digitalmars.D - Re: Biggest problems w/ D - strings
- C. Dunn (27/36) Aug 10 2007 I have a field of n chars stored on disk. It holds a null-terminated st...
- Sean Kelly (4/16) Aug 10 2007 I'm not sure I understand. Why bother computing string length in the C
- BCS (2/27) Aug 10 2007 He might be using a D char[] as an oversized buffer for a c style string...
- C. Dunn (8/25) Aug 10 2007 Exactly. This is very common in the database world. The disk record ha...
- Sean Kelly (8/36) Aug 10 2007 Oh I see. Well, it isn't much help, but std::string in C++ isn't
- Regan Heath (42/84) Aug 11 2007 Something like this: (borrowing from Derek's solution, which I quite
- kenny (34/62) Aug 11 2007 I use postgre and mysql for lots of things. Postgre is much easier to gr...
- Derek Parnell (29/35) Aug 11 2007 You could try this simpler method ...
- Derek Parnell (14/14) Aug 11 2007 On Sat, 11 Aug 2007 20:07:51 +1000, Derek Parnell wrote:
- Vladimir Panteleev (6/7) Aug 13 2007 This reminds me of the Delphi string problem. Before Delphi 3 (OSLT), yo...
- Regan Heath (9/29) Aug 13 2007 It could but the problem remains when dealing with slices, eg.
- Frits van Bommel (5/12) Aug 13 2007 It's probably to make sure a one-past-the-end pointer is also counted as...
Kirk McDonald Wrote:C. Dunn wrote:I have a field of n chars stored on disk. It holds a null-terminated string, padded with zeroes. It is amazingly difficult to compare such a char[n] with some other char[] (which, by the dictates of D, may or may not be null-terminated). template min(T) { T min( T a, T b ) { if ( a < b ) return a; else return b; } } template max(T) { T max( T a, T b ) { if ( a < b ) return b; else return a; } } size_t strnlen(char* s, size_t maxlen){ for(size_t i=0; i<maxlen; ++i){ if (!s[i]) return i; } return maxlen; } int compare(ConstString lhs, ConstString rhs){ char* lptr = cast(char*)lhs; char* rptr = cast(char*)rhs; size_t len_lhs = strnlen(lptr, lhs.length); size_t len_rhs = strnlen(rptr, rhs.length); int comp = strncmp(lptr, rptr, min!(size_t)(len_lhs, len_rhs)); if (comp) return comp; if (len_lhs < len_rhs) return -1; else if (len_lhs > len_rhs) return 1; else return 0; }4) Not enough help for converting between D strings and C char*. There must be conversion functions which work regardless of whether the D string is dynamic or not, and regardless of whether the C char* is null terminated. I'm not sure what the answer is, but this has lead to a large number of runtime bugs for me as a novice.The std.string module has the toStringz and toString functions.
Aug 10 2007
C. Dunn wrote:Kirk McDonald Wrote:I'm not sure I understand. Why bother computing string length in the C fashion when D provides a .length property which holds this information? SeanC. Dunn wrote:I have a field of n chars stored on disk. It holds a null-terminated string, padded with zeroes. It is amazingly difficult to compare such a char[n] with some other char[] (which, by the dictates of D, may or may not be null-terminated).4) Not enough help for converting between D strings and C char*. There must be conversion functions which work regardless of whether the D string is dynamic or not, and regardless of whether the C char* is null terminated. I'm not sure what the answer is, but this has lead to a large number of runtime bugs for me as a novice.The std.string module has the toStringz and toString functions.
Aug 10 2007
Reply to Sean,C. Dunn wrote:He might be using a D char[] as an oversized buffer for a c style string.Kirk McDonald Wrote:I'm not sure I understand. Why bother computing string length in the C fashion when D provides a .length property which holds this information? SeanC. Dunn wrote:I have a field of n chars stored on disk. It holds a null-terminated string, padded with zeroes. It is amazingly difficult to compare such a char[n] with some other char[] (which, by the dictates of D, may or may not be null-terminated).4) Not enough help for converting between D strings and C char*. There must be conversion functions which work regardless of whether the D string is dynamic or not, and regardless of whether the C char* is null terminated. I'm not sure what the answer is, but this has lead to a large number of runtime bugs for me as a novice.The std.string module has the toStringz and toString functions.
Aug 10 2007
BCS Wrote:Reply to Sean,Exactly. This is very common in the database world. The disk record has a fixed size, so I have a struct which looks like this: struct Data{ int id; char[32] name; // ... }; A C function produces this data. D can accept the C struct with no problems. 'name' is just a static array. But processing the name field in D is awkward. 'name.length' is 32, but 'strlen(name)' could be less (or infinity if the string is a full 32 characters sans zeroes, which is why I need strnlen()).C. Dunn wrote:He might be using a D char[] as an oversized buffer for a c style string.I have a field of n chars stored on disk. It holds a null-terminated string, padded with zeroes. It is amazingly difficult to compare such a char[n] with some other char[] (which, by the dictates of D, may or may not be null-terminated).I'm not sure I understand. Why bother computing string length in the C fashion when D provides a .length property which holds this information? Sean
Aug 10 2007
C. Dunn wrote:BCS Wrote:Oh I see. Well, it isn't much help, but std::string in C++ isn't null-terminated either, so this issue isn't unique to D. Unfortunately, I think a custom comparator, like the one you've written, is the best choice here. That or property methods to make Data act more D-like. The get/set routines could return and accept 'normal' D strings, perform length validation, etc. SeanReply to Sean,Exactly. This is very common in the database world. The disk record has a fixed size, so I have a struct which looks like this: struct Data{ int id; char[32] name; // ... }; A C function produces this data. D can accept the C struct with no problems. 'name' is just a static array. But processing the name field in D is awkward. 'name.length' is 32, but 'strlen(name)' could be less (or infinity if the string is a full 32 characters sans zeroes, which is why I need strnlen()).C. Dunn wrote:He might be using a D char[] as an oversized buffer for a c style string.I have a field of n chars stored on disk. It holds a null-terminated string, padded with zeroes. It is amazingly difficult to compare such a char[n] with some other char[] (which, by the dictates of D, may or may not be null-terminated).I'm not sure I understand. Why bother computing string length in the C fashion when D provides a .length property which holds this information? Sean
Aug 10 2007
Sean Kelly wrote:C. Dunn wrote:Something like this: (borrowing from Derek's solution, which I quite liked BTW) import std.string, std.stdio; // Return a slice of the leftmost portion of 'x' // up to but not including the first 'c' string lefts(string x, char c) { int p; p = std.string.find(x,c); if (p < 0) p = x.length; return x[0..p]; } struct Data { int id; char[32] _name; string name() { return lefts(_name, '\0'); } } Data zero; Data full; Data some; static this() { zero._name[] = '\0'; full._name[] = 'a'; some._name[0..10] = 'a'; some._name[10..$] = '\0'; } void main() { char[] other; other.length = 10; other[] = 'a'; assert(other != zero.name); assert(other != full.name); assert(other == some.name); } ReganBCS Wrote:Oh I see. Well, it isn't much help, but std::string in C++ isn't null-terminated either, so this issue isn't unique to D. Unfortunately, I think a custom comparator, like the one you've written, is the best choice here. That or property methods to make Data act more D-like. The get/set routines could return and accept 'normal' D strings, perform length validation, etc.Reply to Sean,Exactly. This is very common in the database world. The disk record has a fixed size, so I have a struct which looks like this: struct Data{ int id; char[32] name; // ... }; A C function produces this data. D can accept the C struct with no problems. 'name' is just a static array. But processing the name field in D is awkward. 'name.length' is 32, but 'strlen(name)' could be less (or infinity if the string is a full 32 characters sans zeroes, which is why I need strnlen()).C. Dunn wrote:He might be using a D char[] as an oversized buffer for a c style string.I have a field of n chars stored on disk. It holds a null-terminated string, padded with zeroes. It is amazingly difficult to compare such a char[n] with some other char[] (which, by the dictates of D, may or may not be null-terminated).I'm not sure I understand. Why bother computing string length in the C fashion when D provides a .length property which holds this information? Sean
Aug 11 2007
C. Dunn wrote:BCS Wrote:I use postgre and mysql for lots of things. Postgre is much easier to grab the string length from cause it returns with the tuple. If I remember right, internally, the schema is stored, then each string looks like this: for varchar <= 255 struct Firstname { ubyte length; char[size] data; } for varchar <= 65535 struct Firstname { ushort length; char[size] data; } -------------------------- why would you want to zero terminate your strings in a database form. It doesn't make any sense. you trade off 1 byte of savings for up to 255 loops to find zero, or two bytes of savings for up to 65535 loops. Consider you have two options... you can always null terminate it -- which means that for strings shorter than 256 chars, you don't save anything -- or you could do this to keep the last char: uint i; for(i = 0; i < 256; i++) { if(str[i] == 0) { break; } } return i; but then that has two checks instead of one (i < 256 && str[i] != 0). In postgre, using libpq, something like what you're saying is very easy... int len = PQgetlength(res, row, offset); if(len >= 0) { char* r = PQgetvalue(res, row, offset); char[] rr; rr.length = len; rr[0 .. len] = r[0 .. len]; } I really suggest using string lengths. it will save you tons of processing power. (especially if you are > 65535 chars in length) and also, by storing the length, you also have the added advantage of being able to store binary data in there, because a zero in the string won't terminate the string. Also, you may find out that people can end strings early passing malformed utf-8 sequences and such too. Every C library that I use, which uses null terminated strings, I quickly convert them to the dark side for the above reasons. walter is very smart making strings that way -- for slicing purposes too :) Example, imagine a RIGHT(str, 5) function with null terminated strings, then think of it in D: (str.length > 5 ? str[length-5 .. length] : str); ok, enough rambling... I LOVE strings in D :) KennyReply to Sean,Exactly. This is very common in the database world. The disk record has a fixed size, so I have a struct which looks like this: struct Data{ int id; char[32] name; // ... }; A C function produces this data. D can accept the C struct with no problems. 'name' is just a static array. But processing the name field in D is awkward. 'name.length' is 32, but 'strlen(name)' could be less (or infinity if the string is a full 32 characters sans zeroes, which is why I need strnlen()).C. Dunn wrote:He might be using a D char[] as an oversized buffer for a c style string.I have a field of n chars stored on disk. It holds a null-terminated string, padded with zeroes. It is amazingly difficult to compare such a char[n] with some other char[] (which, by the dictates of D, may or may not be null-terminated).I'm not sure I understand. Why bother computing string length in the C fashion when D provides a .length property which holds this information? Sean
Aug 11 2007
On Fri, 10 Aug 2007 17:49:01 -0400, C. Dunn wrote:I have a field of n chars stored on disk. It holds a null-terminated string, padded with zeroes. It is amazingly difficult to compare such a char[n] with some other char[] (which, by the dictates of D, may or may not be null-terminated).You could try this simpler method ... import std.string; // Return a slice of the leftmost portion of 'x' // up to but not including the first 'c' string lefts(string x, char c) { int p; p = std.string.find(x,c); if (p < 0) p = x.length; return x[0..p]; } int compare(string lhs, string rhs, char d = '\0') { return std.string.cmp( lefts(lhs,d), lefts(rhs,d) ); } and use it like ... char[32] NameA; char[56] NameB; NameA[] = ' '; NameB[] = ' '; NameA[0..5] = "derek"; NameB[0..7] = "parnell"; result = compare(NameA, NameB); -- Derek Parnell Melbourne, Australia skype: derek.j.parnell
Aug 11 2007
On Sat, 11 Aug 2007 20:07:51 +1000, Derek Parnell wrote: Oops! Of course I really meant ... and use it like ... char[32] NameA; char[56] NameB; NameA[] = '\0'; NameB[] = '\0'; NameA[0..5] = "derek"; NameB[0..7] = "parnell"; result = compare(NameA, NameB); -- Derek Parnell Melbourne, Australia skype: derek.j.parnell
Aug 11 2007
On Sat, 11 Aug 2007 00:49:01 +0300, C. Dunn <cdunn2001 gmail.com> wrote:I have a field of n chars stored on disk. It holds a null-terminated string, padded with zeroes. It is amazingly difficult to compare such a char[n] with some other char[] (which, by the dictates of D, may or may not be null-terminated).This reminds me of the Delphi string problem. Before Delphi 3 (OSLT), you had to do crazy stuff to get a PChar (char*) out of a string. Delphi "long" strings are somewhat similar to D's strings - they have a length property, and allow the string to contain zeroes. Because of that, you couldn't just typecast a string to a PChar, due to lack of a terminating zero. Borland solved the problem by having strings always have a null terminating byte at their end, thus allowing you to typecast a string directly to a PChar. I noticed that memory for arrays are always allocated with an extra byte (internal/gc/gc.d, function _d_arraysetlengthT). I wonder if this is related and can be used for this purpose..? -- Best regards, Vladimir mailto:thecybershadow gmail.com
Aug 13 2007
Vladimir Panteleev wrote:On Sat, 11 Aug 2007 00:49:01 +0300, C. Dunn <cdunn2001 gmail.com> wrote:It could but the problem remains when dealing with slices, eg. string foo = "this is a test"; string bar = foo[5..9]; the byte following the end of the slice 'bar' is ' ' not '\0'. I believe there was/is a hack in toStringz which checks the byte following the slice and if it's '\0' already, does nothing but return the input string. ReganI have a field of n chars stored on disk. It holds a null-terminated string, padded with zeroes. It is amazingly difficult to compare such a char[n] with some other char[] (which, by the dictates of D, may or may not be null-terminated).This reminds me of the Delphi string problem. Before Delphi 3 (OSLT), you had to do crazy stuff to get a PChar (char*) out of a string. Delphi "long" strings are somewhat similar to D's strings - they have a length property, and allow the string to contain zeroes. Because of that, you couldn't just typecast a string to a PChar, due to lack of a terminating zero. Borland solved the problem by having strings always have a null terminating byte at their end, thus allowing you to typecast a string directly to a PChar. I noticed that memory for arrays are always allocated with an extra byte (internal/gc/gc.d, function _d_arraysetlengthT). I wonder if this is related and can be used for this purpose..?
Aug 13 2007
Vladimir Panteleev wrote:On Sat, 11 Aug 2007 00:49:01 +0300, C. Dunn <cdunn2001 gmail.com> wrote:It's probably to make sure a one-past-the-end pointer is also counted as a reference to a memory block by the garbage collector. Though if it's initialized to 0, it could be used for the purpose you describe as a side-effect.I have a field of n chars stored on disk. It holds a null-terminated string, padded with zeroes. It is amazingly difficult to compare such a char[n] with some other char[] (which, by the dictates of D, may or may not be null-terminated).This reminds me of the Delphi string problem. Before Delphi 3 (OSLT), you had to do crazy stuff to get a PChar (char*) out of a string. Delphi "long" strings are somewhat similar to D's strings - they have a length property, and allow the string to contain zeroes. Because of that, you couldn't just typecast a string to a PChar, due to lack of a terminating zero. Borland solved the problem by having strings always have a null terminating byte at their end, thus allowing you to typecast a string directly to a PChar. I noticed that memory for arrays are always allocated with an extra byte (internal/gc/gc.d, function _d_arraysetlengthT). I wonder if this is related and can be used for this purpose..?
Aug 13 2007