www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Why are string literals zero-terminated?

reply =?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= <afb algonet.se> writes:
Why are D string literals '\0' terminated ?

Isn't the implicit length field supposed
to make that termination unnecessary now ?

For instance, if I use:

string2.d:
 char* cstr = "alpha";
 char[] str = "alpha";
Then I get one pointer to the characters:
 __D7string24cstrPa:
 	.long	LC0
That's alright, just pointing to the literal:
 LC0:
 	.ascii "alpha\0"
But the D string is also terminated with a \0:
 __D7string23strAa:
 	.long	5
 	.long	LC0
Doesn't that just waste a char, now that the hack in toStringz has been proved dangerous ? Or is there some internal routine using the fact that they are indeed zero-terminated ? AFAIK, it's just the three string arrays in D: (char[], wchar[], dchar[]) - not other arrays. --anders
Jan 25 2005
parent reply =?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= <afb algonet.se> writes:
Earlier, I wrote:

 Why are D string literals '\0' terminated ?
Never mind, it's just to make the implicit cast to (char*) possible, for use with C functions... Otherwise one would have to use toStringz always, even with string literals. (such as for printf) Test code:
 static const byte[4] XXXX = [ 'X', 'X', 'X', 'X' ];
 
 static const char[4] cABC = "abc\n";
 static const byte[4] bABC = [ 'a', 'b', 'c', '\n' ];
 
 static const byte[4] YYYY = [ 'Y', 'Y', 'Y', 'Y' ];
 
 void main()
 {
   char* chello;
   byte* bhello;
 
   chello = cABC;
   bhello = bABC;
 
   printf(chello);
   printf(cast(char*) bhello);
 }
And as far as I can determine, this goes for *all* char/wchar/dchar arrays - not just the literals ? (but not for byte[]/short[]/int[], and the others) But if toStringz() doesn't check the '\0' contract - and all string arrays are zero-terminated anyway, then of what use is it ? Just avoiding null params ? That could be done much simpler, if that's the case:
 char *stringz(char[] str) { return str ? str : ""; }
Or, if null is not a possibility, just "str.ptr"... (or "cast(char *) str", for DMD before version 0.107) All assuming that D strings are zero-terminated, since that seems to be the current case - right ? --anders
Jan 25 2005
next sibling parent =?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= <afb algonet.se> writes:
Anders F Björklund wrote:

 All assuming that D strings are zero-terminated,
 since that seems to be the current case - right ?
Just rambling, forgot all about the quirks of the allocator with strings of sizes 16,32, etc.
 (16, 32, 64, 128, 256, 512, 1024, and so on)
Please ignore. (but toStringz still needs fixing) --anders
Jan 25 2005
prev sibling parent =?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= <afb algonet.se> writes:
(this was not true:)
 And as far as I can determine, this goes for *all*
 char/wchar/dchar arrays - not just the literals ?

 (but not for byte[]/short[]/int[], and the others)
And here are the simplified test cases, that show when a char[] is *not* zero-terminated: 1) Lengths of 16, 32, 64, 128, 256, 512, 1024, etc.
 void main()
 {
         char[] x = new char[16];
         char[] string = new char[16];
         char[] y = new char[16];
         for (int i = 0; i < 16; i++)
         {
                 x[i] = 'X';
                 string[i] = 'a' + i;
                 y[i] = 'Y';
         }
         printf("%s\n", cast(char*) string);
 }
2) Slices, of already existing strings / arrays.
 void main()
 {
 	char[] hello = "hello";
 	char[] string = hello[0..3];
 	printf("%s\n", string.ptr);
 }
There could be more examples of this, as well. String literals are still terminated with a '\0'. Which is a good thing, even if sometimes confusing. --anders
Jan 25 2005