www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - toStringz and predictability

reply Ben Hinkle <Ben_member pathlink.com> writes:
There's something about toStringz that has me uncomfortable. Consider this code:

import std.string;

int main() {
char* x;
uint b1;
char[4] y;
uint b2;
y[0] = 'a';
y[1] = 'b';
y[2] = 'c';
y[3] = 'd';
x = toStringz(y);
printf("x length is %d, ptr %p b1 %p b2 %p\n",strlen(x),x,&b1,&b2);
b1 = 0x11223344;
b2 = 0x11223344;
printf("x length is %d, ptr %p b1 %p b2 %p\n",strlen(x),x,&b1,&b2);
return 0;
}

Here's what it prints when I run it:
x length is 4, ptr 0xfefff870 b1 0xfefff86c b2 0xfefff874
x length is 17, ptr 0xfefff870 b1 0xfefff86c b2 0xfefff874

The reason why the length changed is that toStringz looks at one past the length
of the string to see if it is 0 and does nothing to the string if it is. But the
sample code then changes the byte past the string by touching a completely
different variable and so the toStringz result is "corrupted". I have toStringz
calls sprinkled through my code when I call C functions and now I'm starting to
get nervous about the lifespans of those strings and how to figure out if they
are valid or not. Thoughts? Walter, is there a guideline I should follow? The
most extreme one that comes to mind is "only call toStringz for strings that get
immediately copied".

-Ben
Jan 18 2005
next sibling parent reply "Walter" <newshound digitalmars.com> writes:
"Ben Hinkle" <Ben_member pathlink.com> wrote in message
news:csj4hq$1cvi$1 digitaldaemon.com...
 The reason why the length changed is that toStringz looks at one past the
length
 of the string to see if it is 0 and does nothing to the string if it is.
But the
 sample code then changes the byte past the string by touching a completely
 different variable and so the toStringz result is "corrupted". I have
toStringz
 calls sprinkled through my code when I call C functions and now I'm
starting to
 get nervous about the lifespans of those strings and how to figure out if
they
 are valid or not. Thoughts? Walter, is there a guideline I should follow?
The
 most extreme one that comes to mind is "only call toStringz for strings
that get
 immediately copied".
It's "COW" (Copy On Write) to the rescue. The idea is only modify a string that you know is unique. If you don't know it is unique, make a copy of it before modifying it. After the toStringz(), you're modifying the argument to toStringz() but there's another reference to that string that expects it to not change.
Jan 18 2005
next sibling parent reply Ben Hinkle <Ben_member pathlink.com> writes:
In article <csjffu$1qtp$1 digitaldaemon.com>, Walter says...
"Ben Hinkle" <Ben_member pathlink.com> wrote in message
news:csj4hq$1cvi$1 digitaldaemon.com...
 The reason why the length changed is that toStringz looks at one past the
length
 of the string to see if it is 0 and does nothing to the string if it is.
But the
 sample code then changes the byte past the string by touching a completely
 different variable and so the toStringz result is "corrupted". I have
toStringz
 calls sprinkled through my code when I call C functions and now I'm
starting to
 get nervous about the lifespans of those strings and how to figure out if
they
 are valid or not. Thoughts? Walter, is there a guideline I should follow?
The
 most extreme one that comes to mind is "only call toStringz for strings
that get
 immediately copied".
It's "COW" (Copy On Write) to the rescue. The idea is only modify a string that you know is unique. If you don't know it is unique, make a copy of it before modifying it.
But the string doesn't necessarily own the byte after the string. It's a random piece of memory. Even if the string is living on the heap the byte one past the array can be changed at pretty much any time by anything. Modifying the byte following a string is different than modifying a string.
 After the toStringz(), you're modifying the argument to
 toStringz() [...]
actually I'm not. I'm modifying another variable.
Jan 18 2005
parent reply =?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= <afb algonet.se> writes:
Ben Hinkle wrote:

 But the string doesn't necessarily own the byte after the string. It's a random
 piece of memory. Even if the string is living on the heap the byte one past the
 array can be changed at pretty much any time by anything. Modifying the byte
 following a string is different than modifying a string.
The bug is in std/string.d :
 	p = &string[0] + string.length;
 
 	// Peek past end of string[], if it's 0, no conversion necessary.
 	// Note that the compiler will put a 0 past the end of static
 	// strings, and the storage allocator will put a 0 past the end
 	// of newly allocated char[]'s.
 	if (*p == 0)
 	    return string;
Yes, it does work for string literals and for dynamic arrays... But it doesn't work for slices of pointers, or static arrays ? Unless there is a way to separate them, it should be avoided. (since with the pointers/statics, the byte after is off-limits) --anders
Jan 18 2005
parent reply Ben Hinkle <Ben_member pathlink.com> writes:
Yes, it does work for string literals and for dynamic arrays...
Actually it doesn't even work for dynamic arrays: import std.string; int main() { char* x; char[] y = new char[32]; y[] = 0; char[] z = new char[32]; z[] = 32; x = toStringz(z); printf("x length is %d\n",strlen(x)); y[] = 32; printf("x length is %d\n",strlen(x)); return 0; } outputs x length is 32 x length is 67 This is due to how the memory manager allocates memory. -Ben
Jan 18 2005
parent =?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= <afb algonet.se> writes:
Ben Hinkle wrote:

Yes, it does work for string literals and for dynamic arrays...
Actually it doesn't even work for dynamic arrays:
Funny, I was just writing that :-) It breaks down for certain multiples of two. (16, 32, 64, 128, 256, 512, 1024, and so on) Sample test program:
 import std.string;
 void main()
 {
   for (int x = 15; x <= 17; x++)
   {
     char[] a = new char[x];
     char[] b = new char[x];
     char[] c = new char[x];
     a[0] = 0;
     b[0] = 0;
     c[0] = 0;
     printf("%d %p\n",a);
     printf("%d %p\n",b);
     printf("%d %p\n",c);
     char *p = &a[0] + a.length;
     if(*p != 0) printf("not 0\n"); else printf("is 0\n");
     for(int i = 0; i < b.length; i++)
       b[i] = 'A' + i;
     char *z = toStringz(b);
     for(int i = 0; i < a.length; i++)
       a[i] = 'X';
     for(int i = 0; i < c.length; i++)
       c[i] = 'X';
     printf("%s\n",z);
   }
 }
Prints:
 15 0xbf498fe0
 15 0xbf498fd0
 15 0xbf498fc0
 is 0
 ABCDEFGHIJKLMNO
 16 0xbf498fb0
 16 0xbf498fa0
 16 0xbf498f90
 not 0
 ABCDEFGHIJKLMNOPXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
 17 0xbf497fa0
 17 0xbf497f80
 17 0xbf497f60
 is 0
 ABCDEFGHIJKLMNOPQ
Perhaps a bit contrived, but shows how it works... std.string.toStringz is broken. --anders
Jan 18 2005
prev sibling next sibling parent reply "Ben Hinkle" <bhinkle mathworks.com> writes:
"Walter" <newshound digitalmars.com> wrote in message
news:csjffu$1qtp$1 digitaldaemon.com...
 "Ben Hinkle" <Ben_member pathlink.com> wrote in message
 news:csj4hq$1cvi$1 digitaldaemon.com...
 The reason why the length changed is that toStringz looks at one past
the
 length
 of the string to see if it is 0 and does nothing to the string if it is.
But the
 sample code then changes the byte past the string by touching a
completely
 different variable and so the toStringz result is "corrupted". I have
toStringz
 calls sprinkled through my code when I call C functions and now I'm
starting to
 get nervous about the lifespans of those strings and how to figure out
if
 they
 are valid or not. Thoughts? Walter, is there a guideline I should
follow?
 The
 most extreme one that comes to mind is "only call toStringz for strings
that get
 immediately copied".
It's "COW" (Copy On Write) to the rescue. The idea is only modify a string that you know is unique. If you don't know it is unique, make a copy of it before modifying it. After the toStringz(), you're modifying the argument
to
 toStringz() but there's another reference to that string that expects it
to
 not change.
In case you need another example, I can imagine just the act of calling a function could corrupt a toStringz result. Suppose the char[] was stored on the stack and the last element of the array is at the very top of the stack and that the next item after the stack is zero (and that the stack grows up in memory). Then calling toStringz (also suppose it that call was inlined just for simplicity) wouldn't make a copy.but calling a function after that would push another stack frame which could potentially set a non-zero byte immediately following the array. That would corrupt the result of toStringz. I couldn't get this to happen on any machine I have around here but it depends on the stack architecture and how function calls work but the problem is still there for some architectures. So I have a suggestion. Have toStringz always copy if the array is on the stack. Have it never copy if the array is in the data segment (so literals behave as they do today) and have it check the GC capacity to ask the GC for control over the byte following the array (though the length of the array would be unchanged). To implement this toStringz would probably have to be moved out of std.string and into internal. If it copied everything except literals then I can see keeping it in std.string. Anyhow, I agree wth Anders that something should be done. -Ben
Jan 19 2005
parent reply parabolis <parabolis softhome.net> writes:
Ben Hinkle wrote:

 So I have a suggestion. Have toStringz always copy if the array is on the
 stack. Have it never copy if the array is in the data segment (so literals
 behave as they do today) and have it check the GC capacity to ask the GC for
 control over the byte following the array (though the length of the array
 would be unchanged). To implement this toStringz would probably have to be
 moved out of std.string and into internal. If it copied everything except
 literals then I can see keeping it in std.string. Anyhow, I agree wth Anders
 that something should be done.
 
Would this implementation work? ---------------------------------------------------------------- char* toStringzz(char[] str) { str.length++; str[length-1] = cast(char)0x00; return cast(char*)&str; } ---------------------------------------------------------------- That is to say is the array resizing implementation sufficient to determine whether str is dynamic or static on its own and if it is dynamic deal wisely with cases where incrementing length might be sufficient? Can you break toStringzz in any of the cases that toStringz breaks?
Jan 19 2005
parent reply "Ben Hinkle" <bhinkle mathworks.com> writes:
"parabolis" <parabolis softhome.net> wrote in message
news:csmbbh$444$1 digitaldaemon.com...
 Ben Hinkle wrote:

 So I have a suggestion. Have toStringz always copy if the array is on
the
 stack. Have it never copy if the array is in the data segment (so
literals
 behave as they do today) and have it check the GC capacity to ask the GC
for
 control over the byte following the array (though the length of the
array
 would be unchanged). To implement this toStringz would probably have to
be
 moved out of std.string and into internal. If it copied everything
except
 literals then I can see keeping it in std.string. Anyhow, I agree wth
Anders
 that something should be done.
Would this implementation work? ---------------------------------------------------------------- char* toStringzz(char[] str) { str.length++; str[length-1] = cast(char)0x00; return cast(char*)&str; } ---------------------------------------------------------------- That is to say is the array resizing implementation sufficient to determine whether str is dynamic or static on its own and if it is dynamic deal wisely with cases where incrementing length might be sufficient? Can you break toStringzz in any of the cases that toStringz breaks?
Nice idea. I think it's on the right track. I've cleaned it up a bit: char* toStringzz(char[] str) { str.length = str.length+1; str[length-1] = 0; return str.ptr; } Also it copies string literals. If there is an easy way to check if something is a string literal we can add that to your code and have a good solution, I think.
Jan 19 2005
next sibling parent reply Lukas Pinkowski <Lukas.Pinkowski web.de> writes:
Ben Hinkle wrote:
 Nice idea. I think it's on the right track. I've cleaned it up a bit:
 char* toStringzz(char[] str) {
     str.length = str.length+1;
     str[length-1] = 0;
     return str.ptr;
 }
 
 Also it copies string literals. If there is an easy way to check if
 something is a string literal we can add that to your code and have a good
 solution, I think.
Hm, doesn't initialize D uninitialized chars to 0 (here str[length-1]), so you can leave out the str[length-1] = 0; part? Thus better: char* toStringzz(char[] str) { str.length = str.length+1; return str.ptr; } But this actually alters the parameter (is this intended?) My version would be: char* toStringz( in char[] str ) { char[] new_str; new_str.length = str.length + 1; new_str[0 .. length-2] = str[0 .. length-1]; return &new_str[0]; } Creating a copy of the parameter, thus not changing it as you would think for in-parameters. I checked and it works for string literals, too.
Jan 19 2005
parent reply "Ben Hinkle" <bhinkle mathworks.com> writes:
"Lukas Pinkowski" <Lukas.Pinkowski web.de> wrote in message
news:csmfl4$a4c$1 digitaldaemon.com...
 Ben Hinkle wrote:
 Nice idea. I think it's on the right track. I've cleaned it up a bit:
 char* toStringzz(char[] str) {
     str.length = str.length+1;
     str[length-1] = 0;
     return str.ptr;
 }

 Also it copies string literals. If there is an easy way to check if
 something is a string literal we can add that to your code and have a
good
 solution, I think.
Hm, doesn't initialize D uninitialized chars to 0 (here str[length-1]), so you can leave out the str[length-1] = 0; part?
the initializer for char is 0xFF.
 Thus better:

 char* toStringzz(char[] str) {
     str.length = str.length+1;
     return str.ptr;
 }

 But this actually alters the parameter (is this intended?)
an array is a pointer to data and a length. Those are passed by value, so changing the length does not change the original string passed to the function.
 My version would be:

 char* toStringz( in char[] str )
 {
   char[] new_str;
   new_str.length = str.length + 1;
   new_str[0 .. length-2] = str[0 .. length-1];
   return &new_str[0];
 }

 Creating a copy of the parameter, thus not changing it as you would think
 for in-parameters. I checked and it works for string literals, too.
watch out for the case when new_str.ptr is str.ptr since I expect the array copy will error if you try to copy overlapping arrays.
Jan 19 2005
parent reply Georg Wrede <georg.wrede nospam.org> writes:
(Actually, I refer here to several examples in this thread.)

char* toStringzz(char[] str) {
    str.length = str.length+1;
    str[length-1] = 0;
    return str.ptr;
}
What bothers me is, if a string gets repeatedly passed, say, between a library and the main program, and the library functions pass the string on to the OS or another library, every time using toStringz -- then what keeps the string from growing at each iteration? Finally we end up with a (possibly short) string with a lot of zeros at the end. It seems harmless at first glance, but what if later this kind of strings are concatenated (in D code) and passed on to a C-written parser? It would see a lot of "empty strings" between real data. Or am I missing something? In the same manner, should toStringz guarantee a valid C string? I.e. no internal zeros? At the _very least_ in the non-release build! ---- The name toStringz is misleading. Since the only use for it is to make strings edible for C code, it should be renamed toStringC. Normally, if a programmer _wants_ to slap a zero at the end, he'd use ~, wouldn't he. Misnomers like this introduce parallax, and in this case so subtle that we don't even notice. And that's where it _really_ counts!
Jan 24 2005
parent =?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= <afb algonet.se> writes:
Georg Wrede wrote:

 It seems harmless at first glance, but what if later this kind of 
 strings are concatenated (in D code) and passed on to a C-written 
 parser? It would see a lot of "empty strings" between real data.
 
 Or am I missing something?
It would probably be easier to remove the hack altogether and just copy?
     body
     {
 	if (string.length == 0)
 	    return "";
 
 	// Need to make a copy
 	char[] copy = new char[string.length + 1];
 	copy[0..string.length] = string;
 	copy[string.length] = 0;
 	return copy;
     }
Isn't that just what "string.length = string.length + 1" does, anyway ? It would be neat if it could be optimized for string literals, but not at the expense of making the whole function instable? (like it is now)
 In the same manner, should toStringz guarantee a valid C string? I.e. no 
 internal zeros? At the _very least_ in the non-release build!
The contract for toStringz specifies that the char[] is *without* '\0':
     in
     {
 	if (string)
 	{
 	    // No embedded 0's
 	    for (uint i = 0; i < string.length; i++)
 		assert(string[i] != 0);
 	}
     }
     out (result)
     {
 	if (result)
 	{   assert(strlen(result) == string.length);
 	    assert(memcmp(result, string, string.length) == 0);
 	}
     }
It also (implicitly) returns a "" string, for an input param of null.
 The name toStringz is misleading. Since the only use for it is to make 
 strings edible for C code, it should be renamed toStringC. Normally, if 
 a programmer _wants_ to slap a zero at the end, he'd use ~, wouldn't he.
It converts a char[], to a zero-terminated char*. No "C" about that ?? (I'm not sure why it doesn't just 'return (string ~ "\0");', anyone ?) ==> body { return ((string.length == 0) ? "" : string ~ "\0"); } Besides, most of the C functions does not accept UTF-8 input anyway... To be usable from regular C, it would need to be converted to byte* ? (and that would most likely involve charset encoding conversion too) --anders
Jan 24 2005
prev sibling parent "Ben Hinkle" <bhinkle mathworks.com> writes:
another version:

char* toStringzz(char[] str) {
    str ~= 0;
    return str.ptr;
}
Jan 19 2005
prev sibling parent Ben Hinkle <Ben_member pathlink.com> writes:
In article <csjffu$1qtp$1 digitaldaemon.com>, Walter says...
"Ben Hinkle" <Ben_member pathlink.com> wrote in message
news:csj4hq$1cvi$1 digitaldaemon.com...
 The reason why the length changed is that toStringz looks at one past the
length
 of the string to see if it is 0 and does nothing to the string if it is.
But the
 sample code then changes the byte past the string by touching a completely
 different variable and so the toStringz result is "corrupted". I have
toStringz
 calls sprinkled through my code when I call C functions and now I'm
starting to
 get nervous about the lifespans of those strings and how to figure out if
they
 are valid or not. Thoughts? Walter, is there a guideline I should follow?
The
 most extreme one that comes to mind is "only call toStringz for strings
that get
 immediately copied".
It's "COW" (Copy On Write) to the rescue. The idea is only modify a string that you know is unique. If you don't know it is unique, make a copy of it before modifying it. After the toStringz(), you're modifying the argument to toStringz() but there's another reference to that string that expects it to not change.
ok, one last try. Walter, I can't tell if you still think this counts as COW. So let me boil it down to a question. Given the code char[1] str; char* cstr = toStringz(str); ubyte x = 1; what is strlen(cstr)? I claim the answer is compiler dependent and depends on if the compiler stuck the storage location for x immediately following str. Sure running the code doesn't have a problem due to word alignment etc but following the language definition and the definition of toStringz the strlen is unknown. -Ben
Jan 20 2005
prev sibling next sibling parent reply =?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= <afb algonet.se> writes:
Ben Hinkle wrote:

 There's something about toStringz that has me uncomfortable. Consider this
code:
 
 import std.string;
 
 int main() {
 char* x;
 uint b1;
 char[4] y;
 uint b2;
 y[0] = 'a';
 y[1] = 'b';
 y[2] = 'c';
 y[3] = 'd';
 x = toStringz(y);
 printf("x length is %d, ptr %p b1 %p b2 %p\n",strlen(x),x,&b1,&b2);
 b1 = 0x11223344;
 b2 = 0x11223344;
 printf("x length is %d, ptr %p b1 %p b2 %p\n",strlen(x),x,&b1,&b2);
 return 0;
 }
 
 Here's what it prints when I run it:
 x length is 4, ptr 0xfefff870 b1 0xfefff86c b2 0xfefff874
 x length is 17, ptr 0xfefff870 b1 0xfefff86c b2 0xfefff874
That's dependent on the compiler, and the alignment: GDC Linux: x length is 4, ptr 0xbff772b8 b1 0xbff772bc b2 0xbff772b0 x length is 25, ptr 0xbff772b8 b1 0xbff772bc b2 0xbff772b0 GDC Mac OS X: x length is 4, ptr 0xbffffaa0 b1 0xbffffa9c b2 0xbffffaa8 x length is 4, ptr 0xbffffaa0 b1 0xbffffa9c b2 0xbffffaa8 But why are you calling toStringz on a simple (char*), without having it properly NUL-terminated at the end ? If you change the code to : char[] y = new char[4]; Then it prints: x length is 4, ptr 0xbf429fe0 b1 0xbff3c758 b2 0xbff3c74c x length is 4, ptr 0xbf429fe0 b1 0xbff3c758 b2 0xbff3c74c A more interesting question is why : x = toStringz(y[0..4]); does *not* make a copy of the converted pointer-to-characters, just because the next byte in memory happens to be a NUL char? (ie. it works if first byte of "b1" is 42, but not if it's 0) Having to use x = toStringz(y[0..4].dup); just because of this little "optimization" feature is not exactly a given... There should probably be a small warning printed about using toStringz on slices (since it works with literals and arrays) But that it fails on pointers and static arrays is not surprising? --anders PS. If you add a -O on Mac OS X, then it prints "12" instead. So just because it printed 4 above doesn't mean it works.
Jan 18 2005
parent reply Ben Hinkle <Ben_member pathlink.com> writes:
That's dependent on the compiler, and the alignment:

GDC Linux:
x length is 4, ptr 0xbff772b8 b1 0xbff772bc b2 0xbff772b0
x length is 25, ptr 0xbff772b8 b1 0xbff772bc b2 0xbff772b0

GDC Mac OS X:
x length is 4, ptr 0xbffffaa0 b1 0xbffffa9c b2 0xbffffaa8
x length is 4, ptr 0xbffffaa0 b1 0xbffffa9c b2 0xbffffaa8
even more interesting...
But why are you calling toStringz on a simple (char*),
without having it properly NUL-terminated at the end ?
The point of toStringz is to make a D string null terminated.
If you change the code to : char[] y = new char[4];

Then it prints:
x length is 4, ptr 0xbf429fe0 b1 0xbff3c758 b2 0xbff3c74c
x length is 4, ptr 0xbf429fe0 b1 0xbff3c758 b2 0xbff3c74c
That's becaseu the "new" allocates space on the heap and so it has nothing to do with b1 and b2 after that. To corrupt the string on the heap you'l have to wait until something else gets allocated right after that string and then assign something to the first byte.
A more interesting question is why : x = toStringz(y[0..4]);
does *not* make a copy of the converted pointer-to-characters,
just because the next byte in memory happens to be a NUL char?
(ie. it works if first byte of "b1" is 42, but not if it's 0)

Having to use x = toStringz(y[0..4].dup); just because of
this little "optimization" feature is not exactly a given...
There should probably be a small warning printed about using
toStringz on slices (since it works with literals and arrays)
I'm starting to think the only safe usage of toStringz is on arrays where you can guarantee the byte after the string is owned by the string - which includes literals and maybe some other special cases.
But that it fails on pointers and static arrays is not surprising?

--anders
PS. If you add a -O on Mac OS X, then it prints "12" instead.
     So just because it printed 4 above doesn't mean it works.
ok.
Jan 18 2005
next sibling parent =?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= <afb algonet.se> writes:
Ben Hinkle wrote:

But why are you calling toStringz on a simple (char*),
without having it properly NUL-terminated at the end ?
The point of toStringz is to make a D string null terminated.
Never mind, I was thinking in C (just because it is implemented that way), forget that D treats static arrays as having lengths... --anders
Jan 18 2005
prev sibling parent =?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= <afb algonet.se> writes:
Ben Hinkle wrote:

If you change the code to : char[] y = new char[4];

Then it prints:
x length is 4, ptr 0xbf429fe0 b1 0xbff3c758 b2 0xbff3c74c
x length is 4, ptr 0xbf429fe0 b1 0xbff3c758 b2 0xbff3c74c
That's becaseu the "new" allocates space on the heap and so it has nothing to do with b1 and b2 after that. To corrupt the string on the heap you'l have to wait until something else gets allocated right after that string and then assign something to the first byte.
Right, I think I only got lucky because how it allocates memory... I couldn't find any traces of "the storage allocator will put a 0 past the end of newly allocated char[]'s", so that must be just DMC. In fact, I'm not sure that even DMD does it ? This test program:
 void main()
 {
   for (int i = 1; i <= 1024; i++)
   {
     char[] a = new char[i];
     char *p = &a[0] + a.length;
     if(*p != 0) printf("%d\n",i);
   }
 }
Prints out 16,32,64,128,256,512,1024 for *all* the various D compilers. So that toStringz peeks beyond the length of the array is clearly a bug! Perhaps if it could tell that the argument is a string literal ? Naah... --anders
Jan 18 2005
prev sibling parent reply parabolis <parabolis softhome.net> writes:
Ben Hinkle wrote:
 There's something about toStringz that has me uncomfortable. Consider this
code:
There is something else that you should be uncomfortable about - the domains of C strings and D strings are not the same. The toStringz function is so named because C strings are 'Z'ero (or null) terminated. That implies they cannot contain a null character yet D strings have no such silly limitations. So the toStringz function should probably look like this: ---------------------------------------------------------------- char* toStringz(char[] dStr) { char[] cStr = new char[dStr.length+1]; foreach(int i, char dChar; dStr) { if(!(cStr[i] = dChar)) throw new Exception("Null char"); } return &cStr; ---------------------------------------------------------------- Now seems like a great time for plugging the unless/until feature of Perl as being nice in this context allowing: unless(cStr[i] = dChar) throw new Exception("Null char");
Jan 19 2005
next sibling parent reply "Ben Hinkle" <ben.hinkle gmail.com> writes:
"parabolis" <parabolis softhome.net> wrote in message 
news:csmiqa$edp$1 digitaldaemon.com...
 Ben Hinkle wrote:
 There's something about toStringz that has me uncomfortable. Consider 
 this code:
There is something else that you should be uncomfortable about - the domains of C strings and D strings are not the same. The toStringz function is so named because C strings are 'Z'ero (or null) terminated. That implies they cannot contain a null character yet D strings have no such silly limitations.
I hate to disagree but.. that doesn't bother me. I don't see anything wrong with ignoring interior zeros. toStringz just makes sure it is zero-terminated - not that that aren't any internal zeros. [snip]
Jan 19 2005
parent parabolis <parabolis softhome.net> writes:
Ben Hinkle wrote:
 "parabolis" <parabolis softhome.net> wrote in message 
 news:csmiqa$edp$1 digitaldaemon.com...
 
Ben Hinkle wrote:

There's something about toStringz that has me uncomfortable. Consider 
this code:
There is something else that you should be uncomfortable about - the domains of C strings and D strings are not the same. The toStringz function is so named because C strings are 'Z'ero (or null) terminated. That implies they cannot contain a null character yet D strings have no such silly limitations.
I hate to disagree but.. that doesn't bother me. I don't see anything wrong with ignoring interior zeros. toStringz just makes sure it is zero-terminated - not that that aren't any internal zeros. [snip]
---------------------------------------------------------------- char* toStringz(char[] dStr, bit ignoreNullsInString = true) ----------------------------------------------------------------
Jan 19 2005
prev sibling parent reply "Matthew" <admin.hat stlsoft.dot.org> writes:
"parabolis" <parabolis softhome.net> wrote in message
news:csmiqa$edp$1 digitaldaemon.com...
 Ben Hinkle wrote:
 There's something about toStringz that has me uncomfortable. Consider this
code:
There is something else that you should be uncomfortable about - the domains of C strings and D strings are not the same. The toStringz function is so named because C strings are 'Z'ero (or null) terminated. That implies they cannot contain a null character yet D strings have no such silly limitations. So the toStringz function should probably look like this: ---------------------------------------------------------------- char* toStringz(char[] dStr) { char[] cStr = new char[dStr.length+1]; foreach(int i, char dChar; dStr) { if(!(cStr[i] = dChar)) throw new Exception("Null char"); } return &cStr; ---------------------------------------------------------------- Now seems like a great time for plugging the unless/until feature of Perl as being nice in this context allowing: unless(cStr[i] = dChar) throw new Exception("Null char");
Has there been debate about unless/until? If so, count me on the list of 'wanting'. :-)
Jan 21 2005
parent parabolis <parabolis softhome.net> writes:
Matthew wrote:
 "parabolis" <parabolis softhome.net> wrote in message
news:csmiqa$edp$1 digitaldaemon.com...
 
----------------------------------------------------------------
char* toStringz(char[] dStr) {
  char[] cStr = new char[dStr.length+1];
  foreach(int i, char dChar; dStr) {
    if(!(cStr[i] = dChar)) throw new Exception("Null char");
  }
  return &cStr;
----------------------------------------------------------------

Now seems like a great time for plugging the unless/until feature of Perl as
being nice in this context allowing:

  unless(cStr[i] = dChar) throw new Exception("Null char");
Has there been debate about unless/until? If so, count me on the list of 'wanting'. :-)
Yes back around the time the digitalmars.d newsgroup started: http://www.digitalmars.com/d/archives/digitalmars/D/1714.html Walter wrote:
"Brian Hammond" <d at brianhammond dot comBrian_member xx 
pathlink.com> wrote
in message news:c8lmu2$vdm$1 xx digitaldaemon.com...
 I really like the unless because it reads so well.

 "do this unless this is true"
That just seems backwards to me <g>. I like things to execute forwards, not backwards.
However Walter's response was long before "is" replaced "===" and so I think it at least deserves another consideration as Perl's unless construct would give us "unless(A is null)" instead of the akward and much maligned "if(!(A is null))".
Jan 21 2005