digitalmars.D.learn - A use case for fromStringz
- Andrej Mitrovic (58/58) Mar 31 2011 There are situations where you have to call a C dispatch function, and p...
- Jesse Phillips (7/7) Mar 31 2011 Why not:
- Andrej Mitrovic (5/12) Mar 31 2011 Nice catch! But see my second reply. If a null terminator is missing
- Andrej Mitrovic (26/26) Mar 31 2011 Actually, this still suffers from the problem when the returned char*
- Jesse Phillips (3/12) Mar 31 2011 I do not know the proper action if the string you receive is garbage. Sh...
- Andrej Mitrovic (3/3) Mar 31 2011 Oh I'm not trying to get this into Phobos, I just needed the function
- Jacob Carlborg (5/31) Apr 01 2011 In those cases, doesn't the function return the length of the filled
- Andrej Mitrovic (3/5) Apr 01 2011 I know what you mean. I would expect a C function to do just that, but
- Andrej Mitrovic (23/23) Apr 15 2011 Hmm.. now I need a function that converts a wchar* to a wchar[] or
- Andrej Mitrovic (57/57) Apr 15 2011 Microsoft has some of the most ridiculous functions. This one
- Andrej Mitrovic (3/3) Apr 16 2011 Yeah I basically took the idea from the existing D implementation.
There are situations where you have to call a C dispatch function, and pass it a void* and a selector. The selector lets you choose what the C function does, for example an enum constant selector `kGetProductName` could ask the C function to fill a null-terminated string at the location of the void* you've passed in. One way of doing this is to pass the .ptr field of a static or dynamic char array to the C function, letting it fill the array with a null-terminated string. But here's the problem: If you try to print out that array in D code with e.g. writefln, it will print out the _entire length_ of the array. This is a problem because the array could quite likely be filled with garbage values after the null terminator. In fact I just had that case when interfacing with C. to!string can convert a null-terminated C string to a D string, with the length matching the location of the null-terminator. But for char arrays, it won't do any checks for null terminators. It only does this if you explicitly pass it a char*. So I've come up with a very simple solution: module fromStringz2; import std.stdio; import std.conv; import std.traits; import std.string; enum { kGetProductName = 1 } // imagine this function is defined in a C DLL extern(C) void cDispatch(void* payload, int selector) { if (selector == kGetProductName) { char* val = cast(char*)payload; val[0] = 'a'; val[1] = 'b'; val[2] = 'c'; val[3] = '\0'; } } string fromStringz(T)(T value) { static if (isArray!T) { return to!string(cast(char*)value); } else { return to!string(value); } } string getNameOld() { static char[256] name; cDispatch(name.ptr, kGetProductName); return to!string(name); } string getNameNew() { static char[256] name; cDispatch(name.ptr, kGetProductName); return fromStringz(name); } void main() { assert(getNameOld().length == 256); // values after [3] could quite // likely be garbage assert(getNameNew().length == 3); } I admit I didn't take Unicode into account, so its far from being perfect or safe. In any case I think its useful to have such a function, since you generally do not want the part of a C string after the null terminator.
Mar 31 2011
Why not: string getNameOld() { static char[256] name; cDispatch(name.ptr, kGetProductName); return to!string(name.ptr); }
Mar 31 2011
On 3/31/11, Jesse Phillips <jessekphillips+D gmail.com> wrote:Why not: string getNameOld() { static char[256] name; cDispatch(name.ptr, kGetProductName); return to!string(name.ptr); }Nice catch! But see my second reply. If a null terminator is missing and we know we're operating on a D array (which has a length), then it could be best to check for a null terminator. If there isn't one it is highly likely that the array contains garbage.
Mar 31 2011
Actually, this still suffers from the problem when the returned char* doesn't have a null terminator. It really sucks when C code does that, and I've just experienced that. There is a solution though: Since we can detect the length of the D array passed into `fromStringz`, we can do the job of to!string ourselves and check for a null terminator. If one isn't found, we return a string of length 0. Here's an updated version which doesn't suffer from the missing null terminator problem: string fromStringz(T)(T value) { static if (isArray!T) { if (value is null || value.length == 0) { return ""; } auto nullPos = value.indexOf("\0"); if (nullPos == -1) return ""; return to!string(value[0..nullPos]); } else { return to!string(value); } }
Mar 31 2011
Andrej Mitrovic Wrote:Actually, this still suffers from the problem when the returned char* doesn't have a null terminator. It really sucks when C code does that, and I've just experienced that. There is a solution though: Since we can detect the length of the D array passed into `fromStringz`, we can do the job of to!string ourselves and check for a null terminator. If one isn't found, we return a string of length 0. Here's an updated version which doesn't suffer from the missing null terminator problem:I do not know the proper action if the string you receive is garbage. Shouldn't it throw an exception since it did not receive a string? This to me seems like a validation issue. If the functions you are calling are expected to return improper data _you_ must validate what your receive, that includes running it through utf validation.
Mar 31 2011
Oh I'm not trying to get this into Phobos, I just needed the function so I wrote it and sharing it here. Maybe it should throw. For my purposes I don't need it to throw. :)
Mar 31 2011
On 3/31/11 11:18 PM, Andrej Mitrovic wrote:Actually, this still suffers from the problem when the returned char* doesn't have a null terminator. It really sucks when C code does that, and I've just experienced that. There is a solution though:In those cases, doesn't the function return the length of the filled data or something like that?Since we can detect the length of the D array passed into `fromStringz`, we can do the job of to!string ourselves and check for a null terminator. If one isn't found, we return a string of length 0. Here's an updated version which doesn't suffer from the missing null terminator problem: string fromStringz(T)(T value) { static if (isArray!T) { if (value is null || value.length == 0) { return ""; } auto nullPos = value.indexOf("\0"); if (nullPos == -1) return ""; return to!string(value[0..nullPos]); } else { return to!string(value); } }-- /Jacob Carlborg
Apr 01 2011
On 4/1/11, Jacob Carlborg <doob me.com> wrote:In those cases, doesn't the function return the length of the filled data or something like that?I know what you mean. I would expect a C function to do just that, but in this case it does not. Its lame but I have to deal with it.
Apr 01 2011
Hmm.. now I need a function that converts a wchar* to a wchar[] or wstring. There doesn't seem to be anything in Phobos for this type of conversion. Or maybe I haven't looked hard enough? I don't know whether this is safe since I'm not sure how the null terminator is represented in utf16, but it does seem to work ok from a few test cases: wstring fromWStringz(wchar* value) { if (value is null) return ""; auto oldPos = value; uint nullPos; while (*value++ != '\0') { nullPos++; } if (nullPos == 0) return ""; return to!wstring(oldPos[0..nullPos]); } I thought we would pay more attention to interfacing with C code. Since D is supposed to work side-by-side with C, we should have more functions that convert common data types between the two languages.
Apr 15 2011
Microsoft has some of the most ridiculous functions. This one (GetEnvironmentStrings) returns a pointer to a block of null-terminated strings, with no information on the count of strings returned. Each string ends with a null-terminator, standard stuff. But only when you find two null terminators in succession you'll know that you've reached the end of the entire block of strings. So from some example code I've seen, people usually create a count variable and increment it for every null terminator in the block until they find a double null terminator. And then they have to loop all over again when constructing a list of strings. Talk about inefficient designs.. There's also a wchar* edition of this function, I don't want to even touch it. Here's what the example code looks like: char *l_EnvStr; l_EnvStr = GetEnvironmentStrings(); LPTSTR l_str = l_EnvStr; int count = 0; while (true) { if (*l_str == 0) break; while (*l_str != 0) l_str++; l_str++; count++; } for (int i = 0; i < count; i++) { printf("%s\n", l_EnvStr); while(*l_EnvStr != '\0') l_EnvStr++; l_EnvStr++; } FreeEnvironmentStrings(l_EnvStr); I wonder.. in all these years.. have they ever thought about using a convention in C where the length is embedded as a 32/64bit value at the pointed location of a pointer, followed by the array contents? I mean something like the following (I'm pseudocoding here, this is not valid C code, and it's 7 AM.): // allocate memory for the length field + character count char* mystring = malloc(sizeof(size_t) + sizeof(char)*length); *(cast(size_t*)mystring) = length; // embed the length // call a function expecting a char* printString(mystring); // void printString(char* string) { size_t length = *(cast(size_t*)string); (cast(size_t*)string)++; // skip count to reach first char // now print all chars one by one for (size_t i; i < length; i++) { printChar(*string++); } } Well, they can always use an extra parameter in a function that has the length, but it seems many people are too lazy to even do that. I guess C programmers just *love* their nulls. :p
Apr 15 2011
Yeah I basically took the idea from the existing D implementation. Although D's arrays are a struct with a length and a pointer (I think so).
Apr 16 2011