digitalmars.D - ubyte[] to string with nogc - different behaviors
- Luhrel (67/67) Aug 05 2020 Hi,
- Adam D. Ruppe (14/24) Aug 05 2020 You just leaked str_ptr there. It gets overwritten by bytes' ptr.
- Luhrel (7/33) Aug 05 2020 Unfortunately, this uses the GC.
- Steven Schveighoffer (18/45) Aug 05 2020 Array is a reference counted type -- at the end of this scope, if there
- Luhrel (19/68) Aug 05 2020 Oh, okay I see now.
- Steven Schveighoffer (21/92) Aug 05 2020 Oh crap. I was thinking this was std.container.array.Array. But it's
- Luhrel (5/21) Aug 05 2020 Maybe with a ref parameter, but I think this will only move the
- Andy Balba (10/42) Aug 09 2020 What am I missing here ??
- Steven Schveighoffer (5/53) Aug 10 2020 There were other problems in this code example, and once you fix the
Hi, I'm currently upgrading Druntime DWARF. To do so, I have to convert some bytes to a string. However, I got different behaviors between the following codes. Code made for the test, which works: --- import core.stdc.stdlib; import std.stdio; string toString(const(ubyte)* bytes_ptr, size_t len) nogc { const(ubyte)[] bytes = bytes_ptr[0 .. len]; char* str_ptr = cast(char*)malloc(len * ubyte.sizeof); string str = cast(string)str_ptr[0 .. len]; str = cast(string)bytes[]; return str; } void main() { const(ubyte)[] data = [0x48, 0x61, 0x76, 0x65, 0x20, 0x61, 0x20, 0x67, 0x6f, 0x6f, 0x64, 0x20, 0x64, 0x61, 0x79, 0x20, 0x21]; string str = toString(data.ptr, data.length); writeln(str); // Works } --- Code that I want to implement in rt.backtrace.dwarf: --- extern(D) string readString(ref const(ubyte)[] buffer) nogc nothrow { Array!ubyte bytes; ubyte b = buffer.read!ubyte(); while (b) { bytes.insertBack(b); b = buffer.read!ubyte(); } if (bytes.length()) { import core.stdc.stdio : printf; import core.stdc.stdlib; char* str_ptr = cast(char*)malloc(bytes.length() * ubyte.sizeof); string str = cast(string)str_ptr[0 .. bytes.length()]; str = cast(string)bytes[]; printf("String: %s\tLength: %lld\n", str.ptr, bytes.length); // Works return str; } return null; } unittest { const(ubyte)[] data = [0x48, 0x61, 0x76, 0x65, 0x20, 0x61, 0x20, 0x67, 0x6f, 0x6f, 0x64, 0x20, 0x64, 0x61, 0x79, 0x20, 0x21, 0x00]; import core.stdc.stdio : printf; string res = data.readString(); printf("String: %s\tLength: %lld\n", res.ptr, res.length); // Outputs `w�h�U assert(res == "Have a good day !"); // Fails assert(data == []); data = [0x00]; assert(data.readString == null); } --- What am I missing ?
Aug 05 2020
On Wednesday, 5 August 2020 at 18:17:15 UTC, Luhrel wrote:string toString(const(ubyte)* bytes_ptr, size_t len) nogc { const(ubyte)[] bytes = bytes_ptr[0 .. len]; char* str_ptr = cast(char*)malloc(len * ubyte.sizeof); string str = cast(string)str_ptr[0 .. len]; str = cast(string)bytes[];You just leaked str_ptr there. It gets overwritten by bytes' ptr. I imagine what you meant to do was str_ptr[0 .. len] = bytes_ptr[0 .. len]; string str = cast(string)str_ptr[0 .. len]; return str; or something like that instead to copy the data over. (Though note this is still liable to leak since the user has no indication they need to free it...) You mnight be best off just doing plain return cast(string) bytes_ptr[0 .. len].idup; as the whole function.char* str_ptr = cast(char*)malloc(bytes.length() * ubyte.sizeof); string str = cast(string)str_ptr[0 .. bytes.length()]; str = cast(string)bytes[];but since you again leak the new memory and just refer back to the old, it is liable to be overwritten once the function exists.
Aug 05 2020
On Wednesday, 5 August 2020 at 18:25:37 UTC, Adam D. Ruppe wrote:On Wednesday, 5 August 2020 at 18:17:15 UTC, Luhrel wrote:Thanks, that's what I was looking for.string toString(const(ubyte)* bytes_ptr, size_t len) nogc { const(ubyte)[] bytes = bytes_ptr[0 .. len]; char* str_ptr = cast(char*)malloc(len * ubyte.sizeof); string str = cast(string)str_ptr[0 .. len]; str = cast(string)bytes[];You just leaked str_ptr there. It gets overwritten by bytes' ptr. I imagine what you meant to do was str_ptr[0 .. len] = bytes_ptr[0 .. len]; string str = cast(string)str_ptr[0 .. len]; return str; or something like that instead to copy the data over. (Though note this is still liable to leak since the user has no indication they need to free it...)You mnight be best off just doing plain return cast(string) bytes_ptr[0 .. len].idup; as the whole function.Unfortunately, this uses the GC.Ooooooh I see now. The different behavior is due to `read!ubyte` which removes the byte from the buffer, that's why I can't simply write `return cast(string)bytes[];`.char* str_ptr = cast(char*)malloc(bytes.length() * ubyte.sizeof); string str = cast(string)str_ptr[0 .. bytes.length()]; str = cast(string)bytes[];but since you again leak the new memory and just refer back to the old, it is liable to be overwritten once the function exists.
Aug 05 2020
On 8/5/20 2:17 PM, Luhrel wrote:extern(D) string readString(ref const(ubyte)[] buffer) nogc nothrow { Array!ubyte bytes;Array is a reference counted type -- at the end of this scope, if there are no more references to it, it will *free the memory* being used.ubyte b = buffer.read!ubyte(); while (b) { bytes.insertBack(b); b = buffer.read!ubyte(); } if (bytes.length()) { import core.stdc.stdio : printf; import core.stdc.stdlib; char* str_ptr = cast(char*)malloc(bytes.length() * ubyte.sizeof); string str = cast(string)str_ptr[0 .. bytes.length()]; str = cast(string)bytes[];Adam is right, you are not copying data here, just reassigning the reference. Not only that, but the expression bytes[] is not just a simple array, it's a range. I don't know what this does, but I wouldn't trust it. What you likely want is: str[] = bytes.data[]; This will copy all the data from the Array to the string. However, I don't really understand the point of this function. What does buffer.read!ubyte do? If you have a more descriptive statement on what you want to accomplish, I'm sure it can be done in a simpler way than what you are trying to do here.printf("String: %s\tLength: %lld\n", str.ptr, bytes.length); // Works return str; } return null; }At this point the data used by bytes (and therefore pointed at by str) are freed. So you are returning a dangling pointer here. -Steve
Aug 05 2020
On Wednesday, 5 August 2020 at 18:46:19 UTC, Steven Schveighoffer wrote:On 8/5/20 2:17 PM, Luhrel wrote:Oh, okay I see now.extern(D) string readString(ref const(ubyte)[] buffer) nogc nothrow { Array!ubyte bytes;Array is a reference counted type -- at the end of this scope, if there are no more references to it, it will *free the memory* being used.It's a shortcut for bytes[0 .. bytes.length]ubyte b = buffer.read!ubyte(); while (b) { bytes.insertBack(b); b = buffer.read!ubyte(); } if (bytes.length()) { import core.stdc.stdio : printf; import core.stdc.stdlib; char* str_ptr = cast(char*)malloc(bytes.length() * ubyte.sizeof); string str = cast(string)str_ptr[0 .. bytes.length()]; str = cast(string)bytes[];Adam is right, you are not copying data here, just reassigning the reference. Not only that, but the expression bytes[] is not just a simple array, it's a range. I don't know what this does, but I wouldn't trust it.What you likely want is: str[] = bytes.data[];Nope. I use the Array struct from rt.utils.container.array, which doesn't have a `data` property.This will copy all the data from the Array to the string. However, I don't really understand the point of this function. What does buffer.read!ubyte do?It returns the first ubyte in the `buffer`. Once read, the ubyte is deleted from `buffer`.If you have a more descriptive statement on what you want to accomplish, I'm sure it can be done in a simpler way than what you are trying to do here.I'm improving rt.backtrace.dwarf, specifically the readLineNumberProgram function, which is nogc, the source of all my problems :D. DWARF v5 have a special DW_LNCT_path tag, which can be in the form DW_FORM_string. From the spec: "A string is a sequence of contiguous non-null bytes followed by one null byte." - Commonly called a C string. So, I need to read all non-null bytes in the buffer, until I found one. But, as readLineNumberProgram is nogc, it's kinda hard for me (I don't come from C/C++).printf("String: %s\tLength: %lld\n", str.ptr, bytes.length); // Works return str; } return null; }At this point the data used by bytes (and therefore pointed at by str) are freed. So you are returning a dangling pointer here. -Steve
Aug 05 2020
On 8/5/20 3:43 PM, Luhrel wrote:On Wednesday, 5 August 2020 at 18:46:19 UTC, Steven Schveighoffer wrote:Oh crap. I was thinking this was std.container.array.Array. But it's rt.util.container.array.Array. This is not reference counted, but does destroy its used memory on destruction. So it's still the same problem. But the cast is not bad from what I can tell -- bytes[] returns a ubyte[]. If this were std.container.array.Array, it would not have compiled.On 8/5/20 2:17 PM, Luhrel wrote:Oh, okay I see now.extern(D) string readString(ref const(ubyte)[] buffer) nogc nothrow { Array!ubyte bytes;Array is a reference counted type -- at the end of this scope, if there are no more references to it, it will *free the memory* being used.It's a shortcut for bytes[0 .. bytes.length]ubyte b = buffer.read!ubyte(); while (b) { bytes.insertBack(b); b = buffer.read!ubyte(); } if (bytes.length()) { import core.stdc.stdio : printf; import core.stdc.stdlib; char* str_ptr = cast(char*)malloc(bytes.length() * ubyte.sizeof); string str = cast(string)str_ptr[0 .. bytes.length()]; str = cast(string)bytes[];Adam is right, you are not copying data here, just reassigning the reference. Not only that, but the expression bytes[] is not just a simple array, it's a range. I don't know what this does, but I wouldn't trust it.Yep, my mistake. Wrong Array type. But still, you are mallocing a block and throwing it away. And I realize, this wouldn't have worked anyway, as str is a string, which means you can't overwrite the data. I think there is likely a better way to do what you are trying to do.What you likely want is: str[] = bytes.data[];Nope. I use the Array struct from rt.utils.container.array, which doesn't have a `data` property.OK, yeah I see the implementation in that file.This will copy all the data from the Array to the string. However, I don't really understand the point of this function. What does buffer.read!ubyte do?It returns the first ubyte in the `buffer`. Once read, the ubyte is deleted from `buffer`.What is the source of the incoming data? Will it be around by the time you need to use this string? Can you just slice it without copying? I'm not familiar with this code, so I don't know the requirements. I would think: return cast(const(char)[])buffer[0 .. strnlen(buffer.ptr, buffer.length))]; or something like that. -SteveIf you have a more descriptive statement on what you want to accomplish, I'm sure it can be done in a simpler way than what you are trying to do here.I'm improving rt.backtrace.dwarf, specifically the readLineNumberProgram function, which is nogc, the source of all my problems :D. DWARF v5 have a special DW_LNCT_path tag, which can be in the form DW_FORM_string. From the spec: "A string is a sequence of contiguous non-null bytes followed by one null byte." - Commonly called a C string. So, I need to read all non-null bytes in the buffer, until I found one. But, as readLineNumberProgram is nogc, it's kinda hard for me (I don't come from C/C++).
Aug 05 2020
On Wednesday, 5 August 2020 at 20:28:22 UTC, Steven Schveighoffer wrote:... But still, you are mallocing a block and throwing it away. And I realize, this wouldn't have worked anyway, as str is a string, which means you can't overwrite the data. I think there is likely a better way to do what you are trying to do.Maybe with a ref parameter, but I think this will only move the problem outside the function.... What is the source of the incoming data? Will it be around by the time you need to use this string? Can you just slice it without copying? I'm not familiar with this code, so I don't know the requirements. I would think: return cast(const(char)[])buffer[0 .. strnlen(buffer.ptr, buffer.length))]; or something like that. -SteveWill try that.
Aug 05 2020
On Wednesday, 5 August 2020 at 21:15:10 UTC, Luhrel wrote:On Wednesday, 5 August 2020 at 20:28:22 UTC, Steven Schveighoffer wrote:What am I missing here ?? Prior to reading this post, I would have tried ubyte[4] ss= [0x23, 0x24, 0x25, 0x26] ; writeln ("ss ", ss); .. which outputs (as expected) ss [35, 36, 37, 38] And to convert ss to string, I would have tried : string sss = cast(string) ss; writeln ("sss ", sss); So, unless I'm missing something, isn't converting ubyte[] to string super simple ?... But still, you are mallocing a block and throwing it away. And I realize, this wouldn't have worked anyway, as str is a string, which means you can't overwrite the data. I think there is likely a better way to do what you are trying to do.Maybe with a ref parameter, but I think this will only move the problem outside the function.... What is the source of the incoming data? Will it be around by the time you need to use this string? Can you just slice it without copying? I'm not familiar with this code, so I don't know the requirements. I would think: return cast(const(char)[])buffer[0 .. strnlen(buffer.ptr, buffer.length))]; or something like that. -SteveWill try that.
Aug 09 2020
On 8/9/20 11:56 PM, Andy Balba wrote:On Wednesday, 5 August 2020 at 21:15:10 UTC, Luhrel wrote:There were other problems in this code example, and once you fix the underlying problems, it's no longer a conversion of ubyte[] to string. But yes, conversion between arrays is pretty easy in D. -SteveOn Wednesday, 5 August 2020 at 20:28:22 UTC, Steven Schveighoffer wrote:What am I missing here ?? Prior to reading this post, I would have tried ubyte[4] ss= [0x23, 0x24, 0x25, 0x26] ; writeln ("ss ", ss); ... which outputs (as expected) ss [35, 36, 37, 38] And to convert ss to string, I would have tried : string sss = cast(string) ss; writeln ("sss ", sss); So, unless I'm missing something, isn't converting ubyte[] to string super simple ?... But still, you are mallocing a block and throwing it away. And I realize, this wouldn't have worked anyway, as str is a string, which means you can't overwrite the data. I think there is likely a better way to do what you are trying to do.Maybe with a ref parameter, but I think this will only move the problem outside the function.... What is the source of the incoming data? Will it be around by the time you need to use this string? Can you just slice it without copying? I'm not familiar with this code, so I don't know the requirements. I would think: return cast(const(char)[])buffer[0 .. strnlen(buffer.ptr, buffer.length))]; or something like that.Will try that.
Aug 10 2020