www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - ubyte[] to string with nogc - different behaviors

reply Luhrel <lucien.perregaux gmail.com> writes:
Hi,

I'm currently upgrading Druntime DWARF. To do so, I have to 
convert some bytes to a string. However, I got different 
behaviors between the following codes.

Code made for the test, which works:
---
import core.stdc.stdlib;
import std.stdio;

string toString(const(ubyte)* bytes_ptr, size_t len)  nogc
{
     const(ubyte)[] bytes = bytes_ptr[0 .. len];
     char* str_ptr = cast(char*)malloc(len * ubyte.sizeof);
     string str = cast(string)str_ptr[0 .. len];
     str = cast(string)bytes[];
     return str;
}

void main()
{
     const(ubyte)[] data = [0x48, 0x61, 0x76, 0x65, 0x20, 0x61, 
0x20, 0x67, 0x6f,
         0x6f, 0x64, 0x20, 0x64, 0x61, 0x79, 0x20, 0x21];

     string str = toString(data.ptr, data.length);
     writeln(str);  // Works
}
---

Code that I want to implement in rt.backtrace.dwarf:
---
extern(D) string readString(ref const(ubyte)[] buffer)  nogc 
nothrow
{
     Array!ubyte bytes;

     ubyte b = buffer.read!ubyte();
     while (b)
     {
         bytes.insertBack(b);
         b = buffer.read!ubyte();
     }

     if (bytes.length())
     {
         import core.stdc.stdio : printf;
         import core.stdc.stdlib;

         char* str_ptr = cast(char*)malloc(bytes.length() * 
ubyte.sizeof);
         string str = cast(string)str_ptr[0 .. bytes.length()];
         str = cast(string)bytes[];

         printf("String: %s\tLength: %lld\n", str.ptr, 
bytes.length);  // Works

         return str;
     }
     return null;
}

unittest
{
     const(ubyte)[] data = [0x48, 0x61, 0x76, 0x65, 0x20, 0x61, 
0x20, 0x67, 0x6f,
         0x6f, 0x64, 0x20, 0x64, 0x61, 0x79, 0x20, 0x21, 0x00];

     import core.stdc.stdio : printf;

     string res = data.readString();
     printf("String: %s\tLength: %lld\n", res.ptr, res.length);  
// Outputs `w�h�U

     assert(res == "Have a good day !");  // Fails
     assert(data == []);

     data = [0x00];
     assert(data.readString == null);
}
---

What am I missing ?
Aug 05 2020
next sibling parent reply Adam D. Ruppe <destructionator gmail.com> writes:
On Wednesday, 5 August 2020 at 18:17:15 UTC, Luhrel wrote:
 string toString(const(ubyte)* bytes_ptr, size_t len)  nogc
 {
     const(ubyte)[] bytes = bytes_ptr[0 .. len];
     char* str_ptr = cast(char*)malloc(len * ubyte.sizeof);
     string str = cast(string)str_ptr[0 .. len];
     str = cast(string)bytes[];
You just leaked str_ptr there. It gets overwritten by bytes' ptr. I imagine what you meant to do was str_ptr[0 .. len] = bytes_ptr[0 .. len]; string str = cast(string)str_ptr[0 .. len]; return str; or something like that instead to copy the data over. (Though note this is still liable to leak since the user has no indication they need to free it...) You mnight be best off just doing plain return cast(string) bytes_ptr[0 .. len].idup; as the whole function.
         char* str_ptr = cast(char*)malloc(bytes.length() * 
 ubyte.sizeof);
         string str = cast(string)str_ptr[0 .. bytes.length()];
         str = cast(string)bytes[];
but since you again leak the new memory and just refer back to the old, it is liable to be overwritten once the function exists.
Aug 05 2020
parent Luhrel <lucien.perregaux gmail.com> writes:
On Wednesday, 5 August 2020 at 18:25:37 UTC, Adam D. Ruppe wrote:
 On Wednesday, 5 August 2020 at 18:17:15 UTC, Luhrel wrote:
 string toString(const(ubyte)* bytes_ptr, size_t len)  nogc
 {
     const(ubyte)[] bytes = bytes_ptr[0 .. len];
     char* str_ptr = cast(char*)malloc(len * ubyte.sizeof);
     string str = cast(string)str_ptr[0 .. len];
     str = cast(string)bytes[];
You just leaked str_ptr there. It gets overwritten by bytes' ptr. I imagine what you meant to do was str_ptr[0 .. len] = bytes_ptr[0 .. len]; string str = cast(string)str_ptr[0 .. len]; return str; or something like that instead to copy the data over. (Though note this is still liable to leak since the user has no indication they need to free it...)
Thanks, that's what I was looking for.
 You mnight be best off just doing plain

 return cast(string) bytes_ptr[0 .. len].idup;

 as the whole function.
Unfortunately, this uses the GC.
         char* str_ptr = cast(char*)malloc(bytes.length() * 
 ubyte.sizeof);
         string str = cast(string)str_ptr[0 .. bytes.length()];
         str = cast(string)bytes[];
but since you again leak the new memory and just refer back to the old, it is liable to be overwritten once the function exists.
Ooooooh I see now. The different behavior is due to `read!ubyte` which removes the byte from the buffer, that's why I can't simply write `return cast(string)bytes[];`.
Aug 05 2020
prev sibling parent reply Steven Schveighoffer <schveiguy gmail.com> writes:
On 8/5/20 2:17 PM, Luhrel wrote:
 extern(D) string readString(ref const(ubyte)[] buffer)  nogc nothrow
 {
      Array!ubyte bytes;
Array is a reference counted type -- at the end of this scope, if there are no more references to it, it will *free the memory* being used.
 
      ubyte b = buffer.read!ubyte();
      while (b)
      {
          bytes.insertBack(b);
          b = buffer.read!ubyte();
      }
 
      if (bytes.length())
      {
          import core.stdc.stdio : printf;
          import core.stdc.stdlib;
 
          char* str_ptr = cast(char*)malloc(bytes.length() *
ubyte.sizeof);
          string str = cast(string)str_ptr[0 .. bytes.length()];
          str = cast(string)bytes[];
Adam is right, you are not copying data here, just reassigning the reference. Not only that, but the expression bytes[] is not just a simple array, it's a range. I don't know what this does, but I wouldn't trust it. What you likely want is: str[] = bytes.data[]; This will copy all the data from the Array to the string. However, I don't really understand the point of this function. What does buffer.read!ubyte do? If you have a more descriptive statement on what you want to accomplish, I'm sure it can be done in a simpler way than what you are trying to do here.
 
          printf("String: %s\tLength: %lld\n", str.ptr, bytes.length);  
 // Works
 
          return str;
      }
      return null;
 }
At this point the data used by bytes (and therefore pointed at by str) are freed. So you are returning a dangling pointer here. -Steve
Aug 05 2020
parent reply Luhrel <lucien.perregaux gmail.com> writes:
On Wednesday, 5 August 2020 at 18:46:19 UTC, Steven Schveighoffer 
wrote:
 On 8/5/20 2:17 PM, Luhrel wrote:
 extern(D) string readString(ref const(ubyte)[] buffer)  nogc 
 nothrow
 {
      Array!ubyte bytes;
Array is a reference counted type -- at the end of this scope, if there are no more references to it, it will *free the memory* being used.
Oh, okay I see now.
 
      ubyte b = buffer.read!ubyte();
      while (b)
      {
          bytes.insertBack(b);
          b = buffer.read!ubyte();
      }
 
      if (bytes.length())
      {
          import core.stdc.stdio : printf;
          import core.stdc.stdlib;
 
          char* str_ptr = cast(char*)malloc(bytes.length() * 
 ubyte.sizeof);
          string str = cast(string)str_ptr[0 .. bytes.length()];
          str = cast(string)bytes[];
Adam is right, you are not copying data here, just reassigning the reference. Not only that, but the expression bytes[] is not just a simple array, it's a range. I don't know what this does, but I wouldn't trust it.
It's a shortcut for bytes[0 .. bytes.length]
 What you likely want is:
 str[] = bytes.data[];
Nope. I use the Array struct from rt.utils.container.array, which doesn't have a `data` property.
 This will copy all the data from the Array to the string.

 However, I don't really understand the point of this function. 
 What does buffer.read!ubyte do?
It returns the first ubyte in the `buffer`. Once read, the ubyte is deleted from `buffer`.
 If you have a more descriptive statement on what you want to 
 accomplish, I'm sure it can be done in a simpler way than what 
 you are trying to do here.
I'm improving rt.backtrace.dwarf, specifically the readLineNumberProgram function, which is nogc, the source of all my problems :D. DWARF v5 have a special DW_LNCT_path tag, which can be in the form DW_FORM_string. From the spec: "A string is a sequence of contiguous non-null bytes followed by one null byte." - Commonly called a C string. So, I need to read all non-null bytes in the buffer, until I found one. But, as readLineNumberProgram is nogc, it's kinda hard for me (I don't come from C/C++).
 
          printf("String: %s\tLength: %lld\n", str.ptr, 
 bytes.length);  // Works
 
          return str;
      }
      return null;
 }
At this point the data used by bytes (and therefore pointed at by str) are freed. So you are returning a dangling pointer here. -Steve
Aug 05 2020
parent reply Steven Schveighoffer <schveiguy gmail.com> writes:
On 8/5/20 3:43 PM, Luhrel wrote:
 On Wednesday, 5 August 2020 at 18:46:19 UTC, Steven Schveighoffer wrote:
 On 8/5/20 2:17 PM, Luhrel wrote:
 extern(D) string readString(ref const(ubyte)[] buffer)  nogc nothrow
 {
      Array!ubyte bytes;
Array is a reference counted type -- at the end of this scope, if there are no more references to it, it will *free the memory* being used.
Oh, okay I see now.
      ubyte b = buffer.read!ubyte();
      while (b)
      {
          bytes.insertBack(b);
          b = buffer.read!ubyte();
      }

      if (bytes.length())
      {
          import core.stdc.stdio : printf;
          import core.stdc.stdlib;

          char* str_ptr = cast(char*)malloc(bytes.length() * 
 ubyte.sizeof);
          string str = cast(string)str_ptr[0 .. bytes.length()];
          str = cast(string)bytes[];
Adam is right, you are not copying data here, just reassigning the reference. Not only that, but the expression bytes[] is not just a simple array, it's a range. I don't know what this does, but I wouldn't trust it.
It's a shortcut for bytes[0 .. bytes.length]
Oh crap. I was thinking this was std.container.array.Array. But it's rt.util.container.array.Array. This is not reference counted, but does destroy its used memory on destruction. So it's still the same problem. But the cast is not bad from what I can tell -- bytes[] returns a ubyte[]. If this were std.container.array.Array, it would not have compiled.
 What you likely want is:
 str[] = bytes.data[];
Nope. I use the Array struct from rt.utils.container.array, which doesn't have a `data` property.
Yep, my mistake. Wrong Array type. But still, you are mallocing a block and throwing it away. And I realize, this wouldn't have worked anyway, as str is a string, which means you can't overwrite the data. I think there is likely a better way to do what you are trying to do.
 
 This will copy all the data from the Array to the string.

 However, I don't really understand the point of this function. What 
 does buffer.read!ubyte do?
It returns the first ubyte in the `buffer`. Once read, the ubyte is deleted from `buffer`.
OK, yeah I see the implementation in that file.
 
 If you have a more descriptive statement on what you want to 
 accomplish, I'm sure it can be done in a simpler way than what you are 
 trying to do here.
I'm improving rt.backtrace.dwarf, specifically the readLineNumberProgram function, which is nogc, the source of all my problems :D. DWARF v5 have a special DW_LNCT_path tag, which can be in the form DW_FORM_string. From the spec: "A string is a sequence of contiguous non-null bytes followed by one null byte." - Commonly called a C string. So, I need to read all non-null bytes in the buffer, until I found one. But, as readLineNumberProgram is nogc, it's kinda hard for me (I don't come from C/C++).
What is the source of the incoming data? Will it be around by the time you need to use this string? Can you just slice it without copying? I'm not familiar with this code, so I don't know the requirements. I would think: return cast(const(char)[])buffer[0 .. strnlen(buffer.ptr, buffer.length))]; or something like that. -Steve
Aug 05 2020
parent reply Luhrel <lucien.perregaux gmail.com> writes:
On Wednesday, 5 August 2020 at 20:28:22 UTC, Steven Schveighoffer 
wrote:
 ...

 But still, you are mallocing a block and throwing it away.

 And I realize, this wouldn't have worked anyway, as str is a 
 string, which means you can't overwrite the data.

 I think there is likely a better way to do what you are trying 
 to do.
Maybe with a ref parameter, but I think this will only move the problem outside the function.
...

 What is the source of the incoming data? Will it be around by 
 the time you need to use this string? Can you just slice it 
 without copying? I'm not familiar with this code, so I don't 
 know the requirements.

 I would think:

 return cast(const(char)[])buffer[0 .. strnlen(buffer.ptr, 
 buffer.length))];

 or something like that.

 -Steve
Will try that.
Aug 05 2020
parent reply Andy Balba <pwplus7 gmail.com> writes:
On Wednesday, 5 August 2020 at 21:15:10 UTC, Luhrel wrote:
 On Wednesday, 5 August 2020 at 20:28:22 UTC, Steven 
 Schveighoffer wrote:
 ...

 But still, you are mallocing a block and throwing it away.

 And I realize, this wouldn't have worked anyway, as str is a 
 string, which means you can't overwrite the data.

 I think there is likely a better way to do what you are trying 
 to do.
Maybe with a ref parameter, but I think this will only move the problem outside the function.
...

 What is the source of the incoming data? Will it be around by 
 the time you need to use this string? Can you just slice it 
 without copying? I'm not familiar with this code, so I don't 
 know the requirements.

 I would think:

 return cast(const(char)[])buffer[0 .. strnlen(buffer.ptr, 
 buffer.length))];

 or something like that.

 -Steve
Will try that.
What am I missing here ?? Prior to reading this post, I would have tried ubyte[4] ss= [0x23, 0x24, 0x25, 0x26] ; writeln ("ss ", ss); .. which outputs (as expected) ss [35, 36, 37, 38] And to convert ss to string, I would have tried : string sss = cast(string) ss; writeln ("sss ", sss); So, unless I'm missing something, isn't converting ubyte[] to string super simple ?
Aug 09 2020
parent Steven Schveighoffer <schveiguy gmail.com> writes:
On 8/9/20 11:56 PM, Andy Balba wrote:
 On Wednesday, 5 August 2020 at 21:15:10 UTC, Luhrel wrote:
 On Wednesday, 5 August 2020 at 20:28:22 UTC, Steven Schveighoffer wrote:
 ...

 But still, you are mallocing a block and throwing it away.

 And I realize, this wouldn't have worked anyway, as str is a string, 
 which means you can't overwrite the data.

 I think there is likely a better way to do what you are trying to do.
Maybe with a ref parameter, but I think this will only move the problem outside the function.
 ...

 What is the source of the incoming data? Will it be around by the 
 time you need to use this string? Can you just slice it without 
 copying? I'm not familiar with this code, so I don't know the 
 requirements.

 I would think:

 return cast(const(char)[])buffer[0 .. strnlen(buffer.ptr, 
 buffer.length))];

 or something like that.
Will try that.
What am I missing here ?? Prior to reading this post, I would have tried  ubyte[4] ss= [0x23, 0x24, 0x25, 0x26] ;    writeln ("ss ", ss); ... which outputs (as expected) ss  [35, 36, 37, 38] And to convert ss to string, I would have tried :     string sss = cast(string) ss;           writeln ("sss ", sss); So, unless I'm missing something, isn't converting ubyte[] to string super simple ?
There were other problems in this code example, and once you fix the underlying problems, it's no longer a conversion of ubyte[] to string. But yes, conversion between arrays is pretty easy in D. -Steve
Aug 10 2020