www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - interfacing with C: strings and byte vectors

reply yawniek <dlang srtnwz.com> writes:
my C library works a lot with strings defined in C as:

struct vec_t {
     char *base;
     size_t len;
}

is there a easy way to feed regular D strings to functions that 
accept vec_t*
without creating a vec_t every time
or do i write wrappers for these functions and if so, what is the 
most elegant way
to build them?

so far i defined vec_t as:

struct vec_t {
     char *base;
     size_t len;

     this(string s) { base = s.ptr; len = s.lenght; }

nothrow  nogc inout(char)[] toString() inout  property { return 
base[0 .. len]; }

     nothrow  nogc  property const(char)[]  toSlice()
     {
         return cast(string)  base[0..len];
     }
     alias toString this;
}

but i guess there is a more elegant way?!
Jun 11 2016
parent reply Mike Parker <aldacron gmail.com> writes:
On Saturday, 11 June 2016 at 09:32:54 UTC, yawniek wrote:

 so far i defined vec_t as:

 struct vec_t {
     char *base;
     size_t len;

     this(string s) { base = s.ptr; len = s.lenght; }

 nothrow  nogc inout(char)[] toString() inout  property { return 
 base[0 .. len]; }

     nothrow  nogc  property const(char)[]  toSlice()
     {
         return cast(string)  base[0..len];
     }
     alias toString this;
 }

 but i guess there is a more elegant way?!
No, you've got the right idea with the constructor. You just need to change your implementation. You have two big problems with your current constructor. First, only string literals in D (e.g. "foo") are guaranteed to be null terminated. If you construct a vec_t with a string that is not a literal, then you can have a problem on the C side where it is expected to be null terminated. Second, if the string you pass to the constructor was allocated on the GC heap and at some point is collected, then base member of the vec_t will point to an invalid memory location. A simple approach would be: this(string s) { import std.string : toStringz; base = s.toStringz(); len = s.length; } Second, your toString and toSlice implementations are potentially problematic. Aside from the fact that toString is returning a slice and toSlice is returning a string, the slice you create there will always point to the same location as base. If base becomes invalid, then so will the slice. For the toString impelementation, you really should be doing something like this: string toString() nothrow inout { import std.conv.to; return to!string(base); } I don't know what benefit you are expecting from the toSlice method. If you absolutely need it, you should return implement it like so: char[] toSlice() nothrow inout { return base[0 .. len].dup; } This will give you a character array that is modifiable without worrying about the what happens to base. If you really want, you can declare the return as const(char)[]. Rather than toSlice, which isn't something the compiler is aware of, it would be much more appropriate to implement opSlice along with opDollar. Then it's possible to do this: auto slice = myVec[0 .. $]. Of course, if you don't want to allow aribrary slices, then perhaps toSlice is better.
Jun 11 2016
parent reply yawniek <dlang srtnwz.com> writes:
On Saturday, 11 June 2016 at 10:26:17 UTC, Mike Parker wrote:
 On Saturday, 11 June 2016 at 09:32:54 UTC, yawniek wrote:
thanks mike for the in depth answer. i forgot to add a few important points: - the strings in vec_t are not c strings - vec_t might contain other data than strings the original ctor i pasted actually doesn't even work, temporarly i solved it like this(string s) { char[] si = cast(char[]) s; //i'm scared base = si.ptr; len = si.length; } is there a better solution than to fearlessly cast away immutability? i guess i could just make a second vec_t that has immutable base and length that can be used in D to stay clean, would that be worth anything? now what i still don't have a proper idea for is how can i create wrappers for the methods accepting vec_t in a clean way. for the vec_t that are allocated in D-land we can state that the C libs will not modify the data. there is a lot of functions that accept vec_t. is there no way to have strings auto cast to vec_t ? another way i see is a UDA that generates the wrapper function(s). other ideas?
Jun 11 2016
parent ag0aep6g <anonymous example.com> writes:
On 06/11/2016 01:59 PM, yawniek wrote:
 i forgot to add a few important points:
 - the strings in vec_t are not c strings
 - vec_t might contain other data than strings

 the original ctor i pasted actually doesn't even work, temporarly i
 solved it like

      this(string s) {
          char[] si = cast(char[]) s; //i'm scared
Rightfully so. Casting away immutable shouldn't be done lightly.
          base = si.ptr;
          len = si.length;
      }
 is there a better solution than to fearlessly cast away immutability?
 i guess i could just make a second vec_t that has immutable base and length
 that can be used in D to stay clean, would that be worth anything?
Sure. You wouldn't have to do the risky cast then.
 now what i still don't have a proper idea for is how can i create
 wrappers for the
 methods accepting vec_t in a clean way.
 for the vec_t that are allocated in D-land we can state that the C libs
 will not modify the data.
 there is a lot of functions that accept vec_t.
 is there no way to have  strings auto cast to vec_t ?
I don't think so, no. You can make a new type and use alias this to have it convert implicitly to vec_t, but you can't enable that for string. There is no feature to implicitly convert *from* another type, only *to* another type.
 another way i see is a UDA that generates the wrapper function(s).
 other ideas?
Generating the wrappers seems ok to me. Regarding a UDA: I haven't really used them, but you'd still have to iterate over all functions that have the attribute, right? At that point, the UDA might be pointless. Can also iterate over everything and check if there's a vec_t parameter.
Jun 11 2016