www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - C string to D without memory allocation?

reply Shriramana Sharma <samjnaa_dont_spam_me gmail.com> writes:
Hello. I have the following code:

import std.stdio, std.conv;
extern(C) const(char) * textAttrN(const (char) * specString, size_t n);
string textAttr(const(char)[] specString)
{
    const(char) * ptr = textAttrN(specString.ptr, specString.length);
    writeln(ptr);
    return to!string(ptr);
}
void main()
{
    auto s = textAttr("w /g");
    writeln(s.ptr);
}

Now I'm getting different pointer values printed, like:

7F532A85A440
7F532A954000

Is it possible to get D to create a D string from a C string but not 
allocate memory? 

I thought perhaps the allocation is because C does not guarantee 
immutability but a D string has to. So I tried changing the return type of 
textAttr to const(char)[] but I find it is still allocating for the return 
value. Is this because a slice can potentially be appended to but it may 
overflow a C buffer?

Finally, I just want to return a safe D type encapsulating a C string but 
avoid allocation – is it possible or not?

Thanks!

-- 

Dec 20 2015
next sibling parent reply Rikki Cattermole <alphaglosined gmail.com> writes:
On 21/12/15 6:34 PM, Shriramana Sharma wrote:
 Hello. I have the following code:

 import std.stdio, std.conv;
 extern(C) const(char) * textAttrN(const (char) * specString, size_t n);
 string textAttr(const(char)[] specString)
 {
      const(char) * ptr = textAttrN(specString.ptr, specString.length);
      writeln(ptr);
      return to!string(ptr);
 }
 void main()
 {
      auto s = textAttr("w /g");
      writeln(s.ptr);
 }

 Now I'm getting different pointer values printed, like:

 7F532A85A440
 7F532A954000

 Is it possible to get D to create a D string from a C string but not
 allocate memory?

 I thought perhaps the allocation is because C does not guarantee
 immutability but a D string has to. So I tried changing the return type of
 textAttr to const(char)[] but I find it is still allocating for the return
 value. Is this because a slice can potentially be appended to but it may
 overflow a C buffer?

 Finally, I just want to return a safe D type encapsulating a C string but
 avoid allocation – is it possible or not?

 Thanks!
size_t strLen = ...; char* ptr = ...; string myCString = cast(string)ptr[0 .. strLen]; I can't remember if it will include the null terminator or not, but if it does just decrease strLen by 1.
Dec 20 2015
next sibling parent reply Shriramana Sharma <samjnaa_dont_spam_me gmail.com> writes:
Rikki Cattermole wrote:

 string myCString = cast(string)ptr[0 .. strLen];
Thanks but does this require that one doesn't attempt to append to the returned string using ~= or such? In which case it is not safe, right? --
Dec 20 2015
next sibling parent reply Rikki Cattermole <alphaglosined gmail.com> writes:
On 21/12/15 6:41 PM, Shriramana Sharma wrote:
 Rikki Cattermole wrote:

 string myCString = cast(string)ptr[0 .. strLen];
Thanks but does this require that one doesn't attempt to append to the returned string using ~= or such? In which case it is not safe, right?
Correct, ~= should only be used by GC controlled memory. Which this is not.
Dec 20 2015
parent Jakob Ovrum <jakobovrum gmail.com> writes:
On Monday, 21 December 2015 at 05:43:04 UTC, Rikki Cattermole 
wrote:
 On 21/12/15 6:41 PM, Shriramana Sharma wrote:
 Rikki Cattermole wrote:

 string myCString = cast(string)ptr[0 .. strLen];
Thanks but does this require that one doesn't attempt to append to the returned string using ~= or such? In which case it is not safe, right?
Correct, ~= should only be used by GC controlled memory. Which this is not.
This is just plain wrong. No idea where you got this from.
Dec 20 2015
prev sibling parent reply Jakob Ovrum <jakobovrum gmail.com> writes:
On Monday, 21 December 2015 at 05:41:31 UTC, Shriramana Sharma 
wrote:
 Rikki Cattermole wrote:

 string myCString = cast(string)ptr[0 .. strLen];
Thanks but does this require that one doesn't attempt to append to the returned string using ~= or such? In which case it is not safe, right?
Growing operations like ~= will copy the array to a GC-allocated, druntime-managed array if it isn't one already.
Dec 20 2015
parent reply Jonathan M Davis via Digitalmars-d-learn writes:
On Monday, December 21, 2015 05:43:59 Jakob Ovrum via Digitalmars-d-learn wrote:
 On Monday, 21 December 2015 at 05:41:31 UTC, Shriramana Sharma
 wrote:
 Rikki Cattermole wrote:

 string myCString = cast(string)ptr[0 .. strLen];
Thanks but does this require that one doesn't attempt to append to the returned string using ~= or such? In which case it is not safe, right?
Growing operations like ~= will copy the array to a GC-allocated, druntime-managed array if it isn't one already.
Exactly. As long as the GC has not been disabled, that there is sufficient memory to allocate, and that appending elements does not result in an exception being thrown (which it wouldn't with arrays of char) ~= should always work. When ~= is used, the runtime looks at the capacity of the dynamic array to see whether it has enough room to grow to fit the new elements. If it does, then the array is grown into that space. If it doesn't, then a block of GC memory is allocated, the elements are copied into that memory, and the dynamic array is set to point to that block of memory. Whether the array is pointing to a GC-allocated block of memory or not when ~= is called is irrelevant. If it isn't, all that means is that the array's capacity will be 0, so it's going to have to reallocate, whereas if it were GC-allocated, it might have enough capacity to not need to reallocate. In either case, the operation will work. - Jonathan M Davis
Dec 21 2015
parent reply Shriramana Sharma <samjnaa_dont_spam_me gmail.com> writes:
Jonathan M Davis via Digitalmars-d-learn wrote:

 If it isn't, all that means is that the
 array's capacity will be 0, so it's going to have to reallocate
So it's safe to return a string produced by fromStringz without having to worry that the user would append to it? Then why is it marked system? Only because one cannot be sure that the input point refers to a valid null-terminated string? --
Dec 21 2015
parent Marc =?UTF-8?B?U2Now7x0eg==?= <schuetzm gmx.net> writes:
On Monday, 21 December 2015 at 09:46:58 UTC, Shriramana Sharma 
wrote:
 Jonathan M Davis via Digitalmars-d-learn wrote:

 If it isn't, all that means is that the
 array's capacity will be 0, so it's going to have to reallocate
So it's safe to return a string produced by fromStringz without having to worry that the user would append to it?
Yes.
 Then why is it marked  system? Only because one cannot be sure 
 that the input point refers to a valid null-terminated string?
Exactly.
Dec 21 2015
prev sibling next sibling parent Jakob Ovrum <jakobovrum gmail.com> writes:
On Monday, 21 December 2015 at 05:39:32 UTC, Rikki Cattermole 
wrote:
 size_t strLen = ...;
 char* ptr = ...;

 string myCString = cast(string)ptr[0 .. strLen];

 I can't remember if it will include the null terminator or not, 
 but if it does just decrease strLen by 1.
Strings from C libraries shouldn't be casted to immutable. If the characters of the C string are truly immutable, mark it as immutable(char)* in the binding (and needless to say, use fromStringz instead of explicit counting when the length isn't known).
Dec 20 2015
prev sibling parent reply Jonathan M Davis via Digitalmars-d-learn writes:
On Monday, December 21, 2015 18:39:32 Rikki Cattermole via Digitalmars-d-learn
wrote:
 size_t strLen = ...;
 char* ptr = ...;

 string myCString = cast(string)ptr[0 .. strLen];

 I can't remember if it will include the null terminator or not, but if
 it does just decrease strLen by 1.
Casting to string is almost always wrong. Casting anything to immutable is almost always wrong. If you want to use C strings in D without allocating, they're going to have to be const(char)[], not immutable(char)[], which is what string is. Otherwise, you risk violating the type system. Slicing explicitly like this can work just fine, and it will result in either char[] or const(char)[] depending on whether the pointer is char* or const char*, but the cast should _not_ be done. There's also fromStringz that Jakob suggests using elsewhere in this thread, but that really just boils down to return cString ? cString[0 .. strlen(cString)] : null; So, using that over simply slicing is primarily for documentation purposes, though it does make it so that you don't have to call strlen directly or check for null before calling it. Regardless, do _not_ cast something to immutable if it's not already immutable. It's just begging for trouble. - Jonathan M Davis
Dec 21 2015
parent Jakob Ovrum <jakobovrum gmail.com> writes:
On Monday, 21 December 2015 at 08:35:22 UTC, Jonathan M Davis 
wrote:
 There's also fromStringz that Jakob suggests using elsewhere in 
 this thread, but that really just boils down to

     return cString ? cString[0 .. strlen(cString)] : null;

 So, using that over simply slicing is primarily for 
 documentation purposes, though it does make it so that you 
 don't have to call strlen directly or check for null before 
 calling it.
To add to this, the main motivation behind `fromStringz` is that `cString` is often a non-trivial expression, such as a function call. With `fromStringz`, this expression can always be put in the argument list and it will only be evaluated once. Otherwise a variable has to be added: auto cString = foo(); return cString[0 .. strlen(cString)];
Dec 21 2015
prev sibling parent reply Jakob Ovrum <jakobovrum gmail.com> writes:
On Monday, 21 December 2015 at 05:34:07 UTC, Shriramana Sharma 
wrote:
 Hello. I have the following code:

 import std.stdio, std.conv;
 extern(C) const(char) * textAttrN(const (char) * specString, 
 size_t n);
 string textAttr(const(char)[] specString)
 {
     const(char) * ptr = textAttrN(specString.ptr, 
 specString.length);
     writeln(ptr);
     return to!string(ptr);
 }
 void main()
 {
     auto s = textAttr("w /g");
     writeln(s.ptr);
 }

 Now I'm getting different pointer values printed, like:

 7F532A85A440
 7F532A954000

 Is it possible to get D to create a D string from a C string 
 but not allocate memory?

 I thought perhaps the allocation is because C does not 
 guarantee immutability but a D string has to. So I tried 
 changing the return type of textAttr to const(char)[] but I 
 find it is still allocating for the return value. Is this 
 because a slice can potentially be appended to but it may 
 overflow a C buffer?

 Finally, I just want to return a safe D type encapsulating a C 
 string but avoid allocation – is it possible or not?

 Thanks!
Use std.string.fromStringz. to!string assumes that pointers to characters are null-terminated strings which is not safe or general (unlike std.format, which safely assumes they are pointers to single characters); it is a poor design. fromStringz is explicit about this assumption. That said, to!string shouldn't allocate when given immutable(char)*.
Dec 20 2015
parent reply Shriramana Sharma <samjnaa_dont_spam_me gmail.com> writes:
Jakob Ovrum wrote:

 Use std.string.fromStringz. to!string assumes that pointers to
 characters are null-terminated strings which is not safe or
 general 
I suppose what you mean is, the onus of guaranteeing that const(char)* refers to a null-terminated string is upon the person calling the to! function? Yes I understand, and Phobos documentation does say that using a pointer for input makes this " system". Wouldn't it be better to just reject pointer as input and force people to use fromStringz?
 (unlike std.format, which safely assumes they are
pointers to single characters);
I see that "%s".format(str) where str is a const(char)* just prints the pointer value in hex. So perhaps one should say that std.format just treats it like any other pointer (and not specifically that it treats it as a pointer to a single char)?
 it is a poor design. fromStringz
is explicit about this assumption.
OK thank you. --
Dec 20 2015
parent Jakob Ovrum <jakobovrum gmail.com> writes:
On Monday, 21 December 2015 at 06:00:45 UTC, Shriramana Sharma 
wrote:
 I suppose what you mean is, the onus of guaranteeing that 
 const(char)* refers to a null-terminated string is upon the 
 person calling the to! function? Yes I understand, and Phobos 
 documentation does say that using a pointer for input makes 
 this " system". Wouldn't it be better to just reject pointer as 
 input and force people to use fromStringz?
It could also simply have done the same as format("%p", p). The problem isn't that to!string is too accepting, but that it makes an unsafe assumption with an inconspicuous, generic interface.
 I see that "%s".format(str) where str is a const(char)* just 
 prints the pointer value in hex. So perhaps one should say that 
 std.format just treats it like any other pointer (and not 
 specifically that it treats it as a pointer to a single char)?
Indeed.
Dec 20 2015