digitalmars.D.learn - string vs. const(char)[] on the function signature
- =?UTF-8?B?QWxpIMOHZWhyZWxp?= (150/150) Jul 13 2012 tl;dr - Please see the conclusion section below.
- Jonathan M Davis (34/40) Jul 13 2012 In a lot of cases, the answer is to simply template on string type so th...
- bearophile (5/6) Jul 13 2012 One case for your discussion:
- Steven Schveighoffer (64/80) Sep 13 2012 [snip]
tl;dr - Please see the conclusion section below. Although the question is valid for other reference types as well, I will only use strings in this post. The question is whether a function should take a string parameter as 'string' or as 'const(char)[]'. string on the function API is a restriction: Such a function wants the caller to provide an immutable string: void foo(string s) // <-- too restrict { // ... doesn't mutate s ... } void main() { char[] s; foo(s); // <-- ERROR: not immutable } The caller can copy the string before calling the function but that would only be to satisfy the limitation that the immutable parameter brings. So, a better parameter type is 'const char[]' (or const(char)[]) because it can be bound to both mutable and immutable: void foo(const char[] s) // <-- welcoming { // ... doesn't mutate s ... } void main() { char[] s; foo(s); // <-- works foo(""); // <-- works } So far so good. Unfortunately, a problem arises when foo() decides to make a call to a function that really (or unnecessarily) needs an immutable string: void bar(string s) // <-- really needs immutable { // ... } void foo(const char[] s) { // ... doesn't mutate s ... bar(s); // <-- ERROR: not immutable } A solution is to make an immutable copy of the string before calling bar(), but that would be the same inefficiency as how the original caller could make a copy before calling foo(): import std.conv; void foo(const char[] s) { // ... doesn't mutate s ... bar(to!string(s)); // <-- sometimes an unnecessary copy } Although to!string is a no-op when the original string is immutable, there will always be a copy above because the 'const char[]' (or const(char)[]) parameter erases the mutability attribute of the string. The compiler can't know whether the original string has been mutable or immutable. In order to retain the mutability information, one can think of inout. Unfortunately, conditional compilation as in the following code can't work with inout: void foo(inout(char)[] s) { // ... doesn't mutate s ... // NOT A SOLUTION static if (is (typeof(s[0]) == immutable)) { writeln("immutable"); } else { writeln("mutable"); } } This is not surprising: Since inout is not a template the compiler will generate the same code for mutable, const, and immutable. So, in order to support all, such parameters cannot be mutated in the function body. Another way of retaining the mutability information is using templates. Since this time the actual "type" would be erased, we can (and should) use a template constraint to accept only strings: void foo(T)(T[] s) if (is(Unqual!T == char)) { // ... doesn't mutate s ... } (Note: Of course the constraint could also be isSomeChar!T, but I wanted to continue with the same example that uses 'char'.) This is great because now we can pass the parameter to to!string unconditionally and don't pay anything if the original is already an immutable string. Here is a program that demonstrates that nothing is being copied for originally immutable strings: import std.stdio; import std.traits; import std.conv; void foo(T)(T[] s_param) if (is(Unqual!T == char)) { writeln('\n', s_param); writeln("in foo: ", s_param.ptr); auto s = to!(string)(s_param); bar(s); } void bar(string s) { writeln("in bar: ", s.ptr); } void main() { char[] m = "originally mutable".dup; foo(m); foo("originally immutable"); } Here is the output on my system: originally mutable in foo: 2B70E4DE7F80 in bar: 2B70E4DE7F60 // <-- copied originally immutable in foo: 47D820 in bar: 47D820 // <-- not copied CONCLUSION: * For parameters of reference types that are not modified in the function, const is a better choice than immutable because const can take both mutable and immutable. (I still include this among the guidelines under the "How to use" section here: http://ddili.org/ders/d.en/const_and_immutable.html ) * The choice above complicates matters when the parameter needs to be forwarded to a function that takes as immutable, because 'const' erases the mutability information of the actual variable. * The solution is to make the function a template so that the actual type is retained. This solution prevents unnecessary copies when the actual variable is already immutable. QUESTIONS: What do you think about all of this? Can you see better idioms? Should we simply ignore this issue and stick with immutable anyway, especially for strings since they are everywhere? Should the original foo() take string and have the callers make a copy if the original variable is mutable? Thank you, Ali P.S. I had opened a similar thread about the return types of functions: http://forum.dlang.org/thread/itr5o1$poi$1 digitalmars.com Since then, I have learned that pure functions can simply return by mutable because the return values of pure functions can implicitly be converted to immutable: char[] foo() pure // <-- returns mutable { char[] result; return result; } void main() { char[] m = foo(); // <-- works string s = foo(); // <-- works }
Jul 13 2012
On Friday, July 13, 2012 10:19:34 Ali Çehreli wrote:QUESTIONS: What do you think about all of this? Can you see better idioms? Should we simply ignore this issue and stick with immutable anyway, especially for strings since they are everywhere? Should the original foo() take string and have the callers make a copy if the original variable is mutable?In a lot of cases, the answer is to simply template on string type so that you don't care. In other cases, you're going to need a string anyway, so why bother templatizing (e.g. you need to assign the string to a member variable or pass it to a C function)? And in some cases, there's no real gain in taking anything other than string. It was debated for std.file whether it should take anything other than string, and we stuck with string, because it just wasn't worth bothering with anything else. Any string-related costs were nothing in comparison to the I/O going on, and whether string or wstring was what was needed for calling the C functions depended on the OS. So, we just stuck with string. I believe that std.net.curl ends up using a combinatin of const(char)[] and string depending on what the string is ultimately used for. std.path and std.string on the other hand templatize everything so that they work with any string type, but they're operating on strings and returning them, not saving them or using them with other APIs (let alone C APIs). In general, I would argue that if you're going to operate on a string and then return it, you should templatize on string type (or even use a range, if the funciton can be genericized beyond strings). But in cases where you're going to need a specific string type, it's often best to just take that string type (usually string). In some of those cases though (particularly if you're going to have to copy it anyway), taking const(char)[] makes more sense. Classes are one case where you're likely to be forced to take either string or const(char) [], because you can't templatize virtual functions. So, ultimately, what makes the most sense depends entirely on the situation. I'm not sure that we can really give any hard and fast rules. It mostly comes down to balancing flexibility and performance. We should make functions as flexible as possible with regards string type but restrict it when it makes sense to do so for performance or when the extra flexibility just isn't warranted (or isn't possible - e.g. virtual functions). You make very good points overall, and I think that you understand the situation fairly well, but I don't know how we'd really go about giving much in the way of concrete guidelines, since it's so situation-dependent. - Jonathan M Davis
Jul 13 2012
Ali Çehreli:tl;dr - Please see the conclusion section below.One case for your discussion: http://d.puremagic.com/issues/show_bug.cgi?id=8164 Bye, bearophile
Jul 13 2012
I know this is really old, but just catching up on old posts. On Fri, 13 Jul 2012 13:19:34 -0400, Ali =C3=87ehreli <acehreli yahoo.com=wrote:tl;dr - Please see the conclusion section below.[snip]CONCLUSION: * For parameters of reference types that are not modified in the =function, const is a better choice than immutable because const can ta=ke =both mutable and immutable. (I still include this among the guidelines under the "How to use" =section here: http://ddili.org/ders/d.en/const_and_immutable.html ) * The choice above complicates matters when the parameter needs to be ==forwarded to a function that takes as immutable, because 'const' erase=s =the mutability information of the actual variable. * The solution is to make the function a template so that the actual =type is retained. This solution prevents unnecessary copies when the =actual variable is already immutable.IMO, a function that does not utilize the benefits of immutable should = actually be re-labeled const or inout. For example (please, don't suggest I use some tricks to make this a one = = liner, it's an example): int foo(immutable(int)[] arr) { int x =3D 0; foreach(m ; arr) x +=3D m; return x; } Clearly, this does not need to be immutable. Making it immutable does n= ot = help or change anything. However, we have this special case of char[] arrays, because the most = common type used is 'string', which is immutable. But using string has benefits -- you can simply store the string somewhe= re = without worrying it gets changed or erased. However, the library (phobos) should not force you into immutable ever. = = Yes, strings are immutable, and we can have some benefits for that, but = = phobos shouldn't make it difficult to avoid immutable, it should not hav= e = an opinion there. Almost all phobos functions that accept 'strings' should take = const(char)[] or inout(char)[], not string. Now, we cannot control what we don't write, so it's quite easy to see th= at = someone might label something as string when it should have been = inout(char)[], and you simply have to deal with it. I think the correct= = solution is to define both an inout/const version, which uses .idup, and= = an immutable version which uses does not. There is no reason to have a = = mutable version. It would be nice to be able to make a recommended pattern that allows yo= u = to avoid code duplication. Something like this: template constOrImmutable(T) { static if(is(T =3D=3D immutable)) alias T constOrImmutable; else alias const(T) constOrImmutable; } void foo(T)(constOrImmutable!(T)[] arg) { bar(to!(immutable(T)[])(arg)); // should idup if T is not immutable= } Of course, this still results in two identical instantiations for mutabl= e = and const, even though the resulting code is the same. Hopefully the = compiler optimizes this out. -Steve
Sep 13 2012