digitalmars.D - Shouldn't hasSwappableElements work on char arrays?
- Andrej Mitrovic (63/63) Feb 24 2011 Would it be wrong if hasSwappableElements worked on char arrays?
- Andrej Mitrovic (2/2) Feb 24 2011 Now I see why using char[] fails. It's because [ElementType!(R).init];
- Jesse Phillips (2/4) Feb 24 2011 Yep, Unicode for the win. dchar[] is swappable.
- Andrej Mitrovic (9/13) Feb 24 2011 Oh looks like you're right. I can use reverse on dchar[]. Weird, I
- Steven Schveighoffer (10/26) Feb 24 2011 A string literal is immutable, dchar[] is mutable. These should work:
- Andrej Mitrovic (9/14) Feb 24 2011 Ah right, the postfix form. That's what i was looking for. I know a
- Jesse Phillips (3/7) Feb 24 2011 Well, aside from discussions to create a proper string type, Text in Tan...
- Andrei Alexandrescu (4/11) Feb 24 2011 Swapping a char[] correctly (preserving the proper code units) without
- bearophile (4/8) Feb 24 2011 There's a need for both unicode strings, and simpler strings of 7 bit AS...
- Jonathan M Davis (10/21) Feb 24 2011 Honestly, I think that the need for actual ASCII strings is quite rare a...
- bearophile (4/6) Feb 24 2011 I need ASCII strings (or mutable/immutable arrays of ASCII chars) all th...
- Jonathan M Davis (15/21) Feb 24 2011 And I would strongly argue that you shouldn't be using ASCII for text un...
- Jonathan M Davis (13/30) Feb 24 2011 It's because the type of an expression has nothing to do with what it's ...
- =?ISO-8859-1?Q?Ali_=C7ehreli?= (13/14) Feb 24 2011 Sorry to take it out of context but that statement is not always
- Andrej Mitrovic (2/4) Feb 24 2011 Can't the compiler figure that out on its own?
- Steven Schveighoffer (9/13) Feb 24 2011 It did figure that out (that it was bad) and told you not to do it :)
- Andrej Mitrovic (12/18) Feb 24 2011 Only when the lhs is a mutable type. If it's immutable (string), then
- Steven Schveighoffer (12/15) Feb 24 2011 It's a 'hidden allocation'. It leads to low performance code that looks...
Would it be wrong if hasSwappableElements worked on char arrays? Look: import std.stdio; import std.algorithm; import std.range; import std.traits; void main() { char[] r = "abc".dup; // fails: static assert(hasSwappableElements!(char[])); // fails because reverse uses hasSwappableElements(r) // as a constraint: reverse(r); // But this works just fine.. swap(r[0], r[2]); assert(r == "cba"); } If you comment out the static assert and the reverse, you'll see that swap works fine on char arrays if you give it an index. Here's an experimental implementation of hasSwappableElements that could work for char[]'s: import std.stdio; import std.algorithm; import std.range : isForwardRange, ElementType; import std.traits; template hasSwappableElements(R) { enum bool hasSwappableElements = isForwardRange!(R) && is(typeof( { auto r = [ElementType!(R).init]; swap(r[0], r[0]); }())); } void main() { char[] r = "abc".dup; // now works: static assert(hasSwappableElements!(char[])); swap(r[0], r[2]); assert(r == "cba"); } Here's another thing that's interesting. If I replace "auto r" with "R r" in the modified hasSwappableElements implementation, then the assert fails: import std.stdio; import std.algorithm; import std.range : isForwardRange, ElementType; import std.traits; template hasSwappableElements(R) { enum bool hasSwappableElements = isForwardRange!(R) && is(typeof( { R r = [ElementType!(R).init]; swap(r[0], r[0]); }())); } void main() { // Fails static assert(hasSwappableElements!(char[])); } The same thing happens if I replace "R" with "char[]". This might be some kind of bug?
Feb 24 2011
Now I see why using char[] fails. It's because [ElementType!(R).init]; returns a dchar[].
Feb 24 2011
Andrej Mitrovic Wrote:Now I see why using char[] fails. It's because [ElementType!(R).init]; returns a dchar[].Yep, Unicode for the win. dchar[] is swappable.
Feb 24 2011
On 2/24/11, Jesse Phillips <jessekphillips+D gmail.com> wrote:Andrej Mitrovic Wrote:Oh looks like you're right. I can use reverse on dchar[]. Weird, I thought I've already tried that. P.S. Why do I have to use this gibberish syntax?: dchar[] test = to!(dchar[])("test"); Isn't the compiler smart enough to do this for me automatically? It's a string literal.. Still, I don't see why char arrays should fail on hasSwappableElements when swap can be used on char arrays?Now I see why using char[] fails. It's because [ElementType!(R).init]; returns a dchar[].Yep, Unicode for the win. dchar[] is swappable.
Feb 24 2011
On Thu, 24 Feb 2011 14:42:38 -0500, Andrej Mitrovic <andrej.mitrovich gmail.com> wrote:On 2/24/11, Jesse Phillips <jessekphillips+D gmail.com> wrote:A string literal is immutable, dchar[] is mutable. These should work: immutable(dchar)[] test = "test"; dstring test = "test"; auto test = "test"d;Andrej Mitrovic Wrote:Oh looks like you're right. I can use reverse on dchar[]. Weird, I thought I've already tried that. P.S. Why do I have to use this gibberish syntax?: dchar[] test = to!(dchar[])("test"); Isn't the compiler smart enough to do this for me automatically? It's a string literal..Now I see why using char[] fails. It's because [ElementType!(R).init]; returns a dchar[].Yep, Unicode for the win. dchar[] is swappable.Still, I don't see why char arrays should fail on hasSwappableElements when swap can be used on char arrays?wait, you thought char[] was an array? You poor poor soul ;) I predict we shall get 1-2 questions/claims of incredulity like this a month until we get a real string type. -Steve
Feb 24 2011
On 2/24/11, Steven Schveighoffer <schveiguy yahoo.com> wrote:A string literal is immutable, dchar[] is mutable. These should work: immutable(dchar)[] test = "test"; dstring test = "test"; auto test = "test"d;Ah right, the postfix form. That's what i was looking for. I know a literal is immutable, I've tried "test".dup, but it still complained that it's not of type dchar. Anywho..wait, you thought char[] was an array? You poor poor soul ;)Yes. And you know what's going to happen next, right? Everyone is going to create their own implementation of a string type because of these non-issues. Happens in C/C++ all the time, I see it in almost every mid-large codebase out there. But w/e, strings are perfect in D etc etc..
Feb 24 2011
Andrej Mitrovic Wrote:Yes. And you know what's going to happen next, right? Everyone is going to create their own implementation of a string type because of these non-issues. Happens in C/C++ all the time, I see it in almost every mid-large codebase out there.Well, aside from discussions to create a proper string type, Text in Tango and mText on dprogramming.org I have yet to see people using a special string type. I'm not sure why you are complaining that compiler is preventing you from doing something stupid. An array of char is not swappable, a range of char is not swappable. D is not in the habit of hiding complexity, trying to swap a string requires a conversion to dchar or at least handling a sequence of char so it is best to state that is what is happening in the code.
Feb 24 2011
On 2/24/11 3:13 PM, Jesse Phillips wrote:Andrej Mitrovic Wrote:Swapping a char[] correctly (preserving the proper code units) without using additional storage is a very interesting problem. AndreiYes. And you know what's going to happen next, right? Everyone is going to create their own implementation of a string type because of these non-issues. Happens in C/C++ all the time, I see it in almost every mid-large codebase out there.Well, aside from discussions to create a proper string type, Text in Tango and mText on dprogramming.org I have yet to see people using a special string type. I'm not sure why you are complaining that compiler is preventing you from doing something stupid. An array of char is not swappable, a range of char is not swappable. D is not in the habit of hiding complexity, trying to swap a string requires a conversion to dchar or at least handling a sequence of char so it is best to state that is what is happening in the code.
Feb 24 2011
Steven Schveighoffer:wait, you thought char[] was an array? You poor poor soul ;) I predict we shall get 1-2 questions/claims of incredulity like this a month until we get a real string type.There's a need for both unicode strings, and simpler strings of 7 bit ASCII chars (both mutable and immutable. The immutable ones must not allow to change their length. Their hashing value may be computed lazily even for the immutable strings). A ubyte[] is not a good enough replacement for an ASCII string. Even a puny language like Python3 has recognized this. Bye, bearophile
Feb 24 2011
On Thursday, February 24, 2011 13:55:43 bearophile wrote:Steven Schveighoffer:Honestly, I think that the need for actual ASCII strings is quite rare and that it _should_ not be encouraged. However, it would be trivial to declare wrappers for char and wchar (e.g. charRange and wcharRange) which actually use char or wchar as their element type if it's really needed. In most cases, however, using unicode strings is what should be happening, so the fact that char[] doesn't work as a range is a _good_ thing. The only real problem with it is the fact that foreach doesn't use dchar as its default iteration type when iterating over arrays of char or wchar. - Jonathan M Daviswait, you thought char[] was an array? You poor poor soul ;) I predict we shall get 1-2 questions/claims of incredulity like this a month until we get a real string type.There's a need for both unicode strings, and simpler strings of 7 bit ASCII chars (both mutable and immutable. The immutable ones must not allow to change their length. Their hashing value may be computed lazily even for the immutable strings). A ubyte[] is not a good enough replacement for an ASCII string. Even a puny language like Python3 has recognized this.
Feb 24 2011
Jonathan M Davis:Honestly, I think that the need for actual ASCII strings is quite rare and that it _should_ not be encouraged.I need ASCII strings (or mutable/immutable arrays of ASCII chars) all the time, they come from English text, genomic data, etc. Bye, bearophile
Feb 24 2011
On Thursday, February 24, 2011 14:55:34 bearophile wrote:Jonathan M Davis:And I would strongly argue that you shouldn't be using ASCII for text unless you _need_ to. Unicode does the job just fine and doesn't run into problems when you end up having to have non-ASCII characters. Far, far too many programs have been written with the assumption that ASCII was good enough and then had to be altered to work with unicode later. Using pure ASCII should be an optimization and only done if it's necessary. For something like genomic data, there's a good chance that such an optimization would be necessary because you needed a random-access range and you risk using too much memory using dstrings (and you know that all of the characters are valid chars, because they're limited to the few characters used to hold genomic data). But most people aren't dealing with genomic data. They're usually dealing with text, and text should be unicode. Assuming that all you're ever going to need is ASCII characters is generally unwise when dealing with text. - Jonathan M DavisHonestly, I think that the need for actual ASCII strings is quite rare and that it _should_ not be encouraged.I need ASCII strings (or mutable/immutable arrays of ASCII chars) all the time, they come from English text, genomic data, etc.
Feb 24 2011
On Thursday, February 24, 2011 11:42:38 Andrej Mitrovic wrote:On 2/24/11, Jesse Phillips <jessekphillips+D gmail.com> wrote:It's because the type of an expression has nothing to do with what it's assigned to. So, the type of "test" is string, not dchar[] (on top of the fact that - on Linux at least - "test" _is_ immutable, so assigning it to a dchar[] without duping it is bad anyway). So, the result of the expression on the right-hand side of the assignment does _not_ match the type of the variable being assigned to (or initialized in this case).Andrej Mitrovic Wrote:Oh looks like you're right. I can use reverse on dchar[]. Weird, I thought I've already tried that. P.S. Why do I have to use this gibberish syntax?: dchar[] test = to!(dchar[])("test"); Isn't the compiler smart enough to do this for me automatically? It's a string literal..Now I see why using char[] fails. It's because [ElementType!(R).init]; returns a dchar[].Yep, Unicode for the win. dchar[] is swappable.Still, I don't see why char arrays should fail on hasSwappableElements when swap can be used on char arrays?It's because char arrays are ranges of dchar, _not_ char. So, you _can't_ swap them. And since each individual char is potentially meaningless on its own (since in anything other than straight ASCII, you're going to need multiple chars per character), swapping them makes no sense. hasSwappableElements deals with ranges, not arrays. And char[] is a range of dchar, not char. - Jonathan M Davis
Feb 24 2011
On 02/24/2011 12:14 PM, Jonathan M Davis wrote:the type of "test" is stringSorry to take it out of context but that statement is not always correct. String literals can be string, wstring, or dstring: void foo(string c, wstring w, dstring d) {} void main() { foo("c", "w", "d"); // <- this compiles // But the following fails to compile: // string s; // foo(s, s, s); } Ali
Feb 24 2011
On 2/24/11, Jonathan M Davis <jmdavisProg gmx.com> wrote:"test" _is_ immutable, so assigning it to a dchar[] without duping it is bad anyway).Can't the compiler figure that out on its own?
Feb 24 2011
On Thu, 24 Feb 2011 15:33:52 -0500, Andrej Mitrovic <andrej.mitrovich gmail.com> wrote:On 2/24/11, Jonathan M Davis <jmdavisProg gmx.com> wrote:It did figure that out (that it was bad) and told you not to do it :) But what you are asking is for the compiler to implicitly dup it. I have thought this might be good to have in the past as well, but it's also not too bad to have to type "test"d.dup. So while having the compiler save you a bit of typing would be good, it's not the end of the world to require it. -Steve"test" _is_ immutable, so assigning it to a dchar[] without duping it is bad anyway).Can't the compiler figure that out on its own?
Feb 24 2011
On 2/24/11, Steven Schveighoffer <schveiguy yahoo.com> wrote:But what you are asking is for the compiler to implicitly dup it.Only when the lhs is a mutable type. If it's immutable (string), then you don't have to dup it. Hence: string a = "abc"; string b = "abc"; assert(&a[0] == &b[0]); There's no point in duping the literal in this case, it would just waste memory.I have thought this might be good to have in the past as well, but it's also not too bad to have to type "test"d.dup. So while having the compiler save you a bit of typing would be good, it's not the end of the world to require it.Of course it's not that hard. But when things can be safely automated, I don't see why they shouldn't be. Unless I'm missing some important factor of duping string literals that was not mentioned already. Btw, "test"d.dup is actually pretty nice. I would have used it before, but I didn't know I could use a postfix form /and/ dup it like that.
Feb 24 2011
On Thu, 24 Feb 2011 16:21:08 -0500, Andrej Mitrovic <andrej.mitrovich gmail.com> wrote:Of course it's not that hard. But when things can be safely automated, I don't see why they shouldn't be. Unless I'm missing some important factor of duping string literals that was not mentioned already.It's a 'hidden allocation'. It leads to low performance code that looks like it's really fast. There are plenty of examples of hidden allocation in D already which I would hope we could get rid of, I wouldn't want to add more. For example, try using an AA literal as an enum, and then use that enum in lots of places. Guess what? Each time you use it, the runtime constructs a new instance of the AA! I think we should strive to require explicit requests for allocations as much as possible. -Steve
Feb 24 2011