www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - Auto-casting in range based functions?

reply "Andrew Stanton" <refefer gmail.com> writes:
I have been playing around with D as a scripting tool and have 
been running into the following issue:

-----------------------------------
import std.algorithm;

struct Delim {
     char delim;
     this(char d) {
         delim = d;
     }
}

void main() {
     char[] d = ['a', 'b', 'c'];
     auto delims = map!Delim(d);
}

/*
Compiling gives me the following error:
/usr/include/d/dmd/phobos/std/algorithm.d(382): Error: 
constructor test.Delim.this (char d) is not callable using 
argument types (dchar)
/usr/include/d/dmd/phobos/std/algorithm.d(382): Error: cannot 
implicitly convert expression ('\U0000ffff') of type dchar to char

*/

-----------------------------------

As someone who most of the time doesn't need to handle unicode, 
is there a way I can convince these functions to not upcast char 
to dchar?  I can't think of a way to make the code more explicit 
in its typing.
May 13 2012
next sibling parent Artur Skawina <art.08.09 gmail.com> writes:
On 05/13/12 19:49, Andrew Stanton wrote:
 I have been playing around with D as a scripting tool and have been running
into the following issue:
 
 -----------------------------------
 import std.algorithm;
 
 struct Delim {
     char delim;
     this(char d) {
         delim = d;
     }
 }
 
 void main() {
     char[] d = ['a', 'b', 'c'];
     auto delims = map!Delim(d);
 }
 
 /*
 Compiling gives me the following error:
 /usr/include/d/dmd/phobos/std/algorithm.d(382): Error: constructor
test.Delim.this (char d) is not callable using argument types (dchar)
 /usr/include/d/dmd/phobos/std/algorithm.d(382): Error: cannot implicitly
convert expression ('\U0000ffff') of type dchar to char
 
 */
 
 -----------------------------------
 
 As someone who most of the time doesn't need to handle unicode, is there a way
I can convince these functions to not upcast char to dchar?  I can't think of a
way to make the code more explicit in its typing.
Well, if you don't want/need utf8 at all: alias ubyte ascii; int main() { ascii[] d = ['a', 'b', 'c']; auto delims = map!Delim(d); //... and if you want to avoid utf8 just for this case (ie you "know" 'd[]' contains just ascii) something like this should work: char[] d = ['a', 'b', 'c']; auto delims = map!((c){assert(c<128); return Delim(cast(char)c);})(d); (it's probably more efficient when written as auto delims = map!Delim(cast(ascii[])d); but you loose the safety checks) artur
May 13 2012
prev sibling parent Jonathan M Davis <jmdavisProg gmx.com> writes:
On Sunday, May 13, 2012 19:49:00 Andrew Stanton wrote:
 I have been playing around with D as a scripting tool and have
 been running into the following issue:
 
 -----------------------------------
 import std.algorithm;
 
 struct Delim {
      char delim;
      this(char d) {
          delim = d;
      }
 }
 
 void main() {
      char[] d = ['a', 'b', 'c'];
      auto delims = map!Delim(d);
 }
 
 /*
 Compiling gives me the following error:
 /usr/include/d/dmd/phobos/std/algorithm.d(382): Error:
 constructor test.Delim.this (char d) is not callable using
 argument types (dchar)
 /usr/include/d/dmd/phobos/std/algorithm.d(382): Error: cannot
 implicitly convert expression ('\U0000ffff') of type dchar to char
 
 */
 
 -----------------------------------
 
 As someone who most of the time doesn't need to handle unicode,
 is there a way I can convince these functions to not upcast char
 to dchar?  I can't think of a way to make the code more explicit
 in its typing.
_All_ string types are considered ranges of dchar and treated as such. That means that narrow strings (e.g. arrays of char or wchar) are not random-access ranges and have no length property as far as range-based functions are concerned. So, you can _never_ have char[] treated as a range of char by any Phobos functions. char[] is UTF-8 by definition, and range-based functions in Phobos operates on code points, not code units. If you want a char[] to be treated as a range of char, then you're going to have to use ubyte[] instead. e.g. char[] d = ['a', 'b', 'c']; auto delims = map!Delim(cast(ubyte[])d); Now, personally, I would argue that you should just use dchar, not char, because regadless of what you are or aren't doing with unicode right now, the odds are that you'll end up processing unicode at some point, and if you're in the habit of using char, you're going to get all kinds of bugs. So, if you just did struct Delim { dchar delim; this(dchar d) { delim = d; } } void main() { char[] d = ['a', 'b', 'c']; auto delims = map!Delim(d); } then it should work just fine. And if you really need a char instead of dchar for some reason, you can always just use std.conv.to - to!char(value) - which will then throw if you're trying to convert a code point that won't fit in a char. In general, any code which has a variable of char or wchar as a variable rather than an element in an array is a red flag which indicates a likely bug or bad design. In specific circumstances, you may need to do so, but in general, it's just asking for bugs. And you're going to have to be fighting Phobos all the time if you try and use ranges of code units rather than ranges of code points. - Jonathan M Davis
May 13 2012