digitalmars.D - Range functions expand char to dchar
- Matt Kline (32/32) Sep 08 2015 After seeing Walter's DConf presentation from this year, I've
- Matt Kline (4/5) Sep 08 2015 My apologies for double-posting, but is this intended behavior,
- Dmitry Olshansky (7/12) Sep 08 2015 Historical consequence of enabling auto-decoding for arrays of char and
- anonymous (43/76) Sep 08 2015 By design with regrets:
- Matt Kline (5/19) Sep 08 2015 A bit verbose, but I suppose that will do.
After seeing Walter's DConf presentation from this year, I've been making an effort to use range algorithms more, such as using chain() and joiner() as an alternative to array concatenation and std.array.join. Unfortunately, doing so with strings has been problematic, as these algorithms expand strings into dstrings. An example: import std.algorithm; import std.range; import std.stdio; import std.regex; void main() { // One would expect this to be a range of chars auto test = chain("foo", "bar", "baz"); // prints "dchar" writeln(typeid(typeof(test.front))); auto arr = ["foo", "bar", "baz"]; auto joined = joiner(arr, ", "); // Also "dchar" writeln(typeid(typeof(joined.front))); // Problems ensue if one assumes the result of joined is a char string. auto r = regex(joined); matchFirst("won't compile", r); // Compiler error } Whether by design or by oversight, this is quite undesirable. It violates the principle of least astonishment (one wouldn't expect joining a bunch of strings would result in a dstring), causing issues such as the one shown above. And, if I aim to use UTF-8 consistently throughout my applications (see http://utf8everywhere.org/), what am I to do?
Sep 08 2015
On Tuesday, 8 September 2015 at 17:52:13 UTC, Matt Kline wrote:Whether by design or by oversight, this is quite undesirable.My apologies for double-posting, but is this intended behavior, or an unfortunate consequence of the metaprogramming used to determine the resulting type of these range functions?
Sep 08 2015
On 08-Sep-2015 20:57, Matt Kline wrote:On Tuesday, 8 September 2015 at 17:52:13 UTC, Matt Kline wrote:Historical consequence of enabling auto-decoding for arrays of char and wchar (and only those). Today it's recognized that one should either wrap an array of char as code unit range or code point range explicitly using byUTF helper. -- Dmitry OlshanskyWhether by design or by oversight, this is quite undesirable.My apologies for double-posting, but is this intended behavior, or an unfortunate consequence of the metaprogramming used to determine the resulting type of these range functions?
Sep 08 2015
On Tuesday 08 September 2015 19:52, Matt Kline wrote:An example: import std.algorithm; import std.range; import std.stdio; import std.regex; void main() { // One would expect this to be a range of chars auto test = chain("foo", "bar", "baz"); // prints "dchar" writeln(typeid(typeof(test.front))); auto arr = ["foo", "bar", "baz"]; auto joined = joiner(arr, ", "); // Also "dchar" writeln(typeid(typeof(joined.front))); // Problems ensue if one assumes the result of joined is a char string. auto r = regex(joined); matchFirst("won't compile", r); // Compiler error } Whether by design or by oversight,By design with regrets: http://forum.dlang.org/post/m01r3d$1frl$1 digitalmars.comthis is quite undesirable. It violates the principle of least astonishment (one wouldn't expect joining a bunch of strings would result in a dstring),The result is a range of dchars actually, strictly not a dstring.causing issues such as the one shown above. And, if I aim to use UTF-8 consistently throughout my applications (see http://utf8everywhere.org/), what am I to do?You can use std.utf.byCodeUnit to get ranges of chars: ---- import std.algorithm; import std.array: array; import std.range; import std.stdio; import std.regex; import std.utf: byCodeUnit; void main() { auto test = chain("foo".byCodeUnit, "bar".byCodeUnit, "baz".byCodeUnit); pragma(msg, typeof(test.front)); /* "immutable(char)" */ auto arr = ["foo".byCodeUnit, "bar".byCodeUnit, "baz".byCodeUnit]; auto joined = joiner(arr, ", ".byCodeUnit); pragma(msg, typeof(joined.front)); /* "immutable(char)" */ /* Having char elements isn't enough. Need to turn the range into an array via std.array.array: */ auto r = regex(joined.array); matchFirst("won't compile", r); /* compiles */ } ---- Alternatively, since you have to materialize `joined` into an array anyway, you can use the dchar range and make a string from it when passing to `regex`: ---- import std.algorithm; import std.conv: to; import std.stdio; import std.regex; void main() { auto arr = ["foo", "bar", "baz"]; auto joined = joiner(arr, ", "); pragma(msg, typeof(joined.front)); /* "dchar" */ /* to!string now: */ auto r = regex(joined.to!string); matchFirst("won't compile", r); /* compiles */ } ----
Sep 08 2015
On Tuesday, 8 September 2015 at 18:21:34 UTC, anonymous wrote:By design with regrets: http://forum.dlang.org/post/m01r3d$1frl$1 digitalmars.com On Thursday, 25 September 2014 at 19:40:29 UTC, Walter Bright wrote:At least I'm not alone. :)Top of my list would be the auto-decoding behavior of std.array.front() on character arrays. Every time I'm faced with that I want to throw a chair through the window.You can use std.utf.byCodeUnit to get ranges of chars:A bit verbose, but I suppose that will do./* Having char elements isn't enough. Need to turn the range into an array via std.array.array: */ auto r = regex(joined.array); matchFirst("won't compile", r); /* compiles */ }If we have a range of char elements, won't that do? regex() uses the standard isSomeString!S constraint to take any range of chars.
Sep 08 2015
On Tuesday 08 September 2015 20:28, Matt Kline wrote:If we have a range of char elements, won't that do? regex() uses the standard isSomeString!S constraint to take any range of chars.isSomeString!S doesn't check if S is a range. It checks if S is "some string", meaning: "Char[], where Char is any of char, wchar or dchar, with or without qualifiers". http://dlang.org/phobos/std_traits.html#isSomeString Checking for ranges would be done with isInputRange, isForwardRange, etc. http://dlang.org/phobos/std_range_primitives.html
Sep 08 2015
On Tuesday, 8 September 2015 at 18:28:40 UTC, Matt Kline wrote:A bit verbose, but I suppose that will do.You could use map --- import std.algorithm : map; import std.utf : byCodeUnit; import std.array : array; auto arr = ["foo", "bar", "baz"].map!(a => a.byCodeUnit).array; ---
Sep 09 2015