digitalmars.D.learn - Why does enumerate over range return dchar, when ranging without
- James Blachly (38/38) May 02 2018 I am puzzled why enumerating in a foreach returns a dchar (which
- rikki cattermole (10/59) May 02 2018 The first example uses auto-decoding (UTF-8 codepoints into a single
- ag0aep6g (9/56) May 03 2018 The first example (foreach over a char[]) doesn't do any decoding. UTF-8...
- rikki cattermole (17/78) May 03 2018 Hmm, I swear this use to work.
- Jonathan M Davis (4/84) May 03 2018 The standard way to get around auto-decoding is std.utf.byCodeUnit.
I am puzzled why enumerating in a foreach returns a dchar (which forces me to cast), whereas without the enumerate the range returns a char as expected. Example: ``` import std.stdio; import std.range : enumerate; void main() { char[] s = ['a','b','c']; char[3] x; auto i = 0; foreach(c; s) { x[i] = c; i++; } writeln(x); } ``` Above works without cast. ''' import std.stdio; import std.range : enumerate; void main() { char[] s = ['a','b','c']; char[3] x; foreach(i, c; enumerate(s)) { x[i] = c; i++; } writeln(x); } ``` Above fails without casting c to type char. The function signature for enumerate shows "auto" return type, so that does not help me understand. Kind regards
May 02 2018
On 03/05/2018 5:44 PM, James Blachly wrote:I am puzzled why enumerating in a foreach returns a dchar (which forces me to cast), whereas without the enumerate the range returns a char as expected. Example: ``` import std.stdio; import std.range : enumerate; void main() { char[] s = ['a','b','c']; char[3] x; auto i = 0; foreach(c; s) { x[i] = c; i++; } writeln(x); } ``` Above works without cast. ''' import std.stdio; import std.range : enumerate; void main() { char[] s = ['a','b','c']; char[3] x; foreach(i, c; enumerate(s)) { x[i] = c; i++; } writeln(x); } ``` Above fails without casting c to type char. The function signature for enumerate shows "auto" return type, so that does not help me understand. Kind regardsThe first example uses auto-decoding (UTF-8 codepoints into a single UTF-32 one). This is considered a bad thing. But the compiler can disable it and leave it as UTF-8 code point upon request. The second example returns a Voldemort type (means no-name) which happens to be an input range. Where it can't disable anything and has been told that it is returning a dchar. See[0] as to where this gets decoded. Writing two small functions to replace it (and popFront), will override this behavior. [0] https://dlang.org/phobos/std_range_primitives.html#.front
May 02 2018
On 05/03/2018 07:56 AM, rikki cattermole wrote:[...]``` import std.stdio; import std.range : enumerate; void main() { char[] s = ['a','b','c']; char[3] x; auto i = 0; foreach(c; s) { x[i] = c; i++; } writeln(x); } ``` Above works without cast. ''' import std.stdio; import std.range : enumerate; void main() { char[] s = ['a','b','c']; char[3] x; foreach(i, c; enumerate(s)) { x[i] = c; i++; } writeln(x); } ```The first example uses auto-decoding (UTF-8 codepoints into a single UTF-32 one). This is considered a bad thing. But the compiler can disable it and leave it as UTF-8 code point upon request.The first example (foreach over a char[]) doesn't do any decoding. UTF-8 stays UTF-8. Also, a `char` is a UTF-8 code *unit*, not a code *point*.The second example returns a Voldemort type (means no-name) which happens to be an input range. Where it can't disable anything and has been told that it is returning a dchar. See[0] as to where this gets decoded.This is auto decoding.Writing two small functions to replace it (and popFront), will override this behavior.This sounds like you can disable auto decoding by providing your own range primitives in your own module. That doesn't work, because Phobos would still use the ones from std.range.primitives.[0] https://dlang.org/phobos/std_range_primitives.html#.front
May 03 2018
On 03/05/2018 9:50 PM, ag0aep6g wrote:On 05/03/2018 07:56 AM, rikki cattermole wrote:Hmm, I swear this use to work. Oh well, easy fix: import std.algorithm; struct Wrapper { char[] input; alias input this; property char front() { return input[0]; } property bool empty() {return input.length == 0;} void popFront() { input = input[1 .. $]; } } void main() { char[] text = ['1', '2', '3']; foreach(c; Wrapper(text).filter!(a => a != '\0')) { pragma(msg, typeof(c)); } }[...]``` import std.stdio; import std.range : enumerate; void main() { char[] s = ['a','b','c']; char[3] x; auto i = 0; foreach(c; s) { x[i] = c; i++; } writeln(x); } ``` Above works without cast. ''' import std.stdio; import std.range : enumerate; void main() { char[] s = ['a','b','c']; char[3] x; foreach(i, c; enumerate(s)) { x[i] = c; i++; } writeln(x); } ```The first example uses auto-decoding (UTF-8 codepoints into a single UTF-32 one). This is considered a bad thing. But the compiler can disable it and leave it as UTF-8 code point upon request.The first example (foreach over a char[]) doesn't do any decoding. UTF-8 stays UTF-8. Also, a `char` is a UTF-8 code *unit*, not a code *point*.The second example returns a Voldemort type (means no-name) which happens to be an input range. Where it can't disable anything and has been told that it is returning a dchar. See[0] as to where this gets decoded.This is auto decoding.Writing two small functions to replace it (and popFront), will override this behavior.This sounds like you can disable auto decoding by providing your own range primitives in your own module. That doesn't work, because Phobos would still use the ones from std.range.primitives.
May 03 2018
On Thursday, May 03, 2018 22:00:04 rikki cattermole via Digitalmars-d-learn wrote:On 03/05/2018 9:50 PM, ag0aep6g wrote:The standard way to get around auto-decoding is std.utf.byCodeUnit. - Jonathan M DavisOn 05/03/2018 07:56 AM, rikki cattermole wrote:Hmm, I swear this use to work. Oh well, easy fix: import std.algorithm; struct Wrapper { char[] input; alias input this; property char front() { return input[0]; } property bool empty() {return input.length == 0;} void popFront() { input = input[1 .. $]; } } void main() { char[] text = ['1', '2', '3']; foreach(c; Wrapper(text).filter!(a => a != '\0')) { pragma(msg, typeof(c)); } }[...]``` import std.stdio; import std.range : enumerate; void main() { char[] s = ['a','b','c']; char[3] x; auto i = 0; foreach(c; s) { x[i] = c; i++; } writeln(x); } ``` Above works without cast. ''' import std.stdio; import std.range : enumerate; void main() { char[] s = ['a','b','c']; char[3] x; foreach(i, c; enumerate(s)) { x[i] = c; i++; } writeln(x); } ```The first example uses auto-decoding (UTF-8 codepoints into a single UTF-32 one). This is considered a bad thing. But the compiler can disable it and leave it as UTF-8 code point upon request.The first example (foreach over a char[]) doesn't do any decoding. UTF-8 stays UTF-8. Also, a `char` is a UTF-8 code *unit*, not a code *point*.The second example returns a Voldemort type (means no-name) which happens to be an input range. Where it can't disable anything and has been told that it is returning a dchar. See[0] as to where this gets decoded.This is auto decoding.Writing two small functions to replace it (and popFront), will override this behavior.This sounds like you can disable auto decoding by providing your own range primitives in your own module. That doesn't work, because Phobos would still use the ones from std.range.primitives.
May 03 2018