www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - Why does enumerate over range return dchar, when ranging without

reply James Blachly <james.blachly gmail.com> writes:
I am puzzled why enumerating in a foreach returns a dchar (which 
forces me to cast), whereas without the enumerate the range 
returns a char as expected.

Example:

```
import std.stdio;
import std.range : enumerate;

void main()
{
	char[] s = ['a','b','c'];

     char[3] x;
     auto i = 0;
     foreach(c; s) {
         x[i] = c;
         i++;
     }

     writeln(x);
}
```
Above works without cast.

'''
import std.stdio;
import std.range : enumerate;

void main()
     {
	char[] s = ['a','b','c'];

     char[3] x;
     foreach(i, c; enumerate(s)) {
         x[i] = c;
         i++;
     }

     writeln(x);
}
```
Above fails without casting c to type char.

The function signature for enumerate shows "auto" return type, so 
that does not help me understand.

Kind regards
May 02 2018
parent reply rikki cattermole <rikki cattermole.co.nz> writes:
On 03/05/2018 5:44 PM, James Blachly wrote:
 I am puzzled why enumerating in a foreach returns a dchar (which forces 
 me to cast), whereas without the enumerate the range returns a char as 
 expected.
 
 Example:
 
 ```
 import std.stdio;
 import std.range : enumerate;
 
 void main()
 {
      char[] s = ['a','b','c'];
 
      char[3] x;
      auto i = 0;
      foreach(c; s) {
          x[i] = c;
          i++;
      }
 
      writeln(x);
 }
 ```
 Above works without cast.
 
 '''
 import std.stdio;
 import std.range : enumerate;
 
 void main()
      {
      char[] s = ['a','b','c'];
 
      char[3] x;
      foreach(i, c; enumerate(s)) {
          x[i] = c;
          i++;
      }
 
      writeln(x);
 }
 ```
 Above fails without casting c to type char.
 
 The function signature for enumerate shows "auto" return type, so that 
 does not help me understand.
 
 Kind regards
The first example uses auto-decoding (UTF-8 codepoints into a single UTF-32 one). This is considered a bad thing. But the compiler can disable it and leave it as UTF-8 code point upon request. The second example returns a Voldemort type (means no-name) which happens to be an input range. Where it can't disable anything and has been told that it is returning a dchar. See[0] as to where this gets decoded. Writing two small functions to replace it (and popFront), will override this behavior. [0] https://dlang.org/phobos/std_range_primitives.html#.front
May 02 2018
parent reply ag0aep6g <anonymous example.com> writes:
On 05/03/2018 07:56 AM, rikki cattermole wrote:
 ```
 import std.stdio;
 import std.range : enumerate;

 void main()
 {
      char[] s = ['a','b','c'];

      char[3] x;
      auto i = 0;
      foreach(c; s) {
          x[i] = c;
          i++;
      }

      writeln(x);
 }
 ```
 Above works without cast.

 '''
 import std.stdio;
 import std.range : enumerate;

 void main()
      {
      char[] s = ['a','b','c'];

      char[3] x;
      foreach(i, c; enumerate(s)) {
          x[i] = c;
          i++;
      }

      writeln(x);
 }
 ```
[...]
 The first example uses auto-decoding (UTF-8 codepoints into a single 
 UTF-32 one). This is considered a bad thing. But the compiler can 
 disable it and leave it as UTF-8 code point upon request.
The first example (foreach over a char[]) doesn't do any decoding. UTF-8 stays UTF-8. Also, a `char` is a UTF-8 code *unit*, not a code *point*.
 The second example returns a Voldemort type (means no-name) which 
 happens to be an input range. Where it can't disable anything and has 
 been told that it is returning a dchar. See[0] as to where this gets 
 decoded.
This is auto decoding.
 Writing two small functions to replace it (and popFront), will 
 override this behavior.
This sounds like you can disable auto decoding by providing your own range primitives in your own module. That doesn't work, because Phobos would still use the ones from std.range.primitives.
 [0] https://dlang.org/phobos/std_range_primitives.html#.front
May 03 2018
parent reply rikki cattermole <rikki cattermole.co.nz> writes:
On 03/05/2018 9:50 PM, ag0aep6g wrote:
 On 05/03/2018 07:56 AM, rikki cattermole wrote:
 ```
 import std.stdio;
 import std.range : enumerate;

 void main()
 {
      char[] s = ['a','b','c'];

      char[3] x;
      auto i = 0;
      foreach(c; s) {
          x[i] = c;
          i++;
      }

      writeln(x);
 }
 ```
 Above works without cast.

 '''
 import std.stdio;
 import std.range : enumerate;

 void main()
      {
      char[] s = ['a','b','c'];

      char[3] x;
      foreach(i, c; enumerate(s)) {
          x[i] = c;
          i++;
      }

      writeln(x);
 }
 ```
[...]
 The first example uses auto-decoding (UTF-8 codepoints into a single 
 UTF-32 one). This is considered a bad thing. But the compiler can 
 disable it and leave it as UTF-8 code point upon request.
The first example (foreach over a char[]) doesn't do any decoding. UTF-8 stays UTF-8. Also, a `char` is a UTF-8 code *unit*, not a code *point*.
 The second example returns a Voldemort type (means no-name) which 
 happens to be an input range. Where it can't disable anything and has 
 been told that it is returning a dchar. See[0] as to where this gets 
 decoded.
This is auto decoding.
 Writing two small functions to replace it (and popFront), will 
 override this behavior.
This sounds like you can disable auto decoding by providing your own range primitives in your own module. That doesn't work, because Phobos would still use the ones from std.range.primitives.
Hmm, I swear this use to work. Oh well, easy fix: import std.algorithm; struct Wrapper { char[] input; alias input this; property char front() { return input[0]; } property bool empty() {return input.length == 0;} void popFront() { input = input[1 .. $]; } } void main() { char[] text = ['1', '2', '3']; foreach(c; Wrapper(text).filter!(a => a != '\0')) { pragma(msg, typeof(c)); } }
May 03 2018
parent Jonathan M Davis <newsgroup.d jmdavisprog.com> writes:
On Thursday, May 03, 2018 22:00:04 rikki cattermole via Digitalmars-d-learn 
wrote:
 On 03/05/2018 9:50 PM, ag0aep6g wrote:
 On 05/03/2018 07:56 AM, rikki cattermole wrote:
 ```
 import std.stdio;
 import std.range : enumerate;

 void main()
 {
      char[] s = ['a','b','c'];

      char[3] x;
      auto i = 0;
      foreach(c; s) {
          x[i] = c;
          i++;
      }

      writeln(x);
 }
 ```
 Above works without cast.

 '''
 import std.stdio;
 import std.range : enumerate;

 void main()
      {
      char[] s = ['a','b','c'];

      char[3] x;
      foreach(i, c; enumerate(s)) {
          x[i] = c;
          i++;
      }

      writeln(x);
 }
 ```
[...]
 The first example uses auto-decoding (UTF-8 codepoints into a single
 UTF-32 one). This is considered a bad thing. But the compiler can
 disable it and leave it as UTF-8 code point upon request.
The first example (foreach over a char[]) doesn't do any decoding. UTF-8 stays UTF-8. Also, a `char` is a UTF-8 code *unit*, not a code *point*.
 The second example returns a Voldemort type (means no-name) which
 happens to be an input range. Where it can't disable anything and has
 been told that it is returning a dchar. See[0] as to where this gets
 decoded.
This is auto decoding.
 Writing two small functions to replace it (and popFront), will
 override this behavior.
This sounds like you can disable auto decoding by providing your own range primitives in your own module. That doesn't work, because Phobos would still use the ones from std.range.primitives.
Hmm, I swear this use to work. Oh well, easy fix: import std.algorithm; struct Wrapper { char[] input; alias input this; property char front() { return input[0]; } property bool empty() {return input.length == 0;} void popFront() { input = input[1 .. $]; } } void main() { char[] text = ['1', '2', '3']; foreach(c; Wrapper(text).filter!(a => a != '\0')) { pragma(msg, typeof(c)); } }
The standard way to get around auto-decoding is std.utf.byCodeUnit. - Jonathan M Davis
May 03 2018