digitalmars.D.learn - Why does enumerate over range return dchar, when ranging without

James Blachly (38/38) May 02 2018 I am puzzled why enumerating in a foreach returns a dchar (which

rikki cattermole (10/59) May 02 2018 The first example uses auto-decoding (UTF-8 codepoints into a single

ag0aep6g (9/56) May 03 2018 The first example (foreach over a char[]) doesn't do any decoding. UTF-8...

rikki cattermole (17/78) May 03 2018 Hmm, I swear this use to work.

Jonathan M Davis (4/84) May 03 2018 The standard way to get around auto-decoding is std.utf.byCodeUnit.

James Blachly <james.blachly gmail.com> writes:

I am puzzled why enumerating in a foreach returns a dchar (which 
forces me to cast), whereas without the enumerate the range 
returns a char as expected.

Example:

```
import std.stdio;
import std.range : enumerate;

void main()
{
	char[] s = ['a','b','c'];

     char[3] x;
     auto i = 0;
     foreach(c; s) {
         x[i] = c;
         i++;
     }

     writeln(x);
}
```
Above works without cast.

'''
import std.stdio;
import std.range : enumerate;

void main()
     {
	char[] s = ['a','b','c'];

     char[3] x;
     foreach(i, c; enumerate(s)) {
         x[i] = c;
         i++;
     }

     writeln(x);
}
```
Above fails without casting c to type char.

The function signature for enumerate shows "auto" return type, so 
that does not help me understand.

Kind regards

May 02 2018

rikki cattermole <rikki cattermole.co.nz> writes:

On 03/05/2018 5:44 PM, James Blachly wrote:
 I am puzzled why enumerating in a foreach returns a dchar (which forces 
 me to cast), whereas without the enumerate the range returns a char as 
 expected.
 
 Example:
 
 ```
 import std.stdio;
 import std.range : enumerate;
 
 void main()
 {
      char[] s = ['a','b','c'];
 
      char[3] x;
      auto i = 0;
      foreach(c; s) {
          x[i] = c;
          i++;
      }
 
      writeln(x);
 }
 ```
 Above works without cast.
 
 '''
 import std.stdio;
 import std.range : enumerate;
 
 void main()
      {
      char[] s = ['a','b','c'];
 
      char[3] x;
      foreach(i, c; enumerate(s)) {
          x[i] = c;
          i++;
      }
 
      writeln(x);
 }
 ```
 Above fails without casting c to type char.
 
 The function signature for enumerate shows "auto" return type, so that 
 does not help me understand.
 
 Kind regards

The first example uses auto-decoding (UTF-8 codepoints into a single 
UTF-32 one). This is considered a bad thing. But the compiler can 
disable it and leave it as UTF-8 code point upon request.

The second example returns a Voldemort type (means no-name) which 
happens to be an input range. Where it can't disable anything and has 
been told that it is returning a dchar. See[0] as to where this gets 
decoded. Writing two small functions to replace it (and popFront), will 
override this behavior.

[0] https://dlang.org/phobos/std_range_primitives.html#.front

May 02 2018

ag0aep6g <anonymous example.com> writes:

On 05/03/2018 07:56 AM, rikki cattermole wrote:
 ```
 import std.stdio;
 import std.range : enumerate;

 void main()
 {
      char[] s = ['a','b','c'];

      char[3] x;
      auto i = 0;
      foreach(c; s) {
          x[i] = c;
          i++;
      }

      writeln(x);
 }
 ```
 Above works without cast.

 '''
 import std.stdio;
 import std.range : enumerate;

 void main()
      {
      char[] s = ['a','b','c'];

      char[3] x;
      foreach(i, c; enumerate(s)) {
          x[i] = c;
          i++;
      }

      writeln(x);
 }
 ```


[...]
 The first example uses auto-decoding (UTF-8 codepoints into a single 
 UTF-32 one). This is considered a bad thing. But the compiler can 
 disable it and leave it as UTF-8 code point upon request.

The first example (foreach over a char[]) doesn't do any decoding. UTF-8 
stays UTF-8.

Also, a `char` is a UTF-8 code *unit*, not a code *point*.

 The second example returns a Voldemort type (means no-name) which 
 happens to be an input range. Where it can't disable anything and has 
 been told that it is returning a dchar. See[0] as to where this gets 
 decoded.

This is auto decoding.

 Writing two small functions to replace it (and popFront), will 
 override this behavior.

This sounds like you can disable auto decoding by providing your own 
range primitives in your own module. That doesn't work, because Phobos 
would still use the ones from std.range.primitives.

 [0] https://dlang.org/phobos/std_range_primitives.html#.front

May 03 2018

rikki cattermole <rikki cattermole.co.nz> writes:

On 03/05/2018 9:50 PM, ag0aep6g wrote:
 On 05/03/2018 07:56 AM, rikki cattermole wrote:
 ```
 import std.stdio;
 import std.range : enumerate;

 void main()
 {
      char[] s = ['a','b','c'];

      char[3] x;
      auto i = 0;
      foreach(c; s) {
          x[i] = c;
          i++;
      }

      writeln(x);
 }
 ```
 Above works without cast.

 '''
 import std.stdio;
 import std.range : enumerate;

 void main()
      {
      char[] s = ['a','b','c'];

      char[3] x;
      foreach(i, c; enumerate(s)) {
          x[i] = c;
          i++;
      }

      writeln(x);
 }
 ```


 [...]
 The first example uses auto-decoding (UTF-8 codepoints into a single 
 UTF-32 one). This is considered a bad thing. But the compiler can 
 disable it and leave it as UTF-8 code point upon request.

 
 The first example (foreach over a char[]) doesn't do any decoding. UTF-8 
 stays UTF-8.
 
 Also, a `char` is a UTF-8 code *unit*, not a code *point*.
 
 The second example returns a Voldemort type (means no-name) which 
 happens to be an input range. Where it can't disable anything and has 
 been told that it is returning a dchar. See[0] as to where this gets 
 decoded.

 
 This is auto decoding.
 
 Writing two small functions to replace it (and popFront), will 
 override this behavior.

 
 This sounds like you can disable auto decoding by providing your own 
 range primitives in your own module. That doesn't work, because Phobos 
 would still use the ones from std.range.primitives.

Hmm, I swear this use to work.

Oh well, easy fix:

import std.algorithm;

struct Wrapper {
     char[] input;
     alias input this;

      property char front() { return input[0]; }
      property bool empty() {return input.length == 0;}
     void popFront() { input = input[1 .. $]; }
}

void main() {
  	char[] text = ['1', '2', '3'];

     foreach(c; Wrapper(text).filter!(a => a != '\0')) {
     	pragma(msg, typeof(c));
     }
}

May 03 2018

Jonathan M Davis <newsgroup.d jmdavisprog.com> writes:

On Thursday, May 03, 2018 22:00:04 rikki cattermole via Digitalmars-d-learn 
wrote:
 On 03/05/2018 9:50 PM, ag0aep6g wrote:
 On 05/03/2018 07:56 AM, rikki cattermole wrote:
 ```
 import std.stdio;
 import std.range : enumerate;

 void main()
 {
      char[] s = ['a','b','c'];

      char[3] x;
      auto i = 0;
      foreach(c; s) {
          x[i] = c;
          i++;
      }

      writeln(x);
 }
 ```
 Above works without cast.

 '''
 import std.stdio;
 import std.range : enumerate;

 void main()
      {
      char[] s = ['a','b','c'];

      char[3] x;
      foreach(i, c; enumerate(s)) {
          x[i] = c;
          i++;
      }

      writeln(x);
 }
 ```


 [...]

 The first example uses auto-decoding (UTF-8 codepoints into a single
 UTF-32 one). This is considered a bad thing. But the compiler can
 disable it and leave it as UTF-8 code point upon request.

 The first example (foreach over a char[]) doesn't do any decoding. UTF-8
 stays UTF-8.

 Also, a `char` is a UTF-8 code *unit*, not a code *point*.

 The second example returns a Voldemort type (means no-name) which
 happens to be an input range. Where it can't disable anything and has
 been told that it is returning a dchar. See[0] as to where this gets
 decoded.

 This is auto decoding.

 Writing two small functions to replace it (and popFront), will
 override this behavior.

 This sounds like you can disable auto decoding by providing your own
 range primitives in your own module. That doesn't work, because Phobos
 would still use the ones from std.range.primitives.

 Hmm, I swear this use to work.

 Oh well, easy fix:

 import std.algorithm;

 struct Wrapper {
      char[] input;
      alias input this;

       property char front() { return input[0]; }
       property bool empty() {return input.length == 0;}
      void popFront() { input = input[1 .. $]; }
 }

 void main() {
       char[] text = ['1', '2', '3'];

      foreach(c; Wrapper(text).filter!(a => a != '\0')) {
       pragma(msg, typeof(c));
      }
 }

The standard way to get around auto-decoding is std.utf.byCodeUnit.

- Jonathan M Davis

May 03 2018

D Programming

C/C++ Programming

Other

digitalmars.D.learn - Why does enumerate over range return dchar, when ranging without