digitalmars.D - opNext: Simplified ranges

Tomer Weka (59/59) Nov 04 2022 I find the empty/front/popNext protocol very cumbersome to

Joseph Rushton Wakeling (15/20) Nov 05 2022 Out of curiosity: what do you think are the pros and cons of this
rikki cattermole (20/20) Nov 05 2022 ```d

rikki cattermole (4/5) Nov 05 2022 return !_haveValue;

Steven Schveighoffer (20/23) Nov 05 2022 Not sure what this means. breaking a foreach does not consume the next

Paul Backus (37/48) Nov 05 2022 You'd think, but if you actually try this, it turns out it

Dukc (15/23) Nov 05 2022 This works:
Steven Schveighoffer (24/79) Nov 05 2022 What is happening is the compiler is not using duck typing exactly, and

Walter Bright (2/2) Nov 05 2022 We didn't use the "next" protocol for ranges because one cannot test for...

Tomer Weka (5/7) Nov 06 2022 But that's exactly the issue - I have to produce the element in

Imperatorn (2/10) Nov 06 2022 Doesn't one the solutions from Steven or Dukec work?
Steven Schveighoffer (11/18) Nov 06 2022 The issue there is that you are making a copy, and the copy has already

Per =?UTF-8?B?Tm9yZGzDtnc=?= (6/8) Nov 06 2022 That's not a blocker if we can afford implementing a second

Per =?UTF-8?B?Tm9yZGzDtnc=?= (4/8) Nov 06 2022 It all boils down to fitting the design to the most common use

Sebastiaan Koppe (6/10) Nov 06 2022 I actually think the most lowest range should be one with

Per =?UTF-8?B?Tm9yZGzDtnc=?= (4/6) Nov 06 2022 I don't grasp what you mean. Can you give enlighten me with a

Per =?UTF-8?B?Tm9yZGzDtnc=?= (3/9) Nov 06 2022 1. Ahh, is this inspired by the concurrency primitives?

Tejas (10/20) Nov 06 2022 You have talked about redesigning ranges multiple times before as

Sebastiaan Koppe (8/14) Nov 07 2022 I am afraid such a document, if done properly, is tantamount to

Timon Gehr (3/14) Nov 07 2022 This does more than just check for existence; the range will have to

Joseph Rushton Wakeling (12/14) Nov 10 2022 Aren't there cases (e.g. some kinds of IO) where that's

Joseph Rushton Wakeling (10/13) Nov 10 2022 BTW, note that the `empty` check is a more information-limited
victoroak (5/20) Nov 10 2022 Yes, in some cases you have to consume the inner stream in either

Tomer Weka <tomer weka.io> writes:

I find the empty/front/popNext protocol very cumbersome to 
implement for most cases (at least in the Weka codebase), which 
causes people to just prefer opApply (even though it has its own 
set of issues).

Today I even ran into a problem that cannot be solved using the 
existing protocol: a consuming range (think a file stream) that 
should not consume the element if the foreach has been broken. 
Using opApply, this is easy (the delegate returns nonzero), but 
using popNext it's next to impossible (or at least really really 
cumbersome). I don't want to go into all the details, so let's 
just jump to the suggested protocol.

```
bool opNext(out T elem) {...}
```

`opNext` returns true if it "produced" an element, false if it 
reached the end. That way

```
foreach (elem; range) {
     ...
}
```

Simply gets lowered to
```
T elem;
while (range.opNext(elem)) {
    ...
}
```

Of course this protocol can live side-by-side with the existing 
protocol, just take precedence in the lowering rules. A concrete 
example:

```
struct LinkedList {
     int value;
     LinkedList* next;

     struct Range {
         LinkedList* cursor;

         bool opNext(out int val) nothrow  nogc {
             if (cursor is null) {
                 return false;
             }
             val = cursor.value;
             return true;
         }
     }

     auto members() {return Range(&this);}
}

foreach (val; myList.members) {
     ...
}
```

Notes:
* `save()` can work for the new protocol as well, it just needs 
to duplicate the iterator's state
* Maybe it would be wise to complement this with opPrev for 
bidirectional ranges, but I have to say I've rarely seen any use 
of them
* I think chaining with map/filter/etc. is even easier to 
implement in the new protocol

Nov 04 2022

Joseph Rushton Wakeling <joseph.wakeling webdrake.net> writes:

On Saturday, 5 November 2022 at 06:01:54 UTC, Tomer Weka wrote:
 ```
 bool opNext(out T elem) {...}
 ```

 `opNext` returns true if it "produced" an element, false if it 
 reached the end.

Out of curiosity: what do you think are the pros and cons of this 
versus, say, something like:

```
Option!(T) opNext() { ... }
```

Where `Option` is an implementation of the option type, and hence 
encapsulates the question of whether an element was produced or 
not?  (IIRC this has been another suggestion for a 
simplified/redesigned range concept.)

An `Option`-based design has the advantage of not requiring a 
persistent `front`, which may fit better with true input ranges 
(e.g. reading from a stream).  OTOH it may make things more 
clunky for the case where there is persistent underlying data 
that a `front` method could refer to.

Nov 05 2022

rikki cattermole <rikki cattermole.co.nz> writes:

```d
mixin template NextToRange(ElementType) {
	ElementType _value;
	bool _haveValue;

	 property {
		bool empty() {
			return _haveValue;
		}

		ElementType front() {
			assert(_haveValue);
			return _value;
		}
	}

	void popFront() {
		_haveValue = nextValue(_value);
	}
}
```

You don't need to break a large percentage of the standard library to 
use this ;)

Nov 05 2022

rikki cattermole <rikki cattermole.co.nz> writes:

On 06/11/2022 12:34 AM, rikki cattermole wrote:
              return _haveValue;

return !_haveValue;

Oops, that should be reversed.

Thanks Bruce Carneal!

Nov 05 2022

Steven Schveighoffer <schveiguy gmail.com> writes:

On 11/5/22 2:01 AM, Tomer Weka wrote:
 Today I even ran into a problem that cannot be solved using the existing 
 protocol: a consuming range (think a file stream) that should not 
 consume the element if the foreach has been broken.

Not sure what this means. breaking a foreach does not consume the next 
element.

Now, foreach *does* make a copy of the range. Which means that the copy 
is going to consume the current, while the original will not get it (and 
the copy is gone once the loop is over).

This can be solved by a wrapper:

```d
struct RefOf(T)
{
   private T *_val;
   ref T _get() { return *_val; }
   alias _get this;
}

auto refOf(T)(return ref T val) { return RefOf!T(&val); }

foreach(elem; range.refOf) // now doesn't make a copy.
```

Could also be solved with syntax changes to foreach. Maybe 
`foreach(elem; ref range)`?

-Steve

Nov 05 2022

Paul Backus <snarwin gmail.com> writes:

On Saturday, 5 November 2022 at 15:17:42 UTC, Steven 
Schveighoffer wrote:
 This can be solved by a wrapper:

 ```d
 struct RefOf(T)
 {
   private T *_val;
   ref T _get() { return *_val; }
   alias _get this;
 }

 auto refOf(T)(return ref T val) { return RefOf!T(&val); }

 foreach(elem; range.refOf) // now doesn't make a copy.
 ```

You'd think, but if you actually try this, it turns out it 
doesn't work: the compiler is "smart" enough to see through the 
`alias this`, and makes a copy of `range.refOf._get` (the actual 
range object) instead of `range.refOf` (the wrapper).

So, for example, this usage code prints `[1, 2, 3]`, 
demonstrating that the range is *not* consumed by the `foreach` 
loop:

```d
void main()
{
     import std.range, std.stdio;

     auto r = only(1, 2, 3);
     foreach (e; refOf(r)) {}
     writeln(r);
}
```

In the `-vcg-ast` output, you can see the call to `_get` inserted 
by the compiler:

```d
void main()
{
	import std.range;
	import std.stdio;
	OnlyResult!(int, int, int) r = only(1, 2, 3);
	{
		OnlyResult!(int, int, int) __r47 = refOf(r)._get().opSlice();
		for (; !__r47.empty(); __r47.popFront())
		{
			int e = __r47.front();
		}
	}
	writeln(r);
	return 0;
}
```

Nov 05 2022

Dukc <ajieskola gmail.com> writes:

On Saturday, 5 November 2022 at 18:08:39 UTC, Paul Backus wrote:
 On Saturday, 5 November 2022 at 15:17:42 UTC, Steven 
 Schveighoffer wrote:
 This can be solved by a wrapper:

 
 You'd think, but if you actually try this, it turns out it 
 doesn't work: the compiler is "smart" enough to see through the 
 `alias this`, and makes a copy of `range.refOf._get` (the 
 actual range object) instead of `range.refOf` (the wrapper).

This works:

```D
void main()
{   import std.range : inputRangeObject, refRange;
     import std.stdio;

     auto testPiece = [1,2,3,4,5];
     foreach(el; (&testPiece).refRange.inputRangeObject)
     {   if(el == 3) break;
     }
     writeln(testPiece); // [3, 4, 5]
}
```

It's far from satisfying though. Ugly to begin with, and worse, 
incompatible with any function attributes.

Nov 05 2022

Steven Schveighoffer <schveiguy gmail.com> writes:

On 11/5/22 2:08 PM, Paul Backus wrote:
 On Saturday, 5 November 2022 at 15:17:42 UTC, Steven Schveighoffer wrote:
 This can be solved by a wrapper:

 ```d
 struct RefOf(T)
 {
   private T *_val;
   ref T _get() { return *_val; }
   alias _get this;
 }

 auto refOf(T)(return ref T val) { return RefOf!T(&val); }

 foreach(elem; range.refOf) // now doesn't make a copy.
 ```

 
 You'd think, but if you actually try this, it turns out it doesn't work: 
 the compiler is "smart" enough to see through the `alias this`, and 
 makes a copy of `range.refOf._get` (the actual range object) instead of 
 `range.refOf` (the wrapper).

It's not "smart", just picky.

 
 So, for example, this usage code prints `[1, 2, 3]`, demonstrating that 
 the range is *not* consumed by the `foreach` loop:
 
 ```d
 void main()
 {
      import std.range, std.stdio;
 
      auto r = only(1, 2, 3);
      foreach (e; refOf(r)) {}
      writeln(r);
 }
 ```
 
 In the `-vcg-ast` output, you can see the call to `_get` inserted by the 
 compiler:
 
 ```d
 void main()
 {
      import std.range;
      import std.stdio;
      OnlyResult!(int, int, int) r = only(1, 2, 3);
      {
          OnlyResult!(int, int, int) __r47 = refOf(r)._get().opSlice();
          for (; !__r47.empty(); __r47.popFront())
          {
              int e = __r47.front();
          }
      }
      writeln(r);
      return 0;
 }
 ```

What is happening is the compiler is not using duck typing exactly, and 
it also has some pretty puzzling logic.

1. If it has the ability to slice, it will *always do this at least 
once*. See the bug reported long ago: 
https://issues.dlang.org/show_bug.cgi?id=14619
2. The definition of a range is that it contains a `front` *member*. 
This cannot be via `alias this`.

So this works:

```d
struct RefOf(T)
{
   private T *_val;
   auto front() { return _get.front; }
   RefOf opSlice() { return this; }
   ref T _get() { return *_val; }
   alias _get this;
}
```

Take away from that what you will.

(also I forgot there actually is something called `std.range.refRange` 
which *is supposed to do this*, but apparently still broken).

-Steve

Nov 05 2022

Walter Bright <newshound2 digitalmars.com> writes:

We didn't use the "next" protocol for ranges because one cannot test for 
existence of the next element without consuming it.

Nov 05 2022

Tomer Weka <tomer weka.io> writes:

On Sunday, 6 November 2022 at 01:48:04 UTC, Walter Bright wrote:
 We didn't use the "next" protocol for ranges because one cannot 
 test for existence of the next element without consuming it.

But that's exactly the issue - I have to produce the element in 
the range's ctor, which essentially requires a "non-popping 
popNext" to determine if it's empty and assign the front

-tomer

Nov 06 2022

Imperatorn <johan_forsberg_86 hotmail.com> writes:

On Sunday, 6 November 2022 at 07:55:13 UTC, Tomer Weka wrote:
 On Sunday, 6 November 2022 at 01:48:04 UTC, Walter Bright wrote:
 We didn't use the "next" protocol for ranges because one 
 cannot test for existence of the next element without 
 consuming it.

 But that's exactly the issue - I have to produce the element in 
 the range's ctor, which essentially requires a "non-popping 
 popNext" to determine if it's empty and assign the front

 -tomer

Doesn't one the solutions from Steven or Dukec work?

Nov 06 2022

Steven Schveighoffer <schveiguy gmail.com> writes:

On 11/6/22 2:55 AM, Tomer Weka wrote:
 On Sunday, 6 November 2022 at 01:48:04 UTC, Walter Bright wrote:
 We didn't use the "next" protocol for ranges because one cannot test 
 for existence of the next element without consuming it.

 
 But that's exactly the issue - I have to produce the element in the 
 range's ctor, which essentially requires a "non-popping popNext" to 
 determine if it's empty and assign the front

The issue there is that you are making a copy, and the copy has already 
consumed the front element. It would be no different with your API as 
well -- if you break from your range type, the element is gone.

What we need is non-copyable input ranges. I have advocated for this for 
a while. In essence, a forward range should be copyable, an input range 
should not be, and we should remove `save`. Then when you foreach it, 
you *can't* make a copy, so it doesn't let you discard the element you 
are looking at.

The other option is -- don't use foreach. Use a while loop like you wrote.

-Steve

Nov 06 2022

Per =?UTF-8?B?Tm9yZGzDtnc=?= <per.nordlow gmail.com> writes:

On Sunday, 6 November 2022 at 01:48:04 UTC, Walter Bright wrote:
 We didn't use the "next" protocol for ranges because one cannot 
 test for existence of the next element without consuming it.

That's not a blocker if we can afford implementing a second 
function for the "peek" case. This is the path Rust took via its 
`Peekable` trait and `peek()` member as described at

https://doc.rust-lang.org/stable/std/iter/struct.Peekable.html

.

Nov 06 2022

Per =?UTF-8?B?Tm9yZGzDtnc=?= <per.nordlow gmail.com> writes:

On Sunday, 6 November 2022 at 21:06:01 UTC, Per Nordlöw wrote:
 That's not a blocker if we can afford implementing a second 
 function for the "peek" case. This is the path Rust took via 
 its `Peekable` trait and `peek()` member as described at

 https://doc.rust-lang.org/stable/std/iter/struct.Peekable.html

It all boils down to fitting the design to the most common use 
case. So is the most common case to be able to peek or not? I 
really don't know right. Does anybody have any insights into this?

Nov 06 2022

Sebastiaan Koppe <mail skoppe.eu> writes:

On Sunday, 6 November 2022 at 21:08:10 UTC, Per Nordlöw wrote:
 It all boils down to fitting the design to the most common use 
 case. So is the most common case to be able to peek or not? I 
 really don't know right. Does anybody have any insights into 
 this?

I actually think the most lowest range should be one with 
destructive read semantics; non-peekable, non-copyable. You can 
build everything on top of that.

I also think ranges should be redesigned to have a 2 stage api. 
One to build the pipeline, one to iterate.

Nov 06 2022

Per =?UTF-8?B?Tm9yZGzDtnc=?= <per.nordlow gmail.com> writes:

On Sunday, 6 November 2022 at 22:32:00 UTC, Sebastiaan Koppe 
wrote:
 I also think ranges should be redesigned to have a 2 stage api. 
 One to build the pipeline, one to iterate.

I don't grasp what you mean. Can you give enlighten me with a 
show case?

Nov 06 2022

Per =?UTF-8?B?Tm9yZGzDtnc=?= <per.nordlow gmail.com> writes:

On Sunday, 6 November 2022 at 22:47:59 UTC, Per Nordlöw wrote:
 On Sunday, 6 November 2022 at 22:32:00 UTC, Sebastiaan Koppe 
 wrote:
 I also think ranges should be redesigned to have a 2 stage 
 api. One to build the pipeline, one to iterate.

 I don't grasp what you mean. Can you give enlighten me with a 
 show case?

1. Ahh, is this inspired by the concurrency primitives?

2. Are there any other languages that use such a 2 stage api?

Nov 06 2022

Tejas <notrealemail gmail.com> writes:

On Sunday, 6 November 2022 at 22:32:00 UTC, Sebastiaan Koppe 
wrote:
 On Sunday, 6 November 2022 at 21:08:10 UTC, Per Nordlöw wrote:
 It all boils down to fitting the design to the most common use 
 case. So is the most common case to be able to peek or not? I 
 really don't know right. Does anybody have any insights into 
 this?

 I actually think the most lowest range should be one with 
 destructive read semantics; non-peekable, non-copyable. You can 
 build everything on top of that.

 I also think ranges should be redesigned to have a 2 stage api. 
 One to build the pipeline, one to iterate.

You have talked about redesigning ranges multiple times before as 
well, plus have expierience using them non-trivially for quite 
some time.

Could you please create a post/document that contains all your 
ideas/thoughts about what a redesigned range would look like? The 
maybe someone could actually work towards implementing it. If not 
directly within phobos, then atleast they could create a package 
and we could test the new ranges' efficacy for real world usage

Nov 06 2022

Sebastiaan Koppe <mail skoppe.eu> writes:

On Monday, 7 November 2022 at 02:49:13 UTC, Tejas wrote:
 Could you please create a post/document that contains all your 
 ideas/thoughts about what a redesigned range would look like? 
 The maybe someone could actually work towards implementing it. 
 If not directly within phobos, then atleast they could create a 
 package and we could test the new ranges' efficacy for real 
 world usage

I am afraid such a document, if done properly, is tantamount to 
doing the full redesign. Otherwise it is just a loose collection 
of rough ideas, valuable to no one but myself.

The reason I brought them up is to gauge interest in and of them.

The next step for me would be to experiment with them, and see if 
they hold up. If that is fruitful we will have something real to 
discuss.

Nov 07 2022

Timon Gehr <timon.gehr gmx.ch> writes:

On 11/6/22 22:06, Per Nordlöw wrote:
 On Sunday, 6 November 2022 at 01:48:04 UTC, Walter Bright wrote:
 We didn't use the "next" protocol for ranges because one cannot test 
 for existence of the next element without consuming it.

 
 That's not a blocker if we can afford implementing a second function for 
 the "peek" case. This is the path Rust took via its `Peekable` trait and 
 `peek()` member as described at
 
 https://doc.rust-lang.org/stable/std/iter/struct.Peekable.html
 
 .

This does more than just check for existence; the range will have to 
cache the next element, which is the issue OP is trying to fix I think.

Nov 07 2022

Joseph Rushton Wakeling <joseph.wakeling webdrake.net> writes:

On Sunday, 6 November 2022 at 01:48:04 UTC, Walter Bright wrote:
 We didn't use the "next" protocol for ranges because one cannot 
 test for existence of the next element without consuming it.

Aren't there cases (e.g. some kinds of IO) where that's 
unavoidable in any case?

The "next" approach is surely the most information-limited case, 
where you really can't separate between checking for existence of 
the next element, and consuming it.  Other possibilities, such as 
the presence of a check whether the range is `empty`, or the 
possibility to `peek`, would then be less information-limited 
options on top of that.

I apologize if I'm out of date on some discussions which have 
already taken place around the range design, and so forcing you 
to re-explain something that's already been worked through.

Nov 10 2022

Joseph Rushton Wakeling <joseph.wakeling webdrake.net> writes:

On Thursday, 10 November 2022 at 15:25:30 UTC, Joseph Rushton 
Wakeling wrote:
 Other possibilities, such as the presence of a check whether 
 the range is `empty`, or the possibility to `peek`, would then 
 be less information-limited options on top of that.

BTW, note that the `empty` check is a more information-limited 
version of peekability.  For example, an algorithmic range (such 
as a range-based view of random number generation) can tell you 
that it's not empty, but may not be able to tell you what the 
next value will be.

OTOH I would imagine that most (all?) ranges across the contents 
of containers should be able both to tell you if the range is 
empty or not _and_ allow you to peek at the next in sequence.

Nov 10 2022

victoroak <jackpboy gmail.com> writes:

On Thursday, 10 November 2022 at 15:25:30 UTC, Joseph Rushton 
Wakeling wrote:
 On Sunday, 6 November 2022 at 01:48:04 UTC, Walter Bright wrote:
 We didn't use the "next" protocol for ranges because one 
 cannot test for existence of the next element without 
 consuming it.

 Aren't there cases (e.g. some kinds of IO) where that's 
 unavoidable in any case?

 The "next" approach is surely the most information-limited 
 case, where you really can't separate between checking for 
 existence of the next element, and consuming it.  Other 
 possibilities, such as the presence of a check whether the 
 range is `empty`, or the possibility to `peek`, would then be 
 less information-limited options on top of that.

 I apologize if I'm out of date on some discussions which have 
 already taken place around the range design, and so forcing you 
 to re-explain something that's already been worked through.

Yes, in some cases you have to consume the inner stream in either 
the constructor or in the empty call. I would also like for a 
more simpler range with only `next`.

Nov 10 2022

D Programming

C/C++ Programming

Other

digitalmars.D - opNext: Simplified ranges