www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - opNext: Simplified ranges

reply Tomer Weka <tomer weka.io> writes:
I find the empty/front/popNext protocol very cumbersome to 
implement for most cases (at least in the Weka codebase), which 
causes people to just prefer opApply (even though it has its own 
set of issues).

Today I even ran into a problem that cannot be solved using the 
existing protocol: a consuming range (think a file stream) that 
should not consume the element if the foreach has been broken. 
Using opApply, this is easy (the delegate returns nonzero), but 
using popNext it's next to impossible (or at least really really 
cumbersome). I don't want to go into all the details, so let's 
just jump to the suggested protocol.

```
bool opNext(out T elem) {...}
```

`opNext` returns true if it "produced" an element, false if it 
reached the end. That way

```
foreach (elem; range) {
     ...
}
```

Simply gets lowered to
```
T elem;
while (range.opNext(elem)) {
    ...
}
```

Of course this protocol can live side-by-side with the existing 
protocol, just take precedence in the lowering rules. A concrete 
example:

```
struct LinkedList {
     int value;
     LinkedList* next;

     struct Range {
         LinkedList* cursor;

         bool opNext(out int val) nothrow  nogc {
             if (cursor is null) {
                 return false;
             }
             val = cursor.value;
             return true;
         }
     }

     auto members() {return Range(&this);}
}

foreach (val; myList.members) {
     ...
}
```

Notes:
* `save()` can work for the new protocol as well, it just needs 
to duplicate the iterator's state
* Maybe it would be wise to complement this with opPrev for 
bidirectional ranges, but I have to say I've rarely seen any use 
of them
* I think chaining with map/filter/etc. is even easier to 
implement in the new protocol
Nov 04 2022
next sibling parent Joseph Rushton Wakeling <joseph.wakeling webdrake.net> writes:
On Saturday, 5 November 2022 at 06:01:54 UTC, Tomer Weka wrote:
 ```
 bool opNext(out T elem) {...}
 ```

 `opNext` returns true if it "produced" an element, false if it 
 reached the end.
Out of curiosity: what do you think are the pros and cons of this versus, say, something like: ``` Option!(T) opNext() { ... } ``` Where `Option` is an implementation of the option type, and hence encapsulates the question of whether an element was produced or not? (IIRC this has been another suggestion for a simplified/redesigned range concept.) An `Option`-based design has the advantage of not requiring a persistent `front`, which may fit better with true input ranges (e.g. reading from a stream). OTOH it may make things more clunky for the case where there is persistent underlying data that a `front` method could refer to.
Nov 05 2022
prev sibling next sibling parent reply rikki cattermole <rikki cattermole.co.nz> writes:
```d
mixin template NextToRange(ElementType) {
	ElementType _value;
	bool _haveValue;

	 property {
		bool empty() {
			return _haveValue;
		}

		ElementType front() {
			assert(_haveValue);
			return _value;
		}
	}

	void popFront() {
		_haveValue = nextValue(_value);
	}
}
```

You don't need to break a large percentage of the standard library to 
use this ;)
Nov 05 2022
parent rikki cattermole <rikki cattermole.co.nz> writes:
On 06/11/2022 12:34 AM, rikki cattermole wrote:
              return _haveValue;
return !_haveValue; Oops, that should be reversed. Thanks Bruce Carneal!
Nov 05 2022
prev sibling next sibling parent reply Steven Schveighoffer <schveiguy gmail.com> writes:
On 11/5/22 2:01 AM, Tomer Weka wrote:
 Today I even ran into a problem that cannot be solved using the existing 
 protocol: a consuming range (think a file stream) that should not 
 consume the element if the foreach has been broken.
Not sure what this means. breaking a foreach does not consume the next element. Now, foreach *does* make a copy of the range. Which means that the copy is going to consume the current, while the original will not get it (and the copy is gone once the loop is over). This can be solved by a wrapper: ```d struct RefOf(T) { private T *_val; ref T _get() { return *_val; } alias _get this; } auto refOf(T)(return ref T val) { return RefOf!T(&val); } foreach(elem; range.refOf) // now doesn't make a copy. ``` Could also be solved with syntax changes to foreach. Maybe `foreach(elem; ref range)`? -Steve
Nov 05 2022
parent reply Paul Backus <snarwin gmail.com> writes:
On Saturday, 5 November 2022 at 15:17:42 UTC, Steven 
Schveighoffer wrote:
 This can be solved by a wrapper:

 ```d
 struct RefOf(T)
 {
   private T *_val;
   ref T _get() { return *_val; }
   alias _get this;
 }

 auto refOf(T)(return ref T val) { return RefOf!T(&val); }

 foreach(elem; range.refOf) // now doesn't make a copy.
 ```
You'd think, but if you actually try this, it turns out it doesn't work: the compiler is "smart" enough to see through the `alias this`, and makes a copy of `range.refOf._get` (the actual range object) instead of `range.refOf` (the wrapper). So, for example, this usage code prints `[1, 2, 3]`, demonstrating that the range is *not* consumed by the `foreach` loop: ```d void main() { import std.range, std.stdio; auto r = only(1, 2, 3); foreach (e; refOf(r)) {} writeln(r); } ``` In the `-vcg-ast` output, you can see the call to `_get` inserted by the compiler: ```d void main() { import std.range; import std.stdio; OnlyResult!(int, int, int) r = only(1, 2, 3); { OnlyResult!(int, int, int) __r47 = refOf(r)._get().opSlice(); for (; !__r47.empty(); __r47.popFront()) { int e = __r47.front(); } } writeln(r); return 0; } ```
Nov 05 2022
next sibling parent Dukc <ajieskola gmail.com> writes:
On Saturday, 5 November 2022 at 18:08:39 UTC, Paul Backus wrote:
 On Saturday, 5 November 2022 at 15:17:42 UTC, Steven 
 Schveighoffer wrote:
 This can be solved by a wrapper:
You'd think, but if you actually try this, it turns out it doesn't work: the compiler is "smart" enough to see through the `alias this`, and makes a copy of `range.refOf._get` (the actual range object) instead of `range.refOf` (the wrapper).
This works: ```D void main() { import std.range : inputRangeObject, refRange; import std.stdio; auto testPiece = [1,2,3,4,5]; foreach(el; (&testPiece).refRange.inputRangeObject) { if(el == 3) break; } writeln(testPiece); // [3, 4, 5] } ``` It's far from satisfying though. Ugly to begin with, and worse, incompatible with any function attributes.
Nov 05 2022
prev sibling parent Steven Schveighoffer <schveiguy gmail.com> writes:
On 11/5/22 2:08 PM, Paul Backus wrote:
 On Saturday, 5 November 2022 at 15:17:42 UTC, Steven Schveighoffer wrote:
 This can be solved by a wrapper:

 ```d
 struct RefOf(T)
 {
   private T *_val;
   ref T _get() { return *_val; }
   alias _get this;
 }

 auto refOf(T)(return ref T val) { return RefOf!T(&val); }

 foreach(elem; range.refOf) // now doesn't make a copy.
 ```
You'd think, but if you actually try this, it turns out it doesn't work: the compiler is "smart" enough to see through the `alias this`, and makes a copy of `range.refOf._get` (the actual range object) instead of `range.refOf` (the wrapper).
It's not "smart", just picky.
 
 So, for example, this usage code prints `[1, 2, 3]`, demonstrating that 
 the range is *not* consumed by the `foreach` loop:
 
 ```d
 void main()
 {
      import std.range, std.stdio;
 
      auto r = only(1, 2, 3);
      foreach (e; refOf(r)) {}
      writeln(r);
 }
 ```
 
 In the `-vcg-ast` output, you can see the call to `_get` inserted by the 
 compiler:
 
 ```d
 void main()
 {
      import std.range;
      import std.stdio;
      OnlyResult!(int, int, int) r = only(1, 2, 3);
      {
          OnlyResult!(int, int, int) __r47 = refOf(r)._get().opSlice();
          for (; !__r47.empty(); __r47.popFront())
          {
              int e = __r47.front();
          }
      }
      writeln(r);
      return 0;
 }
 ```
What is happening is the compiler is not using duck typing exactly, and it also has some pretty puzzling logic. 1. If it has the ability to slice, it will *always do this at least once*. See the bug reported long ago: https://issues.dlang.org/show_bug.cgi?id=14619 2. The definition of a range is that it contains a `front` *member*. This cannot be via `alias this`. So this works: ```d struct RefOf(T) { private T *_val; auto front() { return _get.front; } RefOf opSlice() { return this; } ref T _get() { return *_val; } alias _get this; } ``` Take away from that what you will. (also I forgot there actually is something called `std.range.refRange` which *is supposed to do this*, but apparently still broken). -Steve
Nov 05 2022
prev sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
We didn't use the "next" protocol for ranges because one cannot test for 
existence of the next element without consuming it.
Nov 05 2022
next sibling parent reply Tomer Weka <tomer weka.io> writes:
On Sunday, 6 November 2022 at 01:48:04 UTC, Walter Bright wrote:
 We didn't use the "next" protocol for ranges because one cannot 
 test for existence of the next element without consuming it.
But that's exactly the issue - I have to produce the element in the range's ctor, which essentially requires a "non-popping popNext" to determine if it's empty and assign the front -tomer
Nov 06 2022
next sibling parent Imperatorn <johan_forsberg_86 hotmail.com> writes:
On Sunday, 6 November 2022 at 07:55:13 UTC, Tomer Weka wrote:
 On Sunday, 6 November 2022 at 01:48:04 UTC, Walter Bright wrote:
 We didn't use the "next" protocol for ranges because one 
 cannot test for existence of the next element without 
 consuming it.
But that's exactly the issue - I have to produce the element in the range's ctor, which essentially requires a "non-popping popNext" to determine if it's empty and assign the front -tomer
Doesn't one the solutions from Steven or Dukec work?
Nov 06 2022
prev sibling parent Steven Schveighoffer <schveiguy gmail.com> writes:
On 11/6/22 2:55 AM, Tomer Weka wrote:
 On Sunday, 6 November 2022 at 01:48:04 UTC, Walter Bright wrote:
 We didn't use the "next" protocol for ranges because one cannot test 
 for existence of the next element without consuming it.
But that's exactly the issue - I have to produce the element in the range's ctor, which essentially requires a "non-popping popNext" to determine if it's empty and assign the front
The issue there is that you are making a copy, and the copy has already consumed the front element. It would be no different with your API as well -- if you break from your range type, the element is gone. What we need is non-copyable input ranges. I have advocated for this for a while. In essence, a forward range should be copyable, an input range should not be, and we should remove `save`. Then when you foreach it, you *can't* make a copy, so it doesn't let you discard the element you are looking at. The other option is -- don't use foreach. Use a while loop like you wrote. -Steve
Nov 06 2022
prev sibling next sibling parent reply Per =?UTF-8?B?Tm9yZGzDtnc=?= <per.nordlow gmail.com> writes:
On Sunday, 6 November 2022 at 01:48:04 UTC, Walter Bright wrote:
 We didn't use the "next" protocol for ranges because one cannot 
 test for existence of the next element without consuming it.
That's not a blocker if we can afford implementing a second function for the "peek" case. This is the path Rust took via its `Peekable` trait and `peek()` member as described at https://doc.rust-lang.org/stable/std/iter/struct.Peekable.html .
Nov 06 2022
next sibling parent reply Per =?UTF-8?B?Tm9yZGzDtnc=?= <per.nordlow gmail.com> writes:
On Sunday, 6 November 2022 at 21:06:01 UTC, Per Nordlöw wrote:
 That's not a blocker if we can afford implementing a second 
 function for the "peek" case. This is the path Rust took via 
 its `Peekable` trait and `peek()` member as described at

 https://doc.rust-lang.org/stable/std/iter/struct.Peekable.html
It all boils down to fitting the design to the most common use case. So is the most common case to be able to peek or not? I really don't know right. Does anybody have any insights into this?
Nov 06 2022
parent reply Sebastiaan Koppe <mail skoppe.eu> writes:
On Sunday, 6 November 2022 at 21:08:10 UTC, Per Nordlöw wrote:
 It all boils down to fitting the design to the most common use 
 case. So is the most common case to be able to peek or not? I 
 really don't know right. Does anybody have any insights into 
 this?
I actually think the most lowest range should be one with destructive read semantics; non-peekable, non-copyable. You can build everything on top of that. I also think ranges should be redesigned to have a 2 stage api. One to build the pipeline, one to iterate.
Nov 06 2022
next sibling parent reply Per =?UTF-8?B?Tm9yZGzDtnc=?= <per.nordlow gmail.com> writes:
On Sunday, 6 November 2022 at 22:32:00 UTC, Sebastiaan Koppe 
wrote:
 I also think ranges should be redesigned to have a 2 stage api. 
 One to build the pipeline, one to iterate.
I don't grasp what you mean. Can you give enlighten me with a show case?
Nov 06 2022
parent Per =?UTF-8?B?Tm9yZGzDtnc=?= <per.nordlow gmail.com> writes:
On Sunday, 6 November 2022 at 22:47:59 UTC, Per Nordlöw wrote:
 On Sunday, 6 November 2022 at 22:32:00 UTC, Sebastiaan Koppe 
 wrote:
 I also think ranges should be redesigned to have a 2 stage 
 api. One to build the pipeline, one to iterate.
I don't grasp what you mean. Can you give enlighten me with a show case?
1. Ahh, is this inspired by the concurrency primitives? 2. Are there any other languages that use such a 2 stage api?
Nov 06 2022
prev sibling parent reply Tejas <notrealemail gmail.com> writes:
On Sunday, 6 November 2022 at 22:32:00 UTC, Sebastiaan Koppe 
wrote:
 On Sunday, 6 November 2022 at 21:08:10 UTC, Per Nordlöw wrote:
 It all boils down to fitting the design to the most common use 
 case. So is the most common case to be able to peek or not? I 
 really don't know right. Does anybody have any insights into 
 this?
I actually think the most lowest range should be one with destructive read semantics; non-peekable, non-copyable. You can build everything on top of that. I also think ranges should be redesigned to have a 2 stage api. One to build the pipeline, one to iterate.
You have talked about redesigning ranges multiple times before as well, plus have expierience using them non-trivially for quite some time. Could you please create a post/document that contains all your ideas/thoughts about what a redesigned range would look like? The maybe someone could actually work towards implementing it. If not directly within phobos, then atleast they could create a package and we could test the new ranges' efficacy for real world usage
Nov 06 2022
parent Sebastiaan Koppe <mail skoppe.eu> writes:
On Monday, 7 November 2022 at 02:49:13 UTC, Tejas wrote:
 Could you please create a post/document that contains all your 
 ideas/thoughts about what a redesigned range would look like? 
 The maybe someone could actually work towards implementing it. 
 If not directly within phobos, then atleast they could create a 
 package and we could test the new ranges' efficacy for real 
 world usage
I am afraid such a document, if done properly, is tantamount to doing the full redesign. Otherwise it is just a loose collection of rough ideas, valuable to no one but myself. The reason I brought them up is to gauge interest in and of them. The next step for me would be to experiment with them, and see if they hold up. If that is fruitful we will have something real to discuss.
Nov 07 2022
prev sibling parent Timon Gehr <timon.gehr gmx.ch> writes:
On 11/6/22 22:06, Per Nordlöw wrote:
 On Sunday, 6 November 2022 at 01:48:04 UTC, Walter Bright wrote:
 We didn't use the "next" protocol for ranges because one cannot test 
 for existence of the next element without consuming it.
That's not a blocker if we can afford implementing a second function for the "peek" case. This is the path Rust took via its `Peekable` trait and `peek()` member as described at https://doc.rust-lang.org/stable/std/iter/struct.Peekable.html .
This does more than just check for existence; the range will have to cache the next element, which is the issue OP is trying to fix I think.
Nov 07 2022
prev sibling parent reply Joseph Rushton Wakeling <joseph.wakeling webdrake.net> writes:
On Sunday, 6 November 2022 at 01:48:04 UTC, Walter Bright wrote:
 We didn't use the "next" protocol for ranges because one cannot 
 test for existence of the next element without consuming it.
Aren't there cases (e.g. some kinds of IO) where that's unavoidable in any case? The "next" approach is surely the most information-limited case, where you really can't separate between checking for existence of the next element, and consuming it. Other possibilities, such as the presence of a check whether the range is `empty`, or the possibility to `peek`, would then be less information-limited options on top of that. I apologize if I'm out of date on some discussions which have already taken place around the range design, and so forcing you to re-explain something that's already been worked through.
Nov 10 2022
next sibling parent Joseph Rushton Wakeling <joseph.wakeling webdrake.net> writes:
On Thursday, 10 November 2022 at 15:25:30 UTC, Joseph Rushton 
Wakeling wrote:
 Other possibilities, such as the presence of a check whether 
 the range is `empty`, or the possibility to `peek`, would then 
 be less information-limited options on top of that.
BTW, note that the `empty` check is a more information-limited version of peekability. For example, an algorithmic range (such as a range-based view of random number generation) can tell you that it's not empty, but may not be able to tell you what the next value will be. OTOH I would imagine that most (all?) ranges across the contents of containers should be able both to tell you if the range is empty or not _and_ allow you to peek at the next in sequence.
Nov 10 2022
prev sibling parent victoroak <jackpboy gmail.com> writes:
On Thursday, 10 November 2022 at 15:25:30 UTC, Joseph Rushton 
Wakeling wrote:
 On Sunday, 6 November 2022 at 01:48:04 UTC, Walter Bright wrote:
 We didn't use the "next" protocol for ranges because one 
 cannot test for existence of the next element without 
 consuming it.
Aren't there cases (e.g. some kinds of IO) where that's unavoidable in any case? The "next" approach is surely the most information-limited case, where you really can't separate between checking for existence of the next element, and consuming it. Other possibilities, such as the presence of a check whether the range is `empty`, or the possibility to `peek`, would then be less information-limited options on top of that. I apologize if I'm out of date on some discussions which have already taken place around the range design, and so forcing you to re-explain something that's already been worked through.
Yes, in some cases you have to consume the inner stream in either the constructor or in the empty call. I would also like for a more simpler range with only `next`.
Nov 10 2022