www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Difference between input range and forward range

reply Shachar Shemesh <shachar weka.io> writes:
Please consider the following program:
import std.stdio;
import std.range;

struct Range {
     int a;

      disable this(this);
public:
     int front() {
         return a;
     }

     void popFront() {
         a--;
     }

     bool empty() {
         return a<=0;
     }
}

void main()
{
     pragma(msg, isInputRange!Range);
     pragma(msg, isForwardRange!Range);
     auto r = Range(5);

     foreach( i; r ) {
         writeln(i);
     }

     foreach( i; r ) {
         writeln(i);
     }
}

It seems that "foreach" requires that the range be copyable, in direct 
contradiction to the fact that Range is not a forward range.

Removing the  disable, the program compiles and run, but prints the 
range twice instead of once.

If you couple that with the extremely amorphous definition of "save" for 
forward ranges, there seems to be quite a confusion between the two.

Am I missing something?

Shachar
Nov 10 2015
next sibling parent reply Jonathan M Davis <jmdavisProg gmx.com> writes:
On Tuesday, 10 November 2015 at 14:28:15 UTC, Shachar Shemesh 
wrote:
 Am I missing something?
The short answer is that input ranges need to be copyable to work with foreach, because of how foreach is defined for ranges. The long answer is... Input ranges _are_ copyable. It's just that their state isn't guaranteed to be copied (and if it is copied, then it probably should have been a forward range anyway). The semantics of what happens when copying any type of range are undefined, because it depends on how the range is implemented. For example, if you have auto copy = orig; copy.popFront(); and the type of orig and copy is a reference type, then popping off an element from copy popped off an element from orig, whereas if the range is a value type, then popping off an element from copy has no effect on orig. And if the range is a pseudo-reference type, then it'll probably do something like pop off an element from orig but not affect orig's front, but it could also be that it has no effect on orig (what exactly happens depends on how the range is implemented). Basically, if you copy a range, you can't do _anything_ with the original unless you assign it a value, because what happens when you call anything on the original depends on how its implemented and thus is undefined behavior in general. I talked about this problem in my dconf 2015 talk: http://dconf.org/2015/talks/davis.html Now, with regards to foreach specifically, foreach(e; range) { // do stuff } becomes for(auto __c = range; !__c.empty; __c.popFront()) { auto e = _c.front; // do stuff } Notice that the range is copied. So, yes, for a range to work with foreach, it's going to have to be copyable. disabling the postblit constructor is just going to cause you trouble. But the key thing here is that this means that once you use a range in a foreach loop, you can't use it for anything else. In generic code, you have to consider it to be consumed, because the state of range you passed to foreach is now undefined, since what happens when copying the range is undefined. This is true even if you put a break out of the loop, because the range was copied, and you simply cannot rely on the state of the range you passed to foreach after that copy. Now, if you know the exact semantics of a particular range type, and the code you're writing is not generic, then you have more leeway, but in generic code, you have to be very careful to make sure that you don't use a range that has been copied unless it's been assigned a new value - and that includes not using a range after passing it to foreach. We're frequently lax with this due to the fact that most ranges are value types or pseudo-reference types that act like value types, so they're not only forward ranges, but they're implicitly saved by a copy, and so we frequently get away with using a range after it's been copied, but it doesn't work in the general case. - Jonathan M Davis
Nov 10 2015
parent reply Jesse Phillips <Jesse.K.Phillips+D gmail.com> writes:
On Tuesday, 10 November 2015 at 16:07:01 UTC, Jonathan M Davis 
wrote:
 generic code, you have to consider it to be consumed, because 
 the state of range you passed to foreach is now undefined, 
 since what happens when copying the range is undefined. This is 
 true even if you put a break out of the loop, because the range 
 was copied, and you simply cannot rely on the state of the 
 range you passed to foreach after that copy.
The problem I find with generic code is when the desire is to consume the data. Take this example of parsing a data stream. auto osmData = datastream.take(size).array; datastream.popFrontN(size); auto header = BlobHeader(osmData); http://he-the-great.livejournal.com/49636.html How do I know if popFrontN is needed? If I was given a value base range then it is. If I was given a reference range (in its many forms) the extra call to popFrontN will result in an unneeded data jump. I could require that a forward range is passed in, then I can save() before calling .array and thus always require popFrontN. The best option is probably to use the RefRange wrapper, but it does create an annoying element of surprise.
Nov 11 2015
next sibling parent Ur nuz <neuranuz gmail.com> writes:
     auto osmData = datastream.take(size).array;
     datastream.popFrontN(size);
     auto header = BlobHeader(osmData);

 http://he-the-great.livejournal.com/49636.html

 How do I know if popFrontN is needed? If I was given a value 
 base range then it is. If I was given a reference range (in its 
 many forms) the extra call to popFrontN will result in an 
 unneeded data jump. I could require that a forward range is 
 passed in, then I can save() before calling .array and thus 
 always require popFrontN.

 The best option is probably to use the RefRange wrapper, but it 
 does create an annoying element of surprise.
Yes. I agree. It's good example of problem. I just had problem like this with *take* function, *startsWith* and others. So we have two kinds of ranges: with reference and value semantics. But both of them could be of reference or value *types* (struct or class). So how would we determine at CT (by generic algorithms) wheter current range has ref or value semantics (we couldn't just rely on testing if it's class or struct)!? I think it's important, because it can make influence on programme logic. And again I want to say that we must explicitly say (in doc) that we have two logical range categories, so new users would not make stupid mistakes. Also unittests would be good for ref and value ranges to illustrate our intentions.
Nov 11 2015
prev sibling parent Jonathan M Davis <jmdavisProg gmx.com> writes:
On Wednesday, 11 November 2015 at 22:34:32 UTC, Jesse Phillips 
wrote:
 On Tuesday, 10 November 2015 at 16:07:01 UTC, Jonathan M Davis 
 wrote:
 generic code, you have to consider it to be consumed, because 
 the state of range you passed to foreach is now undefined, 
 since what happens when copying the range is undefined. This 
 is true even if you put a break out of the loop, because the 
 range was copied, and you simply cannot rely on the state of 
 the range you passed to foreach after that copy.
The problem I find with generic code is when the desire is to consume the data. Take this example of parsing a data stream. auto osmData = datastream.take(size).array; datastream.popFrontN(size); auto header = BlobHeader(osmData); http://he-the-great.livejournal.com/49636.html How do I know if popFrontN is needed? If I was given a value base range then it is. If I was given a reference range (in its many forms) the extra call to popFrontN will result in an unneeded data jump. I could require that a forward range is passed in, then I can save() before calling .array and thus always require popFrontN. The best option is probably to use the RefRange wrapper, but it does create an annoying element of surprise.
Well, if we're talking forward ranges, then the only way to be 100% consistent with this is to basically consider datastream unusable after the call to take, because its state is undefined. So, the correct way to handle this would be to do something like auto osmData = datastream.save.take(size).array; datastream.popFrontN(size); Now, that's ugly, but it does ensure that the code will work correctly regardless of whether the range is a reference type, value type, or pseudo-reference type. And if you wanted to guaranteed reference semantics, as you say, you could use RefRange, though you probably do have to be careful about that. Regardless, it highlights how save needs to be called all over the place if you want ranges which are reference types to work consistently with value types. The bigger problem is pure input ranges. If you do auto osmData = datastream.take(size).array; then the state of datastream is undefined (at least in generic code), and you can't do _anything_ with it. If datastream is reference type, then using it would work just fine, since the first size elements would have been consumed, and we shouldn't have to worry about value types (because they can always be forward ranges - though it's technically possible for someone to not make them forward ranges even when they should be). However, we _do_ have a problem with pseudo-reference types. A pure input range pretty much has to be a reference type with regards to its elements (otherwise it could be a forward range), but stuff like caching front can turn it into a pseudo-reference type and totally break code like this. With a full-on reference type auto osmData = datastream.take(size).array; results in datastream being the same as it would have been had you called datastream.popFrontN(size) instead. But with a cached front, while the subsequent elements would be correct, front would be wrong. So, really, you're highlighting a really nasty aspect of this problem. As long as we allow pseudo-reference types for pure input ranges (and as I understand it, existing stuff like vibe.d has them), I don't see how we can make this code work correctly and access any of the elements in the range that were after the elements that were accessed via take. Actually, even with full-on reference types, we're kind of screwed with take and input ranges. take is lazy, so if you do auto osmData = datastream.take(size); datastream.popFrontN(size); and datastream is a reference type, then take will end up referring to to the second n elements, not the first. *sigh* Pure input ranges suck. It's ugly as all get out with forward ranges, but liberal use of save can guarantee consistent semantics. But with pure input ranges... I suspect that there's a whole pile of algorithms that technically should never be used with pure input ranges or which require that you be _very_ careful with them. It would be ludicrous to not have take work with input ranges, but it's quite clear that with a pure input range, calling take _and_ accessing elements beyond the ones taken isn't going to work unless you know that the range is a reference type, and you make sure that you iterate through the result of take _before_ doing anything with the rest of the range. I think that it's pretty clear that we need to re-examine how pure input ranges should work and either make a change to them or have some very clear guidelines on how to use them (which is not likely going to be easy to do correctly) and probably disallow them for most algorithms. Yuck. I'm definitely going to have to stew on this one. I've always thought that pure input ranges were too restrictive to be useful in most cases (though for some stuff you're pretty much stuck with them without doing a lot of buffering), but they seem to get worse every time we examine them in depth. - Jonathan M Davis
Nov 12 2015
prev sibling next sibling parent Jonathan M Davis <jmdavisProg gmx.com> writes:
On Tuesday, 10 November 2015 at 14:28:15 UTC, Shachar Shemesh 
wrote:
 It seems that "foreach" requires that the range be copyable, in 
 direct contradiction to the fact that Range is not a forward 
 range.
To add to what I said, whether a range is "copyable" has nothing to do with whether it is a forward range or not. What's supposed to be guaranteed with any and all ranges is that if you copy it, it has the exact same state as the original had - but _not_ that that state is separate from the original. So, if you have auto copy = orig; then you can pop elements off of copy like you would have popped them off of orig, but you must on longer pop elements off of orig. _copy_ is now the range. This is critical when you consider that most ranges will be copied when they're passed to a function. If they weren't copyable, they'd be almost useless.
 If you couple that with the extremely amorphous definition of 
 "save" for forward ranges, there seems to be quite a confusion 
 between the two.
There is nothing amorphous about the definition of save. save duplicates a range such that you have two ranges of the same type which are independent of one another and which contain exactly the same elements. save provides a way to get a copy of the range's state regardless of whether it's value type, reference type, or pseudo-reference type. Consider, for instance, a range which is implemented as a class. It's a full-on reference type, and copying it around will never result in its state being copied. Without save, there would be no way to duplicate it. And save provides the _only_ generic means of duplicating a range. It's a bug any time that generic code depends on the state of a range being duplicated when it's copied. Unfortunately, because most ranges _do_ duplicate their state when they're copied, there's plenty of code out there which is buggy and relies on ranges being duplicated when they're copied. That's why it's so critical to test algorithms with a variety of range types. - Jonathan M Davis
Nov 10 2015
prev sibling next sibling parent reply Ur nuz <neuranuz gmail.com> writes:
I agree with these considerations. When I define non-copyable 
range (with disabled this) lot of standard phobos functions fails 
to compile instead of using *save* method. So logical question is 
in which cases we should use plain old struct copy or and when we 
should use *save* on forward ranges.

Also good question is should we have input ranges copyable (or 
for what types of ranges they can be copyable)? Good example is 
network socket as input range, because we can't save the state of 
socket stream and get consumed data again so as I thing copying 
of such range looks meaningless (in my opinion). If we want to 
pass it somewhere it's better pass it by reference.

Also passing range somewhere to access it in two different places 
simultaneously is also bad idea. The current state looks like we 
have current approach with range postblit constructor and +save+, 
because we have it for structs and it works somehow (yet) for 
trivial cases. But we don't have clear intentions about how it 
should really work.

Copying and passing ranges should also be specifyed as part of 
range protocol, because it's very common use case and shouldn't 
be ambigous.

Also as far as range could be class object we must consider how 
should they behave?

Is there some ideas and understanding of what I am talking about?
Nov 10 2015
parent Jonathan M Davis <jmdavisProg gmx.com> writes:
On Tuesday, 10 November 2015 at 16:33:02 UTC, Ur nuz wrote:
 I agree with these considerations. When I define non-copyable 
 range (with disabled this) lot of standard phobos functions 
 fails to compile instead of using *save* method. So logical 
 question is in which cases we should use plain old struct copy 
 or and when we should use *save* on forward ranges.

 Also good question is should we have input ranges copyable (or 
 for what types of ranges they can be copyable)? Good example is 
 network socket as input range, because we can't save the state 
 of socket stream and get consumed data again so as I thing 
 copying of such range looks meaningless (in my opinion). If we 
 want to pass it somewhere it's better pass it by reference.
Passing by reference really doesn't work with ranges. Consider that most range-based functions are lazy and wrap the range that they're given in a new range. e.g. auto r = filter!pred(range); or auto r = map!func(range); The range has to be copied for that to work. And even if you could make it so that the result of functions like map or filter referred to the original range by reference, their return value would not be returned by ref, so if a function required that its argument by passed by ref, then you couldn't chain it. So, requiring that ranges be passed by ref would pretty much kill function chaining.
 Also passing range somewhere to access it in two different 
 places simultaneously is also bad idea. The current state looks 
 like we have current approach with range postblit constructor 
 and +save+, because we have it for structs and it works somehow 
 (yet) for trivial cases. But we don't have clear intentions 
 about how it should really work.
It's mostly clear, but it isn't necessarily straightforward to get it right. If you want to duplicate a range, then you _must_ use save. Copying a range by assigning it to another range is not actually copying it per the range API. You pretty much have to consider it a move and consider the original unusable after the copy. The problem is that for arrays and many of the common ranges, copying the range and calling save are semantically the same, so it's very easy to write code which assumes that behavior and then doesn't work with other types of changes. That's why it's critical to test range-based functions with a variety of ranges types - particularly reference types in addition to value types or dynamic arrays.
 Copying and passing ranges should also be specifyed as part of 
 range protocol, because it's very common use case and shouldn't 
 be ambigous.
The semantics of copying a range depend heavily on how a range is implemented and cannot be defined in the general case: auto copy = orig; Dynamic arrays and classes will function fundamentally differently, and with structs, there are a variety of different semantics that that copy could have. What it ultimately comes down to is that while the range API can require that the copy be in the exact same state that the original was in, it can't say anything about the state of the original after the copy. Well-behaved range-based code has to assume that once orig has been copied, it is unusable. If the code wants to actually get a duplicate of the range, then it will have to use save, and the semantics of that _are_ well-defined and do not depend on the type of the range.
 Also as far as range could be class object we must consider how 
 should they behave?
There's really nothing to consider here. It's known how they should behave. There's really only one way that they _can_ behave. One of the main reasons that save exists is because of classes. While copying a dynamic array or many struct types is equivalent to save, it _can't_ be equivalent with a class. When you consider that fact, the required behavior of ranges pretty much falls into place on its own. We may very well need to be far clearer about what those semantics are and how that affects best practices, but there really isn't much (if any) wiggle room in what the range API does and doesn't guarantee and how it should be used. The problem is whether it's _actually_ used that way. If a range-based function is tested with a variety of range types - dynamic arrays, value types, reference types, etc. then it becomes clear very quickly when calls to save are required and how the function must be written to work for all of those range types. But far too often, range-based functions are tested with dynamic arrays and a few struct range types that wrap dynamic arrays, and bugs with regards to reference type ranges are not found. So, there's almost certainly a lot of range-based code out there that works fantastically with dynamic arrays but would fail miserably with a number of other range types. For the most part, I think that it's pretty clear how ranges have to act and how they need to be used based on their API when you actually look at how the range API interacts with different types of ranges, but we often do not go much beyond dynamic arrays and miss out on some of the subtleties. We really do need some good write-ups on ranges and their best practices. I've worked on that before but never managed to spend the time to finish it. Clearly, I need to fix that. - Jonathan M Davis
Nov 10 2015
prev sibling parent reply Steven Schveighoffer <schveiguy yahoo.com> writes:
On 11/10/15 9:28 AM, Shachar Shemesh wrote:
 Please consider the following program:
The problem is that you defined a forward range without implementing all the primitives :) That is, you can easily make a copy of the range so it iterates from the current spot (the quintessential feature of forward ranges). You actually have to do *extra work* to make it a true input-but-not-forward-range. I've never made it a secret that I dislike the 'save' requirement. In my experience, the only time you ever implement 'save' is to do the same thing as copying the range via =. To the point where people simply don't use the .save member, and their code always works, so what is the point? It may as well have been an enum isCopyable. -Steve
Nov 10 2015
parent reply Jonathan M Davis <jmdavisProg gmx.com> writes:
On Tuesday, 10 November 2015 at 17:16:41 UTC, Steven 
Schveighoffer wrote:
 I've never made it a secret that I dislike the 'save' 
 requirement. In my experience, the only time you ever implement 
 'save' is to do the same thing as copying the range via =. To 
 the point where people simply don't use the .save member, and 
 their code always works, so what is the point? It may as well 
 have been an enum isCopyable.
Well, it's not that hard to come up with a range that has to implement save, because copying it doesn't save it. The obvious example is if it's implemented as a class. Another would be std.range.refRange which exists specifically to turn a non-reference type range into a reference type range. And really, any range that's an input range but not a forward range is in that boat, because if copying it duplicated it, then it could be a forward range. I really don't see any way around having something like save without artificially restricting ranges to types which are implicitly saved when copied (which would pretty much mean getting rid of pure input ranges), but even if there were clearly a better way, that's a pretty big change at this stage. - Jonathan M Davis
Nov 10 2015
next sibling parent "H. S. Teoh via Digitalmars-d" <digitalmars-d puremagic.com> writes:
On Tue, Nov 10, 2015 at 06:38:22PM +0000, Jonathan M Davis via Digitalmars-d
wrote:
 On Tuesday, 10 November 2015 at 17:16:41 UTC, Steven Schveighoffer wrote:
I've never made it a secret that I dislike the 'save' requirement. In
my experience, the only time you ever implement 'save' is to do the
same thing as copying the range via =. To the point where people
simply don't use the .save member, and their code always works, so
what is the point?  It may as well have been an enum isCopyable.
Well, it's not that hard to come up with a range that has to implement save, because copying it doesn't save it. The obvious example is if it's implemented as a class. Another would be std.range.refRange which exists specifically to turn a non-reference type range into a reference type range. And really, any range that's an input range but not a forward range is in that boat, because if copying it duplicated it, then it could be a forward range. I really don't see any way around having something like save without artificially restricting ranges to types which are implicitly saved when copied (which would pretty much mean getting rid of pure input ranges), but even if there were clearly a better way, that's a pretty big change at this stage.
[...] There are a few ranges in Phobos that needs to do non-trivial things in .save, that otherwise would not work properly. IIRC groupBy is one of them. I'm pretty sure there are one or two others. I've also needed to do this in my own code. All it takes is for one range to need non-trivial implementation in .save, and all the other wrapper ranges will need to do the same, in order to be semantically correct (e.g., if they wrap around a range in .r, then the copy returned by .save must have r = this.r.save, otherwise the returned range will have side-effects on the original range). T -- Meat: euphemism for dead animal. -- Flora
Nov 10 2015
prev sibling parent reply Steven Schveighoffer <schveiguy yahoo.com> writes:
On 11/10/15 1:38 PM, Jonathan M Davis wrote:
 On Tuesday, 10 November 2015 at 17:16:41 UTC, Steven Schveighoffer wrote:
 I've never made it a secret that I dislike the 'save' requirement. In
 my experience, the only time you ever implement 'save' is to do the
 same thing as copying the range via =. To the point where people
 simply don't use the .save member, and their code always works, so
 what is the point? It may as well have been an enum isCopyable.
Well, it's not that hard to come up with a range that has to implement save, because copying it doesn't save it. The obvious example is if it's implemented as a class.
IMO, that shouldn't be a forward range. But in any case, the correct mechanism is: forward range -> a = b works and makes a copy of the iteration. non-forward range -> a = b fails, you'd have to use a = b.getRef or something like that -or- a = b is a moving operation (a is no longer usable) Classes wouldn't work with this because you can't overload assignment, but the whole range hierarchy system cannot be changed at this point anyway. Moving on... -Steve
Nov 10 2015
next sibling parent Ur nuz <neuranuz gmail.com> writes:
I think that we should document it somewhere in order to prevent 
future mistakes in algorithms that copies and +save+s ranges. I 
think that lots of algorithms could rely on that initializing new 
range or copying via opAssign will create independent range, but 
not really all do it. So we must somehow pay attention to it, 
write some article on working with ranges, because there are a 
lot of aspects that not clear for me and other. This could make 
library much better.
Nov 10 2015
prev sibling parent reply Dominikus Dittes Scherkl <Dominikus.Scherkl continental-corporation.com> writes:
On Tuesday, 10 November 2015 at 18:57:31 UTC, Steven 
Schveighoffer wrote:
 IMO, that shouldn't be a forward range. But in any case, the 
 correct mechanism is:

 forward range -> a = b works and makes a copy of the iteration.
 non-forward range -> a = b fails, you'd have to use a = 
 b.getRef or something like that -or- a = b is a moving 
 operation (a is no longer usable)
Hmm. You mean "b is no longer usable", right? So, any algorithm requiring a copy should better do something like static if(isForwardRange!range) copy = range; else copy = range.save();
Nov 11 2015
parent reply Steven Schveighoffer <schveiguy yahoo.com> writes:
On 11/11/15 4:20 AM, Dominikus Dittes Scherkl wrote:
 On Tuesday, 10 November 2015 at 18:57:31 UTC, Steven Schveighoffer wrote:
 IMO, that shouldn't be a forward range. But in any case, the correct
 mechanism is:

 forward range -> a = b works and makes a copy of the iteration.
 non-forward range -> a = b fails, you'd have to use a = b.getRef or
 something like that -or- a = b is a moving operation (a is no longer
 usable)
Hmm. You mean "b is no longer usable", right?
Heh, right :)
 So, any algorithm requiring a copy should better do something like

 static if(isForwardRange!range)
     copy = range;
 else
     copy = range.save();
No, it should do this: static if(!isForwardRange!range) assert(0, "I need a forward range") save doesn't enter the picture, it would be eliminated. But again, this isn't going to happen. Note my specification above is for a fictional proposal of what I would have done. The problem with the current save regime is that simple assignment for forward ranges works too, but it doesn't always do what you want (and diabolically, 99% of the time it DOES do what you want). -Steve
Nov 12 2015
parent reply Jonathan M Davis <jmdavisProg gmx.com> writes:
On Thursday, 12 November 2015 at 15:29:19 UTC, Steven 
Schveighoffer wrote:
 (and diabolically, 99% of the time it DOES do what you want).
_This_ is the big problem. It wouldn't surprise me in the least if the vast majority of range-based code out there does not actually work properly with ranges which aren't implicitly saved when they're copied. I mean, technically, you have to do stuff like haystack.save.startsWith(needle) instead of haystack.startsWith(needle) if you want your code to work right with all forward ranges, but almost no one does that. And 99% of the time the code works - but not always. - Jonathan M Davis
Nov 12 2015
parent Tobias Pankrath <tobias pankrath.net> writes:
On Thursday, 12 November 2015 at 15:43:50 UTC, Jonathan M Davis 
wrote:
 On Thursday, 12 November 2015 at 15:29:19 UTC, Steven 
 Schveighoffer wrote:
 (and diabolically, 99% of the time it DOES do what you want).
_This_ is the big problem. It wouldn't surprise me in the least if the vast majority of range-based code out there does not actually work properly with ranges which aren't implicitly saved when they're copied. I mean, technically, you have to do stuff like haystack.save.startsWith(needle) instead of haystack.startsWith(needle) if you want your code to work right with all forward ranges, but almost no one does that. And 99% of the time the code works - but not always. - Jonathan M Davis
https://issues.dlang.org/show_bug.cgi?id=11951
Nov 12 2015