digitalmars.D - Input ranges

"Ulrich =?UTF-8?B?S8O8dHRsZXIi?= <kuettler gmail.com> (8/8) Apr 20 2015 Hi,

Jonathan M Davis via Digitalmars-d (79/87) Apr 20 2015 Honestly, the cleanest behavior is that all ranges be reference types, a...

"Ulrich =?UTF-8?B?S8O8dHRsZXIi?= <kuettler gmail.com> writes:

Hi,

I tried to learn about input ranges and got some anonymous advice 
http://forum.dlang.org/thread/dezlxxygufocmafvleen forum.dlang.org 
, still the topic seems to warrant broader attention.

The question is how pure input ranges (as in non-forward) should 
be build. Algorithms like take and groupBy require reference 
semantics. I did not find that communicated. Now I am looking for 
the official ruling. As in: What the D rule book says.

Apr 20 2015

Jonathan M Davis via Digitalmars-d <digitalmars-d puremagic.com> writes:

On Monday, April 20, 2015 07:54:32 via Digitalmars-d wrote:
 Hi,

 I tried to learn about input ranges and got some anonymous advice
 http://forum.dlang.org/thread/dezlxxygufocmafvleen forum.dlang.org
 , still the topic seems to warrant broader attention.

 The question is how pure input ranges (as in non-forward) should
 be build. Algorithms like take and groupBy require reference
 semantics. I did not find that communicated. Now I am looking for
 the official ruling. As in: What the D rule book says.

Honestly, the cleanest behavior is that all ranges be reference types, and
save be required to copy the range. However, there are no actual
requirements that that be the case, and structs and arrays are what is most
frequently used for ranges, and they tend to be half-reference types or
value types. So, passing a range to a function usually ends up being
equivalent to a call to save, which makes it so that many range-based
functions don't work properly with class ranges, since they usually aren't
tested with them. And since arrays certainly aren't going to be full
reference types, it just isn't possible to require that a forward range be a
full reference type, so we need to support value type ranges, half reference
type ranges, and full reference type ranges when it comes to forward ranges,
and we have to deal with the fact that copying them _can_ be equivalent to
saving them without assuming that it is. :|

So, as far as forward ranges go, you shouldn't rely on copying a range being
equivalent to save (otherwise save wouldn't need to exist), but
realistically, if you pass a range to a function, you can't simply use that
range after. Many ranges will have saved themselves in the process, but not
all of them will have. The range in the caller is still valid (fortunately),
but whether it's in the same state that it was before calling the other
function depends entirely on whether copying the range is equivalent to save
or not, meaning that you just can't use a range after you've passed it to
another function unless you called save on it (forcing copy semantics) or
passed it by ref (forcing reference semantics). Unfortunately, there are
almost certainly lingering bugs in Phobos due to functions which assume that
passing range to another function implicitly saves it, simply because that's
what happens with such a large percentage of ranges.

With regard to pure input ranges, they're not forward ranges, because what
they're iterating over cannot be saved. So, by their very nature, they have
to be at least partially reference types and folks will often assume that
they are full reference types. The problem is that they can also be half
reference types in that they could have saved some state internally (e.g.
front) but still not be able to actually have a proper save function which
saves all of their state. So, if you pass a pure input range to a function,
then just like forward ranges, you cannot assume that you can use it again
without passing it by reference, even though by definition, their internal
state will have changed at least partially due to whatever iteration is done
within the function that it's passed to (since it has to be at least a
partial reference type to be a pure input range). However, unlike with a
forward range, the input range is not a full reference type, then it won't
be left in a valid state, since it won't have be saved, just had some of its
contents passed by reference and some of them by value.

All in all, I think that input ranges are incredibly annoying to work with,
and I'm frequently tempted to argue that they just shouldn't exist. They're
just too restrictive and tend to have weird behaviors, but unfcortunately,
there are cases where it's prohibitively expensive to have a forward range.
:|

In any case, all that is presently _required_ of ranges is what the traits
isInputRange, isForwardRange, etc. test for. We can't actually require more
than that, because the compiler can't check for more. What we _can_ do (and
has been discussed but never fully sorted out) is write up some additional
rules about what is expected by Phobos and encourage the community at large
to follow them with their ranges, but those can't actually be enforced, and
we've never been able to agree on those rules (particularly since, in most
cases, they aren't necessary, meaning that it only comes up in the corner
cases and hasn't been enough of a pain point to force change). For instance,
many of us argued that it should not be legal for empty to do any work, but
counter-examples were given where it was necessary. So, it becomes very
difficult to place additional requirements beyond what the API itself says -
even as guidelines with regards to what the standard library expects.

Now, as to pure input ranges and reference semantics, allowing half
reference type ranges is very dangerous, because any time that they're
copied, and the copy is iterated at all, the original will be left in an
invalid state (unlike with forward ranges). So, we need to either require
that all pure input ranges be full reference types or require that pure
input ranges never be used after they're copied (since whether the original
is still valid or not will depend on whether it's a full reference type or
not).

I'd like to require that all pure input ranges be full reference types, but
if a function is going to operate both on pure input ranges and on forward
ranges, then you can't assume that the range is a reference type anyway,
because the forward range probably won't be. That being the case, what we
probably need to do is to publish the guideline that _no_ range ever be used
after it has been copied (be it a forward range or a pure input range).
Then, any function which wants reference semantics will need to pass by ref
or use std.range.RefRange to wrap the range, and any function which doesn't
want reference semantics, will need to restrict itself to forward ranges and
use save explicitly. I don't think that we really have any other option.

- Jonathan M Davis

Apr 20 2015

D Programming

C/C++ Programming

Other

digitalmars.D - Input ranges