www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Issue with forward ranges which are reference types

reply Jonathan M Davis <jmdavisProg gmx.com> writes:
Sorry that this is long, but it's very important IMHO, and I don't know how to 
make it much shorter and cover what it's supposed to cover. 

Okay. Your typical forward range is either an array a struct which is a value 
type (that is, copying it creates an independent range which points to the 
same elements and is not altered if the original range is altered - the 
elements that it points to aren't copied of course). So, when you want to get 
a copy, it's as easy as

auto rangeCopy = range;

It was previously determined that this would be a problem for ranges which are 
reference types (classes in particular, but it affects structs as well, if 
copying them doesn't create an independent range). So, we added the save 
property.

auto rangeCopy = range.save;

That way, when we need to save the state of a range, we always use save, even 
if that particular range would be copied by a simple assignement. Okay. So far 
so good. There is one major problem with the current situation though: the 
behavior of value-type and reference-type forward ranges is very different when 
they are passed to range-based functions.

We use save within a particular algorithm when we know that we need to copy a 
range's state, but simply passing a range to a function may or may not copy a 
range. Take this possible implementation of a drop function, for instance:

R drop(R)(R range, size_t n)
    if(isInputRange!R)
{
    popFrontN(range, n);
    return range;
}

It pops n elements off of the range and returns the range sans those elements. 
In the case of an input range, the original range is n elements shorter - as 
expected. In the case of forward ranges though, it varies. If you pass an 
array or a struct with value semantics to drop, then the original range is not 
altered. Just passing it to drop is equivalent to calling save. However, if 
you use a range which is a reference type, then just like with an input range, 
the original range is altered. So, the behavior of code could change 
drastically depending on whether it's given a range which is a value type or a 
range which is a reference type - even if they're both forward ranges.

auto valRange = "hello world";
assert(equal(drop(valRange, 5), " world");
assert(equal(valRange, "hello world");

auto refRange = createRefRange("hello world");
assert(equal(drop(refRange, 5), " world");
assert(equal(refRange, " world"));

So, the question is, should a range-based function have the same behavior for 
all forward ranges regardless of whether they're value types or reference 
types? Or should the caller be aware of whether a range is a value type or a 
reference type and call save if necessary? Or should the caller just always 
call save when passing a forward range to a function?

Option 1: If we make it so that all functions behave identically for all 
forward ranges, then we're going to need something like this at the beginning 
of every range-based function for all ranges passed to that function:

static if(isForwardRange!R) range = range.save;

This is definitely a bit tedious, but it's completely doable. And with some 
proper tests for it, it's easy to catch whether a function actually does this 
like it's supposed to. However, while in most cases, the compiler should be 
able to optimize out this assignment for structs, it can't always do it. IIUC, 
if the struct defines a postblit constructor or if any of its member variables 
define a postblit constructor (or if any of their member variables declare a 
postblite construcotr, or any of their member variables' member variables...), 
then the save call and assignment won't be able to be optimized out, and there 
will be a performance cost. However, it's likely that very few range types 
will have postblit constructors, given the usual simplicity of their design 
and the fact if they actually need a postblit constructor, they can probably 
just forgoe it in favor of using the save property for all copying, making 
them reference types.

Option 2: On the other hand, we could make it so that the caller just has to 
be aware of whether a range is value type or a reference type and always call 
save when passing a reference type forward range to a function. This avoids 
the potential performance penalty but is very error-prone. Instead of the 
range-based function worrying about it, now _every_ function that calls a 
range-based function must worry about it, and that's quite likely to be error-
prone. It's also likely to be somewhat problematic in generic code (which 
range-based code frequently is), since you can't exactly test for whether a 
range is a reference type or not, forcing you to pretty much call save all of 
the time when passing ranges to functions in generic code.

Option 2 is essentially what we've been doing in Phobos except that we don't 
actually test reference type ranges at all, and the odds are that a lot of 
Phobos doesn't actually handle reference type ranges correctly - meaning that 
even if the caller remembers to call save before passing such a range to a 
Phobos range-based function, there's a good chance that the function won't 
work correctly.

Option 3: Or, we could just say that you should _always_ call save when 
passing a forward range to a function. That way, you avoid the issue of trying 
to figure out whether a range is a reference type or not. It's also less error-
prone in that the fact that you're always doing it makes you less likely to 
forget to do it when you need to. However, you have the irritation of having 
to check for input ranges (since you can't call save on them) and essentially 
end up doing the exact same thing as option 1 except that you're doing it at 
every call point instead of once inside of the function definition. So, you 
really haven't gained much over putting it inside of the function and made it 
more error-prone than option 1since you have to remember to do it everywhere 
that you call a function instead of just the once in its definition.

So, the question is which option is the better one? Or is there another option 
that I haven't thought of? It's quite clear to me that we're going to need to 
add unit tests to Phobos to verify the behavior of Phobos functions when 
dealing with ranges which are reference types, but I think that we need a 
clear strategy on how to deal with the fact that value-type forward ranges are 
automatically saved when they're passed to a function whereas reference-type 
forward ranges are not.

Thoughts?

- Jonathan M Davis
Aug 16 2011
next sibling parent reply Mehrdad <wfunction hotmail.com> writes:
On 8/16/2011 9:05 PM, Jonathan M Davis wrote:
 Sorry that this is long, but it's very important IMHO, and I don't know how to
 make it much shorter and cover what it's supposed to cover.

 Okay. Your typical forward range is either an array a struct which is a value
 type (that is, copying it creates an independent range which points to the
 same elements and is not altered if the original range is altered - the
 elements that it points to aren't copied of course).<snip>
 Thoughts?

 - Jonathan M Davis
Funny, I was also thinking about this recently. The trouble is that that's not the only issue. There's also the issue with polymorphism -- i.e., InputRangeObject is pretty much *useless* right now because no function ever checks for it (AFAIK... am I wrong?). So if you pass a random-access range object as an InputRange, the callee will just assume it's an InputRange and would reject it. So you're forced to downcast every time, which is really tedious. Things don't "just work" anymore.
Aug 16 2011
next sibling parent Jonathan M Davis <jmdavisProg gmx.com> writes:
On Tuesday, August 16, 2011 21:17:31 Mehrdad wrote:
 On 8/16/2011 9:05 PM, Jonathan M Davis wrote:
 Sorry that this is long, but it's very important IMHO, and I don't know
 how to make it much shorter and cover what it's supposed to cover.
 
 Okay. Your typical forward range is either an array a struct which is a
 value type (that is, copying it creates an independent range which
 points to the same elements and is not altered if the original range is
 altered - the elements that it points to aren't copied of
 course).<snip>
 Thoughts?
 
 - Jonathan M Davis
Funny, I was also thinking about this recently. The trouble is that that's not the only issue. There's also the issue with polymorphism -- i.e., InputRangeObject is pretty much *useless* right now because no function ever checks for it (AFAIK... am I wrong?). So if you pass a random-access range object as an InputRange, the callee will just assume it's an InputRange and would reject it. So you're forced to downcast every time, which is really tedious. Things don't "just work" anymore.
Phobos' functions pretty much always have template constraints to verify that they're given the correct type of range, and if they don't then they're supposed to. So, lots of functions have isInputRange!R, and most of the other range types are subtypes of input ranges, so checking for them also checks for input ranges - e.g. isForwardRange!R. Polymorphism has _nothing_ to do with the range API though. If you're dealing with a range type which is a class or interface, then it must implement all of the appropriate functions for the type (or types) of range(s) that it's supposed to be. If the calls happen to be polymorphic, that's fine, but the range-based functions don't care. Whether a type is a particular type of range or not is _entirely_ a matter of its API. So, I don't quite understand what your issue is here. - Jonathan M Davis
Aug 16 2011
prev sibling parent reply Jesse Phillips <jessekphillips+d gmail.com> writes:
On Tue, 16 Aug 2011 21:17:31 -0700, Mehrdad wrote:

 On 8/16/2011 9:05 PM, Jonathan M Davis wrote:
 Sorry that this is long, but it's very important IMHO, and I don't know
 how to make it much shorter and cover what it's supposed to cover.

 Okay. Your typical forward range is either an array a struct which is a
 value type (that is, copying it creates an independent range which
 points to the same elements and is not altered if the original range is
 altered - the elements that it points to aren't copied of
 course).<snip> Thoughts?

 - Jonathan M Davis
Funny, I was also thinking about this recently. The trouble is that that's not the only issue. There's also the issue with polymorphism -- i.e., InputRangeObject is pretty much *useless* right now because no function ever checks for it (AFAIK... am I wrong?). So if you pass a random-access range object as an InputRange, the callee will just assume it's an InputRange and would reject it. So you're forced to downcast every time, which is really tedious. Things don't "just work" anymore.
All of the range functions check for functionality, so if your random- access range object contains, popFront, front, empty (which it is required to to be random-access range) then it will be accepted as an InputRange. Considering your work I'm sure you know this so I'm probably misunderstanding what point your are making?
Aug 16 2011
parent reply Mehrdad <wfunction hotmail.com> writes:
On 8/16/2011 9:37 PM, Jesse Phillips wrote:
 On Tue, 16 Aug 2011 21:17:31 -0700, Mehrdad wrote:

 On 8/16/2011 9:05 PM, Jonathan M Davis wrote:
 Sorry that this is long, but it's very important IMHO, and I don't know
 how to make it much shorter and cover what it's supposed to cover.

 Okay. Your typical forward range is either an array a struct which is a
 value type (that is, copying it creates an independent range which
 points to the same elements and is not altered if the original range is
 altered - the elements that it points to aren't copied of
 course).<snip>  Thoughts?

 - Jonathan M Davis
Funny, I was also thinking about this recently. The trouble is that that's not the only issue. There's also the issue with polymorphism -- i.e., InputRangeObject is pretty much *useless* right now because no function ever checks for it (AFAIK... am I wrong?). So if you pass a random-access range object as an InputRange, the callee will just assume it's an InputRange and would reject it. So you're forced to downcast every time, which is really tedious. Things don't "just work" anymore.
All of the range functions check for functionality, so if your random- access range object contains, popFront, front, empty (which it is required to to be random-access range) then it will be accepted as an InputRange.
Right, but the problem is that none of this template business (e.g. isInputRange!T, hasLength!T, etc.) works if the input is an Object that implements InputRange. For example, consider this: static Object getItems() { return inputRangeObject([1, 2]); } Object collection = getItems(); if (collection.empty) //Whoops... { ... } The caller has no idea what kind of range is returned by getItems(), but he still needs to be able to check whether it's empty. How can he figure this out? He would be forced to cast (which is by itself a pretty bad option), but what can he cast the object to? InputRange!Object doesn't work because it could be an InputRange!string or something. There's really NO way (that I know of) for the caller to test and see if the collection is an input range, unless he knows the Java). Hope that makes sense...
Aug 16 2011
next sibling parent reply Jonathan M Davis <jmdavisProg gmx.com> writes:
On Tuesday, August 16, 2011 23:20:14 Mehrdad wrote:
 On 8/16/2011 9:37 PM, Jesse Phillips wrote:
 On Tue, 16 Aug 2011 21:17:31 -0700, Mehrdad wrote:
 On 8/16/2011 9:05 PM, Jonathan M Davis wrote:
 Sorry that this is long, but it's very important IMHO, and I don't
 know
 how to make it much shorter and cover what it's supposed to cover.
 
 Okay. Your typical forward range is either an array a struct which
 is a
 value type (that is, copying it creates an independent range which
 points to the same elements and is not altered if the original range
 is
 altered - the elements that it points to aren't copied of
 course).<snip>  Thoughts?
 
 - Jonathan M Davis
Funny, I was also thinking about this recently. The trouble is that that's not the only issue. There's also the issue with polymorphism -- i.e., InputRangeObject is pretty much *useless* right now because no function ever checks for it (AFAIK... am I wrong?). So if you pass a random-access range object as an InputRange, the callee will just assume it's an InputRange and would reject it. So you're forced to downcast every time, which is really tedious. Things don't "just work" anymore.
All of the range functions check for functionality, so if your random- access range object contains, popFront, front, empty (which it is required to to be random-access range) then it will be accepted as an InputRange.
Right, but the problem is that none of this template business (e.g. isInputRange!T, hasLength!T, etc.) works if the input is an Object that implements InputRange. For example, consider this: static Object getItems() { return inputRangeObject([1, 2]); } Object collection = getItems(); if (collection.empty) //Whoops... { ... } The caller has no idea what kind of range is returned by getItems(), but he still needs to be able to check whether it's empty. How can he figure this out? He would be forced to cast (which is by itself a pretty bad option), but what can he cast the object to? InputRange!Object doesn't work because it could be an InputRange!string or something. There's really NO way (that I know of) for the caller to test and see if the collection is an input range, unless he knows the Java). Hope that makes sense...
If you're dealing with a reference for a type that implements the functions for input range or forward range or whatever, then it's not an issue. It'll work with functions that require those range types. If you're dealing with a reference that doesn't implement the appropriate functions, then it isn't even if the actual type does. You could cast to the actual type and use that, but that pretty much assumes that you know the actual type - or if you don't you end up having to do something like auto ir = cast(InputRange)obj; if(ir) //call range func else //do whatever you do if you can't call the range func However, that OO _normally_ works is that you use a reference which is the type that you want to treat the object as. So, all of the code using that reference only deals with functionality that that reference type has. You don't usually cast it to other types to try and do other stuff. So, if a function needs the InputRange class/interface, then that's the reference that you use for that particular variable, and then whatever you assign it to (from a function parameter or a return function or whatever) is a type which derives from or implements InputRange. And it all works just fine. It's usually _bad_ OO to be casting between object types. In particular, actually using the base Object class is usually a _bad_ idea. Some languages do that for their containers because they lack proper templates, but then you have to worry about casting the objects to the correct type when you get them but when they added generics, they made it so that only the internal implementation works that way. The generics take care of keeping track of the actual type of the objects in the container and you don't have to cast anymore (though the casts still occur underneath the hood). So, that improves the situation considerably. But regardless, good OO design does not usually require casting from a base class or interface to a derived class. So, the issue that you're describing just doesn't happen in good OO code. - Jonathan M Davis
Aug 16 2011
parent Mehrdad <wfunction hotmail.com> writes:
On 8/16/2011 11:41 PM, Jonathan M Davis wrote:
 On Tuesday, August 16, 2011 23:20:14 Mehrdad wrote:
 Right, but the problem is that none of this template business (e.g.
 isInputRange!T, hasLength!T, etc.) works if the input is an Object that
 implements InputRange.

 For example, consider this:

       static Object getItems()
       { return inputRangeObject([1, 2]); }

       Object collection = getItems();
       if (collection.empty)  //Whoops...
       {
           ...
       }

 The caller has no idea what kind of range is returned by getItems(), but
 he still needs to be able to check whether it's empty.

 How can he figure this out? He would be forced to cast (which is by
 itself a pretty bad option), but what can he cast the object to?
 InputRange!Object doesn't work because it could be an InputRange!string
 or something. There's really NO way (that I know of) for the caller to
 test and see if the collection is an input range, unless he knows the

 Java).

 Hope that makes sense...
If you're dealing with a reference for a type that implements the functions for input range or forward range or whatever, then it's not an issue. It'll work with functions that require those range types. If you're dealing with a reference that doesn't implement the appropriate functions, then it isn't even if the actual type does. You could cast to the actual type and use that, but that pretty much assumes that you know the actual type - or if you don't you end up having to do something like auto ir = cast(InputRange)obj;
That doesn't compile. I think you missed the entire point of my comment -- you have no idea what it's an input range OF. Read below.
 if(ir)
      //call range func
 else
     //do whatever you do if you can't call the range func

 However, that OO _normally_ works is that you use a reference which is the
 type that you want to treat the object as. So, all of the code using that
 reference only deals with functionality that that reference type has. You
 don't usually cast it to other types to try and do other stuff. So, if a
 function needs the InputRange class/interface, then that's the reference that
 you use for that particular variable, and then whatever you assign it to (from
 a function parameter or a return function or whatever) is a type which derives
 from or implements InputRange. And it all works just fine.

 It's usually _bad_ OO to be casting between object types. In particular,
 actually using the base Object class is usually a _bad_ idea. Some languages
 do that for their containers because they lack proper templates, but then you
 have to worry about casting the objects to the correct type when you get them

 but when they added generics, they made it so that only the internal
 implementation works that way. The generics take care of keeping track of the
 actual type of the objects in the container and you don't have to cast anymore
 (though the casts still occur underneath the hood). So, that improves the
 situation considerably.

 But regardless, good OO design does not usually require casting from a base
 class or interface to a derived class. So, the issue that you're describing
 just doesn't happen in good OO code.

 - Jonathan M Davis
I think you missed my point. My point wasn't "What if all you have is an Object reference?", but rather "What if you don't know the _kind_ of InputRange(T) an object is?". i.e. You might know very well that a piece of code returns an InputRange(T) where T is _SOME_ subclass of a class you know, but not have any idea what T is. When does that happen? When you're dealing with covariance and contravariance. You get back a container of SOME kind of object reference, but you don't know what kind of container it is, so there's NO way for you to "just cast it" to InputRange(T) because you don't know what T would be. situation by letting you return InputRange!T where T is some BASE class of what you have. But you can't do that in D (...AFAIK?) so you're forced to return an Object, hence my example. Of course, your argument is that we need to use The Template Hammer, and make the CALLER be a template, so that it can accept anything. The problem with which is that now your template leaks, i.e. now you're forcing a whole bunch of other code to become templated, when it really doesn't need to be. You consider that a good idea, but I think you're completely ignoring the fact that the entire concept of a shared library is to SHARE code. Once you "templatize" a piece of code and then force everything else to follow suit, then you can't share the same code -- it's NEW code EVERY time. So unless you also consider shared libraries to be an indicator of Bad Coding ("only n00bs don't know EVERYTHING at compile time") I'm just confused at how you can think that The Template Hammer should be used for every nail. It /just fails/ when you're making a shared library.
Aug 17 2011
prev sibling parent reply "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Wed, 17 Aug 2011 02:20:14 -0400, Mehrdad <wfunction hotmail.com> wrote:

 On 8/16/2011 9:37 PM, Jesse Phillips wrote:
 On Tue, 16 Aug 2011 21:17:31 -0700, Mehrdad wrote:

 On 8/16/2011 9:05 PM, Jonathan M Davis wrote:
 Sorry that this is long, but it's very important IMHO, and I don't  
 know
 how to make it much shorter and cover what it's supposed to cover.

 Okay. Your typical forward range is either an array a struct which is  
 a
 value type (that is, copying it creates an independent range which
 points to the same elements and is not altered if the original range  
 is
 altered - the elements that it points to aren't copied of
 course).<snip>  Thoughts?

 - Jonathan M Davis
Funny, I was also thinking about this recently. The trouble is that that's not the only issue. There's also the issue with polymorphism -- i.e., InputRangeObject is pretty much *useless* right now because no function ever checks for it (AFAIK... am I wrong?). So if you pass a random-access range object as an InputRange, the callee will just assume it's an InputRange and would reject it. So you're forced to downcast every time, which is really tedious. Things don't "just work" anymore.
All of the range functions check for functionality, so if your random- access range object contains, popFront, front, empty (which it is required to to be random-access range) then it will be accepted as an InputRange.
Right, but the problem is that none of this template business (e.g. isInputRange!T, hasLength!T, etc.) works if the input is an Object that implements InputRange. For example, consider this: static Object getItems() { return inputRangeObject([1, 2]); } Object collection = getItems(); if (collection.empty) //Whoops... { ... } The caller has no idea what kind of range is returned by getItems(), but he still needs to be able to check whether it's empty.
What you are looking for is dynamic typing. That is not supported directly by Object. That is, you have to know *statically* (i.e. at compile time) that *all* instances returned by getItems have an empty property. Object does not have that property, so you used the wrong return type.
 How can he figure this out? He would be forced to cast (which is by  
 itself a pretty bad option), but what can he cast the object to?   
 InputRange!Object doesn't work because it could be an InputRange!string  
 or something. There's really NO way (that I know of) for the caller to  
 test and see if the collection is an input range, unless he knows the  

 Java).
Casting is actually the correct solution. A cast from a based to a derived class is not unsafe as long as you forward the type modifiers (like const): if(auto irange = cast(InputRangeObject)collection) { // now you can use irange if(collection.empty) // success! { ... } } BTW, since input range is the lowest level range, I'd recommend getItems return InputRangeObject instead of Object. Further more, since I'm 100% against class-based ranges, I would recommend not using them at all :) Use a struct instead, or don't use the range concept here. -Steve
Aug 17 2011
parent reply Mehrdad <wfunction hotmail.com> writes:
On 8/17/2011 7:14 AM, Steven Schveighoffer wrote:
 Casting is actually the correct solution.

 if(auto irange = cast(InputRangeObject)collection)
 {
    // now you can use irange
    if(collection.empty) // success!
    {
       ...
    }
 } 
The correct solution? It doesn't even compile. (See my last post, which was after the one you replied to.)
Aug 17 2011
parent reply "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Wed, 17 Aug 2011 10:56:02 -0400, Mehrdad <wfunction hotmail.com> wrote:

 On 8/17/2011 7:14 AM, Steven Schveighoffer wrote:
 Casting is actually the correct solution.

 if(auto irange = cast(InputRangeObject)collection)
 {
    // now you can use irange
    if(collection.empty) // success!
    {
       ...
    }
 }
The correct solution? It doesn't even compile. (See my last post, which was after the one you replied to.)
Oh, right, InputRangeObject is a template. Sorry, I forgot about that aspect. So actually, that isn't possible if you are returning Object, you need to return the correct InputRange(T) type. (in this case InputRange!int) Another good reason to avoid class-based ranges :) -Steve
Aug 17 2011
parent reply Mehrdad <wfunction hotmail.com> writes:
On 8/17/2011 8:26 AM, Steven Schveighoffer wrote:
 On Wed, 17 Aug 2011 10:56:02 -0400, Mehrdad <wfunction hotmail.com> 
 wrote:

 On 8/17/2011 7:14 AM, Steven Schveighoffer wrote:
 Casting is actually the correct solution.

 if(auto irange = cast(InputRangeObject)collection)
 {
    // now you can use irange
    if(collection.empty) // success!
    {
       ...
    }
 }
The correct solution? It doesn't even compile. (See my last post, which was after the one you replied to.)
Oh, right, InputRangeObject is a template. Sorry, I forgot about that aspect. So actually, that isn't possible if you are returning Object, you need to return the correct InputRange(T) type. (in this case InputRange!int) Another good reason to avoid class-based ranges :) -Steve
Er, if they aren't supported then please just remove them altogether... hasn't that been the philosophy so far?
Aug 17 2011
parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Wed, 17 Aug 2011 22:36:14 -0400, Mehrdad <wfunction hotmail.com> wrote:

 On 8/17/2011 8:26 AM, Steven Schveighoffer wrote:
 On Wed, 17 Aug 2011 10:56:02 -0400, Mehrdad <wfunction hotmail.com>  
 wrote:

 On 8/17/2011 7:14 AM, Steven Schveighoffer wrote:
 Casting is actually the correct solution.

 if(auto irange = cast(InputRangeObject)collection)
 {
    // now you can use irange
    if(collection.empty) // success!
    {
       ...
    }
 }
The correct solution? It doesn't even compile. (See my last post, which was after the one you replied to.)
Oh, right, InputRangeObject is a template. Sorry, I forgot about that aspect. So actually, that isn't possible if you are returning Object, you need to return the correct InputRange(T) type. (in this case InputRange!int) Another good reason to avoid class-based ranges :) -Steve
Er, if they aren't supported then please just remove them altogether... hasn't that been the philosophy so far?
It's not my call. My opinion differs from the others, especially Andrei. -Steve
Aug 17 2011
prev sibling next sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 8/16/11 11:05 PM, Jonathan M Davis wrote:
 Sorry that this is long, but it's very important IMHO, and I don't know how to
 make it much shorter and cover what it's supposed to cover.
[snip] Keep things as they are. Algorithms operate on ranges as specified in their signatures. If they need to create additional copies thereof, they use .save. If client code needs to pass a copy of a range to an algorithm, it passes .save. Andrei
Aug 16 2011
parent Jonathan M Davis <jmdavisProg gmx.com> writes:
On Tuesday, August 16, 2011 23:26:37 Andrei Alexandrescu wrote:
 On 8/16/11 11:05 PM, Jonathan M Davis wrote:
 Sorry that this is long, but it's very important IMHO, and I don't know
 how to make it much shorter and cover what it's supposed to cover.
[snip] Keep things as they are. Algorithms operate on ranges as specified in their signatures. If they need to create additional copies thereof, they use .save. If client code needs to pass a copy of a range to an algorithm, it passes .save.
I expect that the result of that is that reference type ranges aren't going to work a lot of the time. Now, how much of that is broken implementations and how much of it is actual design issues, I don't know. It's clear to me however that we need to start having unit tests for reference type ranges in Phobos (probably both of the struct and the class variety to be on the safe side) to make sure that functions at least work correctly when they're passed a reference type range, regardless of whether the range gets consumed in the process. I expect that we have quite a few bugs in Phobos stemming from the fact that pretty much all of the ranges that we test with (and that most people use at in general at this point) are value type ranges. - Jonathan M Davis
Aug 16 2011
prev sibling next sibling parent reply Peter Alexander <peter.alexander.au gmail.com> writes:
On 17/08/11 5:05 AM, Jonathan M Davis wrote:
 It was previously determined that this would be a problem for ranges which are
 reference types (classes in particular, but it affects structs as well, if
 copying them doesn't create an independent range). So, we added the save
 property.

 <snip>

 Thoughts?
Apologies for my ignorance, but I haven't really been following all this ranges stuff. I must be missing something, why would you ever expect an algorithm that works with value types to work with reference types as well?
Aug 17 2011
parent Jonathan M Davis <jmdavisProg gmx.com> writes:
On Wednesday, August 17, 2011 09:03:52 Peter Alexander wrote:
 On 17/08/11 5:05 AM, Jonathan M Davis wrote:
 It was previously determined that this would be a problem for ranges
 which are reference types (classes in particular, but it affects
 structs as well, if copying them doesn't create an independent range).
 So, we added the save property.
 
 <snip>
 
 Thoughts?
Apologies for my ignorance, but I haven't really been following all this ranges stuff. I must be missing something, why would you ever expect an algorithm that works with value types to work with reference types as well?
A range is any type which has the appropriate functions on it. It doesn't matter whether it's an array, a struct, or a class. And if it's a struct, it could be either a value type or a reference type. So, a range could be either a value type or a reference type. In the general case, you can't know without reading the code whether a particular range is a value type or a reference type (though obviously in the case of classes, you know that it's a reference type), and traits can't tell you whether a range is a value type or a reference type. So, range-based functions can't assume that a range is a value type, and they can't assume that a range is a reference type. This has nothing to do with the elements in the range mind you. It's purely a matter of the type of the range itself. So, in order to deal with the issue that auto rangeCopy = range; doesn't necessarily copy, save was introduced to make it so that you can guarantee that you're getting a copy auto rangeCopy = range.save; The issue that I'm bringing up is that you still get different behavior between value type and reference type ranges when you pass them to a function. The only way to guarantee the same behavior is to either call save before passing a range into a function or to call it once it's been passed in. In any case, essentially what it comes down to is that you have no idea in the general case whether a range is a value type or a range type, and you _have_ to code in a manner which works with both or you're going to end up with buggy code. - Jonathan M Davis
Aug 17 2011
prev sibling parent reply "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Wed, 17 Aug 2011 00:05:54 -0400, Jonathan M Davis <jmdavisProg gmx.com>  
wrote:

 So, the question is, should a range-based function have the same  
 behavior for
 all forward ranges regardless of whether they're value types or reference
 types? Or should the caller be aware of whether a range is a value type  
 or a
 reference type and call save if necessary? Or should the caller just  
 always
 call save when passing a forward range to a function?
Probably not helpful, since the establishment seems to be set in their opinions, but I'd recommend saying ranges are always structs, and get rid of the save concept, replacing it with an enum solution. The current save regime is a fallacy, because it's not enforced. It's as bad as c++ const. At the very least, let's wait until someone actually comes up with a valid use case for reference-based forward ranges before changing any code. So far, all I've seen is boilerplate *RangeObject, no real usages. -Steve
Aug 17 2011
parent reply "Lars T. Kyllingstad" <public kyllingen.NOSPAMnet> writes:
On Wed, 17 Aug 2011 10:19:31 -0400, Steven Schveighoffer wrote:

 On Wed, 17 Aug 2011 00:05:54 -0400, Jonathan M Davis
 <jmdavisProg gmx.com> wrote:
 
 So, the question is, should a range-based function have the same
 behavior for
 all forward ranges regardless of whether they're value types or
 reference types? Or should the caller be aware of whether a range is a
 value type or a
 reference type and call save if necessary? Or should the caller just
 always
 call save when passing a forward range to a function?
Probably not helpful, since the establishment seems to be set in their opinions, but I'd recommend saying ranges are always structs, and get rid of the save concept, replacing it with an enum solution. The current save regime is a fallacy, because it's not enforced. It's as bad as c++ const. At the very least, let's wait until someone actually comes up with a valid use case for reference-based forward ranges before changing any code. So far, all I've seen is boilerplate *RangeObject, no real usages.
As long as most functions in std.algorithm don't take the ranges as ref arguments, you need to use a reference-based range whenever you want the function to consume the original range. BTW, this is why I suggested earlier that we add a byRef range. If you absolutely want the function foo() to consume your range, write foo(byRef(myRange)); If you absolutely *don't* want the function to consume your range, write foo(myRange.save); If you don't intend to use the range afterwards, and therefore don't care whether it is consumed or not, write foo(myRange); -Lars
Aug 17 2011
parent reply "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Wed, 17 Aug 2011 13:15:27 -0400, Lars T. Kyllingstad  
<public kyllingen.nospamnet> wrote:

 On Wed, 17 Aug 2011 10:19:31 -0400, Steven Schveighoffer wrote:

 On Wed, 17 Aug 2011 00:05:54 -0400, Jonathan M Davis
 <jmdavisProg gmx.com> wrote:

 So, the question is, should a range-based function have the same
 behavior for
 all forward ranges regardless of whether they're value types or
 reference types? Or should the caller be aware of whether a range is a
 value type or a
 reference type and call save if necessary? Or should the caller just
 always
 call save when passing a forward range to a function?
Probably not helpful, since the establishment seems to be set in their opinions, but I'd recommend saying ranges are always structs, and get rid of the save concept, replacing it with an enum solution. The current save regime is a fallacy, because it's not enforced. It's as bad as c++ const. At the very least, let's wait until someone actually comes up with a valid use case for reference-based forward ranges before changing any code. So far, all I've seen is boilerplate *RangeObject, no real usages.
As long as most functions in std.algorithm don't take the ranges as ref arguments, you need to use a reference-based range whenever you want the function to consume the original range. BTW, this is why I suggested earlier that we add a byRef range. If you absolutely want the function foo() to consume your range, write foo(byRef(myRange));
Do you have a real example besides foo which makes sense on both byRef and by value ranges? I think it's rather important for the function implementation to know what's happening with its range while using it. -Steve
Aug 17 2011
parent reply "Lars T. Kyllingstad" <public kyllingen.NOSPAMnet> writes:
On Wed, 17 Aug 2011 14:15:52 -0400, Steven Schveighoffer wrote:

 On Wed, 17 Aug 2011 13:15:27 -0400, Lars T. Kyllingstad
 <public kyllingen.nospamnet> wrote:
 
 On Wed, 17 Aug 2011 10:19:31 -0400, Steven Schveighoffer wrote:

 On Wed, 17 Aug 2011 00:05:54 -0400, Jonathan M Davis
 <jmdavisProg gmx.com> wrote:

 So, the question is, should a range-based function have the same
 behavior for
 all forward ranges regardless of whether they're value types or
 reference types? Or should the caller be aware of whether a range is
 a value type or a
 reference type and call save if necessary? Or should the caller just
 always
 call save when passing a forward range to a function?
Probably not helpful, since the establishment seems to be set in their opinions, but I'd recommend saying ranges are always structs, and get rid of the save concept, replacing it with an enum solution. The current save regime is a fallacy, because it's not enforced. It's as bad as c++ const. At the very least, let's wait until someone actually comes up with a valid use case for reference-based forward ranges before changing any code. So far, all I've seen is boilerplate *RangeObject, no real usages.
As long as most functions in std.algorithm don't take the ranges as ref arguments, you need to use a reference-based range whenever you want the function to consume the original range. BTW, this is why I suggested earlier that we add a byRef range. If you absolutely want the function foo() to consume your range, write foo(byRef(myRange));
Do you have a real example besides foo which makes sense on both byRef and by value ranges?
Well, I did try my hand at writing a parser for a wiki-style markup language a while ago, which got its input from an input range. It would look at the front of the range, determine what kind of element was there (paragraph, heading, bullet list, etc.), and pass the range on to a specialised function for dealing with that kind of element (parseHeading(), etc.). Of course, those functions had to consume the original range, otherwise the same element would be repeated over and over again. For simple cases, this was only a matter of parseWhatever() taking the range by ref, and everything would work nicely. Sometimes, however, the range would be wrapped by another range (such as Take or Until). If I wanted these to keep consuming the original range, I had to wrap it with byRef(). This happened often enough, and became annoying enough, that I ended up using InputRange objects everywhere instead. -Lars
Aug 17 2011
parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Wed, 17 Aug 2011 14:53:53 -0400, Lars T. Kyllingstad  
<public kyllingen.nospamnet> wrote:

 On Wed, 17 Aug 2011 14:15:52 -0400, Steven Schveighoffer wrote:

 On Wed, 17 Aug 2011 13:15:27 -0400, Lars T. Kyllingstad
 <public kyllingen.nospamnet> wrote:

 On Wed, 17 Aug 2011 10:19:31 -0400, Steven Schveighoffer wrote:

 On Wed, 17 Aug 2011 00:05:54 -0400, Jonathan M Davis
 <jmdavisProg gmx.com> wrote:

 So, the question is, should a range-based function have the same
 behavior for
 all forward ranges regardless of whether they're value types or
 reference types? Or should the caller be aware of whether a range is
 a value type or a
 reference type and call save if necessary? Or should the caller just
 always
 call save when passing a forward range to a function?
Probably not helpful, since the establishment seems to be set in their opinions, but I'd recommend saying ranges are always structs, and get rid of the save concept, replacing it with an enum solution. The current save regime is a fallacy, because it's not enforced. It's as bad as c++ const. At the very least, let's wait until someone actually comes up with a valid use case for reference-based forward ranges before changing any code. So far, all I've seen is boilerplate *RangeObject, no real usages.
As long as most functions in std.algorithm don't take the ranges as ref arguments, you need to use a reference-based range whenever you want the function to consume the original range. BTW, this is why I suggested earlier that we add a byRef range. If you absolutely want the function foo() to consume your range, write foo(byRef(myRange));
Do you have a real example besides foo which makes sense on both byRef and by value ranges?
Well, I did try my hand at writing a parser for a wiki-style markup language a while ago, which got its input from an input range. It would look at the front of the range, determine what kind of element was there (paragraph, heading, bullet list, etc.), and pass the range on to a specialised function for dealing with that kind of element (parseHeading(), etc.). Of course, those functions had to consume the original range, otherwise the same element would be repeated over and over again. For simple cases, this was only a matter of parseWhatever() taking the range by ref, and everything would work nicely. Sometimes, however, the range would be wrapped by another range (such as Take or Until). If I wanted these to keep consuming the original range, I had to wrap it with byRef(). This happened often enough, and became annoying enough, that I ended up using InputRange objects everywhere instead
The problem here seems to be that an input range is used as the base of a forward range. A forward range is much different than an input range, in that an input range destroys the data as it iterates, whereas the forward range does not. I would say that anything that is forward range or above should never be a reference type, but anything that is strictly an input range *should* actually be a reference type (hey, I switched opinions!). The issue is that all forward ranges are input ranges. Note that while I asked for a real example of an *algorithm*, you gave me an example of a *type* that doesn't forwar the desired behavior. I see the point however, and I think a different style of thinking is needed. That is, accepting a ref range is not guaranteed to make any range into a destructive input range, you do need a byRef range. In any case, I think I found a more straightforward example of something that can accept either: walkLength. walkLength doesn't care whether the data gets destroyed or not, it's just counting stuff. So there is legitimate reason for something to accept both an input and forward+ range. So what I think we need is one more isXRange to determine "is this an input range and *only* an input range?" That is, is this a *destructive* input range. In the current implementation, this would mean isInputRange!R && !isForwardRange!R I still dislike save and how useless it is, though. -Steve
Aug 17 2011