www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - Difference between range `save` and copy constructor

reply uranuz <neuranuz gmail.com> writes:
I am interested in current circumstances when we have new copy 
constructor feature what is the purpose of having range `save` 
primitive? For me they look like doing basicaly the same thing. 
And when looking in some source code of `range` module the most 
common thing that `save` does is that it use constructor 
typeof(this) to create a new instance and use `save` on the 
source range:
https://github.com/dlang/phobos/blob/v2.090.1/std/range/package.d

So what is conceptual difference between `save` and copy 
contructor of range?
Feb 15 2020
parent reply Steven Schveighoffer <schveiguy gmail.com> writes:
On 2/15/20 5:53 AM, uranuz wrote:
 I am interested in current circumstances when we have new copy 
 constructor feature what is the purpose of having range `save` 
 primitive? For me they look like doing basicaly the same thing. And when 
 looking in some source code of `range` module the most common thing that 
 `save` does is that it use constructor typeof(this) to create a new 
 instance and use `save` on the source range:
 https://github.com/dlang/phobos/blob/v2.090.1/std/range/package.d
 
 So what is conceptual difference between `save` and copy contructor of 
 range?
Nothing. IMO, any time you are doing anything in save other than `return this;`, you shouldn't have implemented it. The original impetus for the save requirement was so that forward ranges could have a tangible checkable thing that allows introspection (does the range have a save method?). I'm not entirely sure if disabled postblit was even available at the time. The correct way to do it would be to treat ranges that can be copied (regardless of whether they have a copy constructor) as forward ranges, and treat ones that cannot be copied as input ranges. But it's hard to redo ranges like this with all existing code out there. -Steve
Feb 15 2020
parent reply Jonathan M Davis <newsgroup.d jmdavisprog.com> writes:
On Saturday, February 15, 2020 7:34:42 AM MST Steven Schveighoffer via 
Digitalmars-d-learn wrote:
 On 2/15/20 5:53 AM, uranuz wrote:
 I am interested in current circumstances when we have new copy
 constructor feature what is the purpose of having range `save`
 primitive? For me they look like doing basicaly the same thing. And when
 looking in some source code of `range` module the most common thing that
 `save` does is that it use constructor typeof(this) to create a new
 instance and use `save` on the source range:
 https://github.com/dlang/phobos/blob/v2.090.1/std/range/package.d

 So what is conceptual difference between `save` and copy contructor of
 range?
Nothing. IMO, any time you are doing anything in save other than `return this;`, you shouldn't have implemented it. The original impetus for the save requirement was so that forward ranges could have a tangible checkable thing that allows introspection (does the range have a save method?). I'm not entirely sure if disabled postblit was even available at the time. The correct way to do it would be to treat ranges that can be copied (regardless of whether they have a copy constructor) as forward ranges, and treat ones that cannot be copied as input ranges. But it's hard to redo ranges like this with all existing code out there.
Actually, as I understand it, the main reason that save was introduced was so that classes could be forward ranges. While it would be possible to use the postblit constructor or copy constructor with structs, that obviously won't work for classes - hence when save is required. Personally, I think that we'd be better of simply requiring that forward rangse be copyable and force classes that want to be forward ranges to be wrapped by structs, but that would require reworking the range API, and it's far from a trivial change. In practice though, classes should almost never be used as forward ranges, because calling save on them would requires allocating a now object, and that gets expensive fast. As part of testing dxml, I tested it with forward ranges that were classes in order to make sure that they were handled correctly, and their performance was absolutely terrible in comparison to ranges that were structs or strings. - Jonathan M Davis
Feb 15 2020
next sibling parent Steven Schveighoffer <schveiguy gmail.com> writes:
On 2/15/20 9:45 AM, Jonathan M Davis wrote:
 On Saturday, February 15, 2020 7:34:42 AM MST Steven Schveighoffer via
 Digitalmars-d-learn wrote:
 On 2/15/20 5:53 AM, uranuz wrote:
 I am interested in current circumstances when we have new copy
 constructor feature what is the purpose of having range `save`
 primitive? For me they look like doing basicaly the same thing. And when
 looking in some source code of `range` module the most common thing that
 `save` does is that it use constructor typeof(this) to create a new
 instance and use `save` on the source range:
 https://github.com/dlang/phobos/blob/v2.090.1/std/range/package.d

 So what is conceptual difference between `save` and copy contructor of
 range?
Nothing. IMO, any time you are doing anything in save other than `return this;`, you shouldn't have implemented it. The original impetus for the save requirement was so that forward ranges could have a tangible checkable thing that allows introspection (does the range have a save method?). I'm not entirely sure if disabled postblit was even available at the time. The correct way to do it would be to treat ranges that can be copied (regardless of whether they have a copy constructor) as forward ranges, and treat ones that cannot be copied as input ranges. But it's hard to redo ranges like this with all existing code out there.
Actually, as I understand it, the main reason that save was introduced was so that classes could be forward ranges. While it would be possible to use the postblit constructor or copy constructor with structs, that obviously won't work for classes - hence when save is required.
I remember the discussions as being about how an actual implementation detail was required, not just a mark. I remember a suggestion for just putting an enum isForward member in the range being rejected because of this. Except people don't call save. They just copy, and it works for nearly all forward ranges in existence. And a class allocating a new class for saving a forward range is a mislabeled "cheap" operation IMO.
 Personally, I think that we'd be better of simply requiring that forward
 rangse be copyable and force classes that want to be forward ranges to be
 wrapped by structs, but that would require reworking the range API, and it's
 far from a trivial change.
Yep.
 In practice though, classes should almost never be used as forward ranges,
 because calling save on them would requires allocating a now object, and
 that gets expensive fast. As part of testing dxml, I tested it with forward
 ranges that were classes in order to make sure that they were handled
 correctly, and their performance was absolutely terrible in comparison to
 ranges that were structs or strings.
Not surprised ;) -Steve
Feb 15 2020
prev sibling next sibling parent uranuz <neuranuz gmail.com> writes:
 Actually, as I understand it, the main reason that save was 
 introduced was so that classes could be forward ranges
I have use of ranges as a classes in my code that rely on classes and polymorthism, but it's usually an InputRange that implements Phobos interface: https://dlang.org/phobos/std_range_interfaces.html#.InputRange I have virtual opSlice operator that returns InputRange. And sometimes implementation of range is very different. So it's difficult to write one range as a struct. I have a pattern in my code that looks like the following: interface IContainer { InputRange opSlice(); } class MyContainer1: IContainer { class Range1: InputRange { //... one implementation } override InputRange opSlice() { return new Range1(this); } } class MyContainer2: IContainer { class Range2: InputRange { //... another implementation } override InputRange opSlice() { return new Range2(this); } } In this example I need a range to be a class, but not a struct. Another problemme is that `copy contructor` is defined only for structs, but not classes. For the class that uses another class instance of the `same` type to initialize from it would be a regular constructor with parameter. A copy constructor in struct semantics requires that the source would be actually `the same` type. But for classes source could be instance of another class that is inherited from current class. And we cannot prove statically that it's actually the same type. And also if we talk about range interface constructor cannot be a part of it. So we cannot add `copy contructor` (if we would have it for class) to interface and check for it's presence in generic code. So here we have this workaround with `save` method... I don't like that primitive concept has two ways to do the same thing. And it's unclear what is the primary way of doing this (copy constructor or save). It introduce the situation when half of the code would require range being copyable, but another part would require it to to have a save method. Not the situation is that there are a lot of algorothms in Phobos that are not working with ranges that have disabled postblit, but have `save` method that could be used to make a copy. Still I want to be able to create ranges as classes...
Feb 16 2020
prev sibling next sibling parent uranuz <neuranuz gmail.com> writes:
Also I see the problemme that someone can think that it creates 
an input range, because he doesn't provide `save` method, but 
actually it creates forward range unexpectedly, because it is 
copyable. And it makes what is actually happening in code more 
difficult. Some algrorithm can take ranges by value, but others 
take them by reference. So result can be completely different. In 
first case range is being consumed, but in another in is not. 
Personally I prefer to take range by reference in all of my 
algrorithms except cases where I is always a class (because it's 
a reference already). But I still don't know what is the right 
way. There are no official guidelines about it. So every time 
it's a problemme. Although it looks like that range is a simple 
concept, but it's actually not.
Feb 16 2020
prev sibling parent reply uranuz <neuranuz gmail.com> writes:
I have reread presentation:
http://dconf.org/2015/talks/davis.pdf
We declare that `pure` input range cannot be `unpoped` and we 
can't return to the previous position of it later at the time. So 
logically there is no sence of copying input range at all. So 
every Phobos algorithm that declares that it's is working with 
InputRange must be tested for working with range with disabled 
copy constructor and postblit. And if it is not it means that 
this algroithm actually requires a forward range and there we 
missing `save` calls?
Because as it was written in this presentation a range copy is 
undefined (without call to save). So it's illegal to create copy 
of range in Phobos algorithms without `save`?
So we need a test for every algorithm that it is working with 
range with disabled copy constructor and postblit if we declare 
that we only use `save` for range copy?
Feb 16 2020
parent reply Jonathan M Davis <newsgroup.d jmdavisprog.com> writes:
On Sunday, February 16, 2020 3:41:31 AM MST uranuz via Digitalmars-d-learn 
wrote:
 I have reread presentation:
 http://dconf.org/2015/talks/davis.pdf
 We declare that `pure` input range cannot be `unpoped` and we
 can't return to the previous position of it later at the time. So
 logically there is no sence of copying input range at all. So
 every Phobos algorithm that declares that it's is working with
 InputRange must be tested for working with range with disabled
 copy constructor and postblit. And if it is not it means that
 this algroithm actually requires a forward range and there we
 missing `save` calls?
 Because as it was written in this presentation a range copy is
 undefined (without call to save). So it's illegal to create copy
 of range in Phobos algorithms without `save`?
 So we need a test for every algorithm that it is working with
 range with disabled copy constructor and postblit if we declare
 that we only use `save` for range copy?
A range that can't be copied is basically useless. Not only do almost all range-based algorithms take their argumenst by value (and thus copy them), but foreach copies any range that it's given, meaning that if a range isn't copyable, you can't even use it with foreach. And since many range-based algorithms function by wrapping one range with another, the ability to copy ranges is fundamental to most range-based code. That being said, the semantics of copying a range are not actually defined by the range API. Whether iterating over a copy affects the original depends on how a range was implemented. e.g. In code such as void foo(R)(R r) if(isInputRange!R) { r.popFront(); } foo(range); whether the range in the original range in the calling code is affected by the element being popped from the copy inside of foo is implementation dependent. If it's a class or a struct that's a full-on reference type, then mutating the copy does affect the original, whereas if copying a range is equivalent to save, then mutating the copy has no effect on the original. And with pseudo-reference types, it's even worse, because you could end up _partially_ mutating the original by mutating the copy, meaning that you can get some pretty serious bugs if you attempt to use a range after it's been copied. This means that in practice, in generic code, you can never use a range once it's been copied unless you overwrite it with a new value. Passing a range to a function or using it with foreach basically means that you should not continue to use that range, and if you want to be able to continue to use it, you need to call save and pass that copy to the function or foreach instead of passing the range directly to a function or foreach. In order to fix it so that you can rely on the semantics of using a range after it's been copied, we'd have to rework the range API and make it so that the semantics of copying a range were well-defined, and that gets into a huge discussion on its own. As things stand, if you want to test range-based code to ensure that it works correctly (including calling save correctly), you have to test it with a variety of different range types, including both ranges where copying is equivalent to calling save and ranges which are reference types so that copying them simply results in another reference to the same data such that iterating one copy iterates all copies. - Jonathan M Davis
Feb 16 2020
next sibling parent reply uranuz <neuranuz gmail.com> writes:
It's very bad. Because there seem that when I use range based 
algorithm I need to take two things into account. The first is 
how algrorithm is implemented. If it creates copies of range 
inside or pass it by reference. And the second is how the range 
is implemented if it has value or reference semantics. So every 
time I need to look into implementation and I can't rely on API 
description in most of the cases. In a lot of cases Phobos uses 
value semantics. But there are cases where I want the range 
actually be consumed, but it's not. And the other problemme is 
when algorithm expects range to have value semantics, but it's 
not. So it's a buggy mess that it's hard to think about. In 
trivial cases this is working although. But in more complex cases 
it's simplier to implement some algorithms by own hands so that 
it would work as I expect it myself rather that thinking about 
all these value-ref-range mess. But still can't say that I 
implement it correctly, because range specification actually 
sucks as yo say.
It's just horrible
Feb 16 2020
parent Jonathan M Davis <newsgroup.d jmdavisprog.com> writes:
On Sunday, February 16, 2020 6:52:17 AM MST uranuz via Digitalmars-d-learn 
wrote:
 It's very bad. Because there seem that when I use range based
 algorithm I need to take two things into account. The first is
 how algrorithm is implemented. If it creates copies of range
 inside or pass it by reference. And the second is how the range
 is implemented if it has value or reference semantics. So every
 time I need to look into implementation and I can't rely on API
 description in most of the cases. In a lot of cases Phobos uses
 value semantics. But there are cases where I want the range
 actually be consumed, but it's not. And the other problemme is
 when algorithm expects range to have value semantics, but it's
 not. So it's a buggy mess that it's hard to think about. In
 trivial cases this is working although. But in more complex cases
 it's simplier to implement some algorithms by own hands so that
 it would work as I expect it myself rather that thinking about
 all these value-ref-range mess. But still can't say that I
 implement it correctly, because range specification actually
 sucks as yo say.
 It's just horrible
The current situation is definitely not ideal, but it can be used quite consistently. Just follow the rule that once you copy a range, you do not use that range ever again unless you assign it a new value. So, if you do something like auto result = r.find(e); then you should not be using r again unless you assign it a new value. e.g. r = r.find(e); r.popFront(); auto result = r.find(e); r = otherRange; r.popFront(); would be fine, because r was assigned a new value, whereas auto result = r.find(e); r.popFront(); should never happen in your code unless you know that typeof(r) is a type where copying it is equivalent to calling save on it. In generic code, that means that you should never use a range again after passing it to a function unless you assign it a new value, or that function accepted the argument by ref (which almost no range-based functions do). If you want to be able to use the range again after passing it to a function without assigning it a new value, then you need to call save so that you pass an independent copy. e.g. auto result = r.find.save(e); r.popFront(); The only copying going on here is with save, so there's no problem, whereas if r were passed to find directly, the behavior is implementation-dependent - hence why you should not be using a range after it's been copied (which includes passing it to a function). The only time that generic code can reuse a range after passing it to a function is if that function accepts its argument by ref, which almost no range-based code does. Far more frequently, it returns the result as a wrapper range, in which case, you can't use the original range any longer unless you used save when calling the function or if the code is not generic and you're coding based on the specific behavior of that particular range type - which usually isn't something that code should be doing. By no means do I claim that the status quo here is desirable, but if you just follow the simple rule that you don't ever use a range once it's been copied (unless that copy came from save), then you shouldn't run into problems related to the fact that different ranges have different copying semantics unless the function that you're calling is buggy. If you're going to run into a bug along those lines though, it's likely going to be because a function didn't call save when it was supposed to, and it was only tested with range types where copying them is equivalent to save. That's why it's important to test range-based code with both range types where copying them is equivalent to save and range types which are full-on reference types (and thus copying just results in another reference). In general though, any range that is a forward range should have copying it be equivalent to save, and using reference types for forward ranges tends to be inefficient and error-prone even if range-based functions (especially those in Phobos) should be able to handle them correctly. - Jonathan M Davis
Feb 16 2020
prev sibling next sibling parent uranuz <neuranuz gmail.com> writes:
In general for value-semantics and ref-semantics the different 
code is actually needed. But generic algorithm try to pretend 
that the logic is the same. But it's not true. But in wide subset 
of trivial algorithm it's true. So it's incorrectly interpolated 
that it's true for every case. The very bad thing if range is 
passed by value it still can have value or reference semantic. 
And algorithm cannot say which is it actually. There is not such 
problemme for classes. So as I already said when passing ranges 
by ref in algorithms they behave predictible. And if I want 
algrorithm to operate on copy of algorithm then I can just create 
this copy before passing it to this algorithm. And again 
intention is more clear. But Phobos algorithms don't work like 
that. It's why I can't use them in some cases, because they are 
looking unpredictable for me.
Feb 16 2020
prev sibling parent reply uranuz <neuranuz gmail.com> writes:
On Sunday, 16 February 2020 at 12:38:51 UTC, Jonathan M Davis 
wrote:
 On Sunday, February 16, 2020 3:41:31 AM MST uranuz via 
 Digitalmars-d-learn wrote:
 I have reread presentation:
 http://dconf.org/2015/talks/davis.pdf
 We declare that `pure` input range cannot be `unpoped` and we
 can't return to the previous position of it later at the time. 
 So
 logically there is no sence of copying input range at all. So
 every Phobos algorithm that declares that it's is working with
 InputRange must be tested for working with range with disabled
A range that can't be copied is basically useless. Not only do almost all range-based algorithms take their argumenst by value (and thus copy them), but foreach copies any range that it's given, meaning that if a range isn't copyable, you can't even use it with foreach. And since many range-based algorithms function by wrapping one range with another, the ability to copy ranges is fundamental to most range-based code.
This is working fine with disabled postblit... import std; struct SS { disable this(this); // Disabled copy bool _empty = false; bool empty() property { return _empty; } void popFront() { _empty = true; } int front() property { return 10; } } void main() { foreach( it; SS() ) { writeln(it); } } Am I missing something?
Feb 16 2020
parent reply Jonathan M Davis <newsgroup.d jmdavisprog.com> writes:
On Sunday, February 16, 2020 7:29:11 AM MST uranuz via Digitalmars-d-learn 
wrote:
 On Sunday, 16 February 2020 at 12:38:51 UTC, Jonathan M Davis

 wrote:
 On Sunday, February 16, 2020 3:41:31 AM MST uranuz via

 Digitalmars-d-learn wrote:
 I have reread presentation:
 http://dconf.org/2015/talks/davis.pdf
 We declare that `pure` input range cannot be `unpoped` and we
 can't return to the previous position of it later at the time.
 So
 logically there is no sence of copying input range at all. So
 every Phobos algorithm that declares that it's is working with
 InputRange must be tested for working with range with disabled
A range that can't be copied is basically useless. Not only do almost all range-based algorithms take their argumenst by value (and thus copy them), but foreach copies any range that it's given, meaning that if a range isn't copyable, you can't even use it with foreach. And since many range-based algorithms function by wrapping one range with another, the ability to copy ranges is fundamental to most range-based code.
This is working fine with disabled postblit... import std; struct SS { disable this(this); // Disabled copy bool _empty = false; bool empty() property { return _empty; } void popFront() { _empty = true; } int front() property { return 10; } } void main() { foreach( it; SS() ) { writeln(it); } } Am I missing something?
That code compiles, because you're passing a temporary to foreach. So, the compiler does a move instead of a copy. It's the difference between auto ss = SS(); and SS ss; auto ss2 = ss; If your main were void main() { SS ss; foreach( it; ss ) { writeln(it); } } then it would not compile. foreach(e; range) {...} basically gets lowered to for(auto __r = range; !__r.empty; __r.popFront()) { auto e = __r.front; ... } So, foreach( it; SS() ) { writeln(it); } would become for(auto __r = SS(); !__r.empty; __r.popFront()) { auto it = __r.front; writeln(it); } whereas SS ss; foreach( it; ss ) { writeln(it); } would become SS ss; for(auto __r = ss; !__r.empty; __r.popFront()) { auto it = __r.front; writeln(it); } - Jonathan M Davis
Feb 16 2020
parent reply Paul Backus <snarwin gmail.com> writes:
On Sunday, 16 February 2020 at 17:10:24 UTC, Jonathan M Davis 
wrote:
 On Sunday, February 16, 2020 7:29:11 AM MST uranuz via
 This is working fine with disabled postblit...
 import std;

 struct SS
 {
       disable this(this); // Disabled copy

      bool _empty = false;

      bool empty()  property {
          return _empty;
      }

      void popFront() {
         _empty = true;
      }

      int front()  property { return 10; }
 }


 void main()
 {
      foreach( it; SS() ) { writeln(it); }
 }

 Am I missing something?
That code compiles, because you're passing a temporary to foreach. So, the compiler does a move instead of a copy. It's the difference between auto ss = SS(); and SS ss; auto ss2 = ss; If your main were void main() { SS ss; foreach( it; ss ) { writeln(it); } } then it would not compile.
On the other hand, this does work: void main() { SS ss; foreach( it; move(ss) ) { writeln(it); } } So perhaps the correct approach is to use `move` when copying input ranges.
Feb 16 2020
parent reply Jonathan M Davis <newsgroup.d jmdavisprog.com> writes:
On Sunday, February 16, 2020 10:53:36 AM MST Paul Backus via Digitalmars-d-
learn wrote:
 On Sunday, 16 February 2020 at 17:10:24 UTC, Jonathan M Davis

 wrote:
 On Sunday, February 16, 2020 7:29:11 AM MST uranuz via

 This is working fine with disabled postblit...
 import std;

 struct SS
 {

       disable this(this); // Disabled copy

      bool _empty = false;

      bool empty()  property {

          return _empty;

      }

      void popFront() {

         _empty = true;

      }

      int front()  property { return 10; }

 }


 void main()
 {

      foreach( it; SS() ) { writeln(it); }

 }

 Am I missing something?
That code compiles, because you're passing a temporary to foreach. So, the compiler does a move instead of a copy. It's the difference between auto ss = SS(); and SS ss; auto ss2 = ss; If your main were void main() { SS ss; foreach( it; ss ) { writeln(it); } } then it would not compile.
On the other hand, this does work: void main() { SS ss; foreach( it; move(ss) ) { writeln(it); } } So perhaps the correct approach is to use `move` when copying input ranges.
Given that the way that almost all range-based functions work is to copy the range that they're given (often then wrapping it in another range that's returned), I don't see how it would make sense to use move outside of very specific circumstances. If you pass a range to a function, and it's a basic input range, then you just use the range via the return value (be it the same range returned directly or returned within a wraper range), and if it's a forward range, you call save before passing it to the function if you want to be able to use the range directly again. Either way, generic code should never be using a range after it's been copied, and copying is a key part of how idiomatic, range-based code works in D. And really, using move just to be able to use an uncopyable range with foreach doesn't make a lot of sense, since if that's what you want to do, you can always just use a normal for loop. Regardless, there isn't much point in declaring a range type that can't be copied, since it's pretty much only going to work with code that you write. - Jonathan M Davis
Feb 16 2020
parent reply Paul Backus <snarwin gmail.com> writes:
On Sunday, 16 February 2020 at 18:11:11 UTC, Jonathan M Davis 
wrote:
 Either way, generic code should never be using a range after 
 it's been copied, and copying is a key part of how idiomatic, 
 range-based code works in D.
"Copy and then never use the original again" is conceptually the same thing as "move", right? In which case, generic code can accommodate non-copyable ranges *and* more clearly communicate its intent by using `move` instead of a naked copy.
Feb 16 2020
parent reply Jonathan M Davis <newsgroup.d jmdavisprog.com> writes:
On Sunday, February 16, 2020 12:22:01 PM MST Paul Backus via Digitalmars-d-
learn wrote:
 On Sunday, 16 February 2020 at 18:11:11 UTC, Jonathan M Davis

 wrote:
 Either way, generic code should never be using a range after
 it's been copied, and copying is a key part of how idiomatic,
 range-based code works in D.
"Copy and then never use the original again" is conceptually the same thing as "move", right? In which case, generic code can accommodate non-copyable ranges *and* more clearly communicate its intent by using `move` instead of a naked copy.
We already have enough of a mess with save without making things even worse by trying to add moves into the mix. Also, non-copyable ranges have never really been a thing, and I really don't want to see things complicated even further trying to support such an uncommon use case. There are too many weird corner cases with ranges as it is. - Jonathan M Davis
Feb 16 2020
parent uranuz <neuranuz gmail.com> writes:
 Either way, generic code should never be using a range after 
 it's been copied, and copying is a key part of how 
 idiomatic, range-based code works in D.
OK. Thanks for instructions. I shall give it a try.
Feb 17 2020