www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - isInputRange is false for non-copyable types

reply Shachar Shemesh <shachar weka.io> writes:
import std.range;
import std.stdio;

struct S {
     int a;

      disable this(this);
}

void main() {
     writeln(isInputRange!(S[])); // Prints false
}

The reason, as far as I can tell, is that an input range requires that 
we can do:

auto a = ir.front; // Assuming ir isn't empty

That doesn't work for non-copyable types, for obvious reasons.

With that said, that seems more like a deficiency in the input range 
definition than anything. There is no reason we shouldn't be able to 
iterator type operations over non-copyable types.

Shachar
Apr 14 2018
parent reply Jonathan M Davis <newsgroup.d jmdavisprog.com> writes:
On Sunday, April 15, 2018 07:26:54 Shachar Shemesh via Digitalmars-d wrote:
 import std.range;
 import std.stdio;

 struct S {
      int a;

       disable this(this);
 }

 void main() {
      writeln(isInputRange!(S[])); // Prints false
 }

 The reason, as far as I can tell, is that an input range requires that
 we can do:

 auto a = ir.front; // Assuming ir isn't empty

 That doesn't work for non-copyable types, for obvious reasons.

 With that said, that seems more like a deficiency in the input range
 definition than anything. There is no reason we shouldn't be able to
 iterator type operations over non-copyable types.
It's extremely common for range-based functions to copy front. Even foreach does it. e.g. import std.stdio; struct S { this(int i) { _i = i; } this(this) { writefln("copied: %s", _i); } int _i; } void main() { foreach(e; [S(0), S(1), S(2)]) { } } prints copied: 0 copied: 1 copied: 2 If non-copyable types were allowed, then no function which accepted a range would be allowed to copy front without first checking that front was copyable (something that most code simply isn't going to do), and some functions outright require being able to copy front. They could test for copyability, but that would complicate a lot of range-based functions for a relatively rare corner case (much as it obviously matters to any code that has that corner case). Remember also that a large percentage of ranges that don't wrap dynamic arrays put their elements on the stack, which means that whenever the range is copied, the elements are copied. e.g. if the above example uses std.range.only instead of an array literal, it actually prints copied: 0 copied: 1 copied: 2 copied: 0 copied: 1 copied: 2 copied: 0 copied: 1 copied: 2 I'm not sure why it ends up printing each of them 3 times rather than 2, but the fact that foreach copies any range that it's given means that in order to have ranges allow non-copyable types, we'd have to change how foreach worked, which would break a lot of code. Right now, foreach(e; range) { } is lowered to something like for(auto __range = range; !__range.empty; __range.popFront()) { auto e = __range.front; } That requires both copying the range and copying front. We could theoretically change it so that everywhere that e was used, it would be replaced with __range.front so that front wasn't copied and probably wouldn't break code in the process (though it could affect performance), but if we made it so that the range itself wasn't copied, then instead of iterating over a copy, foreach would consume the original range, which would definitely break a lot of code. Now, generic range-based code really shouldn't ever use a range after it's been copied, since whether mutating the copy affects the original depends on the implementation of the range (and thus generic code should assume that foreach may have consumed the range and call save if it doesn't want that to happen), but lots of code does it anyway, and if the code isn't generic, it can work just fine, since then the code can depend on the semantics of that specific range type instead of having to work with the semantics of arbitrary ranges. e.g. if code is written specifically for dynamic arrays rather than ranges in general, then calling save to iterate over a separate slice would be silly, but if the code doesn't currently call save, and foreach were changed to not copy the range, then all of a sudden code like string str = "hello world"; foreach(c; str) { } would consume the dynamic array instead of iterating over a separate slice. Also, in order for a range to work with a non-copyable type, either front would have to return by ref or it would have to construct a new object to return so that it could be moved rather than copied. I expect that quite a few ranges would have problems with such a restriction, and certainly, even if a range could have its front changed to return by ref, the range would no longer be able to guarantee that the caller couldn't mutate the value of front, which would cause problems in at least some cases. In addition, it's quite possible that forcing functions to not copy front would hurt performance. In some cases, by using front directly, you can avoid a copy (e.g. it returns by ref), but in others, the value of front is calculated every time it's called. So, depending on the implementation of the range, calling front repeatedly can be more expensive than simply copying the value and reusing it. As such, it's not necessarily a good idea to force range-based code to not copy front. Also, it's not uncommon for ranges to store the value of front (e.g. it's calculated every time that popFront is called), and that wouldn't work with non-copyable types. Also, if such ranges were changed to calculate the value in front to then work with non-copyable types, whatever calculation they were doing would then have to be done every time front was called instead of being guaranteed to only happen once, which would harm performance in many cases. So, while aside from the issues with foreach, some specific cases might work with non-copyable types, it causes a lot of complications for ranges in general, and best case, we'd end up with most existing range-based code failing to compile with non-copyable types, and we'd have to go to a bunch of extra work in Phobos to either rework stuff so that it would work with non-copyable types or make it test for copyability. Either way, most range-based code outside of Phobos would almost certainly continue to fail to compile with non-copyable types. We have enough problems already because of save frequently not being used correctly and how badly reference type ranges interact with everything because of that. Having to support non-copyable types would then add yet one more thing that many range implementations would fail miserably at and that solid range implementations would have to test for. And of course, I don't see how we could support non-copyable types even if we wanted to because of the issues with foreach. Just like with save, we could take a different approach if we didn't care about breaking code, but there's no way that we could make a change like that unless we could find a clean deprecation path to change the behavior, and I have no clue how we could do that with foreach. Either way, a really compelling case would have to be made as to why all of the changes required to support non-copyable types would be worth it, especially considering that this would affect foreach rather than just a stray function or two in Phobos and as such would potentially affect almost every D program in existence. IIRC, when this has come up previously, Andrei ruled that supporting non-copyable types simply wasn't worth the extra complication, though a quick search doesn't turn up anything where he talked about it. So, I can't verify that at the moment. - Jonathan M Davis
Apr 14 2018
next sibling parent reply Dmitry Olshansky <dmitry.olsh gmail.com> writes:
On Sunday, 15 April 2018 at 06:39:43 UTC, Jonathan M Davis wrote:
 On Sunday, April 15, 2018 07:26:54 Shachar Shemesh via 
 Digitalmars-d wrote:
 [...]
It's extremely common for range-based functions to copy front. Even foreach does it. e.g. [...]
Arguably it should “move” them. This would have worked if input ranges didn’t have to cache front to support multiple ‘front’ calls before popFront. Which is a painful misfeature really as all algorithms would optimize for touching front once anyway. Input range should “produce” elements not keep cache of them, forward ranges may cache current item alright.
 [...]
Apr 15 2018
next sibling parent Dmitry Olshansky <dmitry.olsh gmail.com> writes:
On Sunday, 15 April 2018 at 10:14:20 UTC, Dmitry Olshansky wrote:
 On Sunday, 15 April 2018 at 06:39:43 UTC, Jonathan M Davis 
 wrote:
 On Sunday, April 15, 2018 07:26:54 Shachar Shemesh via 
 Digitalmars-d wrote:
 [...]
It's extremely common for range-based functions to copy front. Even foreach does it. e.g. [...]
Arguably it should “move” them. This would have worked if input ranges didn’t have to cache front to support multiple ‘front’ calls before popFront. Which is a painful misfeature really as all algorithms would optimize for touching front once anyway. Input range should “produce” elements not keep cache of them, forward ranges may cache current item alright.
On a related note this also destroys idea of “lock-free” or “concurrent” input ranges since front/empty/popFront sequence of calls can’t be cheaply synchronized in as a whole in presence of multiple consumers.
 [...]
Apr 15 2018
prev sibling next sibling parent reply Jonathan M Davis <newsgroup.d jmdavisprog.com> writes:
On Sunday, April 15, 2018 10:14:20 Dmitry Olshansky via Digitalmars-d wrote:
 On Sunday, 15 April 2018 at 06:39:43 UTC, Jonathan M Davis wrote:
 On Sunday, April 15, 2018 07:26:54 Shachar Shemesh via

 Digitalmars-d wrote:
 [...]
It's extremely common for range-based functions to copy front. Even foreach does it. e.g. [...]
Arguably it should “move” them. This would have worked if input ranges didn’t have to cache front to support multiple ‘front’ calls before popFront. Which is a painful misfeature really as all algorithms would optimize for touching front once anyway. Input range should “produce” elements not keep cache of them, forward ranges may cache current item alright.
If that were the route that we were going to go, then front and popFront wouldn't make any sense at all. Rather, you'd treat it more like in iterator in Java and have next() return the next element and pop it off at the same time. But even if we did that, it wouldn't have been appropriate in the general case to move the element unless we put serious restrictions on what could be an input range (e.g. that would make no sense whatsoever with dynamic arrays or any kind of container). In some respects it would be nice if input ranges were completely separate beasts from forward ranges, since ideally, forward ranges would all be value types, and input ranges would all be reference types (in which case, save wouldn't be a thing), but that would cause it's own set of problems, because then it wouldn't really work to have input ranges and forward ranges use the same code, which could get really annoying. Input ranges are kind of an abomination IMHO , because they're so painfully limited, but unfortunately, it simply doesn't work to have all ranges be forward ranges if you don't want to have to do stuff like buffer data. So, figuring out what we would ideally do with input ranges and forward ranges gets to be a bit of an interesting question. Regardless, even if we could all agree on how ranges would ideally be redesigned if we were doing this stuff from scratch, we're not doing it from scratch, and for better or worse, a lot of existing code depends on the current design. So, while I'd love to have the opportunity to sit down and redesign the range API, I don't see how we really can - not and actually have it implemented as part of the language or Phobos anyway. - Jonathan M Davis
Apr 15 2018
parent reply Dmitry Olshansky <dmitry.olsh gmail.com> writes:
On Sunday, 15 April 2018 at 10:39:32 UTC, Jonathan M Davis wrote:
 On Sunday, April 15, 2018 10:14:20 Dmitry Olshansky via 
 Digitalmars-d wrote:
 On Sunday, 15 April 2018 at 06:39:43 UTC, Jonathan M Davis 
 wrote:
 On Sunday, April 15, 2018 07:26:54 Shachar Shemesh via

 Digitalmars-d wrote:
 [...]
It's extremely common for range-based functions to copy front. Even foreach does it. e.g. [...]
Arguably it should “move” them. This would have worked if input ranges didn’t have to cache front to support multiple ‘front’ calls before popFront. Which is a painful misfeature really as all algorithms would optimize for touching front once anyway. Input range should “produce” elements not keep cache of them, forward ranges may cache current item alright.
If that were the route that we were going to go, then front and popFront wouldn't make any sense at all. Rather, you'd treat it more like in iterator in Java and have next() return the next element and pop it off at the same time.
Surprisingly, they might have got that right.
 But even if we did that, it wouldn't have been appropriate in 
 the general case to move the element unless we put serious 
 restrictions on what could be an input range (e.g. that would 
 make no sense whatsoever with dynamic arrays or any kind of 
 container).
There it would copy. You get rvalue either way. Another option is ref scope T. I do not see incompatibility you seem to imply.
 So, while I'd love to have the opportunity to sit down and 
 redesign the range API, I don't see how we really can - not and 
 actually have it implemented as part of the language or Phobos 
 anyway.
Yeah, I think as surely as we can redesign postblit. Let’s not focus on learned helplessnes and “it’s just the way things are”. Recent opMove shows there is great flexibility in what can be done. Seriously, it’s just fixing a library design issue even if its use proved viral.
 - Jonathan M Davis
Apr 15 2018
parent Jonathan M Davis <newsgroup.d jmdavisprog.com> writes:
On Sunday, April 15, 2018 12:29:17 Dmitry Olshansky via Digitalmars-d wrote:
 On Sunday, 15 April 2018 at 10:39:32 UTC, Jonathan M Davis wrote:
 So, while I'd love to have the opportunity to sit down and
 redesign the range API, I don't see how we really can - not and
 actually have it implemented as part of the language or Phobos
 anyway.
Yeah, I think as surely as we can redesign postblit. Let’s not focus on learned helplessnes and “it’s just the way things are”. Recent opMove shows there is great flexibility in what can be done. Seriously, it’s just fixing a library design issue even if its use proved viral.
Replacing postblit is trivial in comparison to redesigning ranges. The worst that's going to happen with postblit is that all postblit constructors are going to be deprecated in favor of some other kind of constructor, which would result in a deprecation warning for types with postblit constructors until they've replaced one constructor with the other. As far as breaking changes go, it shouldn't be much more complicated than renaming a function and requiring everyone to update their code. It shouldn't even affect anyone using types with postblit constructors except maybe if they're using __traits(getMembers, ...) to get at __postblit or __xpostblit. It's just the types themselves that need to be updated. In that sense, it's actually less disruptive to replace the postblit constructor than it would be to rename a function, because all that has to be changed is the function itself, not every place that it gets called. And if we get some kind of opMove solution, that wouldn't break any existing code. It would be invisible to anything but new code that was written to use it. That's the easiest kind of improvement. Ranges, on the other hand, are integral to how a large percentage of Phobos works, and a lot of existing code depends on them heavily. And unless you're talking about simply renaming popFront to something else, changing them gets pretty hairy. Changing ranges in any serious way risks having to essentially fork Phobos (or at least large portions of it). Any major redesign would almost certainly require renaming most or all of the range traits. All functions accepting ranges would have to be rewritten. Any function returning a range would have to either be renamed or somehow work with both the old and new range API (and since containers use an operator rather than a function name to provide ranges, that could make updating containers rather interesting unless we're willing to give up on slicing being how we get a range from a container). We may be able to make small changes here and there, but I don't see how anything major is possible unless we're willing to break a ton of existing code. Now, maybe someone can come up with something clever to make some decent improvements to what we have without doing a major redesign, but I think that a major redesign requires effectively forking the range API, which means either ending up with two different range APIs that everyone has to deal with, or requiring all range-based code to eventually be updated from one range API to another, which is an enormous change and far more than simply swapping one function out for another. So, if we decide that we're willing to make changes that effectively require that almost every D program make changes that are potentially quite invasive, then sure, we can redesign the range API. And it would be great to be able to do that. But I have a hard time buying that such a large change would be acceptable at this point. - Jonathan M Davis
Apr 15 2018
prev sibling parent reply Steven Schveighoffer <schveiguy yahoo.com> writes:
On 4/15/18 6:14 AM, Dmitry Olshansky wrote:
 On Sunday, 15 April 2018 at 06:39:43 UTC, Jonathan M Davis wrote:
 On Sunday, April 15, 2018 07:26:54 Shachar Shemesh via Digitalmars-d 
 wrote:
 [...]
It's extremely common for range-based functions to copy front. Even foreach does it. e.g. [...]
Arguably it should “move” them. This would have worked if input ranges didn’t have to cache front to support multiple ‘front’ calls before popFront. Which is a painful misfeature really as all algorithms would optimize for touching front once anyway.
Ugh, then you get things like fungetc. However, there are certainly cases where it's expensive to compute front, and therefore, requires a cache. A new primitive that accesses front and pops it at the same time would be useful. We can have selected algorithms that can deal with it use that primitive instead (and instrument foreach to do it). Potentially, we can create a wrapper for all input ranges such that they support it.
 Input range should “produce” elements not keep cache of them, forward 
 ranges may cache current item alright.
Well, sometimes you HAVE to cache it. For instance File.byLine is clearly an input range, and not a forward range. But it MUST cache because it's i/o that has to be buffered to be looked at! No reason to force caching on the user at that point. -Steve
Apr 17 2018
next sibling parent "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Tue, Apr 17, 2018 at 11:14:02AM -0400, Steven Schveighoffer via
Digitalmars-d wrote:
 On 4/15/18 6:14 AM, Dmitry Olshansky wrote:
[...]
 Input range should “produce” elements not keep cache of them,
 forward ranges may cache current item alright.
Well, sometimes you HAVE to cache it. For instance File.byLine is clearly an input range, and not a forward range. But it MUST cache because it's i/o that has to be buffered to be looked at! No reason to force caching on the user at that point.
[...] In the past I've used range-like (i.e., iteration) APIs that do not cache the current element. To be quite frank, they were a royal pain to work with, because they force user code to be needlessly cluttered with ad hoc caching schemes that pollute the code all over the place. Instead of having the range handle the caching, all downstream code that uses the range has to implement their own caching and/or otherwise implement ugly ad hoc workarounds for the lack thereof. Andrei got it right when he designed the range API with a cached .front, and I would be opposed to any range API design that involves a non-caching .front. For generative ranges like std.range.generate, well, .generate exists precisely for that reason: to be the single implementation of a caching .front that wraps around a non-caching generating function, so that range implementers don't have to write their own caching code, but only need to care about writing the generating function and let .generate handle the caching for them. The downstream user also doesn't need to worry about the odd semantics of a non-caching .front, but can use the range API as usual. Win-win. T -- Recently, our IT department hired a bug-fix engineer. He used to work for Volkswagen.
Apr 17 2018
prev sibling parent Dmitry Olshansky <dmitry.olsh gmail.com> writes:
On Tuesday, 17 April 2018 at 15:14:02 UTC, Steven Schveighoffer 
wrote:
 On 4/15/18 6:14 AM, Dmitry Olshansky wrote:
 On Sunday, 15 April 2018 at 06:39:43 UTC, Jonathan M Davis 
 wrote:
 On Sunday, April 15, 2018 07:26:54 Shachar Shemesh via 
 Digitalmars-d wrote:
 [...]
It's extremely common for range-based functions to copy front. Even foreach does it. e.g. [...]
Arguably it should “move” them. This would have worked if input ranges didn’t have to cache front to support multiple ‘front’ calls before popFront. Which is a painful misfeature really as all algorithms would optimize for touching front once anyway.
Ugh, then you get things like fungetc.
For many algorithms indeed you can’t stop w/o peeking ahead. It’s just that lookahead of 1 is hardcoded in the range definition b/c it’s enough for a lot of things. But then once you are past lookahead of 1, it has to do .save + restore if overshot. ungetc of 2 if you’d like.
 However, there are certainly cases where it's expensive to 
 compute front, and therefore, requires a cache. A new primitive 
 that accesses front and pops it at the same time would be 
 useful. We can have selected algorithms that can deal with it 
 use that primitive instead (and instrument foreach to do it).
Might be an option. But as long as you have front/popFront/empty as well, no concurrent stuff.
 Potentially, we can create a wrapper for all input ranges such 
 that they support it.

 Input range should “produce” elements not keep cache of them, 
 forward ranges may cache current item alright.
Well, sometimes you HAVE to cache it. For instance File.byLine
File caches, byLine actually does “read up until EOL and give me slice of that” I do not see where caching of char[] plays any significant role. Except that due to how ranges work it likely has to do the readline at _empty_ or on construction + popFront.
 is clearly an input range, and not a forward range. But it MUST 
 cache because it's i/o that has to be buffered to be looked at! 
 No reason to force caching on the user at that point.
 -Steve
Apr 17 2018
prev sibling parent reply Shachar Shemesh <shachar weka.io> writes:
On 15/04/18 09:39, Jonathan M Davis wrote:
 
 It's extremely common for range-based functions to copy front. Even foreach
 does it. e.g.
 
Not necessarily: foreach(ref e; [S(0), S(1), S(2)]){} While that would work with foreach, it will not work with anything that expects an inputRange (and since D defines a forward range as an extension, this is also not a forward range).
 If non-copyable types were allowed, then no function which accepted a range
 would be allowed to copy front without first checking that front was
 copyable (something that most code simply isn't going to do)
No, that's not correct. If non-copyable types were allowed, then functions that accept a range would fail to compile on non-copyable types if they do a copy. But the flip side is that if non-copyable would be allowed, a lot of functions that current copy would not. There are far too many unnecessary copies in the code that serve no purpose, but they are invisible, so they are not fixed.
 
 Remember also that a large percentage of ranges that don't wrap dynamic
 arrays put their elements on the stack, which means that whenever the range
 is copied, the elements are copied.
If I am guaranteed to be able to copy an input range, what's the difference between it and a forward range? Strike that, I have a better one: If I am allowed to copy an input range, what does the "save" function in forward range even mean?
 I'm not sure why it ends up printing each of them 3 times rather than 2,
Because some code somewhere does an unnecessary copy?
 the fact that foreach copies any range that it's given means that in order
 to have ranges allow non-copyable types, we'd have to change how foreach
 worked, which would break a lot of code. Right now,
Like I said above, foreach(ref) works with "input ranges" that define ref return from front to uncopyable objects. Foreach without ref doesn't, but that's mandatory from the interface: You cannot iterate copies of a range returning uncopyable objects. As for ranges that store the values inside them: you REALLY don't want to copy those around. They may get quite big (not to mention that I don't see how you can define one over a non-copyable type).
 
 foreach(e; range)
 {
 }
 
 is lowered to something like
 
 for(auto __range = range; !__range.empty; __range.popFront())
 {
      auto e = __range.front;
 }
And that indeed generates a lot of confusion regarding what running two foreaches in a row does. This is a bug that already affects more than just uncopyable objects.
 
 That requires both copying the range and copying front. We could
 theoretically change it so that everywhere that e was used, it would be
 replaced with __range.front
Like I said, already solved for foreach(ref)
 Now, generic range-based code really shouldn't ever use a range after it's
 been copied, since whether mutating the copy affects the original depends on
 the implementation of the range (and thus generic code should assume that
 foreach may have consumed the range and call save if it doesn't want that to
 happen), but lots of code does it anyway,
And allowing ranges with uncopyable members would turn this potential deadly run-time bugs into easy to find compile time errors. Sound like a great reason to allow it, to me.
 and if the code isn't generic, it
 can work just fine, since then the code can depend on the semantics of that
 specific range type
Such as it being copyable? Non-generic code is of no relevance to this discussion, as far as I can tell.
 Also, in order for a range to work with a non-copyable type, either front
 would have to return by ref or it would have to construct a new object to
 return so that it could be moved rather than copied.
Correct
 I expect that quite a
 few ranges would have problems with such a restriction,
Then those ranges will not work with non-copyable objects. That's no reason to make it impossible for *any* range to work with non-copyable.
 In addition, it's quite possible that forcing functions to not copy front
 would hurt performance.
I'm going to stop responding now, because, frankly, I think you and I are having very different discussions. You seem to be advocating against making *all* ranges support non-copyable types, while I'm merely asking that *any* range be allowed to support such types. Current situation is that a range with uncopyable types is not considered an input range, and cannot use any phobos range functions, including some that it should be able to use without a problem. There is no reason to block such ranges from using "map" or "any", except the fact they require that the type answer "isInputRange" with true.
 IIRC, when this has come up previously, Andrei ruled that supporting
 non-copyable types simply wasn't worth the extra complication, though a
 quick search doesn't turn up anything where he talked about it. So, I can't
 verify that at the moment.
I will trust Andrei to weigh in on this point if he thinks he should. Until he does, please feel free to only bring up points you are willing to argue yourself. Shachar
Apr 15 2018
parent reply Jonathan M Davis <newsgroup.d jmdavisprog.com> writes:
On Sunday, April 15, 2018 15:32:37 Shachar Shemesh via Digitalmars-d wrote:
 On 15/04/18 09:39, Jonathan M Davis wrote:
 It's extremely common for range-based functions to copy front. Even
 foreach does it. e.g.
Not necessarily: foreach(ref e; [S(0), S(1), S(2)]){} While that would work with foreach, it will not work with anything that expects an inputRange (and since D defines a forward range as an extension, this is also not a forward range).
The fact that foreach copies the range that it's given makes using ref with anything other than arrays a bit of an iffy proposition. It will compile regardless of whether front returns by ref or not (which arguably is a bug), but it will only affect the original elements if copying the range doesn't copy the elements - which in the case of a basic input range rather than a forward range, is generally true, otherwise, it would be a forward range. But as it stands, using ref with foreach in generic code is not going to go well. If anything, it's actually a code smell. For it to work in general, we'd either have to have foreach stop copying ranges (which would break a _lot_ of code, including code that just uses dynamic arrays), or we'd have to have some way to mark or detect a range as working correctly with ref when it's copied - which would mean a new range trait specifically for that, be that one that tests for a feature (such as hasLength) or one that tests for a whole new range type. And I suspect that we would have to use an enum or somesuch to tag the range as having the right semantics, since I don't think that that can be determined just by looking at the types of its member variables.
 If non-copyable types were allowed, then no function which accepted a
 range would be allowed to copy front without first checking that front
 was copyable (something that most code simply isn't going to do)
No, that's not correct. If non-copyable types were allowed, then functions that accept a range would fail to compile on non-copyable types if they do a copy. But the flip side is that if non-copyable would be allowed, a lot of functions that current copy would not. There are far too many unnecessary copies in the code that serve no purpose, but they are invisible, so they are not fixed.
And if a templated function fails to compile if it's given a particular argument, and that failure is not due to the template constraint, that usually means that the template constraint is incomplete. And that means that any and all templated functions which copy front would then have to test whether front was copyable. Yes, in some cases, that would mean that a range-based function would be rewritten to not copy front, but either way, it would mean that _every_ range-based function then has to worry about non-copyable elements - either by testing for them in its template constraint or by being written specifically in a way that works with them and then adding that particular use case to their unit tests. So, allowing non-copyable types complicates range-based code across the board. And the reality of the matter is that in many cases, copying front is actually more efficient. Not all algorithms can call front just once without copying it, and many ranges (e.g. map) calculate front on each call, meaning that repeated calls to front can be more expensive than just calling it once and copying the result. Dealing with that in the most efficient way possible would probably mean having generic code test whether front returns by ref so that it can copy it if it doesn't and just keep referring to front directly if it does, but that would tend to make for some really messy code in any generic function where front needs to be accessed multiple times. As it stands, because of functions like map, it's almost certainly better for any function which accesses front multiple times to copy it.
 Remember also that a large percentage of ranges that don't wrap dynamic
 arrays put their elements on the stack, which means that whenever the
 range is copied, the elements are copied.
If I am guaranteed to be able to copy an input range, what's the difference between it and a forward range? Strike that, I have a better one: If I am allowed to copy an input range, what does the "save" function in forward range even mean?
You can copy input ranges as much as you like. You just don't get a guaranteed independent copy that can be iterated separately. The classic example would be that a struct copies its members, whereas a class is just a reference. So with structs, you generally (but not always) have a value type, whereas with classes, you have a reference type, and copies of each act completely differently. One is independent, whereas the other is the same as the original aside from assignment. To make matters worse, a range could be a pseudo-reference type in that copying it doesn't provide a completely independent copy, and it doesn't act like a reference type either. Dynamic arrays are one example of that. In their case, the result is a range that's independent for iteration purposes but where mutating any elements affects the original. Similarly, any time that a range is defined with multiple members and they're not all value types, you get a pseudo-reference type. At that point, how mutating the copy affects the original depends on what each of the members does. It could act like a dynamic array, or you could end up in a situation where mutating the copy only partially affects the state of the iteration in the original, in which case, trying to use the original after the copy has been made would be buggy regardless of whether the code is generic or not. e.g. there's a real risk of a situation like that when you do something like have a range wrapping a socket - especially if it does something like put the bytes it grabs in a buffer in the range when it reads them. If the state of the iteration is shared between value types and reference types, you can get all kinds of weird behaviors if you try to use the original after mutating the copy. The result of all of this is that in generic code, you have no clue whatsoever what the semantics are of copying a range around, because it depends entirely on the copying semantics of whatever range that code is currently being instantiated with. As such, generic code basically has to assume that once you copy a range, you can't mutate the original (since it may or may not be affected by mutating the copy and may or may not be cleanly affected by mutating the copy), and as such, if you want an independent copy to iterate over, you need a specific way to get one other than just copying. That's what save does. save was introduced to deal with ranges that were classes (though since it generally requires allocating a new object with every call to save, it's not necessrily a great idea to implement such a range). However, the issues related to copying ranges in general weren't as clear at that point in time. It was just clear that if you wanted to be able to generically get an independent copy, a solution was needed. But of course, since _most_ ranges act just like save when they're copied, save gets frequently forgotten, making it a bit of a thorn in our side. And the fact that generic code can't ever use a range after copying it (even after passing it to foreach, since that does a copy) is something that's frequently missed, causing subtle bugs. So, the fact that ranges weren't better thought through in the beginning with regards to copying means that we have something that works quite well in typical use cases but gets fairly thorny when you're writing generic code (especially if you're then distributing it to other folks to use with ranges that you've never seen or heard of). A lot of the problems that we have with ranges stem from the fact that they're basically designed around being an abstraction of dynamic arrays, and any time that a range doesn't act like a dynamic array does, we start ending up with cracks in how things work. In general, we've gotten it to work fairly well with just front, popFront, and empty if you can assume that ranges copy like dynamic arrays do, but of course, many don't, and there, things start to fall apart. Adding save helped, but it didn't really fix the problem, since it can so often be ignored and still have code work, and it leaves the problem of basic input ranges, which by definition, can't be copied like dynamic arrays are. If a range can't have save, then it's basically because there's something about its iteration state that either can't be saved or can't be saved efficiently. Ideally, we'd be able to do something like require that basic input ranges be reference types (since they obviously can't be value types, or they'd be forward ranges) and require that forward ranges act like value types (at least insofar as copying them copies the iteration state and thus acts like save). And in that case, if we could have a clean way to differentiate between input ranges and forward ranges, we could drop the whole save thing and even rely on how ranges behave when they're copied, but it would make certain types of ranges that work right now not work (e.g. classes couldn't forward be ranges without being wrapped in a struct with a postblit constructor, and basic input ranges would either have to be on the heap or involve a pointer to the stack). It would probably also require that we treat input ranges and forward ranges as incompatible, which would be great in the case where we want to deal with a forward range but not so great in the case where we want to be able to just iterate without ever having to look at any part of the range multiple times, since then we'd basically have to have separate code for input ranges and forward ranges. So, some redesign of ranges would be great. However, they're highly integrated into Phobos and heavily used in existing code, making it very difficult to have any kind of clean deprecation path so that we could transition from the current state of things to wherever we want to be. We could do it if we were willing to pretty much outright fork Phobos, but I'm not sure that it can be done otherwise. And there's still the issue of foreach in that if we want to change its semantics, that would be a breaking change in the language itself, meaning that forking the standard library wouldn't fix that one. If we want to change foreach at all, we'd have to figure out how to do so in place with a clean deprecation path.
 foreach(e; range)
 {
 }

 is lowered to something like

 for(auto __range = range; !__range.empty; __range.popFront())
 {

      auto e = __range.front;

 }
And that indeed generates a lot of confusion regarding what running two foreaches in a row does. This is a bug that already affects more than just uncopyable objects.
It was the behavior that that was required in order to have ranges behave the same as dynamic arrays with foreach without changing how dynamic arrays work with foreach. Yes, that becomes a problem once you start having ranges that are reference types or have any range type where copying it is not equivalent to calling save, but given that ranges were added on top of the existing behavior, nothing else would have made sense. Maybe foreach should have been changed for dynamic arrays, but I don't think that the implications of copying ranges in generic code had been particularly thought through at the point in time that ranges were added to foreach, and I don't know how easy it would have been to talk Walter into changing the behavior of foreach for dynamic arrays at the time. At this point in time, it would require a way to cleanly transition from the old behavior to the new without immediately breaking existing code. If someone can come up with that, and the arguments for changing foreach are compelling enough, then maybe it can be changed, but as it stands, it's just one of those annoyances that stems from the fact that ranges tried to emulate dynamic arrays without requiring that they act like dynamic arrays.
 Now, generic range-based code really shouldn't ever use a range after
 it's been copied, since whether mutating the copy affects the original
 depends on the implementation of the range (and thus generic code
 should assume that foreach may have consumed the range and call save if
 it doesn't want that to happen), but lots of code does it anyway,
And allowing ranges with uncopyable members would turn this potential deadly run-time bugs into easy to find compile time errors. Sound like a great reason to allow it, to me.
Ranges have to be copyable. If they're not, you have to use ref all over the place, and it becomes impossible to create a function that returns a new range without putting it on the heap, completely destroying function chaining unless we want to start allocating a whole bunch of ranges on the heap, which would destroy performance. It would be great if we could clean up the semantics of copying ranges, but making it so that you can't rely on ranges being copied would destroy our ability to use ranges as we do now.
 I'm going to stop responding now, because, frankly, I think you and I
 are having very different discussions.

 You seem to be advocating against making *all* ranges support
 non-copyable types, while I'm merely asking that *any* range be allowed
 to support such types.

 Current situation is that a range with uncopyable types is not
 considered an input range, and cannot use any phobos range functions,
 including some that it should be able to use without a problem.

 There is no reason to block such ranges from using "map" or "any",
 except the fact they require that the type answer "isInputRange" with
 true.
If isInputRange accepted non-copyable element types, then all range-based code would have to deal with non-copyable element types. Yes, in many cases, that would mean adding a check to their template constraints which made them require that the element type be copyable rather than refactoring them to actually work with non-copyable element types, but all well-written, range-based code would then have to take non-copyable types into consideration. And any range-based code that was going to have to work with non-copyable types would then have to be very particular about what it did. We'd be complicating all range-based code to handle yet another corner case. We have too many problems already because of ranges trying to be too many things at once and not locking down their semantics in ways that either require things that are not currently required or disallow things that are currently allowed. - Jonathan M Davis
Apr 15 2018
parent reply Shachar Shemesh <shachar weka.io> writes:
On 16/04/18 01:28, Jonathan M Davis wrote:
 The fact that foreach copies the range that it's given makes using ref with
 anything other than arrays a bit of an iffy proposition. It will compile
 regardless of whether front returns by ref or not (which arguably is a bug),
 but it will only affect the original elements if copying the range doesn't
 copy the elements
You keep referring to ranges that contain the data, but I am racking my mind and failing to find examples. You can have ranges over arrays,linked lists, trees (though you'd probably use opApply there), and any other container, as well as external data sources (files, sockets, pipes). In none of those cases would your range contain actual data. I need to see the use case for ranges *with* embedded data. Plus, as I've said before, if ranges are expected to contain data, copying them seems like an incredibly bad idea. The whole idea behind casually copying the range about is that it's cheap to do.
 which in the case of a basic input range rather than a
 forward range
I don't see how that statement is true.
 is generally true, otherwise, it would be a forward range.
A forward range is an input range.
 But as it stands, using ref with foreach in generic code is not going to go
 well. If anything, it's actually a code smell.
You are conflating ref with owning data, which I find unsubstantiated.
 If non-copyable types were allowed, then functions that accept a range
 would fail to compile on non-copyable types if they do a copy.

 But the flip side is that if non-copyable would be allowed, a lot of
 functions that current copy would not. There are far too many
 unnecessary copies in the code that serve no purpose, but they are
 invisible, so they are not fixed.
And if a templated function fails to compile if it's given a particular argument, and that failure is not due to the template constraint, that usually means that the template constraint is incomplete.
I do not accept this argument outright. In fact, our experience at Weka has drove us to the exact opposite direction. We deliberately set up functions to include invalid cases and then either let them fail because of lacking constraints, or even set up a static assert to fail the function generation rather than fail matching. We found that the error messages produced by the alternative are simply too difficult to parse out and figure what went wrong. With that said, there is some merit to setting up the function constraints so that it matches what the function actually need. It allows overloading the missing combinations by another module. I don't think it is a good argument in this particular case, however. I think we should set up to document which phobos functions require copying. It will flesh out bugs and unnecessary copying in phobos, as well as force phobos developers to think about this scenario when they write code. It shouldn't even take that much time to do this run: just go over all UTs and add instantiation over a non-copyable type. The UTs that fail need work.
 it would mean that _every_ range-based function then has to worry about
 non-copyable elements
No, just those in libraries. The same way they have to worry about big O complexity, type safety, nogc and a bunch of other things. When you write a library, you have to think about your users. This doesn't change anything about that.
 So, allowing
 non-copyable types complicates range-based code across the board.
Like I said above, it is more likely to find bugs than to introduce needless complications. If you code cannot be easily implemented over non-copyable types, exclude them. Don't shut me out of the entire game.
 Not all algorithms can call front just once without
 copying it
Huh? Where did that come from? Phobos already assumes that calling front needs to be an O(1) operation. You can call front as many times as you like. Why would you assume otherwise?
 and many ranges (e.g. map) calculate front on each call, meaning
 that repeated calls to front can be more expensive than just calling it once
 and copying the result.
Not by a lot, and definitely not in any way the generic code has any right to assume. I reject this argument outright.
 The result of all of this is that in generic code, you have no clue
 whatsoever what the semantics are of copying a range around, because it
 depends entirely on the copying semantics of whatever range that code is
 currently being instantiated with. As such, generic code basically has to
 assume that once you copy a range, you can't mutate the original (since it
 may or may not be affected by mutating the copy and may or may not be
 cleanly affected by mutating the copy),
Jonathan, I'm sorry, but what you are describing is called "a bug". Input ranges provide no guarantees about copying the range itself. If you do it and expect *anything* (which it seems you do), you have a bug. If you need to copy the range itself, you absolutely need to require a forward range. A forward range with ref front absolutely promises that both iterations reference the same elements. I find myself sincerely hoping that you are making these examples up in order to win this discussion turned into an argument for some reason, instead of the way phobos is actually written, because if you're right phobos is *broken*. The good news is that if Phobos is broken, this is a great way to flush out the places it's broken. Let's add UTs calling all Phobos functions accepting input ranges (as opposed to forward ranges) with non-copyable ranges (i.e. - the range itself is non-copyable) and fix the compilation errors. This shouldn't take more than a few days to do. Hell, let's write UTs to pass forward ranges that are non-copyable (but return a copy via "save") and flush out those bugs as well. The point of "development" is to make our code better, not defend the status-quo. Especially since we've just identified a fairly simple way to find the places that need work and easily work on them.
 But of course, since _most_ ranges act just like save when they're copied,
 save gets frequently forgotten, making it a bit of a thorn in our side.
Then rejoice: I just suggested a way to *easily* find and fix those places.
 require that forward ranges act like value types (at
 least insofar as copying them copies the iteration state and thus acts like
 save).
The brackets are important. Just because I'm returning a reference to elements does not mean I don't support the range itself restarting a scan.
 So, some redesign of ranges would be great.
Yes, but please note that I don't think it is *necessary*. Yes, save is annoying, but it *is* workable as is, and since we can harness the compiler to locate where we're deviating from the righteous way, not that expensively if we could just be bothered to do it.
 And allowing ranges with uncopyable members would turn this potential
 deadly run-time bugs into easy to find compile time errors. Sound like a
 great reason to allow it, to me.
Ranges have to be copyable.
And why would a range iterating over uncopyable objects not be copyable? How are the two related? Shachar
Apr 15 2018
next sibling parent ag0aep6g <anonymous example.com> writes:
On 04/16/2018 06:15 AM, Shachar Shemesh wrote:
 Input ranges provide no guarantees about copying the range itself. If 
 you do it and expect *anything* (which it seems you do), you have a bug. 
 If you need to copy the range itself, you absolutely need to require a 
 forward range.
 
 A forward range with ref front absolutely promises that both iterations 
 reference the same elements.
 
 I find myself sincerely hoping that you are making these examples up in 
 order to win this discussion turned into an argument for some reason, 
 instead of the way phobos is actually written, because if you're right 
 phobos is *broken*.
 
 The good news is that if Phobos is broken, this is a great way to flush 
 out the places it's broken. Let's add UTs calling all Phobos functions 
 accepting input ranges (as opposed to forward ranges) with non-copyable 
 ranges (i.e. - the range itself is non-copyable) and fix the compilation 
 errors. This shouldn't take more than a few days to do.
I'm currently attempting something similar: fix Phobos to work with RefRange. RefRange is copyable, but assignment is weird. Assignment isn't part of the range interface, so I'm replacing it with `move`: https://issues.dlang.org/show_bug.cgi?id=18657 https://github.com/dlang/phobos/pull/6346 That PR doesn't fix all of Phobos, it's just a start. Going over all of Phobos (and getting it merged) might take a while. Thinking about it, those `move`s probably fix the code for non-copyable ranges as well. There might be considerable overlap here.
Apr 15 2018
prev sibling parent reply Jonathan M Davis <newsgroup.d jmdavisprog.com> writes:
On Monday, April 16, 2018 07:15:36 Shachar Shemesh via Digitalmars-d wrote:
 On 16/04/18 01:28, Jonathan M Davis wrote:
 The fact that foreach copies the range that it's given makes using ref
 with anything other than arrays a bit of an iffy proposition. It will
 compile regardless of whether front returns by ref or not (which
 arguably is a bug), but it will only affect the original elements if
 copying the range doesn't copy the elements
You keep referring to ranges that contain the data, but I am racking my mind and failing to find examples.
Pretty much any generative range does. e.g. iota contains data, and each element is then calculated from the current value. The same goes for stuff like random number generators. It's quite common for ranges to have popFront calculate the next front and store it as a member rather than have front do the calculation, since then multiple calls to front don't incur the cost of whatever has to be done to get the next element, and they can just return the same value each time. In fact, that's generally considered to be best practice. It's true that most ranges do not contain every element directly in a struct, but some do (e.g. std.range.only does, which is a way to generate a range from a list of elements without allocating anything on the heap), and many ranges at least store the value for front. So, I'd say that a fairly large percentage of ranges store at least one value directly in the range object.
 Plus, as I've said before, if ranges are expected to contain data,
 copying them seems like an incredibly bad idea. The whole idea behind
 casually copying the range about is that it's cheap to do.
Yes, it's generally assumed that it's cheap to copy a range, but it's also generally assumed by D code in general that copying is cheap. Relatively few functions in Phobos accept arguments by ref. Some do accept by auto ref - either to forward the refness or to avoid a copy - but since passing by ref doesn't work with rvalues, most code in Phobos doesn't do so unless it's intended to mutate the argument and give the caller access to the result. I would strongly suggest that anyone who is looking to put an object that is expensive to copy in a range consider making it so that that object is on the heap rather than directly in the range.
 But as it stands, using ref with foreach in generic code is not going to
 go well. If anything, it's actually a code smell.
You are conflating ref with owning data, which I find unsubstantiated.
My point is that if you have foreach(ref e; range) { } then unless range is a dynamic array, the odds are very high that it's a bug, because then you risk problems like foreach(ref e; range) { e = foo; } Outside of dynamic arrays, that assignment is usually pointless. ref implies that it's going to actually affect the range that you passed to foreach, but in most cases, it won't. Because the range is copied, in order for ref to affect elements of the range that is passed to foreach, the range must be a reference type or must act like a dynamic array in the sense that each copy refers to exactly the same objects - which is true for some ranges (e.g. those over a container) - but for many ranges, it isn't (e.g. generative ranges aren't usually backed by anything). Also, the front of most ranges does not return by ref, which would be required for the ref on foreach to actually allow you to mutate the element. As such, I'd argue that the fact that ref is allowed on foreach for all ranges is a bug. At minimum, it should require that front return by ref, and even then, depending on the range, it won't necessary work to assign to the element and have it affect the range that was copied when it was passed to foreach. So, while it will work in _some_ cases to pass a range to foreach and then use ref, in general, I would consider it a code smell, because the odds are high that it does not do what the programmer who wrote it assumed that it would do.
 And if a templated function fails to compile if it's given a particular
 argument, and that failure is not due to the template constraint, that
 usually means that the template constraint is incomplete.
I do not accept this argument outright. In fact, our experience at Weka has drove us to the exact opposite direction. We deliberately set up functions to include invalid cases and then either let them fail because of lacking constraints, or even set up a static assert to fail the function generation rather than fail matching. We found that the error messages produced by the alternative are simply too difficult to parse out and figure what went wrong.
For better or for worse, it's generally considered a bug in Phobos if it's possible to write code that gets past the template constraint and then does not compile. There are reasons why that can be a problematic approach, but that is the approach that we've taken thus far. As such, if isInputRange started allowing non-copyable elements, it would be expected that all range-based functions in Phobos would then have to either require copyability in their template constraint or actually work with non-copyable types.
 it would mean that _every_ range-based function then has to worry about
 non-copyable elements
No, just those in libraries. The same way they have to worry about big O complexity, type safety, nogc and a bunch of other things. When you write a library, you have to think about your users. This doesn't change anything about that.
It changes it by adding yet another complication to worry about when implementing ranges and range-based functions that no one implementing range-based code currently has to worry about. And writing bug-free generic, range-based code can already be surprisingly difficult without adding yet another complication into the mix.
 Not all algorithms can call front just once without
 copying it
Huh? Where did that come from? Phobos already assumes that calling front needs to be an O(1) operation. You can call front as many times as you like. Why would you assume otherwise?
Sure, you can call front as many times as you want. But in some cases, it's more expensive to call front multiple times than it is to call it once and store the result. Also, while the result of front may be equivalent every time, it may not be exactly the same object every time. For instance, auto r2 = range.map!(a => to!string())(); is going to to result in a range that allocates a string every time that front is called. This sort of thing is not an uncommon thing to do with map, and while many ranges are implemented to do most of their work in popFront (thereby making calling front cheap), there's no requirement that they do so. So, there are ranges that do their work in front rather than popFront, and in the case of map, it actually has to be done in front, because map can be random-access, and using popFront to calculate the value only works for front, not arbitrary elements. In such cases, calling front multiple times is definitely worse than just calling it once and storing the result. I think that pretty much the only time that it's better to call front multiple times rathre than calling it once and storing the result is when it returns by ref, because if it doesn't return by ref, you're either going to get a copy every time you call front, or the range is going to be creating a new object every time you call front. In those cases, storing the result of front and reusing it would be cheaper. And since most ranges do not return by ref, most ranges are not avoiding copies or any expensive calculations by calling front multiple times. If anything, they're incurring them. Now, arguably, in the cases where a generic function needs to access front multiple times, it would be best if it tested front for refness and then used static ifs so that it stored the result of front when it didn't return by ref and called it multiple times when it did, but that would get ugly fast, and I'm not aware of any Phobos code that currently does that.
 The result of all of this is that in generic code, you have no clue
 whatsoever what the semantics are of copying a range around, because it
 depends entirely on the copying semantics of whatever range that code is
 currently being instantiated with. As such, generic code basically has
 to
 assume that once you copy a range, you can't mutate the original (since
 it may or may not be affected by mutating the copy and may or may not
 be cleanly affected by mutating the copy),
Jonathan, I'm sorry, but what you are describing is called "a bug". Input ranges provide no guarantees about copying the range itself. If you do it and expect *anything* (which it seems you do), you have a bug. If you need to copy the range itself, you absolutely need to require a forward range. A forward range with ref front absolutely promises that both iterations reference the same elements. I find myself sincerely hoping that you are making these examples up in order to win this discussion turned into an argument for some reason, instead of the way phobos is actually written, because if you're right phobos is *broken*. The good news is that if Phobos is broken, this is a great way to flush out the places it's broken. Let's add UTs calling all Phobos functions accepting input ranges (as opposed to forward ranges) with non-copyable ranges (i.e. - the range itself is non-copyable) and fix the compilation errors. This shouldn't take more than a few days to do.
I'm not making anything up. Very little range-based code passes ranges by reference. They're copied by value all the time, even if they're not forward ranges. foreach itself copies a range before iterating over it. Yes, if you want to iterate over two independent copies, you need a forward range, and you need to call save, but ranges are copied constantly. It's just that it's then a bug to do anything with the original after the copy has been made. As it stands, a range that cannot be copied is going to work with almost nothing in Phobos and would be borderline useless. And adding ref everywhere to range-based functions to change that would destroy our ability to chain function calls with ranges. Also, a large percentage of range-based functions return a range that wraps another range, and that means copying the range that's passed in. It simply wouldn't work with many functions to accept a range by ref without then copying the range inside the function anyway. The assumption that input ranges are copyable is built into almost everything range-based in all of Phobos. The rare exceptions are functions like std.conv.parse which are specifically intended to mutate the range that's passed in rather than copying it.
 And allowing ranges with uncopyable members would turn this potential
 deadly run-time bugs into easy to find compile time errors. Sound like
 a
 great reason to allow it, to me.
Ranges have to be copyable.
And why would a range iterating over uncopyable objects not be copyable? How are the two related?
Because many ranges do stuff like store the value of front after calculating it with popFront or store some set of values to calculate new values. No such range would work with non-copyable element types even if isInputRange allowed them. Now, that doesn't mean that _some_ ranges couldn't be made to work with non-copyable element types, but the fact that ranges must be copyable means that any range that worked with non-copyable element types could not be a struct that stored any of its elements as a direct member. - Jonathan M Davis
Apr 16 2018
next sibling parent Shachar Shemesh <shachar weka.io> writes:
This is going nowhere.

Let's flesh this out face to face at dconf.

Shachar

On 16/04/18 12:01, Jonathan M Davis wrote:
 On Monday, April 16, 2018 07:15:36 Shachar Shemesh via Digitalmars-d wrote:
 On 16/04/18 01:28, Jonathan M Davis wrote:
 The fact that foreach copies the range that it's given makes using ref
 with anything other than arrays a bit of an iffy proposition. It will
 compile regardless of whether front returns by ref or not (which
 arguably is a bug), but it will only affect the original elements if
 copying the range doesn't copy the elements
You keep referring to ranges that contain the data, but I am racking my mind and failing to find examples.
Pretty much any generative range does. e.g. iota contains data, and each element is then calculated from the current value. The same goes for stuff like random number generators. It's quite common for ranges to have popFront calculate the next front and store it as a member rather than have front do the calculation, since then multiple calls to front don't incur the cost of whatever has to be done to get the next element, and they can just return the same value each time. In fact, that's generally considered to be best practice. It's true that most ranges do not contain every element directly in a struct, but some do (e.g. std.range.only does, which is a way to generate a range from a list of elements without allocating anything on the heap), and many ranges at least store the value for front. So, I'd say that a fairly large percentage of ranges store at least one value directly in the range object.
 Plus, as I've said before, if ranges are expected to contain data,
 copying them seems like an incredibly bad idea. The whole idea behind
 casually copying the range about is that it's cheap to do.
Yes, it's generally assumed that it's cheap to copy a range, but it's also generally assumed by D code in general that copying is cheap. Relatively few functions in Phobos accept arguments by ref. Some do accept by auto ref - either to forward the refness or to avoid a copy - but since passing by ref doesn't work with rvalues, most code in Phobos doesn't do so unless it's intended to mutate the argument and give the caller access to the result. I would strongly suggest that anyone who is looking to put an object that is expensive to copy in a range consider making it so that that object is on the heap rather than directly in the range.
 But as it stands, using ref with foreach in generic code is not going to
 go well. If anything, it's actually a code smell.
You are conflating ref with owning data, which I find unsubstantiated.
My point is that if you have foreach(ref e; range) { } then unless range is a dynamic array, the odds are very high that it's a bug, because then you risk problems like foreach(ref e; range) { e = foo; } Outside of dynamic arrays, that assignment is usually pointless. ref implies that it's going to actually affect the range that you passed to foreach, but in most cases, it won't. Because the range is copied, in order for ref to affect elements of the range that is passed to foreach, the range must be a reference type or must act like a dynamic array in the sense that each copy refers to exactly the same objects - which is true for some ranges (e.g. those over a container) - but for many ranges, it isn't (e.g. generative ranges aren't usually backed by anything). Also, the front of most ranges does not return by ref, which would be required for the ref on foreach to actually allow you to mutate the element. As such, I'd argue that the fact that ref is allowed on foreach for all ranges is a bug. At minimum, it should require that front return by ref, and even then, depending on the range, it won't necessary work to assign to the element and have it affect the range that was copied when it was passed to foreach. So, while it will work in _some_ cases to pass a range to foreach and then use ref, in general, I would consider it a code smell, because the odds are high that it does not do what the programmer who wrote it assumed that it would do.
 And if a templated function fails to compile if it's given a particular
 argument, and that failure is not due to the template constraint, that
 usually means that the template constraint is incomplete.
I do not accept this argument outright. In fact, our experience at Weka has drove us to the exact opposite direction. We deliberately set up functions to include invalid cases and then either let them fail because of lacking constraints, or even set up a static assert to fail the function generation rather than fail matching. We found that the error messages produced by the alternative are simply too difficult to parse out and figure what went wrong.
For better or for worse, it's generally considered a bug in Phobos if it's possible to write code that gets past the template constraint and then does not compile. There are reasons why that can be a problematic approach, but that is the approach that we've taken thus far. As such, if isInputRange started allowing non-copyable elements, it would be expected that all range-based functions in Phobos would then have to either require copyability in their template constraint or actually work with non-copyable types.
 it would mean that _every_ range-based function then has to worry about
 non-copyable elements
No, just those in libraries. The same way they have to worry about big O complexity, type safety, nogc and a bunch of other things. When you write a library, you have to think about your users. This doesn't change anything about that.
It changes it by adding yet another complication to worry about when implementing ranges and range-based functions that no one implementing range-based code currently has to worry about. And writing bug-free generic, range-based code can already be surprisingly difficult without adding yet another complication into the mix.
 Not all algorithms can call front just once without
 copying it
Huh? Where did that come from? Phobos already assumes that calling front needs to be an O(1) operation. You can call front as many times as you like. Why would you assume otherwise?
Sure, you can call front as many times as you want. But in some cases, it's more expensive to call front multiple times than it is to call it once and store the result. Also, while the result of front may be equivalent every time, it may not be exactly the same object every time. For instance, auto r2 = range.map!(a => to!string())(); is going to to result in a range that allocates a string every time that front is called. This sort of thing is not an uncommon thing to do with map, and while many ranges are implemented to do most of their work in popFront (thereby making calling front cheap), there's no requirement that they do so. So, there are ranges that do their work in front rather than popFront, and in the case of map, it actually has to be done in front, because map can be random-access, and using popFront to calculate the value only works for front, not arbitrary elements. In such cases, calling front multiple times is definitely worse than just calling it once and storing the result. I think that pretty much the only time that it's better to call front multiple times rathre than calling it once and storing the result is when it returns by ref, because if it doesn't return by ref, you're either going to get a copy every time you call front, or the range is going to be creating a new object every time you call front. In those cases, storing the result of front and reusing it would be cheaper. And since most ranges do not return by ref, most ranges are not avoiding copies or any expensive calculations by calling front multiple times. If anything, they're incurring them. Now, arguably, in the cases where a generic function needs to access front multiple times, it would be best if it tested front for refness and then used static ifs so that it stored the result of front when it didn't return by ref and called it multiple times when it did, but that would get ugly fast, and I'm not aware of any Phobos code that currently does that.
 The result of all of this is that in generic code, you have no clue
 whatsoever what the semantics are of copying a range around, because it
 depends entirely on the copying semantics of whatever range that code is
 currently being instantiated with. As such, generic code basically has
 to
 assume that once you copy a range, you can't mutate the original (since
 it may or may not be affected by mutating the copy and may or may not
 be cleanly affected by mutating the copy),
Jonathan, I'm sorry, but what you are describing is called "a bug". Input ranges provide no guarantees about copying the range itself. If you do it and expect *anything* (which it seems you do), you have a bug. If you need to copy the range itself, you absolutely need to require a forward range. A forward range with ref front absolutely promises that both iterations reference the same elements. I find myself sincerely hoping that you are making these examples up in order to win this discussion turned into an argument for some reason, instead of the way phobos is actually written, because if you're right phobos is *broken*. The good news is that if Phobos is broken, this is a great way to flush out the places it's broken. Let's add UTs calling all Phobos functions accepting input ranges (as opposed to forward ranges) with non-copyable ranges (i.e. - the range itself is non-copyable) and fix the compilation errors. This shouldn't take more than a few days to do.
I'm not making anything up. Very little range-based code passes ranges by reference. They're copied by value all the time, even if they're not forward ranges. foreach itself copies a range before iterating over it. Yes, if you want to iterate over two independent copies, you need a forward range, and you need to call save, but ranges are copied constantly. It's just that it's then a bug to do anything with the original after the copy has been made. As it stands, a range that cannot be copied is going to work with almost nothing in Phobos and would be borderline useless. And adding ref everywhere to range-based functions to change that would destroy our ability to chain function calls with ranges. Also, a large percentage of range-based functions return a range that wraps another range, and that means copying the range that's passed in. It simply wouldn't work with many functions to accept a range by ref without then copying the range inside the function anyway. The assumption that input ranges are copyable is built into almost everything range-based in all of Phobos. The rare exceptions are functions like std.conv.parse which are specifically intended to mutate the range that's passed in rather than copying it.
 And allowing ranges with uncopyable members would turn this potential
 deadly run-time bugs into easy to find compile time errors. Sound like
 a
 great reason to allow it, to me.
Ranges have to be copyable.
And why would a range iterating over uncopyable objects not be copyable? How are the two related?
Because many ranges do stuff like store the value of front after calculating it with popFront or store some set of values to calculate new values. No such range would work with non-copyable element types even if isInputRange allowed them. Now, that doesn't mean that _some_ ranges couldn't be made to work with non-copyable element types, but the fact that ranges must be copyable means that any range that worked with non-copyable element types could not be a struct that stored any of its elements as a direct member. - Jonathan M Davis
Apr 16 2018
prev sibling parent Uranuz <neuranuz gmail.com> writes:
 foreach(ref e; range)
 {
 }
On idea is to have `ref` foreach to say that you would like to iterate your range without copying. The syntax could be: foreach(e; ref range) { } or: ref foreach(e; range) { } At least it will not break existing code. But this means that in each case you should make a decision about `ref`/non ref.
Apr 20 2018