digitalmars.D - isInputRange is false for non-copyable types

Shachar Shemesh (17/17) Apr 14 2018 import std.range;

Jonathan M Davis (129/145) Apr 14 2018 It's extremely common for range-based functions to copy front. Even fore...

Dmitry Olshansky (8/15) Apr 15 2018 Arguably it should “move” them.

Dmitry Olshansky (5/24) Apr 15 2018 On a related note this also destroys idea of “lock-free” or
Jonathan M Davis (26/43) Apr 15 2018 If that were the route that we were going to go, then front and popFront

Dmitry Olshansky (10/47) Apr 15 2018 There it would copy. You get rvalue either way. Another option is

Jonathan M Davis (44/55) Apr 15 2018 Replacing postblit is trivial in comparison to redesigning ranges. The w...

Steven Schveighoffer (13/32) Apr 17 2018 Ugh, then you get things like fungetc.

H. S. Teoh (23/31) Apr 17 2018 [...]
Dmitry Olshansky (14/48) Apr 17 2018 For many algorithms indeed you can’t stop w/o peeking ahead. It’s

Shachar Shemesh (51/100) Apr 15 2018 Not necessarily:

Jonathan M Davis (166/225) Apr 15 2018 The fact that foreach copies the range that it's given makes using ref w...

Shachar Shemesh (75/123) Apr 15 2018 You keep referring to ranges that contain the data, but I am racking my

ag0aep6g (10/28) Apr 15 2018 I'm currently attempting something similar: fix Phobos to work with
Jonathan M Davis (115/183) Apr 16 2018 Pretty much any generative range does. e.g. iota contains data, and each

Shachar Shemesh (4/226) Apr 16 2018 This is going nowhere.
Uranuz (11/14) Apr 20 2018 On idea is to have `ref` foreach to say that you would like to

Shachar Shemesh <shachar weka.io> writes:

import std.range;
import std.stdio;

struct S {
     int a;

      disable this(this);
}

void main() {
     writeln(isInputRange!(S[])); // Prints false
}

The reason, as far as I can tell, is that an input range requires that 
we can do:

auto a = ir.front; // Assuming ir isn't empty

That doesn't work for non-copyable types, for obvious reasons.

With that said, that seems more like a deficiency in the input range 
definition than anything. There is no reason we shouldn't be able to 
iterator type operations over non-copyable types.

Shachar

Apr 14 2018

Jonathan M Davis <newsgroup.d jmdavisprog.com> writes:

On Sunday, April 15, 2018 07:26:54 Shachar Shemesh via Digitalmars-d wrote:
 import std.range;
 import std.stdio;

 struct S {
      int a;

       disable this(this);
 }

 void main() {
      writeln(isInputRange!(S[])); // Prints false
 }

 The reason, as far as I can tell, is that an input range requires that
 we can do:

 auto a = ir.front; // Assuming ir isn't empty

 That doesn't work for non-copyable types, for obvious reasons.

 With that said, that seems more like a deficiency in the input range
 definition than anything. There is no reason we shouldn't be able to
 iterator type operations over non-copyable types.

It's extremely common for range-based functions to copy front. Even foreach
does it. e.g.

import std.stdio;

struct S
{
    this(int i)
    {
        _i = i;
    }

    this(this)
    {
        writefln("copied: %s", _i);
    }

    int _i;
}

void main()
{
    foreach(e; [S(0), S(1), S(2)])
    {
    }
}

prints

copied: 0
copied: 1
copied: 2

If non-copyable types were allowed, then no function which accepted a range
would be allowed to copy front without first checking that front was
copyable (something that most code simply isn't going to do), and some
functions outright require being able to copy front. They could test for
copyability, but that would complicate a lot of range-based functions for a
relatively rare corner case (much as it obviously matters to any code that
has that corner case).

Remember also that a large percentage of ranges that don't wrap dynamic
arrays put their elements on the stack, which means that whenever the range
is copied, the elements are copied. e.g. if the above example uses
std.range.only instead of an array literal, it actually prints

copied: 0
copied: 1
copied: 2
copied: 0
copied: 1
copied: 2
copied: 0
copied: 1
copied: 2

I'm not sure why it ends up printing each of them 3 times rather than 2, but
the fact that foreach copies any range that it's given means that in order
to have ranges allow non-copyable types, we'd have to change how foreach
worked, which would break a lot of code. Right now,

foreach(e; range)
{
}

is lowered to something like

for(auto __range = range; !__range.empty; __range.popFront())
{
    auto e = __range.front;
}

That requires both copying the range and copying front. We could
theoretically change it so that everywhere that e was used, it would be
replaced with __range.front so that front wasn't copied and probably
wouldn't break code in the process (though it could affect performance), but
if we made it so that the range itself wasn't copied, then instead of
iterating over a copy, foreach would consume the original range, which would
definitely break a lot of code.

Now, generic range-based code really shouldn't ever use a range after it's
been copied, since whether mutating the copy affects the original depends on
the implementation of the range (and thus generic code should assume that
foreach may have consumed the range and call save if it doesn't want that to
happen), but lots of code does it anyway, and if the code isn't generic, it
can work just fine, since then the code can depend on the semantics of that
specific range type instead of having to work with the semantics of
arbitrary ranges. e.g. if code is written specifically for dynamic arrays
rather than ranges in general, then calling save to iterate over a separate
slice would be silly, but if the code doesn't currently call save, and
foreach were changed to not copy the range, then all of a sudden code like

string str = "hello world";
foreach(c; str)
{
}

would consume the dynamic array instead of iterating over a separate slice.

Also, in order for a range to work with a non-copyable type, either front
would have to return by ref or it would have to construct a new object to
return so that it could be moved rather than copied. I expect that quite a
few ranges would have problems with such a restriction, and certainly, even
if a range could have its front changed to return by ref, the range would no
longer be able to guarantee that the caller couldn't mutate the value of
front, which would cause problems in at least some cases.

In addition, it's quite possible that forcing functions to not copy front
would hurt performance. In some cases, by using front directly, you can
avoid a copy (e.g. it returns by ref), but in others, the value of front is
calculated every time it's called. So, depending on the implementation of
the range, calling front repeatedly can be more expensive than simply
copying the value and reusing it. As such, it's not necessarily a good idea
to force range-based code to not copy front. Also, it's not uncommon for
ranges to store the value of front (e.g. it's calculated every time that
popFront is called), and that wouldn't work with non-copyable types. Also,
if such ranges were changed to calculate the value in front to then work
with non-copyable types, whatever calculation they were doing would then
have to be done every time front was called instead of being guaranteed to
only happen once, which would harm performance in many cases.

So, while aside from the issues with foreach, some specific cases might work
with non-copyable types, it causes a lot of complications for ranges in
general, and best case, we'd end up with most existing range-based code
failing to compile with non-copyable types, and we'd have to go to a bunch
of extra work in Phobos to either rework stuff so that it would work with
non-copyable types or make it test for copyability. Either way, most
range-based code outside of Phobos would almost certainly continue to fail
to compile with non-copyable types. We have enough problems already because
of save frequently not being used correctly and how badly reference type
ranges interact with everything because of that. Having to support
non-copyable types would then add yet one more thing that many range
implementations would fail miserably at and that solid range implementations
would have to test for.

And of course, I don't see how we could support non-copyable types even if
we wanted to because of the issues with foreach. Just like with save, we
could take a different approach if we didn't care about breaking code, but
there's no way that we could make a change like that unless we could find a
clean deprecation path to change the behavior, and I have no clue how we
could do that with foreach. Either way, a really compelling case would have
to be made as to why all of the changes required to support non-copyable
types would be worth it, especially considering that this would affect
foreach rather than just a stray function or two in Phobos and as such would
potentially affect almost every D program in existence.

IIRC, when this has come up previously, Andrei ruled that supporting
non-copyable types simply wasn't worth the extra complication, though a
quick search doesn't turn up anything where he talked about it. So, I can't
verify that at the moment.

- Jonathan M Davis

Apr 14 2018

Dmitry Olshansky <dmitry.olsh gmail.com> writes:

On Sunday, 15 April 2018 at 06:39:43 UTC, Jonathan M Davis wrote:
 On Sunday, April 15, 2018 07:26:54 Shachar Shemesh via 
 Digitalmars-d wrote:
 [...]

 It's extremely common for range-based functions to copy front. 
 Even foreach does it. e.g.

 [...]


Arguably it should “move” them.

This would have worked if input ranges didn’t have to cache front 
to support multiple ‘front’ calls before popFront. Which is a 
painful misfeature really as all algorithms would optimize for 
touching front once anyway.

Input range should “produce” elements not keep cache of them, 
forward ranges may cache current item alright.

 [...]

Apr 15 2018

Dmitry Olshansky <dmitry.olsh gmail.com> writes:

On Sunday, 15 April 2018 at 10:14:20 UTC, Dmitry Olshansky wrote:
 On Sunday, 15 April 2018 at 06:39:43 UTC, Jonathan M Davis 
 wrote:
 On Sunday, April 15, 2018 07:26:54 Shachar Shemesh via 
 Digitalmars-d wrote:
 [...]

 It's extremely common for range-based functions to copy front. 
 Even foreach does it. e.g.

 [...]


 Arguably it should “move” them.

 This would have worked if input ranges didn’t have to cache 
 front to support multiple ‘front’ calls before popFront. Which 
 is a painful misfeature really as all algorithms would optimize 
 for touching front once anyway.

 Input range should “produce” elements not keep cache of them, 
 forward ranges may cache current item alright.

On a related note this also destroys idea of “lock-free” or 
“concurrent” input ranges since front/empty/popFront sequence of 
calls can’t be cheaply synchronized in as a whole in presence of 
multiple consumers.


 [...]

Apr 15 2018

Jonathan M Davis <newsgroup.d jmdavisprog.com> writes:

On Sunday, April 15, 2018 10:14:20 Dmitry Olshansky via Digitalmars-d wrote:
 On Sunday, 15 April 2018 at 06:39:43 UTC, Jonathan M Davis wrote:
 On Sunday, April 15, 2018 07:26:54 Shachar Shemesh via

 Digitalmars-d wrote:
 [...]

 It's extremely common for range-based functions to copy front.
 Even foreach does it. e.g.

 [...]

 Arguably it should “move” them.

 This would have worked if input ranges didn’t have to cache front
 to support multiple ‘front’ calls before popFront. Which is a
 painful misfeature really as all algorithms would optimize for
 touching front once anyway.

 Input range should “produce” elements not keep cache of them,
 forward ranges may cache current item alright.

If that were the route that we were going to go, then front and popFront
wouldn't make any sense at all. Rather, you'd treat it more like in iterator
in Java and have next() return the next element and pop it off at the same
time. But even if we did that, it wouldn't have been appropriate in the
general case to move the element unless we put serious restrictions on what
could be an input range (e.g. that would make no sense whatsoever with
dynamic arrays or any kind of container).

In some respects it would be nice if input ranges were completely separate
beasts from forward ranges, since ideally, forward ranges would all be value
types, and input ranges would all be reference types (in which case, save
wouldn't be a thing), but that would cause it's own set of problems, because
then it wouldn't really work to have input ranges and forward ranges use the
same code, which could get really annoying. Input ranges are kind of an
abomination IMHO , because they're so painfully limited, but unfortunately,
it simply doesn't work to have all ranges be forward ranges if you don't
want to have to do stuff like buffer data. So, figuring out what we would
ideally do with input ranges and forward ranges gets to be a bit of an
interesting question.

Regardless, even if we could all agree on how ranges would ideally be
redesigned if we were doing this stuff from scratch, we're not doing it from
scratch, and for better or worse, a lot of existing code depends on the
current design. So, while I'd love to have the opportunity to sit down and
redesign the range API, I don't see how we really can - not and actually
have it implemented as part of the language or Phobos anyway.

- Jonathan M Davis

Apr 15 2018

Dmitry Olshansky <dmitry.olsh gmail.com> writes:

On Sunday, 15 April 2018 at 10:39:32 UTC, Jonathan M Davis wrote:
 On Sunday, April 15, 2018 10:14:20 Dmitry Olshansky via 
 Digitalmars-d wrote:
 On Sunday, 15 April 2018 at 06:39:43 UTC, Jonathan M Davis 
 wrote:
 On Sunday, April 15, 2018 07:26:54 Shachar Shemesh via

 Digitalmars-d wrote:
 [...]

 It's extremely common for range-based functions to copy 
 front. Even foreach does it. e.g.

 [...]

 Arguably it should “move” them.

 This would have worked if input ranges didn’t have to cache 
 front to support multiple ‘front’ calls before popFront. Which 
 is a painful misfeature really as all algorithms would 
 optimize for touching front once anyway.

 Input range should “produce” elements not keep cache of them, 
 forward ranges may cache current item alright.

 If that were the route that we were going to go, then front and 
 popFront wouldn't make any sense at all. Rather, you'd treat it 
 more like in iterator in Java and have next() return the next 
 element and pop it off at the same time.

Surprisingly, they might have got that right.

 But even if we did that, it wouldn't have been appropriate in 
 the general case to move the element unless we put serious 
 restrictions on what could be an input range (e.g. that would 
 make no sense whatsoever with dynamic arrays or any kind of 
 container).

There it would copy. You get rvalue either way. Another option is 
ref scope T. I do not see incompatibility you seem to imply.


 So, while I'd love to have the opportunity to sit down and 
 redesign the range API, I don't see how we really can - not and 
 actually have it implemented as part of the language or Phobos 
 anyway.

Yeah, I think as surely as we can redesign postblit. Let’s not 
focus on learned helplessnes and “it’s just the way things are”. 
Recent opMove shows there is great flexibility in what can be 
done.

Seriously, it’s just fixing a library design issue even if its 
use proved viral.

 - Jonathan M Davis

Apr 15 2018

Jonathan M Davis <newsgroup.d jmdavisprog.com> writes:

On Sunday, April 15, 2018 12:29:17 Dmitry Olshansky via Digitalmars-d wrote:
 On Sunday, 15 April 2018 at 10:39:32 UTC, Jonathan M Davis wrote:
 So, while I'd love to have the opportunity to sit down and
 redesign the range API, I don't see how we really can - not and
 actually have it implemented as part of the language or Phobos
 anyway.

 Yeah, I think as surely as we can redesign postblit. Let’s not
 focus on learned helplessnes and “it’s just the way things are”.
 Recent opMove shows there is great flexibility in what can be
 done.

 Seriously, it’s just fixing a library design issue even if its
 use proved viral.

Replacing postblit is trivial in comparison to redesigning ranges. The worst
that's going to happen with postblit is that all postblit constructors are
going to be deprecated in favor of some other kind of constructor, which
would result in a deprecation warning for types with postblit constructors
until they've replaced one constructor with the other. As far as breaking
changes go, it shouldn't be much more complicated than renaming a function
and requiring everyone to update their code. It shouldn't even affect anyone
using types with postblit constructors except maybe if they're using
__traits(getMembers, ...) to get at __postblit or __xpostblit. It's just the
types themselves that need to be updated. In that sense, it's actually less
disruptive to replace the postblit constructor than it would be to rename a
function, because all that has to be changed is the function itself, not
every place that it gets called.

And if we get some kind of opMove solution, that wouldn't break any existing
code. It would be invisible to anything but new code that was written to use
it. That's the easiest kind of improvement.

Ranges, on the other hand, are integral to how a large percentage of Phobos
works, and a lot of existing code depends on them heavily. And unless you're
talking about simply renaming popFront to something else, changing them gets
pretty hairy. Changing ranges in any serious way risks having to essentially
fork Phobos (or at least large portions of it). Any major redesign would
almost certainly require renaming most or all of the range traits. All
functions accepting ranges would have to be rewritten. Any function
returning a range would have to either be renamed or somehow work with both
the old and new range API (and since containers use an operator rather than
a function name to provide ranges, that could make updating containers
rather interesting unless we're willing to give up on slicing being how we
get a range from a container).

We may be able to make small changes here and there, but I don't see how
anything major is possible unless we're willing to break a ton of existing
code. Now, maybe someone can come up with something clever to make some
decent improvements to what we have without doing a major redesign, but I
think that a major redesign requires effectively forking the range API,
which means either ending up with two different range APIs that everyone has
to deal with, or requiring all range-based code to eventually be updated
from one range API to another, which is an enormous change and far more than
simply swapping one function out for another.

So, if we decide that we're willing to make changes that effectively require
that almost every D program make changes that are potentially quite
invasive, then sure, we can redesign the range API. And it would be great to
be able to do that. But I have a hard time buying that such a large change
would be acceptable at this point.

- Jonathan M Davis

Apr 15 2018

Steven Schveighoffer <schveiguy yahoo.com> writes:

On 4/15/18 6:14 AM, Dmitry Olshansky wrote:
 On Sunday, 15 April 2018 at 06:39:43 UTC, Jonathan M Davis wrote:
 On Sunday, April 15, 2018 07:26:54 Shachar Shemesh via Digitalmars-d 
 wrote:
 [...]

 It's extremely common for range-based functions to copy front. Even 
 foreach does it. e.g.

 [...]

 
 
 Arguably it should “move” them.
 
 This would have worked if input ranges didn’t have to cache front to 
 support multiple ‘front’ calls before popFront. Which is a painful 
 misfeature really as all algorithms would optimize for touching front 
 once anyway.

Ugh, then you get things like fungetc.

However, there are certainly cases where it's expensive to compute 
front, and therefore, requires a cache. A new primitive that accesses 
front and pops it at the same time would be useful. We can have selected 
algorithms that can deal with it use that primitive instead (and 
instrument foreach to do it). Potentially, we can create a wrapper for 
all input ranges such that they support it.

 Input range should “produce” elements not keep cache of them, forward 
 ranges may cache current item alright.

Well, sometimes you HAVE to cache it. For instance File.byLine is 
clearly an input range, and not a forward range. But it MUST cache 
because it's i/o that has to be buffered to be looked at! No reason to 
force caching on the user at that point.

-Steve

Apr 17 2018

"H. S. Teoh" <hsteoh quickfur.ath.cx> writes:

On Tue, Apr 17, 2018 at 11:14:02AM -0400, Steven Schveighoffer via
Digitalmars-d wrote:
 On 4/15/18 6:14 AM, Dmitry Olshansky wrote:

[...]
 Input range should “produce” elements not keep cache of them,
 forward ranges may cache current item alright.

 
 Well, sometimes you HAVE to cache it. For instance File.byLine is
 clearly an input range, and not a forward range. But it MUST cache
 because it's i/o that has to be buffered to be looked at! No reason to
 force caching on the user at that point.

[...]

In the past I've used range-like (i.e., iteration) APIs that do not
cache the current element. To be quite frank, they were a royal pain to
work with, because they force user code to be needlessly cluttered with
ad hoc caching schemes that pollute the code all over the place.
Instead of having the range handle the caching, all downstream code that
uses the range has to implement their own caching and/or otherwise
implement ugly ad hoc workarounds for the lack thereof.  Andrei got it
right when he designed the range API with a cached .front, and I would
be opposed to any range API design that involves a non-caching .front.

For generative ranges like std.range.generate, well, .generate exists
precisely for that reason: to be the single implementation of a caching
.front that wraps around a non-caching generating function, so that
range implementers don't have to write their own caching code, but only
need to care about writing the generating function and let .generate
handle the caching for them.  The downstream user also doesn't need to
worry about the odd semantics of a non-caching .front, but can use the
range API as usual.  Win-win.


T

-- 
Recently, our IT department hired a bug-fix engineer. He used to work for
Volkswagen.

Apr 17 2018

Dmitry Olshansky <dmitry.olsh gmail.com> writes:

On Tuesday, 17 April 2018 at 15:14:02 UTC, Steven Schveighoffer 
wrote:
 On 4/15/18 6:14 AM, Dmitry Olshansky wrote:
 On Sunday, 15 April 2018 at 06:39:43 UTC, Jonathan M Davis 
 wrote:
 On Sunday, April 15, 2018 07:26:54 Shachar Shemesh via 
 Digitalmars-d wrote:
 [...]

 It's extremely common for range-based functions to copy 
 front. Even foreach does it. e.g.

 [...]

 
 
 Arguably it should “move” them.
 
 This would have worked if input ranges didn’t have to cache 
 front to support multiple ‘front’ calls before popFront. Which 
 is a painful misfeature really as all algorithms would 
 optimize for touching front once anyway.

 Ugh, then you get things like fungetc.

For many algorithms indeed you can’t stop w/o peeking ahead. It’s 
just that lookahead of 1 is hardcoded in the range definition b/c 
it’s enough for a lot of things.

But then once you are past lookahead of 1, it has to do .save + 
restore if overshot. ungetc of 2 if you’d like.

 However, there are certainly cases where it's expensive to 
 compute front, and therefore, requires a cache. A new primitive 
 that accesses front and pops it at the same time would be 
 useful. We can have selected algorithms that can deal with it 
 use that primitive instead (and instrument foreach to do it).

Might be an option. But as long as you have front/popFront/empty 
as well, no concurrent stuff.

 Potentially, we can create a wrapper for all input ranges such 
 that they support it.

 Input range should “produce” elements not keep cache of them, 
 forward ranges may cache current item alright.

 Well, sometimes you HAVE to cache it. For instance File.byLine

File caches, byLine actually does
“read up until EOL and give me slice of that”

I do not see where caching of char[] plays any significant role. 
Except that due to how ranges work it likely has to do the 
readline at _empty_ or on construction + popFront.

 is clearly an input range, and not a forward range. But it MUST 
 cache because it's i/o that has to be buffered to be looked at! 
 No reason to force caching on the user at that point.
 -Steve

Apr 17 2018

Shachar Shemesh <shachar weka.io> writes:

On 15/04/18 09:39, Jonathan M Davis wrote:
 
 It's extremely common for range-based functions to copy front. Even foreach
 does it. e.g.
 

Not necessarily:

foreach(ref e; [S(0), S(1), S(2)]){}

While that would work with foreach, it will not work with anything that 
expects an inputRange (and since D defines a forward range as an 
extension, this is also not a forward range).

 If non-copyable types were allowed, then no function which accepted a range
 would be allowed to copy front without first checking that front was
 copyable (something that most code simply isn't going to do)

No, that's not correct.

If non-copyable types were allowed, then functions that accept a range 
would fail to compile on non-copyable types if they do a copy.

But the flip side is that if non-copyable would be allowed, a lot of 
functions that current copy would not. There are far too many 
unnecessary copies in the code that serve no purpose, but they are 
invisible, so they are not fixed.

 
 Remember also that a large percentage of ranges that don't wrap dynamic
 arrays put their elements on the stack, which means that whenever the range
 is copied, the elements are copied.

If I am guaranteed to be able to copy an input range, what's the 
difference between it and a forward range?

Strike that, I have a better one: If I am allowed to copy an input 
range, what does the "save" function in forward range even mean?


 I'm not sure why it ends up printing each of them 3 times rather than 2,

Because some code somewhere does an unnecessary copy?

 the fact that foreach copies any range that it's given means that in order
 to have ranges allow non-copyable types, we'd have to change how foreach
 worked, which would break a lot of code. Right now,

Like I said above, foreach(ref) works with "input ranges" that define 
ref return from front to uncopyable objects. Foreach without ref 
doesn't, but that's mandatory from the interface: You cannot iterate 
copies of a range returning uncopyable objects.

As for ranges that store the values inside them: you REALLY don't want 
to copy those around. They may get quite big (not to mention that I 
don't see how you can define one over a non-copyable type).

 
 foreach(e; range)
 {
 }
 
 is lowered to something like
 
 for(auto __range = range; !__range.empty; __range.popFront())
 {
      auto e = __range.front;
 }

And that indeed generates a lot of confusion regarding what running two 
foreaches in a row does. This is a bug that already affects more than 
just uncopyable objects.

 
 That requires both copying the range and copying front. We could
 theoretically change it so that everywhere that e was used, it would be
 replaced with __range.front

Like I said, already solved for foreach(ref)

 Now, generic range-based code really shouldn't ever use a range after it's
 been copied, since whether mutating the copy affects the original depends on
 the implementation of the range (and thus generic code should assume that
 foreach may have consumed the range and call save if it doesn't want that to
 happen), but lots of code does it anyway,

And allowing ranges with uncopyable members would turn this potential 
deadly run-time bugs into easy to find compile time errors. Sound like a 
great reason to allow it, to me.

 and if the code isn't generic, it
 can work just fine, since then the code can depend on the semantics of that
 specific range type

Such as it being copyable? Non-generic code is of no relevance to this 
discussion, as far as I can tell.

 Also, in order for a range to work with a non-copyable type, either front
 would have to return by ref or it would have to construct a new object to
 return so that it could be moved rather than copied.

Correct

 I expect that quite a
 few ranges would have problems with such a restriction,

Then those ranges will not work with non-copyable objects. That's no 
reason to make it impossible for *any* range to work with non-copyable.

 In addition, it's quite possible that forcing functions to not copy front
 would hurt performance.

I'm going to stop responding now, because, frankly, I think you and I 
are having very different discussions.

You seem to be advocating against making *all* ranges support 
non-copyable types, while I'm merely asking that *any* range be allowed 
to support such types.

Current situation is that a range with uncopyable types is not 
considered an input range, and cannot use any phobos range functions, 
including some that it should be able to use without a problem.

There is no reason to block such ranges from using "map" or "any", 
except the fact they require that the type answer "isInputRange" with true.

 IIRC, when this has come up previously, Andrei ruled that supporting
 non-copyable types simply wasn't worth the extra complication, though a
 quick search doesn't turn up anything where he talked about it. So, I can't
 verify that at the moment.

I will trust Andrei to weigh in on this point if he thinks he should. 
Until he does, please feel free to only bring up points you are willing 
to argue yourself.

Shachar

Apr 15 2018

Jonathan M Davis <newsgroup.d jmdavisprog.com> writes:

On Sunday, April 15, 2018 15:32:37 Shachar Shemesh via Digitalmars-d wrote:
 On 15/04/18 09:39, Jonathan M Davis wrote:
 It's extremely common for range-based functions to copy front. Even
 foreach does it. e.g.

 Not necessarily:

 foreach(ref e; [S(0), S(1), S(2)]){}

 While that would work with foreach, it will not work with anything that
 expects an inputRange (and since D defines a forward range as an
 extension, this is also not a forward range).

The fact that foreach copies the range that it's given makes using ref with
anything other than arrays a bit of an iffy proposition. It will compile
regardless of whether front returns by ref or not (which arguably is a bug),
but it will only affect the original elements if copying the range doesn't
copy the elements - which in the case of a basic input range rather than a
forward range, is generally true, otherwise, it would be a forward range.
But as it stands, using ref with foreach in generic code is not going to go
well. If anything, it's actually a code smell.

For it to work in general, we'd either have to have foreach stop copying
ranges (which would break a _lot_ of code, including code that just uses
dynamic arrays), or we'd have to have some way to mark or detect a range as
working correctly with ref when it's copied - which would mean a new range
trait specifically for that, be that one that tests for a feature (such as
hasLength) or one that tests for a whole new range type. And I suspect that
we would have to use an enum or somesuch to tag the range as having the
right semantics, since I don't think that that can be determined just by
looking at the types of its member variables.

 If non-copyable types were allowed, then no function which accepted a
 range would be allowed to copy front without first checking that front
 was copyable (something that most code simply isn't going to do)

 No, that's not correct.

 If non-copyable types were allowed, then functions that accept a range
 would fail to compile on non-copyable types if they do a copy.

 But the flip side is that if non-copyable would be allowed, a lot of
 functions that current copy would not. There are far too many
 unnecessary copies in the code that serve no purpose, but they are
 invisible, so they are not fixed.

And if a templated function fails to compile if it's given a particular
argument, and that failure is not due to the template constraint, that
usually means that the template constraint is incomplete. And that means
that any and all templated functions which copy front would then have to
test whether front was copyable. Yes, in some cases, that would mean that a
range-based function would be rewritten to not copy front, but either way,
it would mean that _every_ range-based function then has to worry about
non-copyable elements - either by testing for them in its template
constraint or by being written specifically in a way that works with them
and then adding that particular use case to their unit tests. So, allowing
non-copyable types complicates range-based code across the board.

And the reality of the matter is that in many cases, copying front is
actually more efficient. Not all algorithms can call front just once without
copying it, and many ranges (e.g. map) calculate front on each call, meaning
that repeated calls to front can be more expensive than just calling it once
and copying the result. Dealing with that in the most efficient way possible
would probably mean having generic code test whether front returns by ref so
that it can copy it if it doesn't and just keep referring to front directly
if it does, but that would tend to make for some really messy code in any
generic function where front needs to be accessed multiple times. As it
stands, because of functions like map, it's almost certainly better for any
function which accesses front multiple times to copy it.

 Remember also that a large percentage of ranges that don't wrap dynamic
 arrays put their elements on the stack, which means that whenever the
 range is copied, the elements are copied.

 If I am guaranteed to be able to copy an input range, what's the
 difference between it and a forward range?

 Strike that, I have a better one: If I am allowed to copy an input
 range, what does the "save" function in forward range even mean?

You can copy input ranges as much as you like. You just don't get a
guaranteed independent copy that can be iterated separately. The classic
example would be that a struct copies its members, whereas a class is just a
reference. So with structs, you generally (but not always) have a value
type, whereas with classes, you have a reference type, and copies of each
act completely differently. One is independent, whereas the other is the
same as the original aside from assignment. To make matters worse, a range
could be a pseudo-reference type in that copying it doesn't provide a
completely independent copy, and it doesn't act like a reference type
either.

Dynamic arrays are one example of that. In their case, the result is a range
that's independent for iteration purposes but where mutating any elements
affects the original. Similarly, any time that a range is defined with
multiple members and they're not all value types, you get a pseudo-reference
type. At that point, how mutating the copy affects the original depends on
what each of the members does. It could act like a dynamic array, or you
could end up in a situation where mutating the copy only partially affects
the state of the iteration in the original, in which case, trying to use the
original after the copy has been made would be buggy regardless of whether
the code is generic or not. e.g. there's a real risk of a situation like
that when you do something like have a range wrapping a socket - especially
if it does something like put the bytes it grabs in a buffer in the range
when it reads them. If the state of the iteration is shared between value
types and reference types, you can get all kinds of weird behaviors if you
try to use the original after mutating the copy.

The result of all of this is that in generic code, you have no clue
whatsoever what the semantics are of copying a range around, because it
depends entirely on the copying semantics of whatever range that code is
currently being instantiated with. As such, generic code basically has to
assume that once you copy a range, you can't mutate the original (since it
may or may not be affected by mutating the copy and may or may not be
cleanly affected by mutating the copy), and as such, if you want an
independent copy to iterate over, you need a specific way to get one other
than just copying. That's what save does.

save was introduced to deal with ranges that were classes (though since it
generally requires allocating a new object with every call to save, it's not
necessrily a great idea to implement such a range). However, the issues
related to copying ranges in general weren't as clear at that point in time.
It was just clear that if you wanted to be able to generically get an
independent copy, a solution was needed.

But of course, since _most_ ranges act just like save when they're copied,
save gets frequently forgotten, making it a bit of a thorn in our side. And
the fact that generic code can't ever use a range after copying it (even
after passing it to foreach, since that does a copy) is something that's
frequently missed, causing subtle bugs. So, the fact that ranges weren't
better thought through in the beginning with regards to copying means that
we have something that works quite well in typical use cases but gets fairly
thorny when you're writing generic code (especially if you're then
distributing it to other folks to use with ranges that you've never seen or
heard of).

A lot of the problems that we have with ranges stem from the fact that
they're basically designed around being an abstraction of dynamic arrays,
and any time that a range doesn't act like a dynamic array does, we start
ending up with cracks in how things work. In general, we've gotten it to
work fairly well with just front, popFront, and empty if you can assume that
ranges copy like dynamic arrays do, but of course, many don't, and there,
things start to fall apart. Adding save helped, but it didn't really fix the
problem, since it can so often be ignored and still have code work, and it
leaves the problem of basic input ranges, which by definition, can't be
copied like dynamic arrays are. If a range can't have save, then it's
basically because there's something about its iteration state that either
can't be saved or can't be saved efficiently.

Ideally, we'd be able to do something like require that basic input ranges
be reference types (since they obviously can't be value types, or they'd be
forward ranges) and require that forward ranges act like value types (at
least insofar as copying them copies the iteration state and thus acts like
save). And in that case, if we could have a clean way to differentiate
between input ranges and forward ranges, we could drop the whole save thing
and even rely on how ranges behave when they're copied, but it would make
certain types of ranges that work right now not work (e.g. classes couldn't
forward be ranges without being wrapped in a struct with a postblit
constructor, and basic input ranges would either have to be on the heap or
involve a pointer to the stack). It would probably also require that we
treat input ranges and forward ranges as incompatible, which would be great
in the case where we want to deal with a forward range but not so great in
the case where we want to be able to just iterate without ever having to
look at any part of the range multiple times, since then we'd basically have
to have separate code for input ranges and forward ranges.

So, some redesign of ranges would be great. However, they're highly
integrated into Phobos and heavily used in existing code, making it very
difficult to have any kind of clean deprecation path so that we could
transition from the current state of things to wherever we want to be. We
could do it if we were willing to pretty much outright fork Phobos, but I'm
not sure that it can be done otherwise. And there's still the issue of
foreach in that if we want to change its semantics, that would be a breaking
change in the language itself, meaning that forking the standard library
wouldn't fix that one. If we want to change foreach at all, we'd have to
figure out how to do so in place with a clean deprecation path.

 foreach(e; range)
 {
 }

 is lowered to something like

 for(auto __range = range; !__range.empty; __range.popFront())
 {

      auto e = __range.front;

 }

 And that indeed generates a lot of confusion regarding what running two
 foreaches in a row does. This is a bug that already affects more than
 just uncopyable objects.

It was the behavior that that was required in order to have ranges behave
the same as dynamic arrays with foreach without changing how dynamic arrays
work with foreach. Yes, that becomes a problem once you start having ranges
that are reference types or have any range type where copying it is not
equivalent to calling save, but given that ranges were added on top of the
existing behavior, nothing else would have made sense.

Maybe foreach should have been changed for dynamic arrays, but I don't think
that the implications of copying ranges in generic code had been
particularly thought through at the point in time that ranges were added to
foreach, and I don't know how easy it would have been to talk Walter into
changing the behavior of foreach for dynamic arrays at the time. At this
point in time, it would require a way to cleanly transition from the old
behavior to the new without immediately breaking existing code. If someone
can come up with that, and the arguments for changing foreach are compelling
enough, then maybe it can be changed, but as it stands, it's just one of
those annoyances that stems from the fact that ranges tried to emulate
dynamic arrays without requiring that they act like dynamic arrays.

 Now, generic range-based code really shouldn't ever use a range after
 it's been copied, since whether mutating the copy affects the original
 depends on the implementation of the range (and thus generic code
 should assume that foreach may have consumed the range and call save if
 it doesn't want that to happen), but lots of code does it anyway,

 And allowing ranges with uncopyable members would turn this potential
 deadly run-time bugs into easy to find compile time errors. Sound like a
 great reason to allow it, to me.

Ranges have to be copyable. If they're not, you have to use ref all over the
place, and it becomes impossible to create a function that returns a new
range without putting it on the heap, completely destroying function
chaining unless we want to start allocating a whole bunch of ranges on the
heap, which would destroy performance. It would be great if we could clean
up the semantics of copying ranges, but making it so that you can't rely on
ranges being copied would destroy our ability to use ranges as we do now.

 I'm going to stop responding now, because, frankly, I think you and I
 are having very different discussions.

 You seem to be advocating against making *all* ranges support
 non-copyable types, while I'm merely asking that *any* range be allowed
 to support such types.

 Current situation is that a range with uncopyable types is not
 considered an input range, and cannot use any phobos range functions,
 including some that it should be able to use without a problem.

 There is no reason to block such ranges from using "map" or "any",
 except the fact they require that the type answer "isInputRange" with
 true.

If isInputRange accepted non-copyable element types, then all range-based
code would have to deal with non-copyable element types. Yes, in many cases,
that would mean adding a check to their template constraints which made them
require that the element type be copyable rather than refactoring them to
actually work with non-copyable element types, but all well-written,
range-based code would then have to take non-copyable types into
consideration. And any range-based code that was going to have to work with
non-copyable types would then have to be very particular about what it did.
We'd be complicating all range-based code to handle yet another corner case.
We have too many problems already because of ranges trying to be too many
things at once and not locking down their semantics in ways that either
require things that are not currently required or disallow things that are
currently allowed.

- Jonathan M Davis

Apr 15 2018

Shachar Shemesh <shachar weka.io> writes:

On 16/04/18 01:28, Jonathan M Davis wrote:
 The fact that foreach copies the range that it's given makes using ref with
 anything other than arrays a bit of an iffy proposition. It will compile
 regardless of whether front returns by ref or not (which arguably is a bug),
 but it will only affect the original elements if copying the range doesn't
 copy the elements

You keep referring to ranges that contain the data, but I am racking my 
mind and failing to find examples.

You can have ranges over arrays,linked lists, trees (though you'd 
probably use opApply there), and any other container, as well as 
external data sources (files, sockets, pipes). In none of those cases 
would your range contain actual data. I need to see the use case for 
ranges *with* embedded data.

Plus, as I've said before, if ranges are expected to contain data, 
copying them seems like an incredibly bad idea. The whole idea behind 
casually copying the range about is that it's cheap to do.

 which in the case of a basic input range rather than a
 forward range

I don't see how that statement is true.

 is generally true, otherwise, it would be a forward range.

A forward range is an input range.

 But as it stands, using ref with foreach in generic code is not going to go
 well. If anything, it's actually a code smell.

You are conflating ref with owning data, which I find unsubstantiated.

 If non-copyable types were allowed, then functions that accept a range
 would fail to compile on non-copyable types if they do a copy.

 But the flip side is that if non-copyable would be allowed, a lot of
 functions that current copy would not. There are far too many
 unnecessary copies in the code that serve no purpose, but they are
 invisible, so they are not fixed.

 
 And if a templated function fails to compile if it's given a particular
 argument, and that failure is not due to the template constraint, that
 usually means that the template constraint is incomplete.

I do not accept this argument outright. In fact, our experience at Weka 
has drove us to the exact opposite direction.

We deliberately set up functions to include invalid cases and then 
either let them fail because of lacking constraints, or even set up a 
static assert to fail the function generation rather than fail matching.

We found that the error messages produced by the alternative are simply 
too difficult to parse out and figure what went wrong.

With that said, there is some merit to setting up the function 
constraints so that it matches what the function actually need. It 
allows overloading the missing combinations by another module. I don't 
think it is a good argument in this particular case, however.

I think we should set up to document which phobos functions require 
copying. It will flesh out bugs and unnecessary copying in phobos, as 
well as force phobos developers to think about this scenario when they 
write code. It shouldn't even take that much time to do this run: just 
go over all UTs and add instantiation over a non-copyable type. The UTs 
that fail need work.

 it would mean that _every_ range-based function then has to worry about
 non-copyable elements

No, just those in libraries. The same way they have to worry about big O 
complexity, type safety,  nogc and a bunch of other things.

When you write a library, you have to think about your users. This 
doesn't change anything about that.

 So, allowing
 non-copyable types complicates range-based code across the board.

Like I said above, it is more likely to find bugs than to introduce 
needless complications. If you code cannot be easily implemented over 
non-copyable types, exclude them. Don't shut me out of the entire game.

 Not all algorithms can call front just once without
 copying it

Huh? Where did that come from?

Phobos already assumes that calling front needs to be an O(1) operation. 
You can call front as many times as you like. Why would you assume 
otherwise?

 and many ranges (e.g. map) calculate front on each call, meaning
 that repeated calls to front can be more expensive than just calling it once
 and copying the result.

Not by a lot, and definitely not in any way the generic code has any 
right to assume. I reject this argument outright.

 The result of all of this is that in generic code, you have no clue
 whatsoever what the semantics are of copying a range around, because it
 depends entirely on the copying semantics of whatever range that code is
 currently being instantiated with. As such, generic code basically has to
 assume that once you copy a range, you can't mutate the original (since it
 may or may not be affected by mutating the copy and may or may not be
 cleanly affected by mutating the copy),

Jonathan, I'm sorry, but what you are describing is called "a bug".

Input ranges provide no guarantees about copying the range itself. If 
you do it and expect *anything* (which it seems you do), you have a bug. 
If you need to copy the range itself, you absolutely need to require a 
forward range.

A forward range with ref front absolutely promises that both iterations 
reference the same elements.

I find myself sincerely hoping that you are making these examples up in 
order to win this discussion turned into an argument for some reason, 
instead of the way phobos is actually written, because if you're right 
phobos is *broken*.

The good news is that if Phobos is broken, this is a great way to flush 
out the places it's broken. Let's add UTs calling all Phobos functions 
accepting input ranges (as opposed to forward ranges) with non-copyable 
ranges (i.e. - the range itself is non-copyable) and fix the compilation 
errors. This shouldn't take more than a few days to do.

Hell, let's write UTs to pass forward ranges that are non-copyable (but 
return a copy via "save") and flush out those bugs as well.

The point of "development" is to make our code better, not defend the 
status-quo. Especially since we've just identified a fairly simple way 
to find the places that need work and easily work on them.

 But of course, since _most_ ranges act just like save when they're copied,
 save gets frequently forgotten, making it a bit of a thorn in our side.

Then rejoice: I just suggested a way to *easily* find and fix those places.

 require that forward ranges act like value types (at
 least insofar as copying them copies the iteration state and thus acts like
 save).

The brackets are important. Just because I'm returning a reference to 
elements does not mean I don't support the range itself restarting a scan.

 So, some redesign of ranges would be great.

Yes, but please note that I don't think it is *necessary*. Yes, save is 
annoying, but it *is* workable as is, and since we can harness the 
compiler to locate where we're deviating from the righteous way, not 
that expensively if we could just be bothered to do it.

 And allowing ranges with uncopyable members would turn this potential
 deadly run-time bugs into easy to find compile time errors. Sound like a
 great reason to allow it, to me.

 
 Ranges have to be copyable.

And why would a range iterating over uncopyable objects not be copyable? 
How are the two related?

Shachar

Apr 15 2018

ag0aep6g <anonymous example.com> writes:

On 04/16/2018 06:15 AM, Shachar Shemesh wrote:
 Input ranges provide no guarantees about copying the range itself. If 
 you do it and expect *anything* (which it seems you do), you have a bug. 
 If you need to copy the range itself, you absolutely need to require a 
 forward range.
 
 A forward range with ref front absolutely promises that both iterations 
 reference the same elements.
 
 I find myself sincerely hoping that you are making these examples up in 
 order to win this discussion turned into an argument for some reason, 
 instead of the way phobos is actually written, because if you're right 
 phobos is *broken*.
 
 The good news is that if Phobos is broken, this is a great way to flush 
 out the places it's broken. Let's add UTs calling all Phobos functions 
 accepting input ranges (as opposed to forward ranges) with non-copyable 
 ranges (i.e. - the range itself is non-copyable) and fix the compilation 
 errors. This shouldn't take more than a few days to do.

I'm currently attempting something similar: fix Phobos to work with 
RefRange. RefRange is copyable, but assignment is weird. Assignment 
isn't part of the range interface, so I'm replacing it with `move`:

https://issues.dlang.org/show_bug.cgi?id=18657
https://github.com/dlang/phobos/pull/6346

That PR doesn't fix all of Phobos, it's just a start. Going over all of 
Phobos (and getting it merged) might take a while.

Thinking about it, those `move`s probably fix the code for non-copyable 
ranges as well. There might be considerable overlap here.

Apr 15 2018

Jonathan M Davis <newsgroup.d jmdavisprog.com> writes:

On Monday, April 16, 2018 07:15:36 Shachar Shemesh via Digitalmars-d wrote:
 On 16/04/18 01:28, Jonathan M Davis wrote:
 The fact that foreach copies the range that it's given makes using ref
 with anything other than arrays a bit of an iffy proposition. It will
 compile regardless of whether front returns by ref or not (which
 arguably is a bug), but it will only affect the original elements if
 copying the range doesn't copy the elements

 You keep referring to ranges that contain the data, but I am racking my
 mind and failing to find examples.

Pretty much any generative range does. e.g. iota contains data, and each
element is then calculated from the current value. The same goes for stuff
like random number generators. It's quite common for ranges to have popFront
calculate the next front and store it as a member rather than have front do
the calculation, since then multiple calls to front don't incur the cost of
whatever has to be done to get the next element, and they can just return
the same value each time. In fact, that's generally considered to be best
practice. It's true that most ranges do not contain every element directly
in a struct, but some do (e.g. std.range.only does, which is a way to
generate a range from a list of elements without allocating anything on the
heap), and many ranges at least store the value for front. So, I'd say that
a fairly large percentage of ranges store at least one value directly in the
range object.

 Plus, as I've said before, if ranges are expected to contain data,
 copying them seems like an incredibly bad idea. The whole idea behind
 casually copying the range about is that it's cheap to do.

Yes, it's generally assumed that it's cheap to copy a range, but it's also
generally assumed by D code in general that copying is cheap. Relatively few
functions in Phobos accept arguments by ref. Some do accept by auto ref -
either to forward the refness or to avoid a copy - but since passing by ref
doesn't work with rvalues, most code in Phobos doesn't do so unless it's
intended to mutate the argument and give the caller access to the result. I
would strongly suggest that anyone who is looking to put an object that is
expensive to copy in a range consider making it so that that object is on
the heap rather than directly in the range.

 But as it stands, using ref with foreach in generic code is not going to
 go well. If anything, it's actually a code smell.

 You are conflating ref with owning data, which I find unsubstantiated.

My point is that if you have

foreach(ref e; range)
{
}

then unless range is a dynamic array, the odds are very high that it's a
bug, because then you risk problems like

foreach(ref e; range)
{
    e = foo;
}

Outside of dynamic arrays, that assignment is usually pointless. ref implies
that it's going to actually affect the range that you passed to foreach, but
in most cases, it won't. Because the range is copied, in order for ref to
affect elements of the range that is passed to foreach, the range must be a
reference type or must act like a dynamic array in the sense that each copy
refers to exactly the same objects - which is true for some ranges (e.g.
those over a container) - but for many ranges, it isn't (e.g. generative
ranges aren't usually backed by anything). Also, the front of most ranges
does not return by ref, which would be required for the ref on foreach to
actually allow you to mutate the element. As such, I'd argue that the fact
that ref is allowed on foreach for all ranges is a bug. At minimum, it
should require that front return by ref, and even then, depending on the
range, it won't necessary work to assign to the element and have it affect
the range that was copied when it was passed to foreach. So, while it will
work in _some_ cases to pass a range to foreach and then use ref, in
general, I would consider it a code smell, because the odds are high that it
does not do what the programmer who wrote it assumed that it would do.

 And if a templated function fails to compile if it's given a particular
 argument, and that failure is not due to the template constraint, that
 usually means that the template constraint is incomplete.

 I do not accept this argument outright. In fact, our experience at Weka
 has drove us to the exact opposite direction.

 We deliberately set up functions to include invalid cases and then
 either let them fail because of lacking constraints, or even set up a
 static assert to fail the function generation rather than fail matching.

 We found that the error messages produced by the alternative are simply
 too difficult to parse out and figure what went wrong.

For better or for worse, it's generally considered a bug in Phobos if it's
possible to write code that gets past the template constraint and then does
not compile. There are reasons why that can be a problematic approach, but
that is the approach that we've taken thus far. As such, if isInputRange
started allowing non-copyable elements, it would be expected that all
range-based functions in Phobos would then have to either require
copyability in their template constraint or actually work with non-copyable
types.

 it would mean that _every_ range-based function then has to worry about
 non-copyable elements

 No, just those in libraries. The same way they have to worry about big O
 complexity, type safety,  nogc and a bunch of other things.

 When you write a library, you have to think about your users. This
 doesn't change anything about that.

It changes it by adding yet another complication to worry about when
implementing ranges and range-based functions that no one implementing
range-based code currently has to worry about. And writing bug-free generic,
range-based code can already be surprisingly difficult without adding yet
another complication into the mix.

 Not all algorithms can call front just once without
 copying it

 Huh? Where did that come from?

 Phobos already assumes that calling front needs to be an O(1) operation.
 You can call front as many times as you like. Why would you assume
 otherwise?

Sure, you can call front as many times as you want. But in some cases, it's
more expensive to call front multiple times than it is to call it once and
store the result. Also, while the result of front may be equivalent every
time, it may not be exactly the same object every time. For instance,

auto r2 = range.map!(a => to!string())();

is going to to result in a range that allocates a string every time that
front is called. This sort of thing is not an uncommon thing to do with map,
and while many ranges are implemented to do most of their work in popFront
(thereby making calling front cheap), there's no requirement that they do
so. So, there are ranges that do their work in front rather than popFront,
and in the case of map, it actually has to be done in front, because map can
be random-access, and using popFront to calculate the value only works for
front, not arbitrary elements. In such cases, calling front multiple times
is definitely worse than just calling it once and storing the result. I
think that pretty much the only time that it's better to call front multiple
times rathre than calling it once and storing the result is when it returns
by ref, because if it doesn't return by ref, you're either going to get a
copy every time you call front, or the range is going to be creating a new
object every time you call front. In those cases, storing the result of
front and reusing it would be cheaper. And since most ranges do not return
by ref, most ranges are not avoiding copies or any expensive calculations by
calling front multiple times. If anything, they're incurring them.

Now, arguably, in the cases where a generic function needs to access front
multiple times, it would be best if it tested front for refness and then
used static ifs so that it stored the result of front when it didn't return
by ref and called it multiple times when it did, but that would get ugly
fast, and I'm not aware of any Phobos code that currently does that.

 The result of all of this is that in generic code, you have no clue
 whatsoever what the semantics are of copying a range around, because it
 depends entirely on the copying semantics of whatever range that code is
 currently being instantiated with. As such, generic code basically has
 to
 assume that once you copy a range, you can't mutate the original (since
 it may or may not be affected by mutating the copy and may or may not
 be cleanly affected by mutating the copy),

 Jonathan, I'm sorry, but what you are describing is called "a bug".

 Input ranges provide no guarantees about copying the range itself. If
 you do it and expect *anything* (which it seems you do), you have a bug.
 If you need to copy the range itself, you absolutely need to require a
 forward range.

 A forward range with ref front absolutely promises that both iterations
 reference the same elements.

 I find myself sincerely hoping that you are making these examples up in
 order to win this discussion turned into an argument for some reason,
 instead of the way phobos is actually written, because if you're right
 phobos is *broken*.

 The good news is that if Phobos is broken, this is a great way to flush
 out the places it's broken. Let's add UTs calling all Phobos functions
 accepting input ranges (as opposed to forward ranges) with non-copyable
 ranges (i.e. - the range itself is non-copyable) and fix the compilation
 errors. This shouldn't take more than a few days to do.

I'm not making anything up. Very little range-based code passes ranges by
reference. They're copied by value all the time, even if they're not forward
ranges. foreach itself copies a range before iterating over it. Yes, if you
want to iterate over two independent copies, you need a forward range, and
you need to call save, but ranges are copied constantly. It's just that it's
then a bug to do anything with the original after the copy has been made. As
it stands, a range that cannot be copied is going to work with almost
nothing in Phobos and would be borderline useless. And adding ref everywhere
to range-based functions to change that would destroy our ability to chain
function calls with ranges. Also, a large percentage of range-based
functions return a range that wraps another range, and that means copying
the range that's passed in. It simply wouldn't work with many functions to
accept a range by ref without then copying the range inside the function
anyway. The assumption that input ranges are copyable is built into almost
everything range-based in all of Phobos. The rare exceptions are functions
like std.conv.parse which are specifically intended to mutate the range
that's passed in rather than copying it.

 And allowing ranges with uncopyable members would turn this potential
 deadly run-time bugs into easy to find compile time errors. Sound like
 a
 great reason to allow it, to me.

 Ranges have to be copyable.

 And why would a range iterating over uncopyable objects not be copyable?
 How are the two related?

Because many ranges do stuff like store the value of front after calculating
it with popFront or store some set of values to calculate new values. No
such range would work with non-copyable element types even if isInputRange
allowed them. Now, that doesn't mean that _some_ ranges couldn't be made to
work with non-copyable element types, but the fact that ranges must be
copyable means that any range that worked with non-copyable element types
could not be a struct that stored any of its elements as a direct member.

- Jonathan M Davis

Apr 16 2018

Shachar Shemesh <shachar weka.io> writes:

This is going nowhere.

Let's flesh this out face to face at dconf.

Shachar

On 16/04/18 12:01, Jonathan M Davis wrote:
 On Monday, April 16, 2018 07:15:36 Shachar Shemesh via Digitalmars-d wrote:
 On 16/04/18 01:28, Jonathan M Davis wrote:
 The fact that foreach copies the range that it's given makes using ref
 with anything other than arrays a bit of an iffy proposition. It will
 compile regardless of whether front returns by ref or not (which
 arguably is a bug), but it will only affect the original elements if
 copying the range doesn't copy the elements

 You keep referring to ranges that contain the data, but I am racking my
 mind and failing to find examples.

 
 Pretty much any generative range does. e.g. iota contains data, and each
 element is then calculated from the current value. The same goes for stuff
 like random number generators. It's quite common for ranges to have popFront
 calculate the next front and store it as a member rather than have front do
 the calculation, since then multiple calls to front don't incur the cost of
 whatever has to be done to get the next element, and they can just return
 the same value each time. In fact, that's generally considered to be best
 practice. It's true that most ranges do not contain every element directly
 in a struct, but some do (e.g. std.range.only does, which is a way to
 generate a range from a list of elements without allocating anything on the
 heap), and many ranges at least store the value for front. So, I'd say that
 a fairly large percentage of ranges store at least one value directly in the
 range object.
 
 Plus, as I've said before, if ranges are expected to contain data,
 copying them seems like an incredibly bad idea. The whole idea behind
 casually copying the range about is that it's cheap to do.

 
 Yes, it's generally assumed that it's cheap to copy a range, but it's also
 generally assumed by D code in general that copying is cheap. Relatively few
 functions in Phobos accept arguments by ref. Some do accept by auto ref -
 either to forward the refness or to avoid a copy - but since passing by ref
 doesn't work with rvalues, most code in Phobos doesn't do so unless it's
 intended to mutate the argument and give the caller access to the result. I
 would strongly suggest that anyone who is looking to put an object that is
 expensive to copy in a range consider making it so that that object is on
 the heap rather than directly in the range.
 
 But as it stands, using ref with foreach in generic code is not going to
 go well. If anything, it's actually a code smell.

 You are conflating ref with owning data, which I find unsubstantiated.

 
 My point is that if you have
 
 foreach(ref e; range)
 {
 }
 
 then unless range is a dynamic array, the odds are very high that it's a
 bug, because then you risk problems like
 
 foreach(ref e; range)
 {
      e = foo;
 }
 
 Outside of dynamic arrays, that assignment is usually pointless. ref implies
 that it's going to actually affect the range that you passed to foreach, but
 in most cases, it won't. Because the range is copied, in order for ref to
 affect elements of the range that is passed to foreach, the range must be a
 reference type or must act like a dynamic array in the sense that each copy
 refers to exactly the same objects - which is true for some ranges (e.g.
 those over a container) - but for many ranges, it isn't (e.g. generative
 ranges aren't usually backed by anything). Also, the front of most ranges
 does not return by ref, which would be required for the ref on foreach to
 actually allow you to mutate the element. As such, I'd argue that the fact
 that ref is allowed on foreach for all ranges is a bug. At minimum, it
 should require that front return by ref, and even then, depending on the
 range, it won't necessary work to assign to the element and have it affect
 the range that was copied when it was passed to foreach. So, while it will
 work in _some_ cases to pass a range to foreach and then use ref, in
 general, I would consider it a code smell, because the odds are high that it
 does not do what the programmer who wrote it assumed that it would do.
 
 And if a templated function fails to compile if it's given a particular
 argument, and that failure is not due to the template constraint, that
 usually means that the template constraint is incomplete.

 I do not accept this argument outright. In fact, our experience at Weka
 has drove us to the exact opposite direction.

 We deliberately set up functions to include invalid cases and then
 either let them fail because of lacking constraints, or even set up a
 static assert to fail the function generation rather than fail matching.

 We found that the error messages produced by the alternative are simply
 too difficult to parse out and figure what went wrong.

 
 For better or for worse, it's generally considered a bug in Phobos if it's
 possible to write code that gets past the template constraint and then does
 not compile. There are reasons why that can be a problematic approach, but
 that is the approach that we've taken thus far. As such, if isInputRange
 started allowing non-copyable elements, it would be expected that all
 range-based functions in Phobos would then have to either require
 copyability in their template constraint or actually work with non-copyable
 types.
 
 it would mean that _every_ range-based function then has to worry about
 non-copyable elements

 No, just those in libraries. The same way they have to worry about big O
 complexity, type safety,  nogc and a bunch of other things.

 When you write a library, you have to think about your users. This
 doesn't change anything about that.

 
 It changes it by adding yet another complication to worry about when
 implementing ranges and range-based functions that no one implementing
 range-based code currently has to worry about. And writing bug-free generic,
 range-based code can already be surprisingly difficult without adding yet
 another complication into the mix.
 
 Not all algorithms can call front just once without
 copying it

 Huh? Where did that come from?

 Phobos already assumes that calling front needs to be an O(1) operation.
 You can call front as many times as you like. Why would you assume
 otherwise?

 
 Sure, you can call front as many times as you want. But in some cases, it's
 more expensive to call front multiple times than it is to call it once and
 store the result. Also, while the result of front may be equivalent every
 time, it may not be exactly the same object every time. For instance,
 
 auto r2 = range.map!(a => to!string())();
 
 is going to to result in a range that allocates a string every time that
 front is called. This sort of thing is not an uncommon thing to do with map,
 and while many ranges are implemented to do most of their work in popFront
 (thereby making calling front cheap), there's no requirement that they do
 so. So, there are ranges that do their work in front rather than popFront,
 and in the case of map, it actually has to be done in front, because map can
 be random-access, and using popFront to calculate the value only works for
 front, not arbitrary elements. In such cases, calling front multiple times
 is definitely worse than just calling it once and storing the result. I
 think that pretty much the only time that it's better to call front multiple
 times rathre than calling it once and storing the result is when it returns
 by ref, because if it doesn't return by ref, you're either going to get a
 copy every time you call front, or the range is going to be creating a new
 object every time you call front. In those cases, storing the result of
 front and reusing it would be cheaper. And since most ranges do not return
 by ref, most ranges are not avoiding copies or any expensive calculations by
 calling front multiple times. If anything, they're incurring them.
 
 Now, arguably, in the cases where a generic function needs to access front
 multiple times, it would be best if it tested front for refness and then
 used static ifs so that it stored the result of front when it didn't return
 by ref and called it multiple times when it did, but that would get ugly
 fast, and I'm not aware of any Phobos code that currently does that.
 
 The result of all of this is that in generic code, you have no clue
 whatsoever what the semantics are of copying a range around, because it
 depends entirely on the copying semantics of whatever range that code is
 currently being instantiated with. As such, generic code basically has
 to
 assume that once you copy a range, you can't mutate the original (since
 it may or may not be affected by mutating the copy and may or may not
 be cleanly affected by mutating the copy),

 Jonathan, I'm sorry, but what you are describing is called "a bug".

 Input ranges provide no guarantees about copying the range itself. If
 you do it and expect *anything* (which it seems you do), you have a bug.
 If you need to copy the range itself, you absolutely need to require a
 forward range.

 A forward range with ref front absolutely promises that both iterations
 reference the same elements.

 I find myself sincerely hoping that you are making these examples up in
 order to win this discussion turned into an argument for some reason,
 instead of the way phobos is actually written, because if you're right
 phobos is *broken*.

 The good news is that if Phobos is broken, this is a great way to flush
 out the places it's broken. Let's add UTs calling all Phobos functions
 accepting input ranges (as opposed to forward ranges) with non-copyable
 ranges (i.e. - the range itself is non-copyable) and fix the compilation
 errors. This shouldn't take more than a few days to do.

 
 I'm not making anything up. Very little range-based code passes ranges by
 reference. They're copied by value all the time, even if they're not forward
 ranges. foreach itself copies a range before iterating over it. Yes, if you
 want to iterate over two independent copies, you need a forward range, and
 you need to call save, but ranges are copied constantly. It's just that it's
 then a bug to do anything with the original after the copy has been made. As
 it stands, a range that cannot be copied is going to work with almost
 nothing in Phobos and would be borderline useless. And adding ref everywhere
 to range-based functions to change that would destroy our ability to chain
 function calls with ranges. Also, a large percentage of range-based
 functions return a range that wraps another range, and that means copying
 the range that's passed in. It simply wouldn't work with many functions to
 accept a range by ref without then copying the range inside the function
 anyway. The assumption that input ranges are copyable is built into almost
 everything range-based in all of Phobos. The rare exceptions are functions
 like std.conv.parse which are specifically intended to mutate the range
 that's passed in rather than copying it.
 
 And allowing ranges with uncopyable members would turn this potential
 deadly run-time bugs into easy to find compile time errors. Sound like
 a
 great reason to allow it, to me.

 Ranges have to be copyable.

 And why would a range iterating over uncopyable objects not be copyable?
 How are the two related?

 
 Because many ranges do stuff like store the value of front after calculating
 it with popFront or store some set of values to calculate new values. No
 such range would work with non-copyable element types even if isInputRange
 allowed them. Now, that doesn't mean that _some_ ranges couldn't be made to
 work with non-copyable element types, but the fact that ranges must be
 copyable means that any range that worked with non-copyable element types
 could not be a struct that stored any of its elements as a direct member.
 
 - Jonathan M Davis

Apr 16 2018

Uranuz <neuranuz gmail.com> writes:

 foreach(ref e; range)
 {
 }

On idea is to have `ref` foreach to say that you would like to 
iterate your range without copying. The syntax could be:

foreach(e; ref range)
{
}

or:
ref foreach(e; range)
{
}
At least it will not break existing code. But this means that in 
each case you should make a decision about `ref`/non ref.

Apr 20 2018

D Programming

C/C++ Programming

Other

digitalmars.D - isInputRange is false for non-copyable types