www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Range Redesign: Copy Semantics

reply Jonathan M Davis <newsgroup.d jmdavisprog.com> writes:
I've been thinking about this for a while now, but with the next version of
Phobos which is in the early planning stages, we really should do some
redesigning of ranges. Most of their API is just fine how it is, but there
are some aspects of it which really should be changed if we want them to be
better (the most obvious one being the removal of auto-decoding). But what
I'd like to discuss specifically in this thread is fixing - and defining -
the semantics of copying and assigning to ranges. Specifically, the
semantics of stuff like

auto copy = orig;

and

range1 = range2;

should be defined (which then defines what happens when passing ranges to
functions and stuff like foreach).

Right now, the only semantics of copying a range which are defined by the
range API are what happens to the copy when you make a copy. Specifically,
the copy will have the exact same elements in the exact same order that the
original would have had if it had been iterated through prior to the copy
being made. If that weren't true, then we couldn't pass ranges around.
However, because ranges support a whole range of types with varying copy
semantics, you can't rely on a range's copy semantics beyond that
(particularly with regards to the state of the original range after the copy
has been made). Specifically, a range type can have four different types of
copy semantics:

1. The range is a value type, so copying it results in two completely
indepedent copies. Iterating through one has no effect on the other.

2. The range is a reference type, so copying it results in two completely
dependent copies. They both share all of their state, so iterating through
one iterates through the other, and they both always have the same state.

3. The range is a pseudo-reference type, and it has value semantics for its
iteration state but reference semantics for some portion of the rest of its
state. This means that if you iterate through a copy of the range, it has no
effect on the original, so the range is a value type in that respect, but
the rest of its state may or may not be copied by value (often - though not
always - this means that the iteration state is directly within the range
type, but the elements themselves are referred to via pointer, so mutating
the actual elements would affect all copies of that range, but each range
can be iterated independently). Dynamic arrays are the most common example
of this sort of range.

4. The range is a pseudo-reference type, but it does not have value
semantics with regards to its iteration state. So, when copying the range,
some portion of its state is copied by value, but you don't end up with
indendepent copies - and in fact, you don't get dependent copies either,
since some of the state is copied by value. A common example of this sort of
range which would be one which contains most of its state via a pointer or
reference but which caches its front as a member variable within the struct.
The result of this is that once you've mutated a copy of the range, you can
no longer use the original, because mutating the copy only partially mutates
the state of the original, leaving it in an inconsistent / invalid state.

Generic code must support ranges with any of these copy semantics, and since
these copy semantics are drastically different from one another, the result
is that in fully generic code, once you copy a range, you cannot ever use
the original again (since you can't depend on its state). And you can't even
assign it a new value and then use it, because the semantics of assignment
aren't defined for ranges either (though they're likely to be strongly tied
to the copy semantics). Once you copy a range, you have to assume that the
orginal range is in a potentially invalid state and can't do anything
further with it.

In addition to copying ranges to new variables and passing them to other
functions, this also means that you can't actually continue to use a range
once it's been passed to foreach, because the lowering that foreach uses
copies the range (and it has to or it would break the semantics of using
dynamic arrays with foreach). So, you can't do stuff like partially iterate
through a range, break out of the loop, and then continue to iterate through
it, because the state of the range is implementation-dependent and could be
invalid (on top of the fact that in the case of many forward ranges, copying
it is equivalent to save, so trying to continue the iteration on the
original won't work, whereas it would work with some basic input ranges, so
even without any ranges ending up in an invalid state, the semantics depend
on the type in a way that does not work with generic code). In fact, someone
necro-ed a thread in D.Learn on this issue with foreach just the other day,
since it's not obvious and can be pretty surprising.

Of course, in order to try to solve the problem with varying copy semantics,
the range API has save for forward ranges. This makes it possible to get an
independent copy of a forward range, and any generic code which needs an
independent copy needs to use it to get one. However, that does not help at
all with the inconsistent semantics of actually copying a range, and it's
very common for people to write code that works with dynamic arrays but then
doesn't work with other types of ranges, because other types of ranges have
differing copy semantics, and they didn't test their code with enough range
types to find where they accidentally relied on the copy semantics (or
assignment semantics) of dynamic arrays (or on the semantics of whatever
range type it was that they tested with).

If we really want to fix these classes of issues, we need ranges to have
consistent copy semantics, which means that we need to define what those
semantics are.

For forward ranges, this is quite straightfoward. We need to require that
copying a forward range results in two copies which can be independently
iterated - so essentially, the iteration state must be copied by value, or
another way to look at it is that copying a forward range must be equivalent
to calling save (though the situation is actually more nuanced than that).
We would be requiring that all forward ranges have the same copying
semantics as dynamic arrays with regards to their iteration state (but would
allow the elements themselves to be copied by value or by reference like we
do now). And of course, we would then get rid of save, since it would be
unnecessary.

What this would mean is that all forward ranges would be either dynamic
arrays or structs. Any code that then wanted a class or a pointer to a
struct to be a forward range would need to wrap it in a struct to give it
the correct copy semantics. A naive implementation would just duplicate the
class whenever the struct was copied, but a smarter implementation could do
something like use reference counting to keep track of how many ranges there
were pointing to that same class object and then duplicate it whenever a
function was called on it which would mutate its iteration state, and the
reference count was greater than 1 (and of course not bother to duplicate it
if the reference count was 1). It wouldn't need to do more memory management
than that (though it could), so that doesn't necessarily require anything
 system or  trusted as often gets discussed with reference counting. Of
course, allowing such an implementation does technically mean that copying a
forward range is not necessarily strictly equivalent to calling save, but
the overall iteration semantics would treat each copy as independent like
save is supposed to allow, so it wouldn't be a problem.

And with that change, we then have consistent copy semantics with forward
ranges. All code operating on forward ranges could treat them the same as
they do dynamic arrays when copying them and passing them around, and you
wouldn't have to worry about forgetting to call save. I think that we can
probably get general agreement on such a change, since it's been discussed
off-and-on for years now. However, the problem then is basic input ranges.

Basic input ranges cannot have the same copy semantics as dynamic arrays. If
they could, then they could be forward ranges. As things stand, they either
have reference semantics, or they have pseudo-reference semantics of the
variety where mutating a copy puts the original in an invalid state. The
only way that we could make their copy semantics consistent is if we then
require that they be reference types. The simplest way to do that would be
to require that they be classes or pointers to structs, though that would
then make it impossible to do anything like reference counting with basic
input ranges, which isn't ideal, but it would have the advantage of allowing
us to force reference semantics at compile time.

This would of course mean that basic input ranges and forward ranges would
then be defined to have different copy semantics, which would mean that
either generic code that relied on the copy semantics would have to work
with only one of them, or we would need to give basic input ranges a
different API. And if we allowed basic input ranges to be structs that
defined reference semantics internally (e.g. by implementing reference
counting on a pointer or class reference), then we would have to give basic
input ranges a different API, because we wouldn't have save to differentiate
them anymore, and if we allowed structs, then we couldn't differentiate them
based on a basic input range being a class reference or pointer to a struct,
whereas a forward range would be a struct or a dynamic array.

Ultimately, I'm inclined to argue that we should give basic input ranges a
new API - not just because it would allow them to use reference counting,
but also because the current input range API tends to force basic input
ranges to cache the value of front (which if anything, encourages basic
input ranges to be pseudo-reference types and could result in an extra layer
of indirection in some cases if they're forced to be reference types). It
would be annoying in some cases to require that different functions be used
for basic input ranges and forward ranges (though overloading could
obviously be used to avoid needing different names), but it's already often
the case that code isn't really designed to work with both, and overloading
on the category of range being used is already fairly common, since
different range capabilities allow for different implementations. So, given
that it would prevent whole classes of copying bugs as well as potentially
remove the requirement to cache front for basic input ranges, I think that a
separate API for basic input ranges is warranted. What I would propose for
that would be a single function

    auto next();

where next returns a nullable type where the value returned is the next
element in the range, with a null value being returned if the range is
empty. The return type would then need to emulate a pointer - specifically,
when casting it to bool, it would be true if it's non-null and false if it's
null, and dereferencing it would give you the actual value if it's non-null
(with it being undefined behavior if you dereference null). So, a basic
input range of ints might define next as

    int* next();

or alternatively, it could be something like

    Nullable!int next();

though that wouldn't work with Phobos' current Nullable type, since it
doesn't support either casting to bool or dereferencing (probably because it
originally used alias this). Either way, since we'd just require the
specific API for the return type rather than requirng a pointer, the range
type would have some flexibliity in what it used.

This would then mean that if you wanted to loop over a basic input range,
you'd do something like

    for(auto front = range.next; !front; front = range.next)
    {
        ...
    }

And if we go down this road, then we could also add this API to foreach,
allowing for code such as

    foreach(e; basicInputRange)
    {
        ...
    }

to work - and unlike now, you could rely on the copy semantics involved such
that you would know that you could then break out of the loop and continue
to use the range (whereas right now, you can't safely break out of a foreach
and then continue to use the range that you were iterating over). Of course,
for forward ranges, breaking out of a foreach still wouldn't let you
continue to iterate from where you were, since the foreach would be
iterating over a copy, but those would be the semantics for all forward
ranges, so you could rely on them.

In addition to these changes to the copy semantics, I propose that we then
require that assignment follow the same semantics as copying. So,

    range1 = range2;

on a forward range would result in two independent copies with the same the
same elements in the same order (regardless of what range1 pointed to
before), just like you'd get with

    auto copy = orig;

whereas for basic input ranges

    range1 = range2;

would result in range1 and range2 being fully dependent copies (regardless
of what range1 pointed to before) - just like what you'd expect when
assigning one pointer to another.

So, the assignment semantics would differ between basic input ranges and
forward ranges, but they would be consistent with their copy semantics, and
they would be consistent across all ranges of the same category (and of
course, assignment would only be expected to work if the two ranges were of
exactly the same type).

The result of all of this is that we could then have consistent, reliable
copy and assignment semantics for both basic input ranges and forward
ranges, and the two would be easily distinguishable so that you don't
accidentally use one as the other and end up with bugs due to the
differences in semantics between the two categories of ranges.

The other potential change related to this would be to rename basic input
ranges. Right now, forward ranges are treated as an extension of input
ranges, which makes sense if you're ignoring the issues surrounding copy
semantics, since a forward range is an input range with an extra function
that allows you to get an independent copy, but if we're giving them
different APIs so that the copy semantics can be cleanly defined and
separated, then arguably, it doesn't make sense to call them both ranges. As
such, we could switch to calling forward ranges simply ranges (which we
mostly do anyway) and give basic input ranges a new name. My initial
suggestion would be an input stream (or if we absolutely had to continue to
call them ranges, then maybe simple ranges), but there's probably a better
name to be had, and obviously, plenty of bikeshedding can be done on that
topic. Either way, the names used here aren't the primary concern.

What I'd like to get out of this thread is feedback on how much sense this
idea does or doesn't make and what problems I'm missing that someone else is
able to see. From what I can see, the main negative is simply that you can't
then write code that works on both a basic input range and a forward range
(though you can obviously still create function overloads so that the caller
can use either), but given the issues surrounding copy semantics, I think
that that's probably ultimately a good change (and the number of functions
that can operate on basic input ranges is already pretty limited anyway in
comparison to forward ranges - particularly with regards to generic
algorithms). It will also make it much easier to discuss the separation
between basic input ranges and forward ranges, which IMHO can be too easy to
lose or confuse as things stand.

- Jonathan M Davis
Jan 20 2024
next sibling parent reply Per =?UTF-8?B?Tm9yZGzDtnc=?= <per.nordlow gmail.com> writes:
On Sunday, 21 January 2024 at 05:00:31 UTC, Jonathan M Davis 
wrote:
 I've been thinking about this for a while now, but with the ...
Shall - `Nullable.opCast` to `bool` - `Nullable.opUnary` `*` be added to Phobos' `std.typecons`?
Jan 21 2024
parent reply Jonathan M Davis <newsgroup.d jmdavisprog.com> writes:
On Sunday, January 21, 2024 3:36:37 AM MST Per Nordlöw via Digitalmars-d 
wrote:
 On Sunday, 21 January 2024 at 05:00:31 UTC, Jonathan M Davis

 wrote:
 I've been thinking about this for a while now, but with the ...
Shall - `Nullable.opCast` to `bool` - `Nullable.opUnary` `*` be added to Phobos' `std.typecons`?
Maybe. It probably should be, though I don't know how many changes like that we're going to do to the current version of Phobos. Either way, if we're doing a new major version of Phobos, we'll be reworking a variety of stuff anyway, and so Nullable would presumably be on the list (e.g. I think that it was a big mistake to make it a range, and I'd remove that functionality; slicing it to get a range is one thing, but the fact that it itself is a range definitely causes problems in generic code). And if the new API for basic input ranges uses a pointer API to access its elements, then that's a very good reason to make Nullable support that. As I said in my post, I suspect that the main reason that the current version doesn't is because it originally used alias this - which we removed due to all of the bugs that that caused, but giving Nullable an opCast to bool and * for dereferencing would likely have been a terrible idea when Nullable used alias this, whereas without that, I can't think of any reason why it would be a bad idea - though I'd likely still choose to have get and isNull for those who wanted to make it clear that a Nullable was being used. But I think that it's better to use a pointer API for basic input ranges, because then they can actually use pointers in those cases where that makes sense instead of requiring a Nullable type, much as it also needs to work to use a Nullable type. - Jonathan M Davis
Jan 21 2024
next sibling parent reply Alexandru Ermicioi <alexandru.ermicioi gmail.com> writes:
On Sunday, 21 January 2024 at 10:58:32 UTC, Jonathan M Davis 
wrote:
 On Sunday, January 21, 2024 3:36:37 AM MST Per Nordlöw via 
 Digitalmars-d wrote:
 On Sunday, 21 January 2024 at 05:00:31 UTC, Jonathan M Davis

 wrote:
 I've been thinking about this for a while now, but with the 
 ...
Shall - `Nullable.opCast` to `bool` - `Nullable.opUnary` `*` be added to Phobos' `std.typecons`?
Maybe. It probably should be, though I don't know how many changes like that we're going to do to the current version of Phobos.
Imho, it must not be added, as this might conflate with contents of `Nullable!bool` types.
Jan 21 2024
parent Jonathan M Davis <newsgroup.d jmdavisprog.com> writes:
On Sunday, January 21, 2024 6:26:16 AM MST Alexandru Ermicioi via Digitalmars-
d wrote:
 On Sunday, 21 January 2024 at 10:58:32 UTC, Jonathan M Davis

 wrote:
 On Sunday, January 21, 2024 3:36:37 AM MST Per Nordlöw via

 Digitalmars-d wrote:
 On Sunday, 21 January 2024 at 05:00:31 UTC, Jonathan M Davis

 wrote:
 I've been thinking about this for a while now, but with the
 ...
Shall - `Nullable.opCast` to `bool` - `Nullable.opUnary` `*` be added to Phobos' `std.typecons`?
Maybe. It probably should be, though I don't know how many changes like that we're going to do to the current version of Phobos.
Imho, it must not be added, as this might conflate with contents of `Nullable!bool` types.
It doesn't conflate with Nullable!bool at all, because the alias this was removed from Nullable. The only way to get the value out of Nullable is to use its get member. If we added dereferencing to it, then dereferencing it would then also work to get its value, but if we added an opCast to bool, casting would never give the Nullable's value even if the type of its value were bool. It would just be equivalent to negating the result of isNull. I agree that if we still had alias this on Nullable, then adding opCast to bool to it would be problematic - and that's likely why it doesn't have that opCast, since its original API included alias this, but since the alias this is now gone, that's no longer a problem. There is no ambiguity. - Jonathan M Davis
Jan 21 2024
prev sibling parent reply "Richard (Rikki) Andrew Cattermole" <richard cattermole.co.nz> writes:
On 21/01/2024 11:58 PM, Jonathan M Davis wrote:
 Either way, if we're doing a new major version of Phobos, we'll be reworking
 a variety of stuff anyway, and so Nullable would presumably be on the list
 (e.g. I think that it was a big mistake to make it a range, and I'd remove
 that functionality; slicing it to get a range is one thing, but the fact
 that it itself is a range definitely causes problems in generic code).
Umm, why would we bring back Nullable? By then we'd have sum types which can represent it in the language far better.
Jan 21 2024
parent reply Jonathan M Davis <newsgroup.d jmdavisprog.com> writes:
On Sunday, January 21, 2024 10:39:41 AM MST Richard (Rikki) Andrew Cattermole 
via Digitalmars-d wrote:
 On 21/01/2024 11:58 PM, Jonathan M Davis wrote:
 Either way, if we're doing a new major version of Phobos, we'll be
 reworking a variety of stuff anyway, and so Nullable would presumably be
 on the list (e.g. I think that it was a big mistake to make it a range,
 and I'd remove that functionality; slicing it to get a range is one
 thing, but the fact that it itself is a range definitely causes problems
 in generic code).
Umm, why would we bring back Nullable? By then we'd have sum types which can represent it in the language far better.
Much as I know that you like the idea of sum types, AFAIK, it's never been officially agreed that we're going to have them in the language, and even if we do get them, I don't know what that's actually going to look like. If it's anything like std.sumtype, IMHO, it would be utterly terrible for when you want a Nullable type. For that, you want to be able to ask whether the object has a value, and if it does, you want to be able to fetch its value. So, while you could use different function names for it, you basically want the same kind of API that a pointer has. Something where you'd have to pass delegates in just to get the value out would be incredibly annoying and bloated in comparison (and personally, I hate std.sumtype in general because of that design, but it makes a lot more sense when you have a variety of possible types as opposed to just one which may or may not be there). But regardless of whether sum types would make sense in this case (which I very much question), we'd have to know that they were definitely being added to the language and what exactly that would look like - and they would have to function in a way that was clearly superior to have a dedicated Nullable type, or it would make more sense to add a Nullable type and use that for cases where you want a nullable object. Obviously, if sum types get added to the language soon enough, and they really do replace the need for Nullable, then Nullable probably won't be in Phobos v3, but as thing stand, I would fully expect it to be there. Either way, using the API of a pointer makes a lot of sense for the proposed basic input range API, in which case, a range that wanted to return something other than a pointer would need to return a type that used the same API. - Jonathan M Davis
Jan 21 2024
parent reply "Richard (Rikki) Andrew Cattermole" <richard cattermole.co.nz> writes:
On 22/01/2024 3:08 PM, Jonathan M Davis wrote:
 On Sunday, January 21, 2024 10:39:41 AM MST Richard (Rikki) Andrew Cattermole
 via Digitalmars-d wrote:
 On 21/01/2024 11:58 PM, Jonathan M Davis wrote:
 Either way, if we're doing a new major version of Phobos, we'll be
 reworking a variety of stuff anyway, and so Nullable would presumably be
 on the list (e.g. I think that it was a big mistake to make it a range,
 and I'd remove that functionality; slicing it to get a range is one
 thing, but the fact that it itself is a range definitely causes problems
 in generic code).
Umm, why would we bring back Nullable? By then we'd have sum types which can represent it in the language far better.
Much as I know that you like the idea of sum types, AFAIK, it's never been officially agreed that we're going to have them in the language, and even if we do get them, I don't know what that's actually going to look like.
Given that Walter has put together a DIP for them, I'm fairly confident it is a question of when not if. https://github.com/WalterBright/DIPs/blob/sumtypes/DIPs/1NNN-(wgb).md
Jan 21 2024
parent Jonathan M Davis <newsgroup.d jmdavisprog.com> writes:
On Sunday, January 21, 2024 7:17:35 PM MST Richard (Rikki) Andrew Cattermole 
via Digitalmars-d wrote:
 On 22/01/2024 3:08 PM, Jonathan M Davis wrote:
 On Sunday, January 21, 2024 10:39:41 AM MST Richard (Rikki) Andrew
 Cattermole>
 via Digitalmars-d wrote:
 On 21/01/2024 11:58 PM, Jonathan M Davis wrote:
 Either way, if we're doing a new major version of Phobos, we'll be
 reworking a variety of stuff anyway, and so Nullable would presumably be
 on the list (e.g. I think that it was a big mistake to make it a range,
 and I'd remove that functionality; slicing it to get a range is one
 thing, but the fact that it itself is a range definitely causes problems
 in generic code).
Umm, why would we bring back Nullable? By then we'd have sum types which can represent it in the language far better.
Much as I know that you like the idea of sum types, AFAIK, it's never been officially agreed that we're going to have them in the language, and even if we do get them, I don't know what that's actually going to look like.
Given that Walter has put together a DIP for them, I'm fairly confident it is a question of when not if. https://github.com/WalterBright/DIPs/blob/sumtypes/DIPs/1NNN-(wgb).md
Well, we'll see what we actually end up with, but I'd consider that proposal to be terrible if all you want is a nullable type, and restrictions such as "Members of sumtypes cannot have copy constructors, postblits, or destructors" would be far too much for plenty of use cases such that I would still be very much inclined to have an explicit Nullable type. In any case, whether Phobos v3 has a Nullable type and the question of the exact type returned by next in the proposed API don't really matter that much right now. What matters is the basics of the API, and some of the details can be ironed out later. IMHO, having next return a type that emulates a pointer is the best choice (especially when we don't even have sum types yet), but that can be revisited down the line, particularly since it's not like we're releasing major changes to ranges right now. - Jonathan M Davis
Jan 21 2024
prev sibling next sibling parent reply Alexandru Ermicioi <alexandru.ermicioi gmail.com> writes:
On Sunday, 21 January 2024 at 05:00:31 UTC, Jonathan M Davis 
wrote:
 1. The range is a value type, so copying it results in two 
 completely indepedent copies. Iterating through one has no 
 effect on the other.

 2. The range is a reference type, so copying it results in two 
 completely dependent copies. They both share all of their 
 state, so iterating through one iterates through the other, and 
 they both always have the same state.

 3. The range is a pseudo-reference type, and it has value 
 semantics for its iteration state but reference semantics for 
 some portion of the rest of its state. This means that if you 
 iterate through a copy of the range, it has no effect on the 
 original, so the range is a value type in that respect, but the 
 rest of its state may or may not be copied by value (often - 
 though not always - this means that the iteration state is 
 directly within the range type, but the elements themselves are 
 referred to via pointer, so mutating the actual elements would 
 affect all copies of that range, but each range can be iterated 
 independently). Dynamic arrays are the most common example of 
 this sort of range.

 4. The range is a pseudo-reference type, but it does not have 
 value semantics with regards to its iteration state. So, when 
 copying the range, some portion of its state is copied by 
 value, but you don't end up with indendepent copies - and in 
 fact, you don't get dependent copies either, since some of the 
 state is copied by value. A common example of this sort of 
 range which would be one which contains most of its state via a 
 pointer or reference but which caches its front as a member 
 variable within the struct. The result of this is that once 
 you've mutated a copy of the range, you can no longer use the 
 original, because mutating the copy only partially mutates the 
 state of the original, leaving it in an inconsistent / invalid 
 state.
For all these cases there is `.save` method which seems to be ignored in most of the code and compiler, when it should not be. It explicitly defines that range is copied irrelevant of the type of range you've mntioned. Since `.save` is defined on forward range, this means that pure input range must never be copyable, i.e. copy and post constructors must be disabled. If this rule is followed, issues with copying would dissappear. Even more drastic measure would be to disable copying on all types of ranges, and allow copying only through `.save` method, i.e. do explicit copying when it is intended to. Best regards, Alexandru.
Jan 21 2024
parent reply Jonathan M Davis <newsgroup.d jmdavisprog.com> writes:
On Sunday, January 21, 2024 6:22:26 AM MST Alexandru Ermicioi via Digitalmars-
d wrote:
 For all these cases there is `.save` method which seems to be
 ignored in most of the code and compiler, when it should not be.
 It explicitly defines that range is copied irrelevant of the type
 of range you've mntioned. Since `.save` is defined on forward
 range, this means that pure input range must never be copyable,
 i.e. copy and post constructors must be disabled.

 If this rule is followed, issues with copying would dissappear.

 Even more drastic measure would be to disable copying on all
 types of ranges, and allow copying only through `.save` method,
 i.e. do explicit copying when it is intended to.
And how on earth would you use a range if it can't be copied? They're copied all over the place and have to be to even be used. You copy them when you pass them to any range-based function. You copy them when you return them from a range-based function. You copy them when they get wrapped by another range. You copy them when you pass them to foreach. Disabling copying on ranges would make them borderline useless. And you also can't disable copying on classes or dynamic arrays. We could disallow classes which are ranges (even for basic input ranges) if we really wanted to make it so that they couldn't be copied, but we couldn't do the same with dynamic arrays. And really, ranges need to be copyable, because we need to be able to pass them around around and store them. And while ref could theoretically be used in some of those cases, in many of them, it could not (e.g. you have to copy a range once you wrap it in another type, and the ability to do that is crucial for most range-based algorithms). Also, forward ranges are supposed to mimic dynamic arrays. C++ iterators are meant to mimic pointers, and D's ranges are meant to mimic dynamic arrays. The only reason that we even have save is to support ranges which are classes. A lot of the problems with range-based code come from code that assumes that a range behaves like a dynamic array when it doesn't. If we required that all forward ranges have the same copy semantics as dynamic arrays, then that problem is solved. Having save means treating forward ranges differently than we treat dynamic arrays, and it's completely unnecessary so long as we're willing to require that classes be wrapped in structs to be forward ranges. The ideal situation here is for ranges in general to have the same semantics as dynamic arrays with regards to copying, assignment, and iteration. So, we can't disable copying for ranges in general and have them work, and even if we all agreed that disabling copying for ranges in general was a good idea, we still couldn't do that for dynamic arrays. So, we'd continue to have the problem that code would be written to be range-based but only worked with arrays, because that was what it was tested with. By requiring that forward ranges have the same semantics as dynamic arrays and treating basic input ranges as a separate thing, it becomes possible to write code that assumes the same semantics as dynamic arrays and then works with all forward ranges, which would significantly reduce the number of bugs that we get in range-based code. - Jonathan M Davis
Jan 21 2024
parent reply Nick Treleaven <nick geany.org> writes:
On Sunday, 21 January 2024 at 13:48:56 UTC, Jonathan M Davis 
wrote:
 And how on earth would you use a range if it can't be copied? 
 They're copied all over the place and have to be to even be 
 used. You copy them when you pass them to any range-based 
 function. You copy them when you return them from a range-based 
 function. You copy them when they get wrapped by another range. 
 You copy them when you pass them to foreach. Disabling copying 
 on ranges would make them borderline useless.
Maybe they could be moved instead of copied when the original is not needed.
 And you also can't disable copying on classes or dynamic arrays.
Unless all ranges have to be structs, but then we probably need a function/syntax to ask for a range from a class or dynamic array.
Jan 21 2024
parent Jonathan M Davis <newsgroup.d jmdavisprog.com> writes:
On Sunday, January 21, 2024 11:13:02 AM MST Nick Treleaven via Digitalmars-d 
wrote:
 On Sunday, 21 January 2024 at 13:48:56 UTC, Jonathan M Davis

 wrote:
 And how on earth would you use a range if it can't be copied?
 They're copied all over the place and have to be to even be
 used. You copy them when you pass them to any range-based
 function. You copy them when you return them from a range-based
 function. You copy them when they get wrapped by another range.
 You copy them when you pass them to foreach. Disabling copying
 on ranges would make them borderline useless.
Maybe they could be moved instead of copied when the original is not needed.
Do you really want to have to call move every time that you call a range-based function or want to wrap one range in another? Immovable types are incredibly un-user-friendly in general. It also would mean that ranges wouldn't work with foreach with how foreach currently works, since it copies the range. And if it doesn't copy the range, then ranges other than dynamic arrays would have different semantics when being iterated with foreach than dynamic arrays do, which would also be a big problem.
 And you also can't disable copying on classes or dynamic arrays.
Unless all ranges have to be structs, but then we probably need a function/syntax to ask for a range from a class or dynamic array.
Forward ranges are supposed to mimic dynamic arrays. Having to wrap dynamic arrays to use them as ranges would be very annoying. And for what? To just avoid copying? Yes, if copying ranges is illegal, then you've given them all the same copy semantics, but the result is incredibly unwieldy, and it means that you can't have dynamic arrays and forward ranges use the same code without wrapping dynamic arrays. And are people really going to want to be wrapping dynamic arrays in another type just so that they can make them uncopyable and the be forced to call move all over the place instead? The most friendly way for forward ranges to function is to make them more properly emulate dynamic arrays as they were originally intended to do. The reason that they don't already is so that we could support classes as ranges, which while it should work, is not the typical use case, and we can allow them to be forward ranges almost as easily with a wrapper struct as we can by having a save function. If forward ranges all have the same copy semantics as dynamic arrays, then they become easy to use correctly without having bugs due to stuff like forgetting to call save - and without a bunch of compiler errors, because you forgot to use move all over the place. For forward ranges, it would be a huge win, and it's exactly what we've talked about doing for years now. We just haven't done it, because doing it in the current version of Phobos would break existing code. The bigger problem is then what to do with basic input ranges, because they can't have the same copy semantics, but the fact that they can't implement something like save already hamstrings them enough that arguably they should be treated as something separate anyway. So, by just giving them a separate API, we simplify the situation with forward ranges considerably, and we avoid the classes of bugs that come with code being written to work with both basic input ranges and forward ranges. Of course, there are some cases where having code written to work with both works just fine, and having separate APIs could be annoying in those cases, but basic input ranges are limited enough that most code doesn't support them anyway. Obviously, the tradeoffs are debatable, but I think that it's clear that ideal situation for forward ranges is to have them all have the same copy semantics as dynamic arrays. - Jonathan M Davis
Jan 21 2024
prev sibling next sibling parent reply Alexandru Ermicioi <alexandru.ermicioi gmail.com> writes:
On Sunday, 21 January 2024 at 05:00:31 UTC, Jonathan M Davis 
wrote:
 Ultimately, I'm inclined to argue that we should give basic 
 input ranges a new API - not just because it would allow them 
 to use reference counting, but also because the current input 
 range API tends to force basic input ranges to cache the value 
 of front (which if anything, encourages basic input ranges to 
 be pseudo-reference types and could result in an extra layer of 
 indirection in some cases if they're forced to be reference 
 types). It would be annoying in some cases to require that 
 different functions be used for basic input ranges and forward 
 ranges (though overloading could obviously be used to avoid 
 needing different names), but it's already often the case that 
 code isn't really designed to work with both, and overloading 
 on the category of range being used is already fairly common, 
 since different range capabilities allow for different 
 implementations. So, given that it would prevent whole classes 
 of copying bugs as well as potentially remove the requirement 
 to cache front for basic input ranges, I think that a separate 
 API for basic input ranges is warranted. What I would propose 
 for that would be a single function

     auto next();
We could use java Iterator api in this case which has also `bool hasNext()` function. Then new input range api can also be propagated to other types of range such as forward range and further.
 where next returns a nullable type where the value returned is 
 the next element in the range, with a null value being returned 
 if the range is empty. The return type would then need to 
 emulate a pointer - specifically, when casting it to bool, it 
 would be true if it's non-null and false if it's null, and 
 dereferencing it would give you the actual value if it's 
 non-null (with it being undefined behavior if you dereference 
 null). So, a basic input range of ints might define next as

     int* next();

 or alternatively, it could be something like

     Nullable!int next();
The returned type should be a tagged union, that can allow storing of actual value inside, or pointer to it, while it's interface will hide the details, i.e. get would look like this: `ref T get()`.
 though that wouldn't work with Phobos' current Nullable type, 
 since it doesn't support either casting to bool or 
 dereferencing (probably because it originally used alias this). 
 Either way, since we'd just require the specific API for the 
 return type rather than requirng a pointer, the range type 
 would have some flexibliity in what it used.

 This would then mean that if you wanted to loop over a basic 
 input range, you'd do something like

     for(auto front = range.next; !front; front = range.next)
     {
         ...
     }

 And if we go down this road, then we could also add this API to 
 foreach, allowing for code such as

     foreach(e; basicInputRange)
     {
         ...
     }

 to work - and unlike now, you could rely on the copy semantics 
 involved such that you would know that you could then break out 
 of the loop and continue to use the range (whereas right now, 
 you can't safely break out of a foreach and then continue to 
 use the range that you were iterating over). Of course, for
Input ranges could just be disallowed in foreach statements, that would solve different semantics between them and forward ranges, just like how in Java it is done with Stream api.
 What I'd like to get out of this thread is feedback on how much 
 sense this idea does or doesn't make and what problems I'm 
 missing that someone else is able to see. From what I can see, 
 the main negative is simply that you can't then write code that 
 works on both a basic input range and a forward range (though 
 you can obviously still create function overloads so that the 
 caller can use either), but given the issues surrounding copy 
 semantics, I think that that's probably ultimately a good 
 change (and the number of functions that can operate on basic 
 input ranges is already pretty limited anyway in comparison to 
 forward ranges - particularly with regards to generic 
 algorithms). It will also make it much easier to discuss the 
 separation between basic input ranges and forward ranges, which 
 IMHO can be too easy to lose or confuse as things stand.
Imho, this proposal is complicated, and unnecessarily complicates construction of ranges, making them less appealing to implement in user code. I'd opt for restricting copying completely, and allow copying through `.save` only. The `.next` method proposal is a good improvement though, with addition of `.hasNext` method at minimum.
Jan 21 2024
parent reply Jonathan M Davis <newsgroup.d jmdavisprog.com> writes:
On Sunday, January 21, 2024 6:50:26 AM MST Alexandru Ermicioi via Digitalmars-
d wrote:
 On Sunday, 21 January 2024 at 05:00:31 UTC, Jonathan M Davis
 wrote:
 We could use java Iterator api in this case which has also `bool
 hasNext()` function.
Well, the key change the API that would really matter IMHO would be to merge front and popFront, since that would eliminate the need for basic input ranges to cache front, which many of them are forced to do right now. I'd have to think about it more to see whether having hasNext instead of returning a nullable type from next would be an improvement, make it worse, or be more or less equal. A lot of that depends on how it would affect the typical implementation of a basic input range.
 Then new input range api can also be propagated to other types of
 range such as forward range and further.
Part of the point here is to _not_ do that, because they fundamentally have different semantics.
 where next returns a nullable type where the value returned is
 the next element in the range, with a null value being returned
 if the range is empty. The return type would then need to
 emulate a pointer - specifically, when casting it to bool, it
 would be true if it's non-null and false if it's null, and
 dereferencing it would give you the actual value if it's
 non-null (with it being undefined behavior if you dereference
 null). So, a basic input range of ints might define next as

     int* next();

 or alternatively, it could be something like

     Nullable!int next();
The returned type should be a tagged union, that can allow storing of actual value inside, or pointer to it, while it's interface will hide the details, i.e. get would look like this: `ref T get()`.
Requiring the pointer API already does that. It's just that checking whether the value is null means casting to bool, and you dereference the return value rather than calling get. The implementation would be quite free to then use a Nullable or some other type which does whatever it wants internally so long as casting to bool tells you whether the object contains a value, and dereferencing it gives you the value.
 Input ranges could just be disallowed in foreach statements, that
 would solve different semantics between them and forward ranges,
 just like how in Java it is done with Stream api.
They could be, but that wouldn't be very user-friendly. And it doesn't solve the different semantics of copying them.
 Imho, this proposal is complicated, and unnecessarily complicates
 construction of ranges, making them less appealing to implement
 in user code. I'd opt for restricting copying completely, and
 allow copying through `.save` only.

 The `.next` method proposal is a good improvement though, with
 addition of `.hasNext` method at minimum.
Restricting copying would make ranges borderline unusable. They have to be able to be passed around and be wrappable, which means either copying them or moving them, and moving them would be very un-user-friendly, since it would require explicit calls to move calls all over the place. What we really want to be able to do in general is treat ranges like dynamic arrays (and part of the reason to give basic input ranges a different API is because the can't have the same copy semantics as dynamic arrays). The suggested API would actually simplify forward ranges considerably, because then most of them wouldn't have to implement save any longer, and those few that do would just implement the necessary semantics via their copy constructor (or a more advanced implementation might use reference-counting, which would be somewhat more complex, but the vast majority of ranges would not be in that situation). And the result would be that forward ranges in general could be treated just like dynamic arrays, which would simplify the vast majority of range-based code and allow range-based code to rely on the copy and assignment semantics instead of having to carefully avoid using ranges which have been copied and avoid assignment like they have to do now. So, as far as forward ranges go, this seems to me like it's very much a simplification, not something complicated. If this is coming across as complicated, then it's probably because of all of the explanatory text I had to give as to why these changes should be made. But the changes themselves simplify things for forward ranges. The part that's arguably more complex is simply that the basic input ranges then have to have a separate API, but since they fundamentally have different cropy semantics and can't be used in most range-based functions anyway, I don't think that it complicates things much. And it makes the distinction and behavioral differences between basic input ranges and forward ranges much clearer than they typically are now, which would make understanding ranges easier. - Jonathan M Davis
Jan 21 2024
next sibling parent reply Paul Backus <snarwin gmail.com> writes:
On Monday, 22 January 2024 at 03:52:02 UTC, Jonathan M Davis 
wrote:
 On Sunday, January 21, 2024 6:50:26 AM MST Alexandru Ermicioi 
 via Digitalmars- d wrote:
 Then new input range api can also be propagated to other types 
 of range such as forward range and further.
Part of the point here is to _not_ do that, because they fundamentally have different semantics.
I think maybe we're losing sight of the big picture in this discussion. The big-picture, overarching goal of the range API is to provide a common interface between data structures and algorithms. You have N data structures and M algorithms, and by using this common interface you can implement all N*M combinations in O(N+M) code. Each data structure implements the range API once, and each algorithm is implemented once using the range API. If you split the API in two, by making input streams (basic input ranges) and forward ranges completely disjoint, you undermine this goal. Now, each data structure has to implement *two* APIs, and each algorithm has to be implemented *twice*, once for input streams and once for forward ranges. In practice, what's going to happen is library authors will simply not bother to implement both, and we will end up with gaps where many (data structure, algorithm) pairs are not covered. I appreciate the desire to provide consistent copying semantics, but if it comes to a choice between that and having a single unified API, I will choose the unified API, warts and all.
 Restricting copying would make ranges borderline unusable. They 
 have to be able to be passed around and be wrappable, which 
 means either copying them or moving them, and moving them would 
 be very un-user-friendly, since it would require explicit calls 
 to move calls all over the place.
Worth noting that generic range algorithms already have to account for the existence of non-copyable types, since even if a range itself is copyable, its elements may not be. Also, as I pointed out in an earlier reply to Walter [1], there are legitimate use-cases for non-copyable input streams, where making them copyable would incur a significant performance overhead. I think the correct solution to this is something like DIP 1040, which makes non-copyable types less of a pain to work with in general. [1] https://forum.dlang.org/post/sullaynnrjufookbuijt forum.dlang.org --- In light of the points above, my proposed copying semantics for input streams are: 1. Input streams MAY be non-copyable, but are not required to be. 2. If you copy an input stream and then call next() on the original, the behavior of both the original and the copy is unspecified. That is, I think we should give up on having 100% consistent copying semantics in this one case, in order to keep the overall API unified (by supporting UFCS next()) and avoid unnecessary pessimization of range algorithms. The good news is that with these semantics, if you write you generic code conservatively and treat all input streams as non-copyable, you'll get the right behavior in all cases. And if you don't, you'll get a compile-time error as soon as you actually try to pass in a non-copyable input stream, and you'll know exactly how to fix it. So this design is still an improvement on the status quo.
Jan 21 2024
next sibling parent reply Jonathan M Davis <newsgroup.d jmdavisprog.com> writes:
On Sunday, January 21, 2024 10:22:44 PM MST Paul Backus via Digitalmars-d 
wrote:
 On Monday, 22 January 2024 at 03:52:02 UTC, Jonathan M Davis

 wrote:
 On Sunday, January 21, 2024 6:50:26 AM MST Alexandru Ermicioi

 via Digitalmars- d wrote:
 Then new input range api can also be propagated to other types
 of range such as forward range and further.
Part of the point here is to _not_ do that, because they fundamentally have different semantics.
 If you split the API in two, by making input streams (basic input
 ranges) and forward ranges completely disjoint, you undermine
 this goal. Now, each data structure has to implement *two* APIs,
 and each algorithm has to be implemented *twice*, once for input
 streams and once for forward ranges.

 In practice, what's going to happen is library authors will
 simply not bother to implement both, and we will end up with gaps
 where many (data structure, algorithm) pairs are not covered.
In practice, basic input ranges don't work with the vast majority of algorithms anyway. Some do (e.g. map), but pretty much anything that needs to do more than a simple transformation on the elements ends up needing a forward range. I really don't think that we're going to lose much by forcing basic input ranges to be separate.
 Restricting copying would make ranges borderline unusable. They
 have to be able to be passed around and be wrappable, which
 means either copying them or moving them, and moving them would
 be very un-user-friendly, since it would require explicit calls
 to move calls all over the place.
Worth noting that generic range algorithms already have to account for the existence of non-copyable types, since even if a range itself is copyable, its elements may not be.
Do they? Non-copyable types do not work with ranges right now. They don't work with foreach right now. Non-copyable types have their uses to be sure, but they're also fairly niche. Realistically, very little range-based code is going to work with them because of how restricted they are to deal with, and unless you're explicitly testing code with them, you're not going to end up writing code that works with them. We could bend over backwards to try to make them work in Phobos, but no one else is going to do that unless they're using non-copyable types in ranges themselves (which almost no one will be doing even if we support it), and trying to support non-copyable types will actively make range-base code worse in many cases, because it will require calling front multiple times, and many range-based algorithms do real work in front rather than simply returning a value (e.g. map works that way). Granted, range-based code in general doesn't tend to be very disciplined about whether it calls front once or several times, but forcing it to be several times in order to support non-copyable types will have a real cost for code that is _way_ more common than non-copyable types. Personally, I think that we should make it very explicit that non-copyable types are not supported by ranges. They don't work with them now, and I don't think that it's even vaguely worth it to try to make ranges work with them. The cost is too high for too little benefit.
 In light of the points above, my proposed copying semantics for
 input streams are:

 1. Input streams MAY be non-copyable, but are not required to be.
 2. If you copy an input stream and then call next() on the
 original, the behavior of both the original and the copy is
 unspecified.

 That is, I think we should give up on having 100% consistent
 copying semantics in this one case, in order to keep the overall
 API unified (by supporting UFCS next()) and avoid unnecessary
 pessimization of range algorithms.

 The good news is that with these semantics, if you write you
 generic code conservatively and treat all input streams as
 non-copyable, you'll get the right behavior in all cases. And if
 you don't, you'll get a compile-time error as soon as you
 actually try to pass in a non-copyable input stream, and you'll
 know exactly how to fix it. So this design is still an
 improvement on the status quo.
I really think that trying to support non-copyable types is not worth it - either for the ranges themselves or for their elements. It's not something that very many people are even going to think about, let alone do. So, if it were supported, realistically, it would just be with the algorithms in Phobos and with the code that the few people using such types wrote for themselves. And even in Phobos, even if we wanted to support it, realistically, it wouldn't work a lot of the time until the few people that cared complained about it, because very few programmers would even think about testisg with non-copyable types. It would be _far_ worse than the situation we have now with code needing to call save to work correctly and yet failing to call save all over the place, because most ranges don't need it. Even with Phobos, we have had many bugs over the years due to stuff like forgetting to call save or reusing a range, because you have to test with way too many range types to catch all of the edge cases. IMHO, we need to be trying to reduce the number of edge cases, not increasing them. And it's not like non-copyable types have ever worked properly with ranges - so not supporting them is not removing capabiltiies. It's just not adding capabilities that a select few would like to have. And while I feel for them, I really think that it's far too niche to support given how costly it is to support. - Jonathan M Davis
Jan 22 2024
parent reply Paul Backus <snarwin gmail.com> writes:
On Monday, 22 January 2024 at 23:07:28 UTC, Jonathan M Davis 
wrote:
 In practice, basic input ranges don't work with the vast 
 majority of algorithms anyway. Some do (e.g. map), but pretty 
 much anything that needs to do more than a simple 
 transformation on the elements ends up needing a forward range. 
 I really don't think that we're going to lose much by forcing 
 basic input ranges to be separate.
Here's a very rough approximation of the number of algorithms in Phobos that work with basic input ranges: $ grep 'isInputRange' std/algorithm/*.d | wc -l 153 For comparison, here are the other range types: $ grep 'isForwardRange' std/algorithm/*.d | wc -l 141 $ grep 'isBidirectionalRange' std/algorithm/*.d | wc -l 42 $ grep 'isRandomAccessRange' std/algorithm/*.d | wc -l 62 So, I would say that your hypothesis is not supported by the data, in this case.
 Do they? Non-copyable types do not work with ranges right now. 
 They don't work with foreach right now. Non-copyable types have 
 their uses to be sure, but they're also fairly niche.
Work is being done to make ranges work with non-copyable types (e.g. [1]). Foreach works if the elements are rvalues, or the elements are lvalues and you use ref for the loop variable. This is the correct, expected behavior. I think a big part of the reason non-copyable types are not widely used is that a lot of existing library code was written without support for them in mind. Since we have an opportunity to start from scratch with Phobos v3, there is no reason to repeat this mistake. [1] https://github.com/dlang/phobos/pull/8721
 trying to support non-copyable types will actively make 
 range-base code worse in many cases, because it will require 
 calling front multiple times, and many range-based algorithms 
 do real work in front rather than simply returning a value 
 (e.g. map works that way).
Have you ever actually written generic code that works with non-copyable types? I have, and it mostly involves (a) passing things by ref, (b) using 'auto ref' and core.lifetime.forward a lot, and (c) checking isCopyable!T before trying to copy things. I have no idea why you think this would require calling front multiple times, or doing extra work. Perhaps you could try writing out some example code to clarify?
 Personally, I think that we should make it very explicit that 
 non-copyable types are not supported by ranges. They don't work 
 with them now, and I don't think that it's even vaguely worth 
 it to try to make ranges work with them. The cost is too high 
 for too little benefit.
First, as a general rule, I don't think that we, as standard-library authors, should be in the business of telling our users that their use-cases are illegitimate just because it makes our jobs more difficult. Second...how do you know the benefit is too little? You thought earlier that "the vast majority of algorithms" didn't work with basic input ranges, even though the actual data shows that isn't true. Unless you have concrete evidence, I think it is better to avoid making assumptions about which use-cases are important and which aren't.
Jan 22 2024
next sibling parent Jonathan M Davis <newsgroup.d jmdavisprog.com> writes:
On Monday, January 22, 2024 5:57:49 PM MST Paul Backus via Digitalmars-d 
wrote:
 On Monday, 22 January 2024 at 23:07:28 UTC, Jonathan M Davis

 wrote:
 In practice, basic input ranges don't work with the vast
 majority of algorithms anyway. Some do (e.g. map), but pretty
 much anything that needs to do more than a simple
 transformation on the elements ends up needing a forward range.
 I really don't think that we're going to lose much by forcing
 basic input ranges to be separate.
Here's a very rough approximation of the number of algorithms in Phobos that work with basic input ranges: $ grep 'isInputRange' std/algorithm/*.d | wc -l 153 For comparison, here are the other range types: $ grep 'isForwardRange' std/algorithm/*.d | wc -l 141 $ grep 'isBidirectionalRange' std/algorithm/*.d | wc -l 42 $ grep 'isRandomAccessRange' std/algorithm/*.d | wc -l 62 So, I would say that your hypothesis is not supported by the data, in this case.
I will admit that the number is higher than I would have expected, but most of those algorithms are on the simple side where the algorithm itself doesn't care whether the range is consumed, whereas the caller very much cares. They were written to accept basic input ranges, because the algorithm itself will work with a basic input range, but that doesn't mean that it can actually be used with basic input ranges to do anything useful. For instance, you can use startsWith on a basic input range, but you pretty much never would, because then you've consumed it - and actually, it's not even safe to use the rest of the range in generic code after passing it to startsWith, because it was copied, and the original could be in an invalid state. The same goes with any function that takes an argument and gives you a result that doesn't wrap the input range and return it or return a partially iterated range. In my experience, it's very rare that it's possible to make anything that isn't incredibly simple work with a basic input range due to the fact that looking at it consumes it and makes it impossible to then do anything useful with it. And it's easy to write simple algorithms which will technically work with a basic input range but which aren't actually useful with a basic input range. So, sure, a surprising number of the algorithms in std.algorithm work with isInputRange, but a number of them really shouldn't be used with basic input ranges. Now, a few of them might be more useful if you could rely on a basic input range having reference semantics, but their composibility in general is pretty poor. Either way, I think that the gain from being able to require specific copy semantics for both forward ranges and basic input ranges is worth it. And since such a requirement couldn't actually be enforced at compile time unless we made it impossible to have basic input ranges that were structs (which would then make reference counting impossible), then a specific algorithm could choose to add a function to forward ranges via UFCS and support them if it didn't care about the copy semantics, but like with trusted code, I think that it should then be up to the programmer to make sure that it does the right thing so that basic input ranges can be required to have reference semantics.
 Foreach works if the elements are rvalues, or the elements are
 lvalues and you use ref for the loop variable. This is the
 correct, expected behavior.
Well, I couldn't get it to work when I tried, but either way, it means that you can't use foreach on a range of non-copyable types in generic code, because the way that you use foreach would have to be the same regardless of the elment type. Of course, you could use static if and branch the code based on whether the types are copyable or not, but that's yet another complication to range implementations which are already often far too complicated with too many static if branches.
 trying to support non-copyable types will actively make
 range-base code worse in many cases, because it will require
 calling front multiple times, and many range-based algorithms
 do real work in front rather than simply returning a value
 (e.g. map works that way).
Have you ever actually written generic code that works with non-copyable types? I have, and it mostly involves (a) passing things by ref, (b) using 'auto ref' and core.lifetime.forward a lot, and (c) checking isCopyable!T before trying to copy things. I have no idea why you think this would require calling front multiple times, or doing extra work. Perhaps you could try writing out some example code to clarify?
If you can't rely on front returning something that you can copy, then you're going to have to call front multiple times any time that you need to access front multiple times - either that or call move on it, which you can't necessarily do safely, because that would alter the range. Algorithms that don't have to look at front more than once don't have that problem, but plenty of algorithms need to do multiple things with front. And often, it's much better to just store the result of front rather than call front multiple times, because it's often the case that front does actual work that you'd rather not do over and over again (and it's particularly bad when you have stuff like the result of map where the transformation function it was given allocates on the heap). So the result of having to worry about non-copyable types is that algorithms that need to access front multiple times are either going to end up calling front multiple times, harming performance in many cases, or they're going to have static if branches to change what they're doing based on the return type of front, which is going to make them that much more complex to implement.
 Personally, I think that we should make it very explicit that
 non-copyable types are not supported by ranges. They don't work
 with them now, and I don't think that it's even vaguely worth
 it to try to make ranges work with them. The cost is too high
 for too little benefit.
First, as a general rule, I don't think that we, as standard-library authors, should be in the business of telling our users that their use-cases are illegitimate just because it makes our jobs more difficult. Second...how do you know the benefit is too little? You thought earlier that "the vast majority of algorithms" didn't work with basic input ranges, even though the actual data shows that isn't true. Unless you have concrete evidence, I think it is better to avoid making assumptions about which use-cases are important and which aren't.
IMHO, it's already way too complicated to write correct range-based code, and adding non-copyable types into the mix is very much a step too far. Obviously, there can be disagreement on that, but it's already very difficult to write range-based code that actually works properly with all types of ranges. You have to test it with a ridiculous number of types, and most people just don't do that. Phobos often doesn't do that. And every capability that we add to ranges means even more static if blocks and even more variations that need to be tested. The only way to fix that is to simplify things by putting greater restrictions and requirements on how ranges work, whereas making them work with a wider variety of things is going in the opposite direction of that. Realistically, the best that even might happen here for folks who want ranges to work with non-copyable types is that a subset of the algorithms in Phobos will be made to work with them, but third-party libraries are not going to go to that level of effort (and I don't think that it's all reasonable to tell them that they should). And Phobos' range-based code will become even more complicated and that much harder to maintain and make sure that it works correctly. And we already have too much trouble making sure that Phobos' range-based code works correctly. So, my take on it is that supporting non-copyable types is simply not worth the effort given that it's going to complicate the code and the tests that much more, and they're already too complex. And so, if it's my choice, then we simply won't support non-copyable types with ranges. Obviously, I'm not the sole voice in this, and ultimately, the choice is going to be up to Walter and Atila (probably Atila given that he's supposed to be the one in charge of Phobos), so I could definitely be overruled on this, but I very much want to see it become easier to write correct range-based code, not have it become harder. - Jonathan M Davis
Jan 22 2024
prev sibling parent monkyyy <crazymonkyyy gmail.com> writes:
On Tuesday, 23 January 2024 at 00:57:49 UTC, Paul Backus wrote:
 
     $ grep 'isForwardRange' std/algorithm/*.d | wc -l
     141
     $ grep 'isBidirectionalRange' std/algorithm/*.d | wc -l
     42
     $ grep 'isRandomAccessRange' std/algorithm/*.d | wc -l
     62

 So, I would say that your hypothesis is not supported by the 
 data, in this case.
Oh, I hope someone pays attension and makes arrays more important then doubly linked lists
Jan 24 2024
prev sibling parent Quirin Schroll <qs.il.paperinik gmail.com> writes:
On Monday, 22 January 2024 at 05:22:44 UTC, Paul Backus wrote:
 You have N data structures and M algorithms, and by using this 
 common interface you can implement all N*M combinations in 
 O(N+M) code. Each data structure implements the range API once, 
 and each algorithm is implemented once using the range API.

 If you split the API in two, by making input streams (basic 
 input ranges) and forward ranges completely disjoint, you 
 undermine this goal. Now, each data structure has to implement 
 *two* APIs, and each algorithm has to be implemented *twice*, 
 once for input streams and once for forward ranges.
Not necessarily. First, any algorithm requiring a forward range wouldn’t be implemented twice. And, most importantly, a data structure would not offer input range iteration if it can offer forward range. Why would it?
 In practice, what's going to happen is library authors will 
 simply not bother to implement both, and we will end up with 
 gaps where many (data structure, algorithm) pairs are not 
 covered.
The only realistic concern I see with the proposal is algorithms being implemented requiring forward ranges when they could (also) work with basic input ranges. Overconstraining input is not specific to ranges, though. What Jonathan never mentioned in his proposal, but something that, as far as I see, is true: One can wrap any forward range to be an input range. Heck, the pointer to any forward range is an input range, assuming there is a global `next(ForwardRange*)` function utilizing the forward range primitives. If you pass `&forwardRange` to an input range algorithm, it may copy it around as it pleases, it will change the underlying range just like an input range should; and you’d have the `&` for an indication that it does. A problem I see with this is composition. Requiring the lifting to a pointer requires lvalues, and composition lives on not having those. The issue will generally be negligible, as most range algorithms returning a range will for the most part want to preserve as many of the abilities of the original range as possible. The difference in the algorithm would be that, instead of: ```d // implement input range static if (/* base range is forward range */) // implement forward range static if (/* base range is bidi range */ // implement bidi range // etc. ``` You would have: ```d static if (/* base range is input range */) // implement input range else: static if (/* base range is forward range */) // implement forward range static if (/* base range is bidi range */) // implement bidi range // etc. ```
Jan 23 2024
prev sibling parent reply Alexandru Ermicioi <alexandru.ermicioi gmail.com> writes:
On Monday, 22 January 2024 at 03:52:02 UTC, Jonathan M Davis 
wrote:
 I'd have to think about it more to see whether having hasNext 
 instead of returning a nullable type from next would be an 
 improvement, make it worse, or be more or less equal. A lot of 
 that depends on how it would affect the typical implementation 
 of a basic input range.
Well one advantage of `hasNext` is that you can check in advance whether range is empty or not, without the need to pop front element.
 Part of the point here is to _not_ do that, because they 
 fundamentally have different semantics.
That would mean that you need to overload a method to accept both input, and forward ranges when foreach is not used, right?
 If this is coming across as complicated, then it's probably 
 because of all of the explanatory text I had to give as to why 
 these changes should be made. But the changes themselves 
 simplify things for forward ranges.
Could be the case :). Perhaps it would be nice to see it as a code example.
Jan 22 2024
parent Jonathan M Davis <newsgroup.d jmdavisprog.com> writes:
On Monday, January 22, 2024 2:17:08 AM MST Alexandru Ermicioi via Digitalmars-
d wrote:
 On Monday, 22 January 2024 at 03:52:02 UTC, Jonathan M Davis

 wrote:
 I'd have to think about it more to see whether having hasNext
 instead of returning a nullable type from next would be an
 improvement, make it worse, or be more or less equal. A lot of
 that depends on how it would affect the typical implementation
 of a basic input range.
Well one advantage of `hasNext` is that you can check in advance whether range is empty or not, without the need to pop front element.
true
 Part of the point here is to _not_ do that, because they
 fundamentally have different semantics.
That would mean that you need to overload a method to accept both input, and forward ranges when foreach is not used, right?
Yes, though in practice, most functions tend to require forward ranges anyway. So, while the separation will be annoying in some cases, in general, I wouldn't expect it to be a big problem, and if anything, it will help make the distinction between basic input ranges and forward ranges clearer, which will probably make it easier to reason about the code. Obviously, we'll have to actually do it to see what the impact is, but realistically, about all you can do with basic input ranges is iterate through them once and possibly do transformations on their elements as you do it. The fact that they can be iterated separately from a foreach loop is beneficial, and some range-based functions are useful with them (e.g. map), but most range-based algorithms need forward ranges.
 If this is coming across as complicated, then it's probably
 because of all of the explanatory text I had to give as to why
 these changes should be made. But the changes themselves
 simplify things for forward ranges.
Could be the case :). Perhaps it would be nice to see it as a code example.
Well for forward ranges, you'd basically be writing what you write now except that you wouldn't have to worry about calling save anywhere, which would signifcantly reduce the number of bugs that crop into range-based code. So, about all I could really show you would be normal range-based code that just doesn't call save, because it doesn't need to. It would also allow you to assign one range to another (assuming that they're the same type), which you can't do right now without relying on the specific copy semantics of a range. But for forward ranges, the main change here would simply be that you could assume that copying a range was equivalent to save, which is purely a simplification. If there's a complication, it's only that ranges which are currently reference types would have to do something smarter than saving in their copy constructor if they wanted to be efficient. So, they'd work with something like struct Range(T) { T front() { return _range.front; } void popFront() { _range.popFront(); } bool empty() { return _range.empty; } // or copy constructor this(this) { _range = _range.save; } private static class RefRange { ... } private RefRange _range; } but if you wanted them to be more efficient, you'd need something like struct Range(T) { T front() { return _range.front; } void popFront() { if(_range._refCount != 1) _range = _range.save; // ref count now 1 _range.popFront(); } bool empty() { return _range.empty; } // or copy constructor this(this) { _range.incRef(); } ~this() { _range.decRef(); } private static class RefRange { ... } private RefRange _range; } So, there would be some extra complication for reference type forward ranges, but most forward ranges already do the equivalent of calling save when they're copied, which is a large part of why you frequently end up with bugs when you try to use a range that needs you to call save explicitly. - Jonathan M Davis
Jan 22 2024
prev sibling next sibling parent reply Paul Backus <snarwin gmail.com> writes:
On Sunday, 21 January 2024 at 05:00:31 UTC, Jonathan M Davis 
wrote:
 From what I can see, the main negative is simply that you can't 
 then write code that works on both a basic input range and a 
 forward range (though you can obviously still create function 
 overloads so that the caller can use either)
With the proposed design, it would be possible to implement next() for forward ranges as a UFCS function: auto next(FR)(FR fr) if (std.v2.range.isForwardRange!FR) { if (fr.empty) { return Nullable!(ElementType!FR).init; } else { scope(success) fr.popFront; return nullable(fr.front); } } So, a function written to operate on basic input ranges would be able to accept any kind of range, same as today.
Jan 21 2024
next sibling parent Paul Backus <snarwin gmail.com> writes:
On Sunday, 21 January 2024 at 14:51:33 UTC, Paul Backus wrote:
 With the proposed design, it would be possible to implement 
 next() for forward ranges as a UFCS function:

     auto next(FR)(FR fr)
Correction: fr should be passed by ref (or maybe auto ref).
Jan 21 2024
prev sibling next sibling parent reply Jonathan M Davis <newsgroup.d jmdavisprog.com> writes:
On Sunday, January 21, 2024 7:51:33 AM MST Paul Backus via Digitalmars-d 
wrote:
 On Sunday, 21 January 2024 at 05:00:31 UTC, Jonathan M Davis

 wrote:
 From what I can see, the main negative is simply that you can't
 then write code that works on both a basic input range and a
 forward range (though you can obviously still create function
 overloads so that the caller can use either)
With the proposed design, it would be possible to implement next() for forward ranges as a UFCS function: auto next(FR)(FR fr) if (std.v2.range.isForwardRange!FR) { if (fr.empty) { return Nullable!(ElementType!FR).init; } else { scope(success) fr.popFront; return nullable(fr.front); } } So, a function written to operate on basic input ranges would be able to accept any kind of range, same as today.
True, but then you have a function with the same copy semantics problem that we have now, since the forward range wouldn't have reference semantics. It would need to be fully wrapped in another type which gave it reference semantics to do that. So, while someone could do something like this, I question that we should encourage it or support it with anything in Phobos. - Jonathan M Davis
Jan 21 2024
parent reply Paul Backus <snarwin gmail.com> writes:
On Monday, 22 January 2024 at 02:10:32 UTC, Jonathan M Davis 
wrote:
 On Sunday, January 21, 2024 7:51:33 AM MST Paul Backus via 
 Digitalmars-d wrote:
 With the proposed design, it would be possible to implement 
 next() for forward ranges as a UFCS function:
[...]
 True, but then you have a function with the same copy semantics 
 problem that we have now, since the forward range wouldn't have 
 reference semantics. It would need to be fully wrapped in 
 another type which gave it reference semantics to do that.
You caught me--I forgot to mark the parameter as 'ref'. Dangers of writing code directly into a message without testing it first, I suppose. :)
Jan 21 2024
parent Jonathan M Davis <newsgroup.d jmdavisprog.com> writes:
On Sunday, January 21, 2024 7:30:40 PM MST Paul Backus via Digitalmars-d 
wrote:
 On Monday, 22 January 2024 at 02:10:32 UTC, Jonathan M Davis

 wrote:
 On Sunday, January 21, 2024 7:51:33 AM MST Paul Backus via

 Digitalmars-d wrote:
 With the proposed design, it would be possible to implement
 next() for forward ranges as a UFCS function:
[...]
 True, but then you have a function with the same copy semantics
 problem that we have now, since the forward range wouldn't have
 reference semantics. It would need to be fully wrapped in
 another type which gave it reference semantics to do that.
You caught me--I forgot to mark the parameter as 'ref'. Dangers of writing code directly into a message without testing it first, I suppose. :)
Well, even if the parameter is ref (which it would need to be), the core problem is that the range-based function then has to deal with passing around an object that could have value semantics even though basic input ranges would be required to have reference semantics with the proposed changes. Obviously, range-based code _can_ work without caring about the copy semantics of the range type, but the main thrust of the proposed changes is to try to require specific copy semantics from different categories of ranges so that we can avoid the bugs that come with different range types having different copy semantics (as well as making it possible to write code that takes advantage of the specific copy semantics of a particular category of range; even simply being able to assign one range to another in generic code would be a big help to some code). Ideally, code would be able to rely on the copy and assignment semantics of ranges - and that means not trying to treat basic input ranges and forward ranges as the same thing without actually wrapping one in a way that forces it to be the same thing (be it by turning a forward range into a reference type with the basic input range API or by wrapping a basic input range in a struct which caches the data such that it can implement the forward range API). Simply slapping the functions for the other API on one of the two types will result in ranges that won't behave properly, which will work with some functions and fail miserably with others. - Jonathan M Davis
Jan 21 2024
prev sibling next sibling parent Jonathan M Davis <newsgroup.d jmdavisprog.com> writes:
On Sunday, January 21, 2024 7:51:33 AM MST Paul Backus via Digitalmars-d 
wrote:
 On Sunday, 21 January 2024 at 05:00:31 UTC, Jonathan M Davis

 wrote:
 From what I can see, the main negative is simply that you can't
 then write code that works on both a basic input range and a
 forward range (though you can obviously still create function
 overloads so that the caller can use either)
With the proposed design, it would be possible to implement next() for forward ranges as a UFCS function: auto next(FR)(FR fr) if (std.v2.range.isForwardRange!FR) { if (fr.empty) { return Nullable!(ElementType!FR).init; } else { scope(success) fr.popFront; return nullable(fr.front); } } So, a function written to operate on basic input ranges would be able to accept any kind of range, same as today.
Actually. What occurs to me is that if we made this function take the range by pointer, then it would be possible to give a forward range the correct copy semantics. You'd then have to take its address (or put it on the heap) to make it work, but then you'd get reference semantics even though it's a forward range. Obviously, that's less than ideal if you want a function to accept both basic input ranges and forward ranges, but if it's a function that really only needs basic input ranges and doesn't benefit from operating on a forward range, then it would be simple enough to create an overload which takes the address - and if you really want to be returning the range from the function, then you wouldn't want it to be operating on both basic input ranges and forward ranges anyway, since you'd usually be looking to wrap the result, in which case, you'd need to do all of the annoying static ifs to determine which functions from the range API you could slap on the wrapper range based on the capabilities of the one being passed in. - Jonathan M Davis
Jan 22 2024
prev sibling parent Quirin Schroll <qs.il.paperinik gmail.com> writes:
On Sunday, 21 January 2024 at 14:51:33 UTC, Paul Backus wrote:
 On Sunday, 21 January 2024 at 05:00:31 UTC, Jonathan M Davis 
 wrote:
 From what I can see, the main negative is simply that you 
 can't then write code that works on both a basic input range 
 and a forward range (though you can obviously still create 
 function overloads so that the caller can use either)
With the proposed design, it would be possible to implement next() for forward ranges as a UFCS function: auto next(FR)(FR fr) if (std.v2.range.isForwardRange!FR) { if (fr.empty) { return Nullable!(ElementType!FR).init; } else { scope(success) fr.popFront; return nullable(fr.front); } } So, a function written to operate on basic input ranges would be able to accept any kind of range, same as today.
It wouldn’t work. That’s the point. It would compile, probably, but by the proposal, an input range would have guaranteed reference semantics, whereas a forward range has value semantics (as far as iteration state is concerned). That means, if you call `next` on a copy of an input range, *both* ranges, the old one and the copy, are on the following element, but doing that on a forward range, only the copy would be on the next element and the original would not. Both guaranteed. I’d rather say that a forward range must have a ` disable`d `next`, i.e. `isForwardRange` should test if `next` is present and if it is, refuse to acknowledge the forward range. By the proposal, input and forward ranges are fundamentally incommensurate concepts on the usage side; granted, only when it comes to copying, but that’s enough.
Jan 23 2024
prev sibling next sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
Thanks for bringing this up. Improving range semantics is something we should 
not pass up the opportunity for with Phobos3. (I consider Phobos1 the D1 
version, Phobos2 the one we have currently.)

Another thing that should be visited is the ability to lock/unlock a range. For 
example, a range that represents stdout needs to be lockable.
Jan 21 2024
next sibling parent Paul Backus <snarwin gmail.com> writes:
On Sunday, 21 January 2024 at 17:15:58 UTC, Walter Bright wrote:
 Another thing that should be visited is the ability to 
 lock/unlock a range. For example, a range that represents 
 stdout needs to be lockable.
Surely this would be better handled outside the range API. E.g., you call a library function that locks stdout and returns a range, and then when you're done with the range, it unlocks stdout again in its destructor. But the functions you pass the range to (map, filter, whatever) don't need to know anything about locking. This *would* be a good use-case for a non-copyable range, since you only want to unlock once (and reference counting would add significant overhead). So even if we don't *require* ranges to be non-copyable, we probably want to *allow* then to be.
Jan 21 2024
prev sibling parent Jonathan M Davis <newsgroup.d jmdavisprog.com> writes:
On Sunday, January 21, 2024 10:15:58 AM MST Walter Bright via Digitalmars-d 
wrote:
 Thanks for bringing this up. Improving range semantics is something we
 should not pass up the opportunity for with Phobos3. (I consider Phobos1
 the D1 version, Phobos2 the one we have currently.)

 Another thing that should be visited is the ability to lock/unlock a range.
 For example, a range that represents stdout needs to be lockable.
I will have to study that more to see what's needed there. If it's the kind of thing that requires higer level locking than simply locking within the implementations of existing range-based functions (e.g. if it had to be locked across multiple calls), then that likely either merits a non-standard extension that the user has to worry about that's specific to stdio stuff, or we'll need to come up with some sort of higher level addition to handle it. I very much doubt that it's the sort of thing that's going to make much sense to worry about within most range-based functions though. I will add it to the list of things to look into though. There are several other minor changes that I think are going to be needed to the range API (e.g. requiring that random-access ranges implement opDollar rather than letting it be optional, since it's useless if it's optional like it is now), but the core point of this thread was to try to sort out the issue of the copy and assignment semantics for ranges, and the best solution for that seems to be to get rid of save (which would allow all forward ranges to behave like dynamic arrays as far as their copy, assignment, and iteration semantics go) and to make basic input ranges distinct from forward ranges, since they fundamentally can't have the same semantics. The issue of tail-const is going to be another big one, though that could result in us having to switch to a head/tail model for forward ranges instead of front/popFront - either that, or we're going to need a language improvement of some kind to sort it out, since we need to be able to generically get a tail-const copy of a forward range, whereas we currently don't have a way to convert const(Range!Element) to Range!(const Element). We can't even assume that const(Range!Element) is the same as const(Range!(const Element)). So, there are definitely further issues that will need to be hashed out beyond the copy semantics. I just decided to start with that. - Jonathan M Davis
Jan 21 2024
prev sibling next sibling parent reply Sebastiaan Koppe <mail skoppe.eu> writes:
On Sunday, 21 January 2024 at 05:00:31 UTC, Jonathan M Davis 
wrote:
 I've been thinking about this for a while now, but with the 
 next version of Phobos which is in the early planning stages, 
 we really should do some redesigning of ranges.
Thanks for the write up. Another issue I have encountered is around priming. Sometimes it happens in `empty`, sometimes in a private helper function and sometimes even in the constructor. If priming happens in `empty`, then it can't be `const`. Which is strange because you would expect `empty` not to mutate. I have been thinking about having an explicit build and iteration phase, where priming happens when you switch from build to iteration. The benefit is that implementers have a clear place where to prime the range.
 What I would propose for that would be a single function

    auto next();

 where next returns a nullable type where the value returned is 
 the next element in the range, with a null value being returned 
 if the range is empty.
What about: ``` bool next(Callable)(Callable c) { if (empty) return false; c(front()); popFront(); return true; } ``` It has the benefit of not needing to unbox a Nullable/Pointer.
Jan 21 2024
next sibling parent reply Paul Backus <snarwin gmail.com> writes:
On Sunday, 21 January 2024 at 18:26:37 UTC, Sebastiaan Koppe 
wrote:
 What about:

     bool next(Callable)(Callable c) {
         if (empty)
             return false;
         c(front());
         popFront();
         return true;
     }

 It has the benefit of not needing to unbox a Nullable/Pointer.
If your concern is performance, an indirect function call is going to hurt a lot more than a pointer dereference. If your concern is aesthetics, dereferencing a pointer or unboxing a Nullable is far less ugly than passing a callback. // With pointer while (auto p = r.next) doStuffWith(*p); // With callback while (r.next((e) { doStuffWith(e); })) {}
Jan 21 2024
parent reply Ben Jones <fake fake.fake> writes:
On Sunday, 21 January 2024 at 22:40:55 UTC, Paul Backus wrote:
 If your concern is aesthetics, dereferencing a pointer or 
 unboxing a Nullable is far less ugly than passing a callback.
I've been writing a fair amount of Kotlin (an a tiny bit of Swift) and the trailing lambda syntax is very nice for making these APIs look nice. Bascially, if the last argument to a function is a function, you can put it's curly braces after the parentheses: `range.map{ it*2}.filter{ it < 10}` The sauce that Kotlin has that D doesn't is that 1-arg lambdas get an implicit parameter called `it` and explicit parameters come after the opening curly brace: `{ p1, p2 -> body }`. I'm also not sure how it would work with templates. Generally, I think it's much easier to think about lifetime guarantees with functional APIs, so something like that might make them more palatable syntactically.
Jan 21 2024
parent Quirin Schroll <qs.il.paperinik gmail.com> writes:
On Monday, 22 January 2024 at 05:54:59 UTC, Ben Jones wrote:
 On Sunday, 21 January 2024 at 22:40:55 UTC, Paul Backus wrote:
 If your concern is aesthetics, dereferencing a pointer or 
 unboxing a Nullable is far less ugly than passing a callback.
I've been writing a fair amount of Kotlin (an a tiny bit of Swift) and the trailing lambda syntax is very nice for making these APIs look nice. Bascially, if the last argument to a function is a function, you can put it's curly braces after the parentheses: `range.map{ it*2}.filter{ it < 10}` The sauce that Kotlin has that D doesn't is that 1-arg lambdas get an implicit parameter called `it` and explicit parameters come after the opening curly brace: `{ p1, p2 -> body }`. I'm also not sure how it would work with templates. Generally, I think it's much easier to think about lifetime guarantees with functional APIs, so something like that might make them more palatable syntactically.
This is off-topic, but I want to mention that the implicit `it` confused the hell out of me when I learned Kotlin. The feature literally saves measly 6 keystrokes for `it -> `, 4 if you ignore spaces. In D, `map!(it => it * 2)` IMO, is pretty good (e.g. compared to C++). D’s lambdas have their flaws, but lack of brevity isn’t one.
Jan 23 2024
prev sibling parent reply Jonathan M Davis <newsgroup.d jmdavisprog.com> writes:
On Sunday, January 21, 2024 11:26:37 AM MST Sebastiaan Koppe via Digitalmars-d 
wrote:
 On Sunday, 21 January 2024 at 05:00:31 UTC, Jonathan M Davis

 wrote:
 I've been thinking about this for a while now, but with the
 next version of Phobos which is in the early planning stages,
 we really should do some redesigning of ranges.
Thanks for the write up. Another issue I have encountered is around priming. Sometimes it happens in `empty`, sometimes in a private helper function and sometimes even in the constructor. If priming happens in `empty`, then it can't be `const`. Which is strange because you would expect `empty` not to mutate. I have been thinking about having an explicit build and iteration phase, where priming happens when you switch from build to iteration. The benefit is that implementers have a clear place where to prime the range.
Well, arguably, that's really an implementation detail of the range, and it would be pretty annoying if range-based code had to worry about explicitly priming a range (you're basically getting into two phase construction at that point, which tends to be problematic). Ideally, once you have a range, it's ready to use. So, while I could see saying what best practice was on that, I'm not sure that I'd want to require that it be done a specific way. Part of the problem is that in my experience, it sometimes makes sense to do it one place, whereas with other ranges, it makes more sense to do it somewhere else. And whether empty mutates really shouldn't matter so long as it's not actually const (it has to be logically const to be sure, but that doesn't mean that it needs to actually be const), though obviously, it can become annoying to wrap a range when you want to make your empty const, and you can't rely on the range your wrapping having empty be const. For better or worse though, the current range API makes no requirements about const (and I'm not sure if we want to add such requirements given how restrictive D's const is). If we did require const anywhere on the range API though, empty and length would be the obvious places. That being said, I'm not sure that I've ever written a range that primes anything in empty. Usually, the question is between what goes in front and and what goes in popFront and how much has to go in the constructor for that to work cleanly. And the answer is not always the same. We may also want to rework how front and popFront work for forward ranges, which could simplify the priming issue depending on what we did. Regardless though, the priming issue is certainly something that we should be thinking about. Even if we don't add anything to the API for it, it will be good to keep it in mind for how it would affect the other range API functions for any reworking of them that we might do. Regardless, the big issues that were the point of this thread were the copy and assignment semantics for ranges and what we would need to do fix those. There are definitely other issues that we're going to have to sort out (e.g. the tail-const issue).
 What I would propose for that would be a single function

    auto next();

 where next returns a nullable type where the value returned is
 the next element in the range, with a null value being returned
 if the range is empty.
What about: ``` bool next(Callable)(Callable c) { if (empty) return false; c(front()); popFront(); return true; } ``` It has the benefit of not needing to unbox a Nullable/Pointer.
Well, that would basically be another sort of opApply, which is an idiom that tends to be pretty hard to understand in comparison to the alternatives. Getting a nullable type of some kind back is far easier to read and understand, and the only real downside I see to it is that you get bugs if someone tries to access the value when there isn't one. But ranges already have that in general with front and empty, and I don't think that it's been much of an issue in practice. We could also go with next and hasNext like Alexandru suggested, in which case, we wouldn't be returning a pointer or nullable type if that's the concern. I'm not sure if that's ultimately better or worse though. In some respects, it would be better to be able to put all of the range logic in a single function, but it would also be nice in some situations to be able to ask whether a basic input range has any elements without having to grab the first one if it's there. Another consideration that comes to mind is that we might want to add a function that explicitly skips elements (since in some cases, you can skip work if you don't actually need the next element), and since next both pops the element and returns it, we wouldn't have to worry about the state of front in the process (whereas adding some kind of skip function to the current range API would result in an invalid front). - Jonathan M Davis
Jan 21 2024
parent Sebastiaan Koppe <mail skoppe.eu> writes:
On Monday, 22 January 2024 at 04:38:13 UTC, Jonathan M Davis 
wrote:
 On Sunday, January 21, 2024 11:26:37 AM MST Sebastiaan Koppe
 I have been thinking about having an explicit build and 
 iteration phase, where priming happens when you switch from 
 build to iteration.

 The benefit is that implementers have a clear place where to 
 prime the range.
Well, arguably, that's really an implementation detail of the range, and it would be pretty annoying if range-based code had to worry about explicitly priming a range (you're basically getting into two phase construction at that point, which tends to be problematic). Ideally, once you have a range, it's ready to use. So, while I could see saying what best practice was on that, I'm not sure that I'd want to require that it be done a specific way. Part of the problem is that in my experience, it sometimes makes sense to do it one place, whereas with other ranges, it makes more sense to do it somewhere else.
Yes exactly. My point is that if there is an explicit place to do priming, that uncertainty would go away.
 That being said, I'm not sure that I've ever written a range 
 that primes anything in empty. Usually, the question is between 
 what goes in front and and what goes in popFront and how much 
 has to go in the constructor for that to work cleanly. And the 
 answer is not always the same.
With 2-phase ranges it would always be the same. Anyway, here are some Phobos examples: - `filter` does its priming in empty, https://github.com/dlang/phobos/blob/bf35228426529ab19e5d17a3286d187214bf024a/std/algorithm/iteration.d#L1381 - `filterByDirectional` does a while loop in its constructor, https://github.com/dlang/phobos/blob/bf35228426529ab19e5d17a3286d187214bf024a/std/algorithm/iteration.d#L1581 - `chunkBy` calls empty in its constructor, https://github.com/dlang/phobos/blob/bf35228426529ab19e5d17a3286d187214bf024a/std/algorithm/iteration.d#L1931 - `substitute` calls empty and popFront in its constructor, https://github.com/dlang/phobos/blob/bf35228426529ab19e5d17a3286d187214bf024a/std/algorithm/iteration.d#L6909 Taken together, pretty much anything can happen just by constructing a range. You don't even need to iterate it!
 Regardless, the big issues that were the point of this thread 
 were the copy and assignment semantics for ranges and what we 
 would need to do fix those. There are definitely other issues 
 that we're going to have to sort out (e.g. the tail-const 
 issue).
Ultimately it all comes together or it doesn't.
 What about:

 ```
      bool next(Callable)(Callable c) {
          if (empty)
              return false;
          c(front());
          popFront();
          return true;
      }
 ```

 It has the benefit of not needing to unbox a Nullable/Pointer.
Well, that would basically be another sort of opApply, which is an idiom that tends to be pretty hard to understand in comparison to the alternatives.
Actual usual would just use `foreach` of course. But in cases where you want to iterate manually you have to deal with the added ugliness/complexity, fair. I do think its easier to access the item by `ref` this way, instead of having to do a pointer in some nullable wrapper.
 Getting a nullable type of some kind back is far easier to read 
 and understand, and the only real downside I see to it is that 
 you get bugs if someone tries to access the value when there 
 isn't one. But ranges already have that in general with front 
 and empty, and I don't think that it's been much of an issue in 
 practice.
I don't particularly like APIs that have a requirement to call methods in a particular order, I much rather have APIs that can't be used wrong. Pit of success and all that. In practice though, I messed up only a few times, and then added a condition on `empty` and continued on.
 We could also go with next and hasNext like Alexandru 
 suggested, in which case, we wouldn't be returning a pointer or 
 nullable type if that's the concern. I'm not sure if that's 
 ultimately better or worse though. In some respects, it would 
 be better to be able to put all of the range logic in a single 
 function, but it would also be nice in some situations to be 
 able to ask whether a basic input range has any elements 
 without having to grab the first one if it's there.
With some sources, determining whether there is a next item actually involves *getting* the next item. Instead, like `empty`, I think `hasNext` ought to be `const`.
Jan 22 2024
prev sibling next sibling parent reply Steven Schveighoffer <schveiguy gmail.com> writes:
On Sunday, 21 January 2024 at 05:00:31 UTC, Jonathan M Davis 
wrote:
 Ultimately, I'm inclined to argue that we should give basic 
 input ranges a new API - not just because it would allow them 
 to use reference counting, but also because the current input 
 range API tends to force basic input ranges to cache the value 
 of front (which if anything, encourages basic input ranges to 
 be pseudo-reference types and could result in an extra layer of 
 indirection in some cases if they're forced to be reference 
 types). It would be annoying in some cases to require that 
 different functions be used for basic input ranges and forward 
 ranges (though overloading could obviously be used to avoid 
 needing different names), but it's already often the case that 
 code isn't really designed to work with both, and overloading 
 on the category of range being used is already fairly common, 
 since different range capabilities allow for different 
 implementations. So, given that it would prevent whole classes 
 of copying bugs as well as potentially remove the requirement 
 to cache front for basic input ranges, I think that a separate 
 API for basic input ranges is warranted. What I would propose 
 for that would be a single function

     auto next();

 where next returns a nullable type where the value returned is 
 the next element in the range, with a null value being returned 
 if the range is empty. The return type would then need to 
 emulate a pointer - specifically, when casting it to bool, it 
 would be true if it's non-null and false if it's null, and 
 dereferencing it would give you the actual value if it's 
 non-null (with it being undefined behavior if you dereference 
 null). So, a basic input range of ints might define next as

     int* next();
While I agree with most of what you wrote, there is one very big problem with this. Namely, current input ranges are suddenly considered to be forward ranges. Changing the semantics of what it means to only define `front`, `popFront`, and `empty` changes the semantics of existing code. And this is not something you can solve with some kind of versioning. The only way I can think of is to alter the name of one or more primitives to prevent accidental usage. -Steve
Jan 21 2024
next sibling parent reply Paul Backus <snarwin gmail.com> writes:
On Sunday, 21 January 2024 at 19:24:20 UTC, Steven Schveighoffer 
wrote:
 While I agree with most of what you wrote, there is one very 
 big problem with this. Namely, current input ranges are 
 suddenly considered to be forward ranges. Changing the 
 semantics of what it means to only define `front`, `popFront`, 
 and `empty` changes the semantics of existing code.

 And this is not something you can solve with some kind of 
 versioning. The only way I can think of is to alter the name of 
 one or more primitives to prevent accidental usage.
One possibility is to change the forward range interface to something like bool empty(); Element head(); typeof(this) tail(); In addition to distinguishing new-style forward ranges from old-style ones, this interface would also allow forward ranges to be `const` (although iterating a `const` forward range would only be possible with recursion). `popFront` could then be implemented as a UFCS function: void popFront(R)(ref R r) if (std.v2.range.isForwardRange!R && isAssignable!R) { r = r.tail; }
Jan 21 2024
parent Jonathan M Davis <newsgroup.d jmdavisprog.com> writes:
On Sunday, January 21, 2024 3:46:06 PM MST Paul Backus via Digitalmars-d 
wrote:
 On Sunday, 21 January 2024 at 19:24:20 UTC, Steven Schveighoffer

 wrote:
 While I agree with most of what you wrote, there is one very
 big problem with this. Namely, current input ranges are
 suddenly considered to be forward ranges. Changing the
 semantics of what it means to only define `front`, `popFront`,
 and `empty` changes the semantics of existing code.

 And this is not something you can solve with some kind of
 versioning. The only way I can think of is to alter the name of
 one or more primitives to prevent accidental usage.
One possibility is to change the forward range interface to something like bool empty(); Element head(); typeof(this) tail(); In addition to distinguishing new-style forward ranges from old-style ones, this interface would also allow forward ranges to be `const` (although iterating a `const` forward range would only be possible with recursion). `popFront` could then be implemented as a UFCS function: void popFront(R)(ref R r) if (std.v2.range.isForwardRange!R && isAssignable!R) { r = r.tail; }
Well, I think that the reality of the matter here is that if we want to deal with const and immutable properly (which is another issue that we need to be tackling with the range API redesign), we're either going to need to switch to a more functional-style API like this, or we're going to need to add some sort of opTailConst to the language, or we're going to need to add some other sort of language improvement that would allow use to convert const(Range!Element) to Range!(const Element) generically. As things stand, const and ranges really don't interact well together outside of fairly limited situations (e.g. using dynamic arrays for everything), and containers in particular get screwed by it. So, we may very well decide that we need to move forward ranges to something like head and tail, though I'm not terribly happy with the idea, and I'm not sure what the performance implications are (though LDC at least could probably make it not matter). It would also require another change to foreach, but pretty much _anything_ we do here to fix the tail-const problem with ranges is going to require some kind of language change. Another problem with the head/tail approach is that we then have to come up with names for bidirectional ranges, and all of that could get pretty confusing. But ultimately, that can be solved. Either way, the main things that I think needs to happen to fix the copy semantics problems are to require that forward ranges have the same copy semantics as dynamic arrays (at least with regards to their iteration state), which means that save is no longer a thing, and we need basic input ranges to be reference types, which means that they should probably have a different API from forward ranges. I suppose that we could then change the forward range API and leave the basic input range API alone, but given that the current API requires caching front, which causes some problems for basic input ranges, I'm inclined to give basic input ranges a different API. If we then also need to change the API of forward ranges beyond removing save, then so be it, though that will result in a lot more code having to be changed when it's updated to use or be in Phobos v3. - Jonathan M Davis
Jan 21 2024
prev sibling parent Jonathan M Davis <newsgroup.d jmdavisprog.com> writes:
On Sunday, January 21, 2024 12:24:20 PM MST Steven Schveighoffer via 
Digitalmars-d wrote:
 While I agree with most of what you wrote, there is one very big
 problem with this. Namely, current input ranges are suddenly
 considered to be forward ranges. Changing the semantics of what
 it means to only define `front`, `popFront`, and `empty` changes
 the semantics of existing code.

 And this is not something you can solve with some kind of
 versioning. The only way I can think of is to alter the name of
 one or more primitives to prevent accidental usage.
Yeah, I was orignally hoping that this sort of thing could just be solved with editions, but regardless of what we do with editions and Phobos, third party code won't necessarily be updated to match, and it won't even necessarily be obvious whether it has been. To an extent, that doesn't really matter, and we could theoretically punt on the issue based on the idea that any basic input ranges written to follow the Phobos v2 API would just behave incorrectly with functions written to follow the Phobos v3 API for forward ranges, so testing would catch it, but that's obviously not ideal. It _could_ be caught if we required all basic input ranges to be either class references or pointers to structs, while requiring that all forward ranges be structs or dynamic arrays (and we will have to do the latter), but that would arguably restrict the implementation of basic input ranges too much (e.g. reference counting wouldn't be possible). For better or worse, it's definitely an argument for switching to a more functional-style API like Paul mentioned and like has occasionally been discussed in the past (usually as a potentially solution to the tail-const and tail-immutable issue). That would actually allow us to solve other problems rather than simply being a minor API change just to make it so that we could detect the difference, though I'm not sure what all of the other consequences of such a change would be (particularly with regards to performance). - Jonathan M Davis
Jan 21 2024
prev sibling next sibling parent reply Atila Neves <atila.neves gmail.com> writes:
On Sunday, 21 January 2024 at 05:00:31 UTC, Jonathan M Davis 
wrote:
 I've been thinking about this for a while now, but with the 
 next version of Phobos which is in the early planning stages, 
 we really should do some redesigning of ranges. Most of their 
 API is just fine how it is, but there are some aspects of it 
 which really should be changed if we want them to be better 
 (the most obvious one being the removal of auto-decoding). But 
 what I'd like to discuss specifically in this thread is fixing 
 - and defining - the semantics of copying and assigning to 
 ranges. Specifically, the semantics of stuff like

 [...]
I don't think I've ever encountered a situation where reference ranges would have been desirable - I've never used one. I think that `.save` was a historical mistake, and that ranges that can be copied are forward ranges. Something like a range reading from stdin or a socket would/should disable the copy/postblit constructors. Has anyone here used the class-based ranges?
Jan 22 2024
next sibling parent "Richard (Rikki) Andrew Cattermole" <richard cattermole.co.nz> writes:
On 23/01/2024 4:41 AM, Atila Neves wrote:
 On Sunday, 21 January 2024 at 05:00:31 UTC, Jonathan M Davis wrote:
 I've been thinking about this for a while now, but with the next 
 version of Phobos which is in the early planning stages, we really 
 should do some redesigning of ranges. Most of their API is just fine 
 how it is, but there are some aspects of it which really should be 
 changed if we want them to be better (the most obvious one being the 
 removal of auto-decoding). But what I'd like to discuss specifically 
 in this thread is fixing - and defining - the semantics of copying and 
 assigning to ranges. Specifically, the semantics of stuff like

 [...]
I don't think I've ever encountered a situation where reference ranges would have been desirable - I've never used one. I think that `.save` was a historical mistake, and that ranges that can be copied are forward ranges. Something like a range reading from stdin or a socket would/should disable the copy/postblit constructors. Has anyone here used the class-based ranges?
I did experiment with them when I first learned ranges. Nothing is set up for them. But they do two things extremely well: 1. Create confusion, they are not the definition of what a range is, nor are they used anywhere. 2. Of course since then I've also learned that they offer opportunity to prevent optimizations, so not a great solution to be tied to. I want signatures. I've told Adam Wilson that if you want me involved with ranges for PhobosV3 at the very least I want the compile time aspect. The current situation with ranges not being properly documented in API is just not good enough. Signatures do have a runtime aspect, they can represent ranges if you need a reference, we can also implement something like save via that (even if the implementation doesn't have it).
Jan 22 2024
prev sibling next sibling parent reply "H. S. Teoh" <hsteoh qfbox.info> writes:
On Mon, Jan 22, 2024 at 03:41:35PM +0000, Atila Neves via Digitalmars-d wrote:
 On Sunday, 21 January 2024 at 05:00:31 UTC, Jonathan M Davis wrote:
 I've been thinking about this for a while now, but with the next
 version of Phobos which is in the early planning stages, we really
 should do some redesigning of ranges. Most of their API is just fine
 how it is, but there are some aspects of it which really should be
 changed if we want them to be better (the most obvious one being the
 removal of auto-decoding). But what I'd like to discuss specifically
 in this thread is fixing - and defining - the semantics of copying
 and assigning to ranges. Specifically, the semantics of stuff like
 
 [...]
I don't think I've ever encountered a situation where reference ranges would have been desirable - I've never used one.
It's useful in recursive-descent parsers where you expect the range to have advanced past whatever a recursive call has consumed.
 I think that `.save` was a historical mistake, and that ranges that
 can be copied are forward ranges. Something like a range reading from
 stdin or a socket would/should disable the copy/postblit constructors.
Meaning they can only be passed by reference?
 Has anyone here used the class-based ranges?
Yes. They are useful when you have a range whose type varies at runtime. For example: auto myFunc(R)(R range, int type) { switch (type) { case 1: return range.map!(a => a*2); case 2: auto auxRange = helperFunc(range); return range.filter!(a => a % 2) .chain(auxRange); case 3: return range.chain(recurrence!(...)); } assert(0); } This code will not compile because there is no common return type. To fix it, wrap each return in out of the class-based range objects: auto myFunc(R)(R range, int type) { switch (type) { case 1: return range.map!(a => a*2) .inputRangeObject; case 2: auto auxRange = helperFunc(range); return range.filter!(a => a % 2) .chain(auxRange) .inputRangeObject; case 3: return range.chain(recurrence!(...)) .inputRangeObject; } assert(0); } T -- They say that "guns don't kill people, people kill people." Well I think the gun helps. If you just stood there and yelled BANG, I don't think you'd kill too many people. -- Eddie Izzard, Dressed to Kill
Jan 22 2024
parent reply Atila Neves <atila.neves gmail.com> writes:
On Monday, 22 January 2024 at 17:07:05 UTC, H. S. Teoh wrote:
 On Mon, Jan 22, 2024 at 03:41:35PM +0000, Atila Neves via 
 Digitalmars-d wrote:
 On Sunday, 21 January 2024 at 05:00:31 UTC, Jonathan M Davis 
 wrote:
 [...]
I don't think I've ever encountered a situation where reference ranges would have been desirable - I've never used one.
It's useful in recursive-descent parsers where you expect the range to have advanced past whatever a recursive call has consumed.
Why can't this be done with value ranges?
 I think that `.save` was a historical mistake, and that ranges 
 that can be copied are forward ranges. Something like a range 
 reading from stdin or a socket would/should disable the 
 copy/postblit constructors.
Meaning they can only be passed by reference?
Or moved.
Jan 23 2024
parent "H. S. Teoh" <hsteoh qfbox.info> writes:
On Tue, Jan 23, 2024 at 09:19:24AM +0000, Atila Neves via Digitalmars-d wrote:
 On Monday, 22 January 2024 at 17:07:05 UTC, H. S. Teoh wrote:
 On Mon, Jan 22, 2024 at 03:41:35PM +0000, Atila Neves via Digitalmars-d
 wrote:
 On Sunday, 21 January 2024 at 05:00:31 UTC, Jonathan M Davis wrote:
 [...]
I don't think I've ever encountered a situation where reference ranges would have been desirable - I've never used one.
It's useful in recursive-descent parsers where you expect the range to have advanced past whatever a recursive call has consumed.
Why can't this be done with value ranges?
[...] Because the recursive call advances a value copy of the passed range, and when it returns, the range in the caller is still in the old position. T -- "Hi." "'Lo."
Jan 23 2024
prev sibling next sibling parent reply Paul Backus <snarwin gmail.com> writes:
On Monday, 22 January 2024 at 15:41:35 UTC, Atila Neves wrote:
 I think that `.save` was a historical mistake, and that ranges 
 that can be copied are forward ranges. Something like a range 
 reading from stdin or a socket would/should disable the 
 copy/postblit constructors.
This was my initial thought too, but it turns out it's not quite that simple. If we *require* basic input ranges to be non-copyable, then types which are inherently copyable (in particular, `T[]`) will be exclude from being basic input ranges. In order to use a `T[]` with an algorithm that expects a basic input range, you would have to wrap it in a `struct` that has copying disabled. It's also a big problem for composability. If half the algorithms require your range to be copyable, and the other half require it to be non-copyable, you have to keep track of [what color each algorithm is][1] whenever you write a range pipeline, or write a new algorithm that calls an existing one internally. [1]: https://journal.stuffwithstuff.com/2015/02/01/what-color-is-your-function/
Jan 22 2024
next sibling parent Atila Neves <atila.neves gmail.com> writes:
On Monday, 22 January 2024 at 17:18:56 UTC, Paul Backus wrote:
 On Monday, 22 January 2024 at 15:41:35 UTC, Atila Neves wrote:
 I think that `.save` was a historical mistake, and that ranges 
 that can be copied are forward ranges. Something like a range 
 reading from stdin or a socket would/should disable the 
 copy/postblit constructors.
This was my initial thought too, but it turns out it's not quite that simple. If we *require* basic input ranges to be non-copyable,
I don't think we should that because algorithms that don't care if a range is copiable (i.e. forward) or not should be usable with any input range.
 then types which are inherently copyable (in particular, `T[]`) 
 will be exclude from being basic input ranges. In order to use 
 a `T[]` with an algorithm that expects a basic input range, you 
 would have to wrap it in a `struct` that has copying disabled.
That would indeed be a mistake, but the forward range concept/template constraint is a... err... "subtype" of input ranges, so it *should* work. My idea here is changing the concept of forward ranges from ones that can have `.save` called on them to ones that can be copied. If that works, of course. The case against the idea is that right now, range authors have to opt-in to "forwardness", whereas copies have to be disabled to opt-out.
 It's also a big problem for composability. If half the 
 algorithms require your range to be copyable, and the other 
 half require it to be non-copyable, you have to keep track of 
 [what color each algorithm is][1] whenever you write a range 
 pipeline, or write a new algorithm that calls an existing one 
 internally.
This is the case already (sort of) with functions that expected isForwardRange!R vs isInputRange!R, the difference being a relationship of subtyping that makes it that all forward ranges are also input ranges.
Jan 23 2024
prev sibling parent reply Steven Schveighoffer <schveiguy gmail.com> writes:
On Monday, 22 January 2024 at 17:18:56 UTC, Paul Backus wrote:
 On Monday, 22 January 2024 at 15:41:35 UTC, Atila Neves wrote:
 I think that `.save` was a historical mistake, and that ranges 
 that can be copied are forward ranges. Something like a range 
 reading from stdin or a socket would/should disable the 
 copy/postblit constructors.
This was my initial thought too, but it turns out it's not quite that simple. If we *require* basic input ranges to be non-copyable, then types which are inherently copyable (in particular, `T[]`) will be exclude from being basic input ranges.
You are thinking about this the wrong way. Copyability becomes a trait of forward ranges, instead of input ranges. An input range would not be copyable, therefore code which takes input ranges would have to take them by reference, or the caller would have to ensure no other copies exist. But you can still take forward ranges this way. It is not hard to do -- you just remove the requirement for copying from `isInputRange`, and add it to `isForwardRange`.
 In order to use a `T[]` with an algorithm that expects a basic 
 input range, you would have to wrap it in a `struct` that has 
 copying disabled.
A `T[]` can be used without copying it. So why can't you pass that into an algorithm that takes an input range?
 It's also a big problem for composability. If half the 
 algorithms require your range to be copyable, and the other 
 half require it to be non-copyable, you have to keep track of 
 [what color each algorithm is][1] whenever you write a range 
 pipeline, or write a new algorithm that calls an existing one 
 internally.
The assertion is that if the range is copyable, it's now a forward range and copying is allowed. It is *on you* as the range developer to ensure copying does not work if it's an input range. The algorithm that says `isInputRange` would have to avoid copying, but that also works for forward ranges as well. There would be no need for multiple colors. Now, in *practical* terms, algorithms that are input-range compatible likely would need more careful attention to avoid copies than they currently do. -Steve
Jan 23 2024
parent reply Paul Backus <snarwin gmail.com> writes:
On Tuesday, 23 January 2024 at 17:25:47 UTC, Steven Schveighoffer 
wrote:
 You are thinking about this the wrong way. Copyability becomes 
 a trait of forward ranges, instead of input ranges. An input 
 range would not be copyable, therefore code which takes input 
 ranges would have to take them by reference, or the caller 
 would have to ensure no other copies exist. But you can still 
 take forward ranges this way.

 It is not hard to do -- you just remove the requirement for 
 copying from `isInputRange`, and add it to `isForwardRange`.

 [...]

 The assertion is that if the range is copyable, it's now a 
 forward range and copying is allowed. It is *on you* as the 
 range developer to ensure copying does not work if it's an 
 input range.
The only problem with this is that what you are asking the range developer to do is impossible. :) Consider what would happen if the user creates a pointer to a non-copyable input range. The pointer (a) is inherently copyable and (b) implements the range interface (via implicit dereferencing), but (c) the copies are not independent. (If your response to this is "ok, we'll just have isForwardRange explicitly reject pointers", keep in mind that this problem also applies to generic wrapper types that use `alias this` or `opDispatch`. Are we really going to tell our users that it's their fault if their code breaks because they put a range pointer inside one of these types?) So, there has to be some other way of distinguishing forward ranges from input ranges, beyond just checking whether they can be copied. For example, by using `next()` for input ranges vs. `head()`/`tail()` for forward ranges. (As [you pointed out earlier][1], reusing the `empty`/`front`/`popFront` API would not be a great idea.) Once you have this other method of distinguishing input and forward ranges, requiring forward ranges to be copyable no longer implies that pure input ranges MUST be non-copyable. So we can allow "pointer-to-input-range" to be an input range itself, even though it supports copying, and nothing will break. [1]: https://forum.dlang.org/post/yppqryhpumcucqspkqsx forum.dlang.org
Jan 23 2024
next sibling parent reply Nick Treleaven <nick geany.org> writes:
On Tuesday, 23 January 2024 at 18:21:47 UTC, Paul Backus wrote:
 (If your response to this is "ok, we'll just have 
 isForwardRange explicitly reject pointers", keep in mind that 
 this problem also applies to generic wrapper types that use 
 `alias this` or `opDispatch`. Are we really going to tell our 
 users that it's their fault if their code breaks because they 
 put a range pointer inside one of these types?)
`isForwardRange` could require the range *type* to have `empty, front, popFront` defined. Not sure if that solves your `alias this`/`opDispatch` points. This would mean arrays would need the range methods built-in rather than using UFCS.
Jan 24 2024
parent Paul Backus <snarwin gmail.com> writes:
On Wednesday, 24 January 2024 at 12:26:43 UTC, Nick Treleaven 
wrote:
 On Tuesday, 23 January 2024 at 18:21:47 UTC, Paul Backus wrote:
 (If your response to this is "ok, we'll just have 
 isForwardRange explicitly reject pointers", keep in mind that 
 this problem also applies to generic wrapper types that use 
 `alias this` or `opDispatch`. Are we really going to tell our 
 users that it's their fault if their code breaks because they 
 put a range pointer inside one of these types?)
`isForwardRange` could require the range *type* to have `empty, front, popFront` defined. Not sure if that solves your `alias this`/`opDispatch` points. This would mean arrays would need the range methods built-in rather than using UFCS.
It could, but I don't think it's a great idea, because it would make ranges a special case. Along with all of the usual reasons special cases are bad, ranges are one of the first things new D programmers are going to encounter, and are pretty much the canonical example of introspection and structural typing in D. If we make ranges behave differently, users who learn from them are going to have a much harder time understanding every *other* use of introspection.
Jan 24 2024
prev sibling parent reply Steven Schveighoffer <schveiguy gmail.com> writes:
On Tuesday, 23 January 2024 at 18:21:47 UTC, Paul Backus wrote:
 On Tuesday, 23 January 2024 at 17:25:47 UTC, Steven 
 Schveighoffer wrote:
 You are thinking about this the wrong way. Copyability becomes 
 a trait of forward ranges, instead of input ranges. An input 
 range would not be copyable, therefore code which takes input 
 ranges would have to take them by reference, or the caller 
 would have to ensure no other copies exist. But you can still 
 take forward ranges this way.

 It is not hard to do -- you just remove the requirement for 
 copying from `isInputRange`, and add it to `isForwardRange`.

 [...]

 The assertion is that if the range is copyable, it's now a 
 forward range and copying is allowed. It is *on you* as the 
 range developer to ensure copying does not work if it's an 
 input range.
The only problem with this is that what you are asking the range developer to do is impossible. :) Consider what would happen if the user creates a pointer to a non-copyable input range. The pointer (a) is inherently copyable and (b) implements the range interface (via implicit dereferencing), but (c) the copies are not independent. (If your response to this is "ok, we'll just have isForwardRange explicitly reject pointers", keep in mind that this problem also applies to generic wrapper types that use `alias this` or `opDispatch`. Are we really going to tell our users that it's their fault if their code breaks because they put a range pointer inside one of these types?)
Wouldn't that be on the person who wrapped the type? I wouldn't expect `RefCounted` to care that it's wrapping a range for instance. We could explicitly forbid pointers and classes, because you can't obviously implement value semantics using those. Indeed, since copying is defined by default, it's more onorous to properly create an input range. But so what? Can I tell you how many times people copy an input range without realizing the consequences today? At least if it's *explicit* that input ranges should not be copyable, then you will be more aware of the issue as the compiler complains about it.
 So, there has to be some other way of distinguishing forward 
 ranges from input ranges, beyond just checking whether they can 
 be copied. For example, by using `next()` for input ranges vs. 
 `head()`/`tail()` for forward ranges. (As [you pointed out 
 earlier][1], reusing the `empty`/`front`/`popFront` API would 
 not be a great idea.)
Yeah, I agree that if we change semantics, we have to change the naming, for the sake of all existing code. But the naming could be consistent between input/forward ranges as it is today. I'm not keen on the head/tail idea, as it seems way more clunky than I would like, and I'm not sold on the compiler making sure the `x = x.tail` operation is going to be efficient. I don't know how bad it would be to deal with, it's not something I have put a lot of consideration into, but I had thought that this mechanism of preventing copying for things that shouldn't be used from multiple copies sounds like the proper way to prevent problems by default. Sometimes it's good to start with what is simple and correct, even if it's not what you want, and see if you can massage it into what makes sense. -Steve
Jan 24 2024
parent reply Paul Backus <snarwin gmail.com> writes:
On Wednesday, 24 January 2024 at 16:18:11 UTC, Steven 
Schveighoffer wrote:
 On Tuesday, 23 January 2024 at 18:21:47 UTC, Paul Backus wrote:
 (If your response to this is "ok, we'll just have 
 isForwardRange explicitly reject pointers", keep in mind that 
 this problem also applies to generic wrapper types that use 
 `alias this` or `opDispatch`. Are we really going to tell our 
 users that it's their fault if their code breaks because they 
 put a range pointer inside one of these types?)
Wouldn't that be on the person who wrapped the type? I wouldn't expect `RefCounted` to care that it's wrapping a range for instance.
Yes, it would--which is exactly what I'd like to avoid. I don't want us to put ourselves in a position where we have to tell end users "you're holding it wrong" if they try to combine two generic library features like this. In general, we should not make assumptions in our library code unless we are capable of enforcing them. Since we are, unfortunately, not capable of enforcing the assumption that all copyable ranges have forward-range semantics, we should not allow ourselves make it. (Perhaps there's an opportunity for a language feature here? A `unique(T)` qualifier would help with this, for example.)
 Indeed, since copying is defined by default, it's more onorous 
 to properly create an input range. But so what? Can I tell you 
 how many times people copy an input range without realizing the 
 consequences today? At least if it's *explicit* that input 
 ranges should not be copyable, then you will be more aware of 
 the issue as the compiler complains about it.
I agree that permitting input ranges to be copyable puts more of a burden on the authors of range algorithms. However, given that there are edge cases where the burden of preventing copies *must* fall on someone other than the range's author, I would rather that person be the author of the generic algorithm than the end user. My hope is that in practice, if Phobos itself contains some prominent examples of non-copyable ranges, these bugs will be found quickly during testing. That's obviously not as good as a reliable compile-time error, but it's not nothing.
 I don't know how bad it would be to deal with, it's not 
 something I have put a lot of consideration into, but I had 
 thought that this mechanism of preventing copying for things 
 that shouldn't be used from multiple copies sounds like the 
 proper way to prevent problems by default.
For what it's worth, I agree that in practice, most input ranges should be non-copyable. [That `readf` bug][1] we discussed a little while ago would never have existed if `LockingTextReader` had been non-copyable from the start, for example. [1]: https://github.com/dlang/phobos/pull/8826
Jan 24 2024
parent reply Paul Backus <snarwin gmail.com> writes:
On Wednesday, 24 January 2024 at 17:35:32 UTC, Paul Backus wrote:
 In general, we should not make assumptions in our library code 
 unless we are capable of enforcing them. Since we are, 
 unfortunately, not capable of enforcing the assumption that all 
 copyable ranges have forward-range semantics, we should not 
 allow ourselves make it.
Having thought about it some more, this is a bad argument, and I withdraw it. The entire range API is based on convention to begin with. Of course there's no enforcement mechanism!
Jan 24 2024
next sibling parent Jonathan M Davis <newsgroup.d jmdavisprog.com> writes:
On Wednesday, January 24, 2024 11:20:38 AM MST Paul Backus via Digitalmars-d 
wrote:
 On Wednesday, 24 January 2024 at 17:35:32 UTC, Paul Backus wrote:
 In general, we should not make assumptions in our library code
 unless we are capable of enforcing them. Since we are,
 unfortunately, not capable of enforcing the assumption that all
 copyable ranges have forward-range semantics, we should not
 allow ourselves make it.
Having thought about it some more, this is a bad argument, and I withdraw it. The entire range API is based on convention to begin with. Of course there's no enforcement mechanism!
Yeah. We can enforce some stuff statically (e.g. we can require that a forward range be a dynamic array or a struct, since if it's a class or a pointer, we know for sure that it doesn't have the right copy semantics), but we can't actually enforce the correct copy semantics statically - just like we can't enforce the correct copy semantics with save right now. We can't even enforce that popFront removes an element from the range or that empty has anything to do with how many elements the range has. We can improve the situation with the API (e.g. getting rid of the requirement for save), and we can try to make it easier to use correctly and harder to use incorrectly, but unfortunately, there's no way to not rely on convention on some basis. - Jonathan M Davis
Jan 24 2024
prev sibling parent reply Alexandru Ermicioi <alexandru.ermicioi gmail.com> writes:
On Wednesday, 24 January 2024 at 18:20:38 UTC, Paul Backus wrote:
 The entire range API is based on convention to begin with. Of 
 course there's no enforcement mechanism!
Isn't `isInputRange`, `isForwardRange`, and the rest, kind of enforcement machanism?
Jan 24 2024
parent reply Jonathan M Davis <newsgroup.d jmdavisprog.com> writes:
On Wednesday, January 24, 2024 12:23:45 PM MST Alexandru Ermicioi via 
Digitalmars-d wrote:
 On Wednesday, 24 January 2024 at 18:20:38 UTC, Paul Backus wrote:
 The entire range API is based on convention to begin with. Of
 course there's no enforcement mechanism!
Isn't `isInputRange`, `isForwardRange`, and the rest, kind of enforcement machanism?
Traits like that can test (and thus be used to enforce) _some_ of the semantics, but there are plenty of things that can't be tested statically (like what popFront actually does). They are able to test that certain code will compile with the range API (or that certain code won't compile) using the type that they're instantiated with, and they can test certain explicit stuff about the type (e.g. enforcing that it's a struct if we want to do that), but they can't actually test what the functions do. So, while we can enforce that a forward range is copyable, we can't enforce what its copy semantics are beyond disallowing stuff like classes or pointers to structs, since those are clearly reference types. But the struct itself could have a variety of copy semantics depending on its implementation, and there's no way to determine that statically. Ultimately, for a range to have the correct copy semantics, we have to rely on the programmer to implement them correctly, just like we have to rely on them implementing what front, popFront, and empty do correctly. - Jonathan M Davis
Jan 24 2024
next sibling parent reply Alexandru Ermicioi <alexandru.ermicioi gmail.com> writes:
On Wednesday, 24 January 2024 at 20:08:56 UTC, Jonathan M Davis 
wrote:

 Traits like that can test (and thus be used to enforce) _some_ 
 of the semantics, but there are plenty of things that can't be 
 tested statically (like what popFront actually does). They are 
 able to test that certain code will compile with the range API 
 (or that certain code won't compile) using the type that 
 they're instantiated with, and they can test certain explicit 
 stuff about the type (e.g. enforcing that it's a struct if we 
 want to do that), but they can't actually test what the 
 functions do.

 So, while we can enforce that a forward range is copyable, we 
 can't enforce what its copy semantics are beyond disallowing 
 stuff like classes or pointers to structs, since those are 
 clearly reference types. But the struct itself could have a 
 variety of copy semantics depending on its implementation, and 
 there's no way to determine that statically. Ultimately, for a 
 range to have the correct copy semantics, we have to rely on 
 the programmer to implement them correctly, just like we have 
 to rely on them implementing what front, popFront, and empty do 
 correctly.
That's what unit tests are for, while those traits act like pseudo interfaces. Maybe there could be a test suite provided in Phobos for user to run over his own range implementations, in his own unit tests. This would make easy to spot whether user's range does or does not follow proper logic.
Jan 24 2024
parent Jonathan M Davis <newsgroup.d jmdavisprog.com> writes:
On Wednesday, January 24, 2024 2:31:19 PM MST Alexandru Ermicioi via 
Digitalmars-d wrote:
 On Wednesday, 24 January 2024 at 20:08:56 UTC, Jonathan M Davis

 wrote:
 Traits like that can test (and thus be used to enforce) _some_
 of the semantics, but there are plenty of things that can't be
 tested statically (like what popFront actually does). They are
 able to test that certain code will compile with the range API
 (or that certain code won't compile) using the type that
 they're instantiated with, and they can test certain explicit
 stuff about the type (e.g. enforcing that it's a struct if we
 want to do that), but they can't actually test what the
 functions do.

 So, while we can enforce that a forward range is copyable, we
 can't enforce what its copy semantics are beyond disallowing
 stuff like classes or pointers to structs, since those are
 clearly reference types. But the struct itself could have a
 variety of copy semantics depending on its implementation, and
 there's no way to determine that statically. Ultimately, for a
 range to have the correct copy semantics, we have to rely on
 the programmer to implement them correctly, just like we have
 to rely on them implementing what front, popFront, and empty do
 correctly.
That's what unit tests are for, while those traits act like pseudo interfaces.
Of course that's what unit tests are for, but that doesn't enforce anything. That's the programmers making sure that their code behaves the way that it should - that their code follows the conventions required by the range API. On the other hand, what Paul was talking about was actually statically enforcing the semantics via template constraints rather than relying on coventions, which is something that we can do with some portions of the semantics of the range API but can't do for a good chunk of it.
 Maybe there could be a test suite provided in Phobos for user to run over
 his own range implementations, in his own unit tests. This would make easy
 to spot whether user's range does or does not follow proper logic.
Well, we can provide test ranges to use to test a function with, but not much can be automated there, since the results would depend on what the function actually did. However, if there were set of standard range types to test with, then it would at least be easier to test a proper range of range types. Phobos itself doesn't always do a good job with that though. As for testing that a range behaves properly, that might be possible to automate, but since it's usually the case that a range comes from a function, that's not always straightforward. And while how the range should behave is largely defined by the range API, not all ranges are generic with regards to what they work with (e.g. what their element type is) or how they're constructed, so writing a generic set of tests could be challenging. Off the top of my head, I don't know how possible it would be, though I'm sure that it could be made to work with a subset of ranges. All in all, providing better testing tools is a good idea, but a lot of the question here is what we can statically enforce, since the more that we can statically enforce, the harder it will be to write range-based code that does the wrong thing, and the harder it will be to pass types to a range-based function which then won't behave properly even if the function is written properly. - Jonathan M Davis
Jan 24 2024
prev sibling parent Quirin Schroll <qs.il.paperinik gmail.com> writes:
On Wednesday, 24 January 2024 at 20:08:56 UTC, Jonathan M Davis 
wrote:
 Traits like that can test (and thus be used to enforce) _some_ 
 of the semantics, but there are plenty of things that can't be 
 tested statically (like what popFront actually does).
Even if you could, what would you even test? Especially for `popFront`, what does it mean to “pop front”? A no-op could be a reasonable `popFront` implementation for a range that represents a value infinitely repeated, i.e. something like `cycle(1)`.
Feb 01 2024
prev sibling parent Jonathan M Davis <newsgroup.d jmdavisprog.com> writes:
On Monday, January 22, 2024 8:41:35 AM MST Atila Neves via Digitalmars-d 
wrote:
 On Sunday, 21 January 2024 at 05:00:31 UTC, Jonathan M Davis

 wrote:
 I've been thinking about this for a while now, but with the
 next version of Phobos which is in the early planning stages,
 we really should do some redesigning of ranges. Most of their
 API is just fine how it is, but there are some aspects of it
 which really should be changed if we want them to be better
 (the most obvious one being the removal of auto-decoding). But
 what I'd like to discuss specifically in this thread is fixing
 - and defining - the semantics of copying and assigning to
 ranges. Specifically, the semantics of stuff like

 [...]
I don't think I've ever encountered a situation where reference ranges would have been desirable - I've never used one. I think that `.save` was a historical mistake, and that ranges that can be copied are forward ranges. Something like a range reading from stdin or a socket would/should disable the copy/postblit constructors.
Once you disable the copy/postblit constructor, you can't generally pass the range around or wrap it in another range - at least not without explicit calls to move. ref parameters and return values would solve some of that, but it wouldn't solve all of it (wrapping in particular, which ranges do heavily, won't work with ref). I would expect a range that can't be copied to be too annoying to even bother with. And in fact, they can't even work with foreach, because foreach copies the range that it iterates over (and has to in order to have the semantics match what hapens with dynamic arrays, though if we separate basic input ranges from forward ranges, then we can change what happens with basic input ranges). IMHO, if we want to go the route of trying to make them uncopyable, we pretty much might as well just get rid of the concept of basic input ranges and require that they use opApply (which isn't an entirely bad idea given how limited basic input ranges are in practice anyway). Basic input ranges are inherently either reference types or pseudo-reference types, and they can be made to work cleanly if we just require that they be reference types. Those are very different semantics from forward ranges, which are value types with regards to their iteration state (assuming that we require that copying them results in copies which can be independently iterated rather than having save). And that's why I'm arguing that we should split the two concepts rather than trying to treat forward ranges like an extension of basic input ranges - especially since basic input ranges are extremely hamstrung anyway, because you can't actually get an independent copy of them, and in practice, most stuff needs that. And if we did go with a solution for basic input ranges which made them non-copyable, then they couldn't be used with the same code that uses forward ranges anyway, so we might as well just give them a different API and stop treating them like the same thing when they really aren't.
 Has anyone here used the class-based ranges?
Yes. Symmetry uses them. In general, it's best not to use them for forward ranges, but when you're dealing with code where you can't determine the type at compile time (like with an interpreter), then you need some some kind of runtime polymorphism, and you really don't have much choice. That being said, we can support that by wrapping such classes in structs and ensuring that the structs have the correct copy semantics rather than allowing forward ranges that are classes. So, it should be quite possible to get rid of save without losing any functionality, and we'd get much cleaner copy semantics in the process, making it so that forward ranges in general behave more like dynamic arrays. For basic input ranges, it should be perfectly fine for them to be classes - desirable even - simply because you can't actually get independent copies of them, and using a struct with pseudo-reference copy semantics is just begging for bugs, because it means that mutating the copy puts the original in an invalid state, whereas if we can require that all basic input ranges be full-on reference types (which could include structs and pointers to structs so long as they have the correct semantics), then we can actually rely on sane, consistent copy semantics for basic input ranges. It's just that they would be different from the copy semantics of forward ranges - which would also be true if we were somehow able to make it work to make basic input ranges noncopyable instead, though I really don't think that that will work. - Jonathan M Davis
Jan 22 2024
prev sibling next sibling parent victoroak <victoroak victor.oak> writes:
On Sunday, 21 January 2024 at 05:00:31 UTC, Jonathan M Davis 
wrote:
  [...]
I think one problem with just "next" as the api for an Input Range is that currently I don't know any safe way to return a reference to the item. If "next" returns a pointer we can't be sure the backing data structure (a growable array like std::vector for example) did not freed that memory yet but I guess this is the same for any range right now. Any way, it would be good to have an optimization for Nullable!T or a different type that when T is nullable (e.g. classes or pointers) it doesn't need a boolean flag but also loses the ability to store null inside it. That's one of the optimizations of "Option" in Rust, although in Rust references can't be null so it's ok to do this.
Jan 22 2024
prev sibling parent monkyyy <crazymonkyyy gmail.com> writes:
On Sunday, 21 January 2024 at 05:00:31 UTC, Jonathan M Davis 
wrote:
 wall-o-text
Can this be summarized as ```d enum isRangeCopible(R)= is( (R r){ R rdup=r; r=rdup; } ); ```
 What I'd like to get out of this thread is feedback on how much 
 sense this idea does
I think someone formal should collect all changes people want to make to ranges see my thread for .index and feature sets theres something about immutablity I would also suggest arrays should be more important then doubly linked lists
Jan 22 2024