digitalmars.D - Proposed Changes to the Range API for Phobos v3

Jonathan M Davis (547/547) May 16 2024 Okay. This is my proposal for an updated input range API for Phobos v3. ...

Richard (Rikki) Andrew Cattermole (6/6) May 16 2024 Two things I suggest we consider, given the write up:

Jonathan M Davis (46/53) May 16 2024 What would that buy us? I don't see how it adds any value.

Richard (Rikki) Andrew Cattermole (24/28) May 16 2024 Required, yes, automatic yes.

Jonathan M Davis (57/85) May 16 2024 I see nothing positive about having to wrap arrays. Treating arrays as

Richard (Rikki) Andrew Cattermole (7/9) May 16 2024 Given the problems listed it is a possible solution.

Jonathan M Davis (5/15) May 17 2024 I have no problem with it being presented and evaluated, but I don't see...

H. S. Teoh (7/11) May 17 2024 [...]

Richard (Rikki) Andrew Cattermole (11/20) May 17 2024 There would be no friction for users of ranges in PhobosV3 they would

monkyyy (17/26) May 16 2024 ????

Jonathan M Davis (58/74) May 16 2024 The new range API is quite explictly _not_ trying to do any explicit Uni...

monkyyy (33/44) May 16 2024 my proposal of adding .key to the range api isnt unicode spefic

Jonathan M Davis (24/43) May 16 2024 Essentially, that requires that opDollar define the common arithmetic

monkyyy (9/12) May 16 2024 I disagree; slicing should be separate from indexing (and keys)

Ogi (9/12) May 17 2024 Actually, this option worth exploring.

H. S. Teoh (7/22) May 17 2024 [...]
Jonathan M Davis (59/78) May 17 2024 It wouldn't fix the import problem. Realistically, it's not going to wor...

Dukc (41/97) May 18 2024 Jonathan M Davis kirjoitti 16.5.2024 klo 17.56:> This is a proposal for

Jonathan M Davis (106/179) May 18 2024 It is something that has proven to be a problem. Many range-based functi...
Quirin Schroll (4/15) Jun 04 2024 The range API is baked into the language through `foreach`

Steven Schveighoffer (13/20) May 27 2024 So something I just thought of as a drawback here -- `save` gives
Quirin Schroll (31/32) Jul 03 2024 Maybe good new names could be `empty`, `front`, and `opPopFront`
Imperatorn (87/91) Jul 03 2024 Tales from the AI

Eyal Lotem (70/70) Sep 23 2024 Hey,

monkyyy (10/15) Sep 23 2024 Is opApply even an iterator interface? Per the talk comparing the

Eyal Lotem (9/24) Sep 23 2024 It's more restrictive because it can't do zip or some other

Steven Schveighoffer (3/13) Sep 24 2024 Thank you for posting this!
Steven Schveighoffer (157/160) Sep 24 2024 I implemented a quick-and-dirty "getNext" style range as outlined

monkyyy (3/5) Sep 24 2024 1000000 lines of code, contracts, bandaids instead of rewrites
Steven Schveighoffer (12/23) Sep 24 2024 Some more info, my simple filter range pre-computes the first

Jonathan M Davis <newsgroup.d jmdavisprog.com> writes:

Okay. This is my proposal for an updated input range API for Phobos v3. I
previously sent it out to some or the core D folks, and the feedback has
largely been positive, but I'm now posting it to the newsgroup for wider
feedback.

And yes, this is long, but there's a lot to cover, and I want both what's
changing and why it's changing to be clear. Not that anyone expects me to be
short when writing something like this up anyway...
===

This is a proposal for an updated input range API for Phobos v3.

Input ranges as defined in Phobos v2 have worked extremely well for us thus
far, but experience has shown that there are some problems with them, and we
would like to address and fix them with Phobos v3. So, this proposal
attempts to leave input ranges the same as much as reasonable while making
the necessary changes to fix the known problems.

Output ranges will be covered in a separate proposal.

A Note on the Enforcement of Range API Requirements
---------------------------------------------------

As a general note with regards to any requirements that the range API has
(be it the current version or the new one), some requirements can be and are
enforced with the traits for ranges (e.g. isInputRange requires that front
result in a value and will be false if front is void), whereas other
requirements cannot actually be enforced (e.g. that front returns the same
value each time that it's evaluated until popFront is called). As such, the
range API statically enforces what it can but can't enforce everything.

This means that when the range API places a requirement on ranges and their
behavior, and that particular requirement cannot be statically enforced, it
_is_ possible to write a type that violates the range API but passes all of
the associated traits and template constraints. Such a type is considered to
be in violation of the range API, and there are no promises that any
range-based function will behave correctly with such a range (since the
range itself does not have the required behavior). Range-based code in
general is free to assume that any ranges that it's given correctly
implement the range API, and it really doesn't make sense for the situation
to be otherwise.

As such, when this proposal says that a range is required to behave in a
certain way, that does not necessarily mean that that behavior is statically
enforced (just like not all requirements of the current range API are
statically enforced). They will be enforced where practicable, but in some
cases, it simply means that it's a requirement that any ranges must meet if
they're going to work correctly with normal range-based code, and anyone
writing range-based code can assume that those requirements have been met.
As is the case with many things, testing will then be required to make sure
that the code is working correctly with regards to anything that couldn't be
statically enforced.

Most of the changes in this proposal revolve around being stricter with the
behavioral requirements for the range API so that range-based code can
actually rely on certain behaviors of ranges (like their copy semantics),
whereas the current range API has been lax on a number of aspects of the
semantics of ranges, making it impossible for generic range-based code to
rely on those semantics.

Now, on to discussing what actually needs to be changed...

Some of the Problems with Input Ranges
--------------------------------------

1. Auto-decoding has not solved the problems that it was intended to solve
(if nothing else, because graphemes are a thing, and auto-decoding does not
take that into account), and it has caused both performance problems and a
proliferation of complications (in particular with all of the code that
attempts to work around auto-decoding to regain performance and avoid having
strings be wrapped by other range types where possible). So, we clearly
would like to get rid of it.

2. Requiring save for forward ranges has been a source of bugs. Most forward
ranges do not require it, so it is routinely forgotten, causing bugs for any
ranges that do require it. Ideally, we would be able to just rely on copies
of forward ranges being independent and thus not need save at all. It would
also bring the semantics of ranges that much closer to dynamic arrays /
slices, which ranges are effectively trying to be a generalization of, just
like iterators in C++ are effectively a generalization of pointers.

3. The semantics of copying from and assigning to ranges are not defined by
the current range API, and because ranges allow such a variety of types with
such a variety of semantics for copying and assignment, we cannot actually
define the semantics of copying or assignment for ranges with the current
API. This puts some annoying restrictions on writing correct generic code
(e.g. not being able to use a range once it's been copied), and ideally, we
would be able to actually define the semantics of copying and assignment so
that generic code can rely on it.

4. There is no guarantee that the init value of a range is even valid - or
that a range can be default-initialized. This causes problems in a number of
cases but in particular when it's necessary to store a range as a member
variable. One recent example of this is
https://issues.dlang.org/show_bug.cgi?id=24202 where a project incorrectly
relied on std.range.chain's init value being both valid and empty (which it
was previously but then wasn't after some changes were made to chain to
improve its performance). As things stand, code that wants to be able to
store a range as a member variable and to do so completely correctly needs
to do something like wrap it in a std.typecons.Nullable to ensure that the
init value isn't actually used, and of course, no one actually does this or
even really thinks about it. In practice, they end up relying on the
behavior of whatever ranges their code currently happens to use, and they
almost certainly don't realize that they're relying on unspecified behavior
that could change in the future. Ideally, we would be able to rely on the
init value being valid - and preferrably empty (at least for finite ranges)
- which is not possible with the current range API.

5. The range API does not provide any way to get an empty range from ranges
that can't be sliced other than iterating through the entire range (ranges
that can be sliced allow it by using the slice operator with indices that
make the length 0). takeExactly allows us to make an empty range from any
range, but the type changes in the process, which does not work in any
situation where code needs to make a range empty but cannot change the type
(e.g. if it's dealing with a ref parameter or a member variable and needs to
set that variable to an empty range).

6. The range API does not currently specify what happens if you copy front
and then call popFront. For the vast majority of code, it is both assumed
that you can copy front and that the copy of front will be unaffected by
calling popFront. However, there _are_ ranges (such as std.stdio's ByLine)
which will do things like use a dynamic array as a buffer which has its
elements overwritten in popFront, resulting in the elements of the copy
being mutated, since it's a slice of the same data. Such ranges are
sometimes referred to as having a transient front, and they do not work with
a lot of range-based code. As such, they either need to be disallowed, or we
need a way to detect such behavior so that range-based code can check for
it.

7. Related to the issue of transient front is the fact that range-based code
in general tends to assume that front is copyable, meaning that it doesn't
work with non-copyable types. This isn't a problem with the range API per
se, but we probably do need to better formalize what you can assume you can
do with front vs what you have to test for.

8. $ cannot be used with random-access ranges in generic code, because the
range API does not require that a random-access range define opDollar.

9. const makes ranges useless. Realistically, a const range is not a range,
because you can't call popFront, and you can't iterate through it
(technically, it _is_ possible to have popFront be const in cases where the
iteration state doesn't actually live in the range, and popFront isn't pure,
but normal ranges do not work that way).

Most range-based code doesn't actually need to worry much about const, since
it's just iterating through the range, and the constness of the elements
often doesn't even matter, since the code is templated and works regardless
of the mutability of the elements. However, once a range is stored in
another type, constness can quickly become a problem. This not a problem
with dynamic arrays / slices, because they can be sliced to get a tail-const
slice of the original, and templates are actually instantiated with the
tail-const type for dynamic arrays, but other range types do not have such
compiler magic at present.

10. While it's not exactly a problem with the range API, it would be nice if
we didn't have to import std.range / std.range.primitives (or the Phobos v3
equivalent) to use arrays as ranges. In theory, we could add the range API
functions for arrays to object.d, but doing so would conflict with the
existing functions in std.range (and bugs with selective imports actually
make it somewhat problematic to resolve those conflicts - see
https://issues.dlang.org/show_bug.cgi?id=24451). And if we're getting rid of
auto-decoding, the range API functions for arrays of characters would
change, meaning that we can't solve the problem by putting the functions in
object.d and then aliasing them in std.range.primitives to avoid problems
with selective imports from std.range or std.range.primitives.

So, the symbol conflicts which would result from putting the range API
functions for arrays in object.d do potentially place some restrictions on
what we do with the new range API.

Overview of Proposed Changes
----------------------------

1. The easy one is that the range API functions for dynamic arrays will not
treat arrays of characters as special. A dynamic array of char will be a
range of char, and a dynamic array of wchar will be a range of wchar.

Any code that needs to decode will need to use the phobos v3 replacement for
std.utf's decode or decodeFront - or use foreach - to decode the code units
to code points (and if it needs to switch encodings, then there will be
whatever versions of byUTF, byChar, etc. that the replacement for std.utf
will have). And if code needs to worry about normalization or other grapheme
stuff, then it'll need to use that functionality from the replacement of
std.uni (which is basically what ranges have to do right now anyway). We may
choose to still have UTFException, but it will be up to the programmer to
choose whether they want to use the replacement character or to throw
exceptions when decoding, because the decoding will no longer be automatic.
As such, using the range API on dynamic arrays will work with  nogc, and we
won't need a bunch of extra code just to avoid the auto-decoding.

This may result in more bugs where programmers slice into the middle of code
points, but auto-decoding just makes that harder rather than actually
preventing it, and there's really no fix for that without doing something
like operating at the grapheme level by default, which would be horribly
inefficient. We probably should have better documentation and tutorials for
how to write range-based code which deals with Unicode correctly, but
ultimately, education is the solution here, not trying to make everything
work at the code point or grapheme level by default. It may make sense to
add a string type of some kind to Phobos which operates at the grapheme
level with its API but which tries to be efficient internally, but that's
really not a question for the range API. It seems to be generally agreed
upon that arrays of code units should be treated as ranges of code units, so
that's what the new range API will do.

2. All ranges which can be default-initialized must have a valid init value
(that is, their range API functions must behave according to the normal
rules for those functions and can't do something like have empty be false
and then have front fail to return a valid value; the init value of all
ranges that can be default-initialized must work properly as ranges).

By requiring that the init value be valid, we avoid all of the problems that
we've been having with ranges which behave badly when they're
default-initialized. Instead, generic code will be able to rely on a
default-initialized range behaving sanely, and the default behavior will be
what most programmers expect, whereas right now, some ranges will actually
segfault if their init value is used (e.g. if the range type is a class).

Of course, this does make it so that classes, interfaces, and pointers can't
be ranges, but there are ways to work around that (as will be discussed
later).

Ideally, we would also disallow ranges from disabling default initialization
so that generic range-based code can rely on it, but that causes problems
with ranges that need to be constructed at runtime to function properly.
That problem can be worked around fairly easily with finite ranges, since
worst case, you can always just add a flag to them which indicates whether
they were default-initialized and then have empty return true in that case
(without having to care about what the range normally does). It's also
likely that such ranges are rare, since it's almost always the case that you
can rework the logic of a finite range to have a valid init value.

However, with infinite ranges, there is no such solution. If they cannot be
default-initialized, then they either can't be ranges, or they would have to
be finite ranges which would just never be empty if they're constructed at
runtime (while doing something like the flag trick to make their init value
empty). And it's certainly true that the range API doesn't (and can't)
guarantee that finite ranges are truly finite, but it's still better if we
can define infinite ranges that need to be constructed at runtime as
infinite ranges, since then we can get the normal benefits that come from
statically knowing that a range is infinite.

Alternatively, we could disallow finite ranges from disabling default
initialization (since they have a workaround) while allowing infinite ranges
to disable it, but that seems like it's too much special-casing and like it
would add to the confusion overall.

And maybe it would be better to just force infinite ranges which can't work
properly unless they're constructed at runtime to be finite ranges given
that they're likely to be pretty rare, but ultimately, I think that it's
more straightforward to just require that the init value be valid if a range
can be default-initialized. The issues with types that can't be
default-initialized already are something which causes problems for generic
code in general, and it would not require any special explanation with the
range API.

It would also fit in with the language better if we ever add default
constructors with a future Edition (not that I expect that to happen, but
forcing range types to have a valid init value would make that harder).

The only real problem I see is that there will likely be some generic code
which will be written to assume that default initialization works for all
ranges, but again, that's a general problem with types that disallow default
initialization, and the bug that you then get is a compilation error,
whereas forgetting something like save results in code that compiles but
does the wrong thing. So, when we do run into such bugs, they won't silently
cause problems.

Weighing the pros and cons here, I think that we're better off just letting
ranges disable default initialization (much as it's annoying in general when
types do that). Simply requiring that the init value of a range that can be
default-initialized be valid fixes the core problem, and then the
initialization problems that we're left with are not range-specific.

3. If a range is finite (that is, it's not statically known to be empty),
and it can be default-initialized, then its init value must be empty. If it
cannot be default-initialized, then it must define a function called
emptyRange which returns an empty range of the same type. Phobos will then
define a function called emptyRange which takes a finite range which can be
default-initialized and returns the init value of that range.

The result of this is that we have a reliable way to get an empty range from
any finite range without changing the type of the range. If we could require
that finite ranges be default-initializable, then we wouldn't need
emptyRange in the same way, but even then, such code would have to check
whether the range was infinite to see if assuming that init was empty was
correct, whereas this way, code can just call emptyRange, and it'll get a
compilation error if it's given an infinite range. So, code that needs an
empty range will get compiler checks which verify that the range is finite,
and it'll work whether the range can be default-initialized or not (though
it'll only work with CTFE if the range can be default-initialized).

Of course, this entry doesn't apply to infinite ranges at all, since whether
they can be default-initialized or not, their empty is always false. The
fact that that can be determined statically is what makes them infinite
ranges.

4. All ranges must be either dynamic arrays or structs (and not pointers to
structs either).

This allows us to have consistent requirements with regards to copying,
assignment, and initialization (e.g. it's not possible to require that the
init value be valid if classes can be ranges). Most ranges are already
dynamic arrays or structs anyway, so for _most_ range-based code, this
requirement is a complete non-issue.

However, the main problem that we then have is the ranges that actually need
to be classes or interfaces. This happens when you need to do stuff like
wrap arbitrary ranges with a single range type at runtime (which
std.range.interfaces currently provides the functionality to do). The
solution to this problem then is to provide struct wrappers. That way, a
range that needs to be a class / interface will instead be a struct wrapping
a class / interface reference. The struct can then ensure that its init
value is valid, and it can ensure that the range has the correct copying and
assignment semantics. So, the replacement for std.range.interfaces will then
provide not only classes and interfaces, but it will provide the appropriate
struct wrappers as well.

So, a function like inputRangeObject might return something like
RangeWrapper!(InputRangeObject!Foo) instead of InputRangeObject!Foo like it
would currently do. The exact details still need to be sorted out, but there
should be no technical barrier to replacing all ranges which are currently
classes or interfaces with ranges that are structs wrapping classes or
interfaces, which would allow us to make it illegal for classes or
interfaces to be ranges without losing the actual functionality that such
ranges currently provide.

5. Forward ranges will no longer have save. Rather, it will be required that
copying a forward range will result in an independent copy. Both the
original and the copy must then have the same elements in the same order,
and iterating one will not affect the other. So, in essence, copying a
forward range must be equivalent to calling save (though the actual
implementation could be more clever than that).

So, for most forward ranges, this will simply mean that they won't need a
save function that just returns the this reference, and range-based code
will not need to call save.

For the ranges that would currently require that save be called, they will
have to be implemented in a way that save is no longer required (which is
made possible by disallowing pointers and references from being ranges). The
naive implementation would therefore just have the copy constructor be
equivalent to save, whereas a more sophisticated implementation would likely
use reference counting and only duplicate the internal state when the
reference count isn't 1, and a function is called which would mutate the
state (e.g. doing the equivalent to calling save when popFront is called,
and the reference count isn't 1). Obviously, the struct wrappers that the
replacement for std.range.interfaces would provide would implement the
correct copying functionality so that save would not be required, and anyone
using a class or interface as a range wouldn't actually have to implement
that behavior either.

6. Basic input ranges will be non-copyable.

Forward ranges can be either value types or pseudo-reference types which are
value types with regards to their iteration state (that is, their actual
elements may be outside of the range type itself, but the iteration state is
copied when the range is copied). The fact that their iteration state can be
copied is what makes it so that they can be forward ranges.

On the other hand, basic input ranges can't have their iteration state
copied; otherwise, they could be forward ranges. So, currently, they are
either reference types, or they are pseudo-reference types where a copy does
not copy the full iteration state, leaving the original in an invalid state
if anything is done to the copy (e.g. front is stored directly in the
struct, but the rest of the elements are on the heap or come from somewhere
else outside the struct like a socket, meaning that front will be stale on
the original after popFront is called on the copy).

So, fundamentally, basic input ranges have different copy semantics than
forward ranges. However, we _can_ have consistent copy semantics for all
basic input ranges and consistent copy semantics for all forward ranges -
just not consistent semantics between the two. The difference in their copy
semantics is fundamentally the difference between those two kinds of ranges.

So, for forward ranges, as described in the previous entry, we fix the
problem of inconsistent copy semantics by requiring that copying them result
in independent copies, whereas we have two options for basic input ranges:

1) Either we need to require that basic input ranges be reference types
(thereby avoiding screwed up copies, because the copying is basically just
copying a pointer)

2) or we need to make them non-copyable - in which case, they can still be
moved, but because they're non-copyable, you don't get situations where
iterating through one copy partially mutates the other, since there's only
ever one copy.

In order to fix the init problem, we already need to disallow classes and
pointers, which would mean that if we required that basic input ranges be
reference types, they would have to be structs wrapping pointers or
references, and it would be difficult to statically determine the difference
between a basic input range and a forward range (since there will no longer
be a save function to do that). So, we'd be forced to give basic input
ranges a different API in order to distinguish them - which isn't all bad,
since it's not always a good idea to have basic input ranges and forward
ranges share the same code given that they have different copy semantics,
but it would be nice to still be able to share code in the cases where the
copy semantics don't matter. Also, requiring that basic input ranges be
reference types would likely require some additional indirections in some
cases, since you then can't put anything directly in the range itself
(because all of the actual state would need to be in the type pointed to by
the wrapper struct in order for it to be a reference type).

On the other hand, if we just make basic input ranges non-copyable, then the
difference between them and forward ranges is then trivial to detect with
__traits(isCopyable, ...), and you can put as much state directly in the
basic input range as you want. The downside, of course, is then that you
can't copy the range. However, it then makes the semantics of what's going
on _very_ clear. A forward range is copyable, and a basic input range is
not.

This should make the fundamental difference between a basic input range and
a forward range stark in a way that it arguably isn't with save, even though
the difference is still fundamentally the same. Forward ranges can have
their iteration state copied, whereas basic input ranges can't. But instead
of having that be determined by a special range API function that most folks
forget to call, it will instead be a clear difference in copy semantics that
will actually result in compilation errors if you get it wrong.

Of course, the major downside here is simply that code will be forced to use
explicit moves in a bunch of cases. But it's certainly better than copying a
range and then using the partially mutated (and thus invalid) original like
you can easily do now. And DIP 1040 should improve the situation
considerably by adding implicit moves for the last use of a variable.

It will also help that most ranges are forward ranges anyway, so while this
will be annoying in some code (probably most commonly in the functions in
Phobos that are written to work with both basic input ranges and forward
ranges), most code probably won't have to deal with it, and the code that
does will then be avoiding subtle copying bugs like it would potentially
have now. So, it's one of those cases where the compilation errors could get
annoying, but they're also clearly preventing bugs.

As an aside, having non-copyable basic input ranges will be beneficial for
the replacement for std.random. It's historically been a problem that
accidentally copying random number generator ranges results in numbers being
repeated, and by making them non-copyable, that won't be a problem anymore.
We can then add a separate function to them (e.g. dup) to explicitly to get
a copy from them when we actually want to.

7. Assigning a forward range to an lvalue of the same type should have the
equivalent semantics as copying a forward range, with the difference of
course being that the lvalue is then completely replaced with the new state
as opposed to generating a completely new copy.

These should be the obvious semantics for assignment, but it's not something
that we can currently rely on for all of the same reasons that we can't
currently rely on the semantics of copying a range.

8. Assigning an basic input range to an lvalue of the same type should only
work if it's a move rather than a copy, since basic input ranges are
non-copyable.

Again, these should be the obvious semantics for assignment given what the
copy semantics are.

9. In order to deal with transient front, we will explicitly disallow it. It
will be a requirement that if front is copied, the copy will be unaffected
by popFront. Of course, this doesn't prevent other references to the same
data from mutating it (e.g. if it's a reference type), but calling popFront
will not mutate it.

Therefore, code which wishes do something like std.stdio's ByLine needs to
use a different solution, e.g. it can use opApply instead, or it can make
front non-copyable, since if it's non-copyable, then popFront can mutate it
without worrying about any copies being mutated.

10. With regards to ranges of non-copyable elements, there will be a clear
way to test for whether a range has copyable elements, and range-based code
that won't work with non-copyable elements should disallow them. This may be
as simple as __traits(isCopyable, ElementType!R), or it may involve a new
trait such as hasCopyableElements!R. It's undecided yet whether an
additional trait will be necessary, but the range documentation will be
clear on the matter, and range-based Phobos code that needs to copy elements
will test for copyability.

11. Finite random-access ranges are required to implement opDollar, and
their opIndex must work with $. Similarly, any ranges which implement
slicing must implement opDollar, and slicing must work with $.

In most cases, this will just be an alias to length, but barring a language
change that automatically treats length as opDollar (which has been
discussed before but has never materialized and is somewhat controversial
given types where it wouldn't make sense to treat length as opDollar), we
have to require that opDollar be defined, or generic code won't be able to
use $ with indexing or slicing. We probably would have required it years ago
except that it would have broken code to add the requirement.

12. We're going to rely on some form of language improvement (such as
Walter's recent Implicit Conversion of Template Instantiations DIP) to solve
the tail-const problem.

As an alternate solution, we could rework the range API to do be more like a
list in a functional language - e.g. head returns the first element, and
tail returns a range containing all of the elements except for the first
one. This would allow us to define tail's return type as being tail-const,
but that could get interesting with assignment (e.g. you can't assign
Range!(const Foo) to const!(Range!Foo)), and basic iteration would require
assignment with such an API, e.g.

    for(auto r = range; !r.empty; r = r.tail)
    {
        auto e = r.head;
    }

So, we'd probably be forced to add a separate function just to get a
tail-const copy (since tail wouldn't necessarily return the exact same type
as the range, because tail would be returning a tail-const copy minus head),
e.g.

    for(auto r = range.tailConstCopy; !r.empty; r = r.tail)
    {
        auto e = r.head;
    }

And at that point, the actual solution to the tail-const issue isn't really
changing to an API that copies the range so much as providing a function
that returns a tail-const copy and routinely using that, which means that we
could fix the tail-const problem with the current API simply by adding a
function which gives you a tail-const copy. As such, switching to an API
that copies ranges to iterate through them rather than popping off their
elements in-place doesn't really buy us anything.

Regardless, adding a function to get a tail-const copy would be putting us
in pretty much the same boat that we're in now with save - only probably
worse. It would really only be needed in situations where a range has become
const somehow, meaning that most range-based code would not be tested with
such ranges, and the new function would be forgotten just like save (only it
would probably be forgotten even more). So, I'm inclined to think that a
language solution is a much better solution.

Also, with regards to a range API that copies for iteration, concerns have
been raised about its performance vs popFront. I ran some basic tests to see
if that was a problem, and it didn't seem to make any difference, but it
wouldn't surprise me if it did with more complex types (which basically
would have required duplicating a bunch of std.algorithm with a different
range API to compare the performance, which I didn't take the time to do).
So, I don't know how valid those concerns are, but I do agree that it would
be giving the optimizer more to optimize out.

Regardless, I decided that it would just be too large a change to the range
API with too little benifit, and if we can just implement tail-const
conversions in the language (which we arguably should be doing for more than
just ranges), then that solves the problem. And if we decide that we really
do need a new range API function to fix the problem, we can just add it
(though if we do, hopefully we do it before the initial release of Phobos v3
so that we can require it without breaking code).

13. The range API functions for basic input ranges and forward ranges will
be changed from front, popFront, and empty to first, popFirst, and isEmpty.
And the range API functions for bidirectional ranges will be renamed last
and popLast.

I'm not a big fan of this, since it would be nice to keep the same function
names, and making this change requires a language change for foreach.
However, it solves two problems:

First, if we remove save from forward ranges, then the old basic input
ranges look like the new forward ranges so long as they're structs and
copyable. So, if we don't change the forward range API in some fashion, then
it will be easy to pass ranges that aren't really forward ranges to code
that requires forward ranges, and bugs will ensue.

Second, if we want to make it so that dynamic arrays / slices are forward
ranges without requiring an import, then we're going to get a _lot_ of
symbol conflicts. Almost all of the range-based code currently in existence
imports either std.range or std.range.primitives, and _all_ of that code
would get symbol conflicts if we added the range API functions for arrays to
object.d. That can be managed with an Edition, but it would mean that tons
of code would have to be updated with the new Edition. On the other hand, if
we use a different set of function names for the range API functions that
need to be added to object.d for arrays, then we don't get all of those
symbol conflicts, and only new code which is written to use the new range
API will need to worry about the name changes.

Of course, if we add the functions to object.d, we arguably should still do
it with an Edition if we want to minimize the risk of symbol conflicts
(since there could be code in the wild with conflicting symbols; but if we
rename the functions, then we're talking about _possible_ symbol conflicts
rather than the guarantee of a ton of them like we'll get if we don't rename
them).

And https://issues.dlang.org/show_bug.cgi?id=24451 makes putting the current
range API functions in object.d even worse, since selective imports would
only work with std.range.primitives and not std.range, meaning that a lot of
the code that currently uses selective imports would still break, though
if/when that bug is fixed, then that problem goes away.

Of course, if we decide that it's too much trouble to put the range API
functions for arrays in object.d, we could solve the problem of the old
basic input ranges looking like the new forward ranges by renaming only a
single function - e.g. empty to isEmpty - since even a single difference in
the API would be enough to make them different.

Alternatively, we could add an enum of some kind to the new range API to
make it different. It would be kind of ugly, but it would allow us to keep
all of the current function names - though we'd still have to import a
module to use arrays as ranges, so that problem wouldn't be fixed.

Of course, as annoying as it would be to rename the range API functions, it
_would_ have the advantage that you could see at a glance that code was
using the new range API rather than the old one. And that could definitely
be valuable. It also means that if you accidentally use the range traits
from the wrong version of the API, you'll get errors _very_ quickly, since
rather than being partially compatible, they won't be compatible at all.

So, while I'm definitely not happy about renaming the range API functions,
after having considered the options, I think that it's the best path
forward.

Of course, if we rename them, that unfortunately also opens up the door for
bikeshedding, but I think that first, popFirst, isEmpty, last, and popLast
are pretty obvious choices, and they correspond well to the old names.

Either way, it's my intention that Phobos v3 will have wrappers such that
you can easily convert a range from the v2 API to a range that works with
the v3 API and vice versa. In all likelihood, the main annoyance with the
new function names will simply be getting used to them. It'll be easy enough
to update the names if code is ported from v2 to v3, and any such ranges
will need to be looked over to make sure that they match the new
requirements anyway (though most ranges probably already do fulfill most of
the added requirements in this proposal).

Conclusion
----------

So, that covers all of the major points that I can think of right now. Of
course, there will be other minor changes (e.g. I'll probably rename
isForwardRange to isCopyableRange), but those are the core changes, and this
document is already ridiculously long as it is. However, I think that it's
covered what really needs to be covered.

I do need to put together a specification which lists the range API in its
entirety rather than as a proposal for what to change and why to change it
(if nothing else, because some form of that needs to be part of the
documentation), but this document is already ridiculously long without that,
so I'm leaving it as just the list of problems and proposed changes rather
than including a full spec.

In any case, have fun picking apart and complaining about the proposal. :)

- Jonathan M Davis

May 16 2024

"Richard (Rikki) Andrew Cattermole" <richard cattermole.co.nz> writes:

Two things I suggest we consider, given the write up:

1. Automatic boxing of slices into a struct, this was something I was 
considering for signatures specifically for ranges.
2. Enable passing the this pointer on a class explicitly, coupled with a 
few final methods you could handle when the class is null and then 
forward to the implementation.

May 16 2024

Jonathan M Davis <newsgroup.d jmdavisprog.com> writes:

On Thursday, May 16, 2024 9:33:26 AM MDT Richard (Rikki) Andrew Cattermole via 
Digitalmars-d wrote:
 Two things I suggest we consider, given the write up:

 1. Automatic boxing of slices into a struct, this was something I was
 considering for signatures specifically for ranges.

What would that buy us? I don't see how it adds any value.

I guess that having a struct would negate the need to import the range API
functions for arrays, but instead, we'd have to import the struct (or put it
in object.d), and then we'd have to explicitly wrap arrays, whereas right
now, all you need is an import to treat arrays as ranges, and with my
proposed changes, you wouldn't even need that anymore.

Also, considering that ranges are basically supposed to be an abstraction of
dynamic arrays / slices, similar to how C++ iterators are an abstraction of
pointers, having to wrap dynamic arrays in structs is arguably taking things
in the wrong direction. It would be making it so that dynamic arrays aren't
ranges instead of making ranges in general more like dynamic arrays.

It would also get really annoying to have to constantly wrap arrays just to
pass them to range-based code - and it would get extra awkward in the cases
where a function that used to operate on an array was changed to operate on
a range instead, since then you'd either be forced to treat arrays
differently than normal with such a function or break code that used to pass
arrays, forcing it to wrap them in a struct. And since strings are dynamic
arrays, all of this would apply to strings too.

From what I can see, trying to stop dynamic arrays / slices as ranges just

makes things worse.

 2. Enable passing the this pointer on a class explicitly, coupled with a
 few final methods you could handle when the class is null and then
 forward to the implementation.

That would destroy our ability to rely on the copy or assignment semantics
of ranges. The whole point of disallowing pointers or classes as ranges
would be so that we can rely on forward ranges being copied without needing
a function like save - and so that we can rely on the init value being
valid. Having free functions to handle the case where a class reference was
null would fix the init problem (at the cost of needing to import those
functions), but it doesn't solve the problem that we cannot have reliable
copy or assignment semantics if we allow reference types to be ranges.

Requiring that class references be wrapped in structs allows us to have
reliable copy and assignment semantics without losing any functionality, and
it makes the semantics of ranges in general more consistent - and closer to
that of dynamic arrays / slices. It's already the case that ranges which are
classes are pretty rare. Some use cases require them, but the vast majority
of range-based code uses structs. I really don't see any value in trying to
support naked classes as ranges when it's fairly straightforward to just
wrap them in structs and get rid of any special cases that they would
otherwise cause. And since you typically use the functions in
std.range.interfaces to generate class ranges anyway, the resulting code
would be pretty similar to what anyone using class ranges would be doing
now.

It seems like you're proposing that we start requiring wrapper structs for
some of the most common kinds of ranges, whereas my proposal would be
requiring wrapper structs only for rare ranges - ones which are already
typically generated using helper functions.

- Jonathan M Davis

May 16 2024

"Richard (Rikki) Andrew Cattermole" <richard cattermole.co.nz> writes:

On 17/05/2024 4:22 AM, Jonathan M Davis wrote:
 It seems like you're proposing that we start requiring wrapper structs for
 some of the most common kinds of ranges, whereas my proposal would be
 requiring wrapper structs only for rare ranges - ones which are already
 typically generated using helper functions.

Required, yes, automatic yes.

The language could help here, not a pleasant proposal but it would be 
do-able to have the language auto-wrap into a boxed type. Given ranges 
are first class in D, it'll be worth it.

```d
import core.attributes : autobox;

struct Map( autobox T) {
	this( autobox T val);
}
```

```d
module core.boxing.slice;

struct BoxedSlice(T) {
	T[] slice;
	alias slice this;

	bool empty() {
		return slice.length == 0;
	}
}
```

The reason I am suggesting this is we clearly don't like the solutions 
being proposed for the range functions for slices. So lets change it to 
a problem we do like and can solve.

May 16 2024

Jonathan M Davis <newsgroup.d jmdavisprog.com> writes:

On Thursday, May 16, 2024 10:48:26 PM MDT Richard (Rikki) Andrew Cattermole 
via Digitalmars-d wrote:
 On 17/05/2024 4:22 AM, Jonathan M Davis wrote:
 It seems like you're proposing that we start requiring wrapper structs for
 some of the most common kinds of ranges, whereas my proposal would be
 requiring wrapper structs only for rare ranges - ones which are already
 typically generated using helper functions.

 Required, yes, automatic yes.

 The language could help here, not a pleasant proposal but it would be
 do-able to have the language auto-wrap into a boxed type. Given ranges
 are first class in D, it'll be worth it.

 ```d
 import core.attributes : autobox;

 struct Map( autobox T) {
   this( autobox T val);
 }
 ```

 ```d
 module core.boxing.slice;

 struct BoxedSlice(T) {
   T[] slice;
   alias slice this;

   bool empty() {
       return slice.length == 0;
   }
 }
 ```

 The reason I am suggesting this is we clearly don't like the solutions
 being proposed for the range functions for slices. So lets change it to
 a problem we do like and can solve.

I see nothing positive about having to wrap arrays. Treating arrays as
ranges is far more user-friendly than that and has worked very well for us
overall. It's just annoying to have to import a module in order to be able
to use the range API with arrays, and auto-decoding was a mistake. Other
than that, it really hasn't been an issue.

In order to get rid of save, the range API has to change enough that the old
basic input ranges don't look like the new forward ranges, so it can't stay
identical to what it is now regardless. Changing the names of the range API
functions takes care of that, _and_ it allows us to put the new range API
functions for dynamic arrays in object.d without breaking a ton of code.

Yes, having to rename the functions is annoying, but it also has the
advantage of making it clear whether code is using the updated range API
with the adjusted semantics or whether the code was written for the old
range API and does not necessarily conform to the new requirements. So,
renaming the range API functions fixes multiple problems.

Making it so that arrays must be wrapped to be ranges does nothing to fix
the need to at least partially change the range API so that the old basic
input ranges don't look like the new forward ranges. So, it doesn't fix the
need to either rename the functions or add an otherwise pointless enum to
the range API just to indicate that it's not the old range API. And if we're
going to rename them, we might as well just rename them all so that the
difference is clear, and it gives us an easy way to fix the import problem
for treating arrays as ranges.

So, I don't see anything here that wrapping arrays improves. It just causes
additional friction, and if you have to import anything to create the
wrapper, we're right back to where we are now with having to import
std.range.primitives, just with a different module.

I'm also highly skeptical that it would work to convince Walter to add an
auto-boxing feature for this. He specifically complained in a recent DLF
meeting about how we can't just use arrays as ranges with no need to do
anything like import a module. He wants to be able to use the range API with
arrays with no friction, and renaming the range API functions gives us a way
to do that, whereas wrapping them creates additional friction.

On top of that, alias this is almost always a mistake with generic code.
Most generic code tests for specific types, not implicit conversions, and
testing for implicit conversions is error-prone, because the implicit
conversion doesn't actually happen just because it's tested for, and even if
it does happen, because the code forced the conversion, it happens inside
the function rather than at the call site, which risks all kinds of issues
(not as many with dynamic arrays as with some other types, but implicit
conversions and template constraints are typically a very bad combination).
So, adding a wrapper type is going to cause all kinds of problems with
template constraints and which overloads get used. The whole situation is
just much simpler if we don't wrap dynamic arrays.

All in all, I do not understand why you think that wrapping arrays improves
anything. From what I can see, it's just purely worse.

Yes, Walter has to be convinced to add first/popFirst/isEmpty to foreach in
addition to front/popFront/empty for the proposed range API to work with
foreach, but given that it explicitly fixes a problem that he was
complaining about, and it's a very simple change, I expect that he'll
approve it. But even if he doesn't, and we're forced to do something like
put an otherwise pointless enum on the new range API to distinguish new
ranges from the old ones, I'd still rather be importing the replacement for
std.range.primitives than have to deal with a wrapper type.

- Jonathan M Davis

May 16 2024

"Richard (Rikki) Andrew Cattermole" <richard cattermole.co.nz> writes:

On 17/05/2024 6:40 PM, Jonathan M Davis wrote:
 All in all, I do not understand why you think that wrapping arrays improves
 anything. From what I can see, it's just purely worse.

Given the problems listed it is a possible solution.

That doesn't mean it's better. But it is a common enough solution in PL 
design space that it is worth considering.

For stuff like this I want to evaluate ideas not just to get to the 
first good one that could work, but also to make sure it actually is the 
best candidate.

May 16 2024

Jonathan M Davis <newsgroup.d jmdavisprog.com> writes:

On Friday, May 17, 2024 12:53:07 AM MDT Richard (Rikki) Andrew Cattermole via 
Digitalmars-d wrote:
 On 17/05/2024 6:40 PM, Jonathan M Davis wrote:
 All in all, I do not understand why you think that wrapping arrays
 improves
 anything. From what I can see, it's just purely worse.

 Given the problems listed it is a possible solution.

 That doesn't mean it's better. But it is a common enough solution in PL
 design space that it is worth considering.

 For stuff like this I want to evaluate ideas not just to get to the
 first good one that could work, but also to make sure it actually is the
 best candidate.

I have no problem with it being presented and evaluated, but I don't see any
reason why it would be better. It seems like it's just worse.

- Jonathan M Davis

May 17 2024

"H. S. Teoh" <hsteoh qfbox.info> writes:

On Fri, May 17, 2024 at 03:33:26AM +1200, Richard (Rikki) Andrew Cattermole via
Digitalmars-d wrote:
 Two things I suggest we consider, given the write up:
 
 1. Automatic boxing of slices into a struct, this was something I was
 considering for signatures specifically for ranges.

[...]

What's the purpose of this, other than to add friction to use built-in
arrays as ranges?  What do we get in return for paying this price?


T

-- 
Error: Keyboard not attached. Press F1 to continue. -- Yoon Ha Lee, CONLANG

May 17 2024

"Richard (Rikki) Andrew Cattermole" <richard cattermole.co.nz> writes:

On 18/05/2024 2:28 AM, H. S. Teoh wrote:
 On Fri, May 17, 2024 at 03:33:26AM +1200, Richard (Rikki) Andrew Cattermole
via Digitalmars-d wrote:
 Two things I suggest we consider, given the write up:

 1. Automatic boxing of slices into a struct, this was something I was
 considering for signatures specifically for ranges.

 [...]
 
 What's the purpose of this, other than to add friction to use built-in
 arrays as ranges?  What do we get in return for paying this price?

There would be no friction for users of ranges in PhobosV3 they would 
all be annotated to turn on the boxing automatically.

User made ones would need annotating however, that could be a bit of a pain.

It is a pain for user made ones already in v2, since you have to make 
sure you have the right import and don't use it in a way that UFCS 
cannot work.

So it solves a number of issues like conflicting definitions as 
described by J.M.D.

However he isn't keen on it, so as a proposal it'll probably end at my 
suggestion to consider it.

May 17 2024

monkyyy <crazymonkyyy gmail.com> writes:

On Thursday, 16 May 2024 at 14:56:55 UTC, Jonathan M Davis wrote:
 wall-o-text
 no function header lists or patterns

????

 no reference at all to keyed ranges

????

 explict range starters for unicode instead of autodecoding

ok, thats *half* the problem; how does `hello 😀 
world`.byUnicode(flag.charIndex).indexOf('w')` count correctly? 
The existing range api discards information that is otherwise 
trivial to have

the old solution of dchars and autodecoding failed; whats your 
proposal for the unicode problem on the *dchar* side of the 
problem where it was believed that dchar would simplify all use 
cases of unicode into simple indexing

 There is no guarantee that the init value of a range is even 
 valid

Thats not what init means in this language; I believe it should 
be `.zero`, optional and part of a larger change were float.zero 
is 0.0; see my dip

 8. $ cannot be used with random-access ranges in generic code, 
 because the range API does not require that a random-access 
 range define opDollar

I believe you should be much much much more spefic

is [min(i,$-1)] supported? is [$..0] supported?

the slicing api is a rabbit hole that needs allot of care

May 16 2024

Jonathan M Davis <newsgroup.d jmdavisprog.com> writes:

On Thursday, May 16, 2024 11:05:56 AM MDT monkyyy via Digitalmars-d wrote:
 On Thursday, 16 May 2024 at 14:56:55 UTC, Jonathan M Davis wrote:
 explict range starters for unicode instead of autodecoding

 ok, thats *half* the problem; how does `hello 😀
 world`.byUnicode(flag.charIndex).indexOf('w')` count correctly?
 The existing range api discards information that is otherwise
 trivial to have

 the old solution of dchars and autodecoding failed; whats your
 proposal for the unicode problem on the *dchar* side of the
 problem where it was believed that dchar would simplify all use
 cases of unicode into simple indexing

The new range API is quite explictly _not_ trying to do any explicit Unicode
handling. It's doing what has been discussed for years, which is to leave
that up to the programmer and whatever algorithms they choose to use.
Whether it's best to operate at the code unit, code point, or grapheme level
depends on the operation, and issues such as normalization complicate the
situation considerably with regards to any attempt to provide a one size
fits all solution for indexing into graphemes. It's also not efficient to
operate at the grapheme level by default.

The replacement to std.utf will provide the tools to convert from code units
to code points, just like std.utf does now, with the difference being that
the programmer will no longer have to work around the range API functions to
try to prevent decoding from happening automatically.

Code that then needs to operate at the grapheme level will need to use the
replacement for std.uni, and that can include a random-access range of
graphemes. But it won't be the default way to operate, because that's
inefficient, and no normalization scheme actually works in all cases, making
selecting a default problematic.

We may choose to include a string type in Phobos which provides an API that
allows you to operate at the grapheme level by default, but dynamics arrays
of code units will be treated as ranges of code units. The problems that
we've had with auto-decoding have stemmed from trying to treat them as
anything else.

So, we can build whatever useful algorithms or types we want on top of the
built-in strings, and we may be able to provide some better solutions than
we currently have for dealing with graphemes, but the built-in strings will
not have any kind of special-casing as part of the range API. Any ranges of
graphemes will be types of their own and will not be the built-in strings
even if they may wrap the built-in strings.

 8. $ cannot be used with random-access ranges in generic code,
 because the range API does not require that a random-access
 range define opDollar

 I believe you should be much much much more spefic

 is [min(i,$-1)] supported? is [$..0] supported?

 the slicing api is a rabbit hole that needs allot of care

Finite random-access ranges will need to support the same operations with $
that dynamic arrays do, since the whole point here is to access them in the
same manner as dynamic arrays, and operations such as [$ .. 0] never make
sense, so no, that won't be supported. And no additional syntax for $ is
being added. It's purely for indexing and slicing. The current situation
makes it so that we can't use $ at all with ranges in generic code, because
there is no requirement that it be supported, whereas ideally, we'd be able
to use it in the same way that you would with dynamic arrays. So,
isRandomAccessRange will test to make sure that [$ - 1] compiles and returns
the correct type, and hasSlicing will test to make sure that [0 .. $] and
[0 .. $ - 1] compile and return the correct type. It will be an additional
requirement that $ be equivalent to length, but we can't statically test for
that.

Infinite random-access ranges will not support $ for opIndex, because they
can't and be infinite. They will support it for slicing, but they won't be
able to support doing arithmetic on $, because that makes no sense with an
infinite range. $ will merely be used to indicate that slice is supposed to
go to the end of the range (and therefore that the result will be infinite)
rather than resulting in a finite slice as would occur if only indices are
used.

$ is actually already partially tested for with hasSlicing and infinite
ranges where it requires that opSlice return the same range type if opDollar
is used, whereas it's expected to return a finite range if only indices are
used. However, since there's no requirement that $ work at all, it can't
actually be used in generic code even if you know that the range is
infinite. It's just that hasSlicing requires that the result be the same if
$ is implemented to work with slicing, which is therefore kind of a
pointless test as things stand.

- Jonathan M Davis

May 16 2024

monkyyy <crazymonkyyy gmail.com> writes:

On Thursday, 16 May 2024 at 22:21:54 UTC, Jonathan M Davis wrote:
 The new range API is quite explictly _not_ trying to do any 
 explicit Unicode handling.

my proposal of adding .key to the range api isnt unicode spefic 
either

 It's also not efficient to operate at the grapheme level by 
 default.

Anything lazy will be fine; the dstring conversion was slow 
because it had to allocate eagerly

 and no normalization scheme actually works in all cases, making 
 selecting a default problematic.

Im very much for "range starters"; my example code is wildly 
verbose for me and had one

but I view it as *only half the problem*; strings index weird, 
this is trivial to deal with if you have two counters in the same 
abstraction; I believe if effencly impossible to handle by piling 
on more complexity

 So, we can build ***whatever useful algorithms*** or types we 
 want on top of the built-in strings


*I disagree*; I believe the majority of searching should be 
replaced with an "indexes" function

`auto indexes(alias F,R)(R r)=>r.filter!F.swapkeyvalue;`

such a function depends on having a .key concept in the api; and 
then you have the unicode range have apis for selecting the right 
kind of key-value for your range and it will all just work

 [$ .. 0] never make sense, so no, that won't be supported.

if I define a complex opDollar, I can easily imagine it being 
seen as the "indexing" type, when in fact im using it it to slice 
on a bidirectional datastruct that maybe knows how to handle $/2

```d
struct mydata{
   auto opDollar(){
     struct dollar{
       opBinary!"-"
       opBinary!"/"
```

if the range api makes an assumption that 
`range.opSlice!(typeof(range.opDollar),typeof(range.opDollar))` 
this may break unnecessarily; where opDollar is used in opslicing 
matters, also like add $/2 to the wishlist

is [min(i,$-1)] supported?

_____

while I know such slicing probably isnt on your rader for being 
to magic I unironically like how well it works for the base slice

May 16 2024

Jonathan M Davis <newsgroup.d jmdavisprog.com> writes:

On Thursday, May 16, 2024 7:30:06 PM MDT monkyyy via Digitalmars-d wrote:
 if I define a complex opDollar, I can easily imagine it being
 seen as the "indexing" type, when in fact im using it it to slice
 on a bidirectional datastruct that maybe knows how to handle $/2

 ```d
 struct mydata{
    auto opDollar(){
      struct dollar{
        opBinary!"-"
        opBinary!"/"
 ```

 if the range api makes an assumption that
 `range.opSlice!(typeof(range.opDollar),typeof(range.opDollar))`
 this may break unnecessarily; where opDollar is used in opslicing
 matters, also like add $/2 to the wishlist

Essentially, that requires that opDollar define the common arithmetic
operations (except maybe multiplication). I'll have to think about the edge
cases to decide how reasonable that is (though the fact that you can often
use a single opBinary for all of the arithmetic operations helps). Whatever
the operations are, _every_ finite range that defines indexing or slicing
would then have to define all of them in order for generic code to be able
to rely on them.

Fortunately, in most cases, opDollar should just be an alias to length, so
it may be reasonable, much as it could be annoying for the corner cases.

is [min(i,$-1)] supported?

_____

 while I know such slicing probably isnt on your rader for being
 to magic I unironically like how well it works for the base slice

I was not aware that $ could be used anywhere except directly within the
indexing or slicing expression (or if I knew, I forgot). I guess that the
compiler lets it be used anywhere between the brackets, and since the result
is just size_t with arrays, it works just fine with min. A user-defined type
would have to implement opCmp for that to work. So having min work would
mean requiring opCmp on top of the arithmetic operations. That might be
reasonable, but it definitely adds to the complexity of implementing
opDollar for types that can't just alias length, though realistically, there
aren't going to be many finite ranges which implement opDollar another way.

So, I'll have to think about it further, but whatever the operations are
that ranges will be able to use with $ in generic code will be required by
the API and documented accordingly (if not, there would be no way to rely on
being able to use them in generic code).

- Jonathan M Davis

May 16 2024

monkyyy <crazymonkyyy gmail.com> writes:

On Friday, 17 May 2024 at 04:48:24 UTC, Jonathan M Davis wrote:
 Whatever the operations are, _every_ finite range that defines 
 indexing or slicing would then have to define all of them in 
 order for generic code to be able to rely on them.

I disagree; slicing should be separate from indexing (and keys)

a hashmap can trivially provide a key, and reaccess the value, 
but it probaly cant slice.

randomAccessRanges is a copy and paste with a clean up of c++ 
with its pointers, we have base types that airnt pointers.

I prefer just a free-for-all of optional functions, pre-emptive 
compromise: bidirectionaly, key and indexing(by key), numeric 
slicing; should be independent feature sets

May 16 2024

Ogi <ogion.art gmail.com> writes:

On Thursday, 16 May 2024 at 14:56:55 UTC, Jonathan M Davis wrote:
 Alternatively, we could add an enum of some kind to the new 
 range API to make it different. It would be kind of ugly, but 
 it would allow us to keep all of the current function names

Actually, this option worth exploring.

We can see now that implicit interfaces have downsides. When the 
interface changes, we are forced to jump through hoops to avoid 
problems. And what would will do if at some point in future we 
want to change API again? Come up with another set of names? 
Versioning would solve this once and for all.

If you think that enum is ugly, we can use an UDA instead. This 
way, ranges that use the new API will be instantly recognizable.

May 17 2024

"H. S. Teoh" <hsteoh qfbox.info> writes:

On Fri, May 17, 2024 at 01:22:48PM +0000, Ogi via Digitalmars-d wrote:
 On Thursday, 16 May 2024 at 14:56:55 UTC, Jonathan M Davis wrote:
 Alternatively, we could add an enum of some kind to the new range
 API to make it different. It would be kind of ugly, but it would
 allow us to keep all of the current function names

 
 Actually, this option worth exploring.
 
 We can see now that implicit interfaces have downsides. When the
 interface changes, we are forced to jump through hoops to avoid
 problems. And what would will do if at some point in future we want to
 change API again? Come up with another set of names? Versioning would
 solve this once and for all.
 
 If you think that enum is ugly, we can use an UDA instead. This way,
 ranges that use the new API will be instantly recognizable.

[...]

+1, I like the UDA idea.  It will be much less effort to migrate to the
new range API to slap a UDA on, than to rename every range method.


T

-- 
Bare foot: (n.) A device for locating thumb tacks on the floor.

May 17 2024

Jonathan M Davis <newsgroup.d jmdavisprog.com> writes:

On Friday, May 17, 2024 8:30:47 AM MDT H. S. Teoh via Digitalmars-d wrote:
 On Fri, May 17, 2024 at 01:22:48PM +0000, Ogi via Digitalmars-d wrote:
 On Thursday, 16 May 2024 at 14:56:55 UTC, Jonathan M Davis wrote:
 Alternatively, we could add an enum of some kind to the new range
 API to make it different. It would be kind of ugly, but it would
 allow us to keep all of the current function names

 Actually, this option worth exploring.

 We can see now that implicit interfaces have downsides. When the
 interface changes, we are forced to jump through hoops to avoid
 problems. And what would will do if at some point in future we want to
 change API again? Come up with another set of names? Versioning would
 solve this once and for all.

 If you think that enum is ugly, we can use an UDA instead. This way,
 ranges that use the new API will be instantly recognizable.

 [...]

 +1, I like the UDA idea.  It will be much less effort to migrate to the
 new range API to slap a UDA on, than to rename every range method.

It wouldn't fix the import problem. Realistically, it's not going to work to
retain the same function names and have arrays work as ranges without
needing an explicit import. Adding the functions to object.d breaks pretty
much all range-based code in existence due to symbol conflicts, and due to
an issue with how selective imports work, you can't even use selective
imports with std.range to fix it; you have to use selective imports with
std.range.primitives - and of course, you'd have to use selective imports
for every range API function, which would be tedious. Using new function
names makes it so that we don't have any symbol conflicts - in addition to
allowing us to statically distinguish between the old and new APIs. It would
also make it easy to see at a glance which API a piece of code is using.

Also, having the main difference that can be statically checked between the
old and new API be just a UDA makes it much more likely that code will
accidentally be set up for the wrong range API version, because someone
forgot the UDA, and it will make it much riskier to try to write code that's
overloaded on the old and new APIs. It would not surprise me if a number of
library writers will want to make it so that their code works with both APIs
by creating overloads that call the new Phobos functions to wrap an old
range so that it works with the new API.

Some ranges will be easily distinguishable due to a lack of save, but if the
code is written to work on basic input ranges (or it overloads on the
category of the range), then if the primary difference is whether the
programmer remembered to put a UDA on the type or not, the odds are pretty
high that the wrong overload will be used some percentage of the time.
Whether that outright makes the code behave incorrectly or just degrades its
performance is going to be highly dependent on what it's doing, but I'm
afraid that if the main difference between the old and new APIs is a UDA,
it's going to be missed in many cases and cause issues, whereas a different
set of function names makes the difference crystal clear and completely
obviates the risk of a range from one version of the API being mistaken as a
range from the other version.

And maybe it won't be at all common to try to overload code based on which
range API it's using, but I don't think that there's much question that it
will be less error-prone if the two APIs are distinct.

But regardless of how risky it is to have the primary difference simply be a
UDA, I don't see how the import problem can be solved without changing the
names. And Walter has explicitly requested that we fix it so that we no
longer need to import anything for arrays to be treated as ranges.

We could force them to not be ranges at all and use wrapper types like
Rikki is suggesting, but I can't imagine that Walter would be any happier
about that, and having wrappers like that typically causes issues
(especially with type introspection and any code that actually needs to
operate on dynamic arrays), whereas the primary issues with having new names
is that you have to remember to use the new ones when writing code that uses
the new API, and when updating code, you'll have to do a search and replace.

All in all, while renaming the functions is annoying, it definitely will
solve some of the issues that we have, and I don't expect that it will
ultimately be a big issue. And the more I thought about it while working out
the proposal, the more I came to the conclusion that having the difference
between the two API versions be visible at a glance would reduce the number
of problems that we'd have. It makes the introspection situation cleaner, it
makes it much more likely that making a mistake will result in a compilation
error, and it makes it easier to understand what's going on when you read
the code instead of having to dig around to figure out which version of the
API is actually being used. And it allows us to get rid of needing to import
anything to use arrays as ranges, which I don't think that we can actually
do with the current function names.

- Jonathan M Davis

May 17 2024

Dukc <ajieskola gmail.com> writes:

Jonathan M Davis kirjoitti 16.5.2024 klo 17.56:> This is a proposal for 
an updated input range API for Phobos v3.

Good effort writing all this up! I certainly like most of this. However, 
some questions / proposals I do have.

 
 Some of the Problems with Input Ranges
 --------------------------------------
 
 6. The range API does not currently specify what happens if you copy front
 and then call popFront. For the vast majority of code, it is both assumed
 that you can copy front and that the copy of front will be unaffected by
 calling popFront. However, there _are_ ranges (such as std.stdio's ByLine)
 which will do things like use a dynamic array as a buffer which has its
 elements overwritten in popFront, resulting in the elements of the copy
 being mutated, since it's a slice of the same data. Such ranges are
 sometimes referred to as having a transient front, and they do not work with
 a lot of range-based code. As such, they either need to be disallowed, or we
 need a way to detect such behavior so that range-based code can check for
 it.

I think this one is beyond the scope of the range spec. Whether the 
buffer of `ByLine` gets copied on element copy is an issue about the 
element of the range, not about the range itself. If this needs fixing, 
the element type of `byLine` needs to be changed, not the range spec.

Otherwise `front` would need to always deep copy the element on each 
invocation, which would make any range of ranges horribly inefficient!
 
 9. const makes ranges useless. Realistically, a const range is not a range,
 because you can't call popFront, and you can't iterate through it
 (technically, it _is_ possible to have popFront be const in cases where the
 iteration state doesn't actually live in the range, and popFront isn't pure,
 but normal ranges do not work that way).

In principle it goes like: elements being const is a non-issue, it's 
just another element type. Range itself can be const, but has to be cast 
to non-const with const elements to be actually iterable.

In practice, no way! With some care, you can write a range that copes 
with const elements, but even that is surprisingly thorny since a struct 
having a const field runs into all sorts of issues.

But just try designing a range that can be cast between const and tail 
const! Bar for some niche ranges like TakeNone, it would be extremely 
hard if not impossible with the current language.

 
 10. While it's not exactly a problem with the range API, it would be nice if
 we didn't have to import std.range / std.range.primitives (or the Phobos v3
 equivalent) to use arrays as ranges.

Respectful disagree on this one. Ranges are (bar for the foreach range 
API) a Phobos concept, not a language concept. It's good to be able to 
pick whether functions in your module will digest ranges, and if, 
whether it's V2, V3 or some other API. I wish for no changes here.


 
 Overview of Proposed Changes
 ----------------------------
 
 3. If a range is finite (that is, it's not statically known to be empty),
 and it can be default-initialized, then its init value must be empty. If it
 cannot be default-initialized, then it must define a function called
 emptyRange which returns an empty range of the same type. Phobos will then
 define a function called emptyRange which takes a finite range which can be
 default-initialized and returns the init value of that range.

I suggest a slight relaxation. A finite range must have an empty init 
value if it does not provide an explicit `emptyRange`. That is, any 
range, finite or infinite, may have any valid `.init` value, as long as 
it's `.emptyRange` (whether explicit or Phobos-provided) is actually empty.


 4. All ranges must be either dynamic arrays or structs (and not pointers to
 structs either).

Maybe unions too? Not that it's often needed.


 9. In order to deal with transient front, we will explicitly disallow it. It
 will be a requirement that if front is copied, the copy will be unaffected
 by popFront. Of course, this doesn't prevent other references to the same
 data from mutating it (e.g. if it's a reference type), but calling popFront
 will not mutate it.

I'd rather have this one as a recommendation as opposed to something 
other ranges are allowed to assume. That would probably turn some 
current uses of `recurrence` or `generate` to unspecified behaviour, 
silently. In the case of `ByLine`, I agree the proposed design is 
superior but I wouldn't want to declare the present design illegal either.

 
 11. Finite random-access ranges are required to implement opDollar, and
 their opIndex must work with $. Similarly, any ranges which implement
 slicing must implement opDollar, and slicing must work with $.
 
 In most cases, this will just be an alias to length, but barring a language
 change that automatically treats length as opDollar (which has been
 discussed before but has never materialized and is somewhat controversial
 given types where it wouldn't make sense to treat length as opDollar), we
 have to require that opDollar be defined, or generic code won't be able to
 use $ with indexing or slicing. We probably would have required it years ago
 except that it would have broken code to add the requirement.

Why would any language changes be needed for that? Can't `.length` be a 
Phobos function for types that have `opDollar` but not their own `.length`?


 Of course, there will be other minor changes (e.g. I'll probably rename
 isForwardRange to isCopyableRange),

I would have somewhat high bar for renaming that but leaving the other 
range names alone. I understand the names input range, forward range, 
bidirectional range and random access range all come from C++ iterators 
that are named exactly the same. So maybe best to either stick with the 
standard we follow, or break away in full and reconsider the names of 
all of them.

May 18 2024

Jonathan M Davis <newsgroup.d jmdavisprog.com> writes:

On Saturday, May 18, 2024 2:48:51 PM MDT Dukc via Digitalmars-d wrote:
 6. The range API does not currently specify what happens if you copy front
 and then call popFront. For the vast majority of code, it is both assumed
 that you can copy front and that the copy of front will be unaffected by
 calling popFront. However, there _are_ ranges (such as std.stdio's ByLine)
 which will do things like use a dynamic array as a buffer which has its
 elements overwritten in popFront, resulting in the elements of the copy
 being mutated, since it's a slice of the same data. Such ranges are
 sometimes referred to as having a transient front, and they do not work
 with a lot of range-based code. As such, they either need to be
 disallowed, or we need a way to detect such behavior so that range-based
 code can check for it.

 I think this one is beyond the scope of the range spec. Whether the
 buffer of `ByLine` gets copied on element copy is an issue about the
 element of the range, not about the range itself. If this needs fixing,
 the element type of `byLine` needs to be changed, not the range spec.

 Otherwise `front` would need to always deep copy the element on each
 invocation, which would make any range of ranges horribly inefficient!

It is something that has proven to be a problem. Many range-based functions
will behave incorrectly if they copy front and then popFront mutates the
copy. The range API needs to lock down the semantics involved for
range-based code to actually be able to rely on them. Any range-based code
which makes assumptions which aren't guaranteed by the range API risks
breaking with types that do not follow those assumptions, and that's
something that has been causing us problems for years, because not enough
was locked down. And the possibility of front being transient is one of
those issues that was missed when the range API was originally designed.

If deep-copying elements is an issue, then pointers or reference types
should be used - or the elements should be non-copyable. We are improving
the support for non-copyable types as part of this (whereas right now, most
range-based code assumes that the elements are copyable), but experience has
shown that allowing popFront to mutate the copy of front causes problems,
and the rare case where something like that is actually desirable has
alternate solutions.

In the case of byLine, it can use non-copyable elements, or it can use
opApply, or it could use some sort of reference-counted type so that it can
detect whether there are currently any copies which would be affected if the
buffer was reused. There is no reason why it has to use its current
solution.

Ultimately though, IMHO, it would be better for such a type to not even be a
range than to make it so that range-based code has to worry about this
issue. Experience has shown that it causes problems for most range-based
code, and it really doesn't add much value.

 9. const makes ranges useless. Realistically, a const range is not a
 range,
 because you can't call popFront, and you can't iterate through it
 (technically, it _is_ possible to have popFront be const in cases where
 the
 iteration state doesn't actually live in the range, and popFront isn't
 pure, but normal ranges do not work that way).

 In principle it goes like: elements being const is a non-issue, it's
 just another element type. Range itself can be const, but has to be cast
 to non-const with const elements to be actually iterable.

Casting away const and then mutating the result violates D's type system.
The only cases where it doesn't is when the cast actually copies the type.
So, as soon as you have something like a range of const ranges, you're
screwed.

 In practice, no way! With some care, you can write a range that copes
 with const elements, but even that is surprisingly thorny since a struct
 having a const field runs into all sorts of issues.

 But just try designing a range that can be cast between const and tail
 const! Bar for some niche ranges like TakeNone, it would be extremely
 hard if not impossible with the current language.

Fixing the issue of tail-const realistically requires a language DIP, and
Walter's DIP should make it so that a const range will be able to be
converted to a range of const elements, whereas right now, that only works
with dynamic arrays, because technically, const(Range!Foo) and
Range!(const Foo) are completely different and unrelated types as far as the
type system is concerned (technically, they could even have completely
different implementations in spite of the fact that they come from the same
template).

 10. While it's not exactly a problem with the range API, it would be nice
 if we didn't have to import std.range / std.range.primitives (or the
 Phobos v3 equivalent) to use arrays as ranges.

 Respectful disagree on this one. Ranges are (bar for the foreach range
 API) a Phobos concept, not a language concept. It's good to be able to
 pick whether functions in your module will digest ranges, and if,
 whether it's V2, V3 or some other API. I wish for no changes here.

Well, Walter has specifically requested that we fix it so that the import is
no longer required, and a lot of us would love to get rid of it. Either way,
this will have no effect on the v2 API, because the functions for the v3 API
will be different. And since they only affect ranges, the only code that it
would potentially cause problems for is code that wants to name one or more
free functions which take dynamic arrays with exactly the same names.

Anyone who wants to create their own range API will still be free to do so.

 Overview of Proposed Changes
 ----------------------------

 3. If a range is finite (that is, it's not statically known to be empty),
 and it can be default-initialized, then its init value must be empty. If
 it
 cannot be default-initialized, then it must define a function called
 emptyRange which returns an empty range of the same type. Phobos will then
 define a function called emptyRange which takes a finite range which can
 be
 default-initialized and returns the init value of that range.

 I suggest a slight relaxation. A finite range must have an empty init
 value if it does not provide an explicit `emptyRange`. That is, any
 range, finite or infinite, may have any valid `.init` value, as long as
 it's `.emptyRange` (whether explicit or Phobos-provided) is actually empty.

That means that you cannot rely on a default-initialized finite range being
valid and empty, which is part of the whole point here. Having that
requirement will reduce bugs, and the lack of that requirement has caused
bugs, particularly since plenty of folks end up assuming that the init value
is empty even when there is no such guarantee. Right now, there isn't even a
guarantee that it's valid, let alone empty.

At least when a range cannot be default-initialized, you get an error when
you attempt instead of just getting buggy behavior. Requiring emptyRange in
that case then allows us a way to get an empty range in spite of the fact
that we can't rely on the range being default-initializable, but by
requiring that finite ranges that can be default-initialized have an init
value that's both valid and empty will prevent bugs due to folks assuming
that the range is valid and empty like has already been happening. And since
the vast majority of ranges can be default-initialized, emptyRange will only
need to be declared in rare cases, and the free function, emptyRange, will
take care of the cases where the range can be default-initialized.

 4. All ranges must be either dynamic arrays or structs (and not pointers
 to
 structs either).

 Maybe unions too? Not that it's often needed.

Unions can't have functions, so they couldn't directly be ranges anyway. But
even if they could, passing around unions is just begging for trouble. They
don't have postblit or copy constructors even when their members would need
it (since the union doesn't keep track of which of its members is the valid
one), and for code to work properly, it really needs to access whichever the
currently valid member of the union is anyway. Trying to do anything with
unions and ranges would just be begging for bugs.

 11. Finite random-access ranges are required to implement opDollar, and
 their opIndex must work with $. Similarly, any ranges which implement
 slicing must implement opDollar, and slicing must work with $.

 In most cases, this will just be an alias to length, but barring a
 language
 change that automatically treats length as opDollar (which has been
 discussed before but has never materialized and is somewhat controversial
 given types where it wouldn't make sense to treat length as opDollar), we
 have to require that opDollar be defined, or generic code won't be able to
 use $ with indexing or slicing. We probably would have required it years
 ago except that it would have broken code to add the requirement.

 Why would any language changes be needed for that? Can't `.length` be a
 Phobos function for types that have `opDollar` but not their own `.length`?

Right now, generic code canot use $ with ranges, because the range API does
not require that it be there, and there are many, many ranges which do not
implement it even if they're random-access and/or have slicing. So, generic
code cannot do something like

    auto slice = range[5 .. $ - 1];

or

    auto value = range[$ - 7];

For that to work, we have to actually require that $ work properly for
ranges with slicing and/or random-access. Right now, you're forced to do
stuff like

    auto slice = range[5 .. range.length - 1];

or

    auto value = range[range.length - 7];

which is far more verbose and does not play well with converting code that
operates on dynamic arrays to code that operates on random-access ranges
instead. hasSlicing and isRandomAccessRange likely would have reqired
opDollar years ago, but it wasn't there initially, and adding it later would
have broken too much code.

With the current language, for $ to work with a user-defined type for either
indexing or slicing, that type must implement opDollar, because that's what
the language replaces $ with. There is no other way to have $.

It has been suggested in the past that we make it so that the language
treats length as opDollar if it's present and opDollar is not present, which
would make it so that all random-access ranges automatically got a working
$. However, it has also been brought up that there are types which have both
indexing and length but where it would be inappropriate for length to be
treated as opDollar - the prime example being hash tables / maps where $
wouldn't make any sense at all.

So, barring a language change, the most straightforward way for a finite
range to implement opDollar is just to alias length to opDollar, and finite
ranges without length shouldn't have opDollar anyway. But infinite ranges
will have to define their own opDollar regardless, since they need one
that's a type that just marks the end when slicing, and it makes no sense
for their opDollar to be size_t, since size_t is not infinite.

The proposal here is basically just to require opDollar with slicing and
indexing like we should have originally - and would have if it hadn't been
forgotten.

- Jonathan M Davis

May 18 2024

Quirin Schroll <qs.il.paperinik gmail.com> writes:

On Saturday, 18 May 2024 at 20:48:51 UTC, Dukc wrote:
 Jonathan M Davis kirjoitti 16.5.2024 klo 17.56:> This is a
 10. While it's not exactly a problem with the range API, it 
 would be nice if
 we didn't have to import std.range / std.range.primitives (or 
 the Phobos v3
 equivalent) to use arrays as ranges.

 Respectful disagree on this one. Ranges are (bar for the 
 foreach range API) a Phobos concept, not a language concept. 
 It's good to be able to pick whether functions in your module 
 will digest ranges, and if, whether it's V2, V3 or some other 
 API. I wish for no changes here.

The range API is baked into the language through `foreach` 
semantics. It’s not a Phobos feature, it’s a core language 
feature.

Jun 04 2024

Steven Schveighoffer <schveiguy gmail.com> writes:

On Thursday, 16 May 2024 at 14:56:55 UTC, Jonathan M Davis wrote:

 5. Forward ranges will no longer have save. Rather, it will be 
 required that copying a forward range will result in an 
 independent copy. Both the original and the copy must then have 
 the same elements in the same order, and iterating one will not 
 affect the other. So, in essence, copying a forward range must 
 be equivalent to calling save (though the actual implementation 
 could be more clever than that).

So something I just thought of as a drawback here -- `save` gives 
a nice explicit mechanism to copy when passing to a function that 
takes a value by `auto ref`. In other words it *forces* non-ref 
semantics so you do not affect the original.

Since passing to an `auto ref` function lets the compiler decide 
to not make a copy, code that currently uses `save` for this 
purpose doesn't have an equivalent in the new regime.

If we aren't going to have it, we might want to add a specialized 
do-nothing UFCS function which just takes an lvalue and turns it 
into an rvalue. Maybe we just call it `save`! Or maybe just a 
boring `asRvalue`...

-Steve

May 27 2024

Quirin Schroll <qs.il.paperinik gmail.com> writes:

On Thursday, 16 May 2024 at 14:56:55 UTC, Jonathan M Davis wrote:
 […]

Maybe good new names could be `empty`, `front`, and `opPopFront` 
for forward ranges, and `empty` and `opFrontThenPop` (or 
something).

Using `op…` makes sense as those functions get used by 
core-language constructs such as `foreach`. The reason `empty` 
and `front` aren’t `op…` is because those would often be called 
by programmers directly, and people would just define an alias 
anyway. One `op…` function suffices for a new interface.

As a matter of fact, `++range` could lower to 
`range.opPopFront()` or, if that is `void`, `(range.opPopFront(), 
range)` (comma operator). And `*range++` could lower to 
`range.opFrontThenPop`.

Another idea: For a forward range, `front` can be a field or 
property with a setter. The same can be true for input ranges’ 
`opFrontThenPop` which could be overloaded: If `opFrontThenPop()` 
is a getter for `front`, it’s an input range, and if there is 
`opFrontThenPop(rhs)` it’s an output range. Expressions of the 
form `*range++ = rhs` could lower to `opFrontThenPop(rhs)`.

If the names are too range-specific, they could be generalized to 
`opDereferenceUnary`, such that `*obj++` and `*obj--` lower to 
`opDereferenceUnary!"++"` and `opDereferenceUnary!"--"`, 
respectively.

If we go that path, a forward range could be defined by having 
`empty`, `front`, and supporting `++range` (which implements 
`popFront`); an input range be defined by having `empty` and 
supporting `*range++` (possibly by `opDereferenceUnary!"++"` that 
can be called with no arguments), and an output range could be 
defined by having `empty` and supporting `*range++ = rhs` 
(possibly by `opDereferenceUnary!"++"` that can be called with 1 
argument).

Jul 03 2024

Imperatorn <johan_forsberg_86 hotmail.com> writes:

On Thursday, 16 May 2024 at 14:56:55 UTC, Jonathan M Davis wrote:
 Okay. This is my proposal for an updated input range API for 
 Phobos v3. I previously sent it out to some or the core D 
 folks, and the feedback has largely been positive, but I'm now 
 posting it to the newsgroup for wider feedback.

Tales from the AI
-----------------

General Observations:
Length and Clarity:

The proposal is very long and dense, which might deter some 
readers from providing thorough feedback. Consider breaking it 
into more manageable sections or providing a summarized version 
at the beginning.

Some sections could benefit from more concise language. Removing 
redundant phrases and focusing on the core points could enhance 
readability.
Implementation Details:

You mention several times that the exact details need to be 
sorted out. It would be beneficial to provide more concrete 
examples or pseudo-code where possible to clarify your ideas.

Specific Observations and Suggestions:

Auto-decoding:

Your point about auto-decoding is clear, but ensure you emphasize 
the impact on backward compatibility. This change might break a 
lot of existing code, so highlighting strategies for a smooth 
transition would be beneficial.
Save Requirement for Forward Ranges:

You propose removing save and instead relying on copy semantics. 
While this can simplify the API, it's important to provide more 
detailed guidelines on how to implement forward ranges under this 
new requirement. Examples of how to handle common use cases 
without save would help clarify this change.
Copy Semantics for Basic Input Ranges:

Making basic input ranges non-copyable is a significant change. 
Ensure you clearly articulate the benefits versus the drawbacks. 
You might also want to provide migration strategies or helper 
functions for common patterns that currently rely on copying 
input ranges.

Default Initialization:

Requiring all default-initializable ranges to have a valid init 
value is a good move towards safety. However, the discussion on 
whether to disallow finite ranges from disabling default 
initialization seems unresolved. A more definitive stance would 
strengthen the proposal. Consider detailing specific examples of 
how this requirement improves robustness and how to handle 
exceptions.

Transient Front:

Explicitly disallowing transient fronts is a significant change. 
It might help to provide more context or examples of how common 
use cases will be handled under the new rules. For instance, what 
are the alternatives for ranges that currently rely on transient 
fronts?

Const Ranges:

The problem with const ranges is well-articulated, but the 
solution proposed (relying on language improvements) seems 
uncertain. It might be useful to outline potential workarounds or 
interim solutions until language changes are implemented.
Renaming Range API Functions:

Renaming functions like front to first and empty to isEmpty could 
introduce a significant learning curve. Ensure you provide strong 
justifications for these changes and perhaps include a transition 
plan or tools to help developers update their codebases.

Backward Compatibility and Transition Plan:

The proposal introduces many breaking changes. A detailed 
transition plan, including deprecation strategies, versioning 
schemes, and tooling support, would be crucial. Consider 
providing scripts or automated tools to help developers migrate 
their code.

Structural Suggestions:

Summary Section:

Add a summary section at the beginning that lists the key changes 
and their benefits. This helps readers quickly grasp the main 
points before diving into the details.
Separate Concerns:

Consider separating the proposal into multiple documents or 
sections: one for identifying problems with the current API, one 
for the proposed changes, and one for implementation strategies 
and examples.

Examples and Use Cases:

Throughout the document, add more concrete examples and use 
cases. This helps in understanding the practical implications of 
the changes and provides a clearer picture of the benefits and 
challenges.

Community Feedback:

Encourage community feedback by posing specific questions or 
areas where input is particularly needed. This can help focus the 
discussion and gather more targeted feedback.
Overall, your proposal is comprehensive and addresses many 
crucial aspects of the range API. By refining the structure and 
providing more concrete details, you can enhance its clarity and 
make it easier for the community to provide valuable feedback.

Jul 03 2024

Eyal Lotem <eyal weka.io> writes:

Hey,

In continuation to the discussion about how 3-method ranges do 
not apply at the conference, I wanted to refer you to Teodora 
Serbanescu's work on ranges, showing how a single-method range 
protocol is better optimized[1] than the 3 method range protocol.

Also, here's some benchmark code[2] of a repeated composition of 
map/filter, and if you look at the generated assembly, you see 
how well opApply is optimized vs. the phobos range which even in 
-release -O4 remains non-inlined.

Results on my laptop:

**opApplyRange** took **13637** hnsecs
**phobosRange** took **207873** hnsecs

[1]. 
https://www.dropbox.com/scl/fi/bo9kj2jhd209ie503s00e/Teodora-Single-method-range-protocol.pdf?rlkey=p5qnai91xpsw1kme7im16zogy&st=l1phb9k5&dl=0

[2].
```
import std: iota, unaryFun, ForeachType;

struct MapResult(alias _Func, R) {
     alias Func = unaryFun!_Func;
     R underlying;
     int opApply(scope int delegate(ref 
typeof(Func(ForeachType!R.init))) dg) {
         foreach(ref x; underlying) {
             auto res = Func(x);
             if(int rc = dg(res)) return rc;
         }
         return 0;
     }
}

struct FilterResult(alias _Func, R) {
     alias Func = unaryFun!_Func;
     R underlying;
     int opApply(scope int delegate(ref ForeachType!R) dg) {
         foreach(ref x; underlying) {
             if(Func(x)) if(int rc = dg(x)) return rc;
         }
         return 0;
     }
}

auto map(alias Func, R)(R rng) { return MapResult!(Func, R)(rng); 
}
auto filter(alias Func, R)(R rng) { return FilterResult!(Func, 
R)(rng); }

auto benchmark(alias F, Args...)(auto ref Args args) {
     import std: forward, writefln;
     import std.datetime.stopwatch: StopWatch, AutoStart;
     auto sw = StopWatch(AutoStart.yes);
     scope(exit) {
         sw.stop();
         writefln("%s took %s hnsecs", __traits(identifier, F), 
sw.peek.total!"hnsecs");
     }
     return F(forward!args);
}

auto opApplyRange(ref uint res) {
     foreach(x; 
iota(1000000).filter!"a%2".map!"a*2".filter!"a%3".map!"a*2".filter!"a%3".map
"a*2".filter!"a%3") res += x;
}

auto phobosRange(ref uint res) {
     import std: map, filter;
     foreach(x; 
iota(1000000).filter!"a%2".map!"a*2".filter!"a%3".map!"a*2".filter!"a%3".map
"a*2".filter!"a%3") res += x;
}

unittest {
     import std: writeln;
     uint res = 0;
     benchmark!opApplyRange(res);
     benchmark!phobosRange(res);
}
```

Sep 23 2024

monkyyy <crazymonkyyy gmail.com> writes:

On Monday, 23 September 2024 at 08:20:59 UTC, Eyal Lotem wrote:
 showing how a single-method range protocol is better 
 optimized[1] than the 3 method range protocol.
 **opApplyRange**

Is opApply even an iterator interface? Per the talk comparing the 
function count of iterators

https://youtu.be/d3qY4dZ2r4w?si=sRWCtba_njjqrOCP

 phobosRange
 import std: map, filter;

How would you know the difference between phoboes complexity 
confusing the optimizer vs the 3-ness?

----

3 function ranges are good for sane api reasons even if slower, 
much less optional  indexing allowing for binary search, allowing 
in-place sorts etc.

Sep 23 2024

Eyal Lotem <eyal weka.io> writes:

On Monday, 23 September 2024 at 08:56:20 UTC, monkyyy wrote:
 On Monday, 23 September 2024 at 08:20:59 UTC, Eyal Lotem wrote:
 showing how a single-method range protocol is better 
 optimized[1] than the 3 method range protocol.
 **opApplyRange**

 Is opApply even an iterator interface? Per the talk comparing 
 the function count of iterators

 https://youtu.be/d3qY4dZ2r4w?si=sRWCtba_njjqrOCP

It's more restrictive because it can't do zip or some other 
things. I don't suggest using opApply as a replacement. It's 
merely a proof that Phobos ranges are not optimizing as possible

 phobosRange
 import std: map, filter;

 How would you know the difference between phoboes complexity 
 confusing the optimizer vs the 3-ness?

It's my conclusion from looking at the cgast vs asm. You can take 
a look and I'm open to other reasons for this.

 ----

 3 function ranges are good for sane api reasons even if slower, 
 much less optional  indexing allowing for binary search, 
 allowing in-place sorts etc.

This is all possible if iteration uses one method. A single 
method for iteration doesn't preclude other methods for other 
purposes.

Sep 23 2024

Steven Schveighoffer <schveiguy gmail.com> writes:

On Monday, 23 September 2024 at 08:20:59 UTC, Eyal Lotem wrote:
 Hey,

 In continuation to the discussion about how 3-method ranges do 
 not apply at the conference, I wanted to refer you to Teodora 
 Serbanescu's work on ranges, showing how a single-method range 
 protocol is better optimized[1] than the 3 method range 
 protocol.

 Also, here's some benchmark code[2] of a repeated composition 
 of map/filter, and if you look at the generated assembly, you 
 see how well opApply is optimized vs. the phobos range which 
 even in -release -O4 remains non-inlined.

Thank you for posting this!

-Steve

Sep 24 2024

Steven Schveighoffer <schveiguy gmail.com> writes:

On Monday, 23 September 2024 at 08:20:59 UTC, Eyal Lotem wrote:
 Results on my laptop:

 **opApplyRange** took **13637** hnsecs
 **phobosRange** took **207873** hnsecs

I implemented a quick-and-dirty "getNext" style range as outlined 
in the post, which is not the greatest, and doesn't support ref 
ranges.

I also just did a "simple implementation" of Map and Filter as 
ranges.

The results tell an interesting story.

```d
import std: iota, unaryFun, ForeachType;
import std.datetime.stopwatch : benchmark;

struct MapResult(alias _Func, R) {
     alias Func = unaryFun!_Func;
     R underlying;
     auto getNext(ref bool valid) {
         auto x = underlying.getNext(valid);
         if(valid) return Func(x);
         return typeof(Func(x)).init;
     }
     int opApply(scope int delegate(ref 
typeof(Func(ForeachType!R.init))) dg) {
         foreach(ref x; underlying) {
             auto res = Func(x);
             if(int rc = dg(res)) return rc;
         }
         return 0;
     }
}

struct MapResult2(alias _Func, R) {
     alias Func = unaryFun!_Func;
     R underlying;
     auto front() => Func(underlying.front);
     void popFront() => underlying.popFront;
     bool empty() => underlying.empty;
}

struct FilterResult(alias _Func, R) {
     alias Func = unaryFun!_Func;
     R underlying;
     auto getNext(ref bool valid) {
         typeof(underlying.getNext(valid)) x;
         do {
             x = underlying.getNext(valid);
         } while(valid && !Func(x));
         return x;
     }
     int opApply(scope int delegate(ref ForeachType!R) dg) {
         foreach(ref x; underlying) {
             if(Func(x)) if(int rc = dg(x)) return rc;
         }
         return 0;
     }
}

struct FilterResult2(alias _Func, R) {
     alias Func = unaryFun!_Func;
     R underlying;
     this(R underlying) {
         while(!underlying.empty && !Func(underlying.front)) 
underlying.popFront;
         this.underlying = underlying;
     }

     auto front() => underlying.front;
     void popFront() {
         underlying.popFront;
         while(!underlying.empty && !Func(underlying.front)) 
underlying.popFront;
     }

     bool empty() => underlying.empty;
}

// strawman protocol, no ref support.
auto getNext(R)(ref R range, ref bool valid) {
     typeof(range.front()) x;
     if((valid = !range.empty) == true)
     {
         x = range.front;
         range.popFront;
     }
     return x;
}

auto map(alias Func, R)(R rng) { return MapResult!(Func, R)(rng); 
}
auto filter(alias Func, R)(R rng) { return FilterResult!(Func, 
R)(rng); }
auto map2(alias Func, R)(R rng) { return MapResult2!(Func, 
R)(rng); }
auto filter2(alias Func, R)(R rng) { return FilterResult2!(Func, 
R)(rng); }

auto opApplyRange() {
     uint res;
     foreach(x; 
iota(1000000).filter!"a%2".map!"a*2".filter!"a%3".map!"a*2".filter!"a%3".map
"a*2".filter!"a%3") res += x;
     assert(res == 1890804904);
}

auto newRangeStyle() {
     uint res;
     auto r = 
iota(1000000).filter!"a%2".map!"a*2".filter!"a%3".map!"a*2".filter!"a%3".map!"a*2".filter!"a%3";
     bool valid = false;
     while(true) {
         auto x = r.getNext(valid);
         if(!valid) break;
         res += x;
     }
     assert(res == 1890804904);
}

auto phobosRange() {
     uint res;
     import std: map, filter;
     foreach(x; 
iota(1000000).filter!"a%2".map!"a*2".filter!"a%3".map!"a*2".filter!"a%3".map
"a*2".filter!"a%3") res += x;
     assert(res == 1890804904);
}

auto simpleRanges() {
     uint res;
     foreach(x; 
iota(1000000).filter2!"a%2".map2!"a*2".filter2!"a%3".map2!"a*2".filter2!"a%3".map2!
a*2".filter2!"a%3") res += x;
     assert(res == 1890804904);
}


unittest {
     import std: writeln;
     immutable runs = 100;
     auto data = benchmark!(opApplyRange, phobosRange, 
newRangeStyle, simpleRanges)(runs);
     writeln(i"opApplyRange took $(data[0]) for $(runs) runs");
     writeln(i"phobosRange took $(data[1]) for $(runs) runs");
     writeln(i"newRangeStyle took $(data[2]) for $(runs) runs");
     writeln(i"simpleRanges took $(data[3]) for $(runs) runs");
}
```

Some notes here:

1. I used phobos benchmark instead of the custom one. My results 
were very unstable, so I wanted something that was somewhat 
immune from random CPU usage on my laptop.
2. I added an assert to ensure the functions were actually 
working as expected!


Results:
```
opApplyRange took 186 ms, 123 μs, and 9 hnsecs for 100 runs
phobosRange took 1 sec, 345 ms, 67 μs, and 5 hnsecs for 100 runs
newRangeStyle took 83 ms, 706 μs, and 2 hnsecs for 100 runs
simpleRanges took 136 ms and 47 μs for 100 runs
```

Interestingly here, the simple ranges seem properly inlined. So 
somehow the optimizer could see through those despite having 3 
function API, but not the normal ranges? Is there something I did 
special in there that the normal functions prevent?

But even then, the performance of the `getNext` API destroys 
everything. So there is still something to this to be explored.

For sure we should figure out why phobos does so poorly here when 
clearly it's possible to do much better.

My laptop details:
```
LDC - the LLVM D compiler (1.39.0):
   based on DMD v2.109.1 and LLVM 17.0.6
   built with LDC - the LLVM D compiler (1.39.0)
   Default target: arm64-apple-darwin23.5.0
   Host CPU: apple-m1
```
-Steve

Sep 24 2024

monkyyy <crazymonkyyy gmail.com> writes:

On Tuesday, 24 September 2024 at 14:36:05 UTC, Steven 
Schveighoffer wrote:
 For sure we should figure out why phobos does so poorly here 
 when clearly it's possible to do much better.

1000000 lines of code, contracts, bandaids instead of rewrites

Sep 24 2024

Steven Schveighoffer <schveiguy gmail.com> writes:

On Tuesday, 24 September 2024 at 14:36:05 UTC, Steven 
Schveighoffer wrote:
 Results:
 ```
 opApplyRange took 186 ms, 123 μs, and 9 hnsecs for 100 runs
 phobosRange took 1 sec, 345 ms, 67 μs, and 5 hnsecs for 100 runs
 newRangeStyle took 83 ms, 706 μs, and 2 hnsecs for 100 runs
 simpleRanges took 136 ms and 47 μs for 100 runs
 ```

 Interestingly here, the simple ranges seem properly inlined. So 
 somehow the optimizer could see through those despite having 3 
 function API, but not the normal ranges? Is there something I 
 did special in there that the normal functions prevent?

Some more info, my simple filter range pre-computes the first 
element, and then avoids the whole "check if it's initialized" 
dance on `front` and `empty`.

If I add that "feature" to my range, it gets a lot slower, like 
5x slower. Still not as bad as Phobos, but it explains a bit of 
it.

This is the cost of "optimizing" ranges just in case someone does 
something unexpected.

Oh, and someone asked on discord, I am using `-O3` on the build.

-Steve

Sep 24 2024

D Programming

C/C++ Programming

Other

digitalmars.D - Proposed Changes to the Range API for Phobos v3