www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Proposed Changes to the Range API for Phobos v3

reply Jonathan M Davis <newsgroup.d jmdavisprog.com> writes:
Okay. This is my proposal for an updated input range API for Phobos v3. I
previously sent it out to some or the core D folks, and the feedback has
largely been positive, but I'm now posting it to the newsgroup for wider
feedback.

And yes, this is long, but there's a lot to cover, and I want both what's
changing and why it's changing to be clear. Not that anyone expects me to be
short when writing something like this up anyway...
===

This is a proposal for an updated input range API for Phobos v3.

Input ranges as defined in Phobos v2 have worked extremely well for us thus
far, but experience has shown that there are some problems with them, and we
would like to address and fix them with Phobos v3. So, this proposal
attempts to leave input ranges the same as much as reasonable while making
the necessary changes to fix the known problems.

Output ranges will be covered in a separate proposal.

A Note on the Enforcement of Range API Requirements
---------------------------------------------------

As a general note with regards to any requirements that the range API has
(be it the current version or the new one), some requirements can be and are
enforced with the traits for ranges (e.g. isInputRange requires that front
result in a value and will be false if front is void), whereas other
requirements cannot actually be enforced (e.g. that front returns the same
value each time that it's evaluated until popFront is called). As such, the
range API statically enforces what it can but can't enforce everything.

This means that when the range API places a requirement on ranges and their
behavior, and that particular requirement cannot be statically enforced, it
_is_ possible to write a type that violates the range API but passes all of
the associated traits and template constraints. Such a type is considered to
be in violation of the range API, and there are no promises that any
range-based function will behave correctly with such a range (since the
range itself does not have the required behavior). Range-based code in
general is free to assume that any ranges that it's given correctly
implement the range API, and it really doesn't make sense for the situation
to be otherwise.

As such, when this proposal says that a range is required to behave in a
certain way, that does not necessarily mean that that behavior is statically
enforced (just like not all requirements of the current range API are
statically enforced). They will be enforced where practicable, but in some
cases, it simply means that it's a requirement that any ranges must meet if
they're going to work correctly with normal range-based code, and anyone
writing range-based code can assume that those requirements have been met.
As is the case with many things, testing will then be required to make sure
that the code is working correctly with regards to anything that couldn't be
statically enforced.

Most of the changes in this proposal revolve around being stricter with the
behavioral requirements for the range API so that range-based code can
actually rely on certain behaviors of ranges (like their copy semantics),
whereas the current range API has been lax on a number of aspects of the
semantics of ranges, making it impossible for generic range-based code to
rely on those semantics.

Now, on to discussing what actually needs to be changed...

Some of the Problems with Input Ranges
--------------------------------------

1. Auto-decoding has not solved the problems that it was intended to solve
(if nothing else, because graphemes are a thing, and auto-decoding does not
take that into account), and it has caused both performance problems and a
proliferation of complications (in particular with all of the code that
attempts to work around auto-decoding to regain performance and avoid having
strings be wrapped by other range types where possible). So, we clearly
would like to get rid of it.

2. Requiring save for forward ranges has been a source of bugs. Most forward
ranges do not require it, so it is routinely forgotten, causing bugs for any
ranges that do require it. Ideally, we would be able to just rely on copies
of forward ranges being independent and thus not need save at all. It would
also bring the semantics of ranges that much closer to dynamic arrays /
slices, which ranges are effectively trying to be a generalization of, just
like iterators in C++ are effectively a generalization of pointers.

3. The semantics of copying from and assigning to ranges are not defined by
the current range API, and because ranges allow such a variety of types with
such a variety of semantics for copying and assignment, we cannot actually
define the semantics of copying or assignment for ranges with the current
API. This puts some annoying restrictions on writing correct generic code
(e.g. not being able to use a range once it's been copied), and ideally, we
would be able to actually define the semantics of copying and assignment so
that generic code can rely on it.

4. There is no guarantee that the init value of a range is even valid - or
that a range can be default-initialized. This causes problems in a number of
cases but in particular when it's necessary to store a range as a member
variable. One recent example of this is
https://issues.dlang.org/show_bug.cgi?id=24202 where a project incorrectly
relied on std.range.chain's init value being both valid and empty (which it
was previously but then wasn't after some changes were made to chain to
improve its performance). As things stand, code that wants to be able to
store a range as a member variable and to do so completely correctly needs
to do something like wrap it in a std.typecons.Nullable to ensure that the
init value isn't actually used, and of course, no one actually does this or
even really thinks about it. In practice, they end up relying on the
behavior of whatever ranges their code currently happens to use, and they
almost certainly don't realize that they're relying on unspecified behavior
that could change in the future. Ideally, we would be able to rely on the
init value being valid - and preferrably empty (at least for finite ranges)
- which is not possible with the current range API.

5. The range API does not provide any way to get an empty range from ranges
that can't be sliced other than iterating through the entire range (ranges
that can be sliced allow it by using the slice operator with indices that
make the length 0). takeExactly allows us to make an empty range from any
range, but the type changes in the process, which does not work in any
situation where code needs to make a range empty but cannot change the type
(e.g. if it's dealing with a ref parameter or a member variable and needs to
set that variable to an empty range).

6. The range API does not currently specify what happens if you copy front
and then call popFront. For the vast majority of code, it is both assumed
that you can copy front and that the copy of front will be unaffected by
calling popFront. However, there _are_ ranges (such as std.stdio's ByLine)
which will do things like use a dynamic array as a buffer which has its
elements overwritten in popFront, resulting in the elements of the copy
being mutated, since it's a slice of the same data. Such ranges are
sometimes referred to as having a transient front, and they do not work with
a lot of range-based code. As such, they either need to be disallowed, or we
need a way to detect such behavior so that range-based code can check for
it.

7. Related to the issue of transient front is the fact that range-based code
in general tends to assume that front is copyable, meaning that it doesn't
work with non-copyable types. This isn't a problem with the range API per
se, but we probably do need to better formalize what you can assume you can
do with front vs what you have to test for.

8. $ cannot be used with random-access ranges in generic code, because the
range API does not require that a random-access range define opDollar.

9. const makes ranges useless. Realistically, a const range is not a range,
because you can't call popFront, and you can't iterate through it
(technically, it _is_ possible to have popFront be const in cases where the
iteration state doesn't actually live in the range, and popFront isn't pure,
but normal ranges do not work that way).

Most range-based code doesn't actually need to worry much about const, since
it's just iterating through the range, and the constness of the elements
often doesn't even matter, since the code is templated and works regardless
of the mutability of the elements. However, once a range is stored in
another type, constness can quickly become a problem. This not a problem
with dynamic arrays / slices, because they can be sliced to get a tail-const
slice of the original, and templates are actually instantiated with the
tail-const type for dynamic arrays, but other range types do not have such
compiler magic at present.

10. While it's not exactly a problem with the range API, it would be nice if
we didn't have to import std.range / std.range.primitives (or the Phobos v3
equivalent) to use arrays as ranges. In theory, we could add the range API
functions for arrays to object.d, but doing so would conflict with the
existing functions in std.range (and bugs with selective imports actually
make it somewhat problematic to resolve those conflicts - see
https://issues.dlang.org/show_bug.cgi?id=24451). And if we're getting rid of
auto-decoding, the range API functions for arrays of characters would
change, meaning that we can't solve the problem by putting the functions in
object.d and then aliasing them in std.range.primitives to avoid problems
with selective imports from std.range or std.range.primitives.

So, the symbol conflicts which would result from putting the range API
functions for arrays in object.d do potentially place some restrictions on
what we do with the new range API.

Overview of Proposed Changes
----------------------------

1. The easy one is that the range API functions for dynamic arrays will not
treat arrays of characters as special. A dynamic array of char will be a
range of char, and a dynamic array of wchar will be a range of wchar.

Any code that needs to decode will need to use the phobos v3 replacement for
std.utf's decode or decodeFront - or use foreach - to decode the code units
to code points (and if it needs to switch encodings, then there will be
whatever versions of byUTF, byChar, etc. that the replacement for std.utf
will have). And if code needs to worry about normalization or other grapheme
stuff, then it'll need to use that functionality from the replacement of
std.uni (which is basically what ranges have to do right now anyway). We may
choose to still have UTFException, but it will be up to the programmer to
choose whether they want to use the replacement character or to throw
exceptions when decoding, because the decoding will no longer be automatic.
As such, using the range API on dynamic arrays will work with  nogc, and we
won't need a bunch of extra code just to avoid the auto-decoding.

This may result in more bugs where programmers slice into the middle of code
points, but auto-decoding just makes that harder rather than actually
preventing it, and there's really no fix for that without doing something
like operating at the grapheme level by default, which would be horribly
inefficient. We probably should have better documentation and tutorials for
how to write range-based code which deals with Unicode correctly, but
ultimately, education is the solution here, not trying to make everything
work at the code point or grapheme level by default. It may make sense to
add a string type of some kind to Phobos which operates at the grapheme
level with its API but which tries to be efficient internally, but that's
really not a question for the range API. It seems to be generally agreed
upon that arrays of code units should be treated as ranges of code units, so
that's what the new range API will do.

2. All ranges which can be default-initialized must have a valid init value
(that is, their range API functions must behave according to the normal
rules for those functions and can't do something like have empty be false
and then have front fail to return a valid value; the init value of all
ranges that can be default-initialized must work properly as ranges).

By requiring that the init value be valid, we avoid all of the problems that
we've been having with ranges which behave badly when they're
default-initialized. Instead, generic code will be able to rely on a
default-initialized range behaving sanely, and the default behavior will be
what most programmers expect, whereas right now, some ranges will actually
segfault if their init value is used (e.g. if the range type is a class).

Of course, this does make it so that classes, interfaces, and pointers can't
be ranges, but there are ways to work around that (as will be discussed
later).

Ideally, we would also disallow ranges from disabling default initialization
so that generic range-based code can rely on it, but that causes problems
with ranges that need to be constructed at runtime to function properly.
That problem can be worked around fairly easily with finite ranges, since
worst case, you can always just add a flag to them which indicates whether
they were default-initialized and then have empty return true in that case
(without having to care about what the range normally does). It's also
likely that such ranges are rare, since it's almost always the case that you
can rework the logic of a finite range to have a valid init value.

However, with infinite ranges, there is no such solution. If they cannot be
default-initialized, then they either can't be ranges, or they would have to
be finite ranges which would just never be empty if they're constructed at
runtime (while doing something like the flag trick to make their init value
empty). And it's certainly true that the range API doesn't (and can't)
guarantee that finite ranges are truly finite, but it's still better if we
can define infinite ranges that need to be constructed at runtime as
infinite ranges, since then we can get the normal benefits that come from
statically knowing that a range is infinite.

Alternatively, we could disallow finite ranges from disabling default
initialization (since they have a workaround) while allowing infinite ranges
to disable it, but that seems like it's too much special-casing and like it
would add to the confusion overall.

And maybe it would be better to just force infinite ranges which can't work
properly unless they're constructed at runtime to be finite ranges given
that they're likely to be pretty rare, but ultimately, I think that it's
more straightforward to just require that the init value be valid if a range
can be default-initialized. The issues with types that can't be
default-initialized already are something which causes problems for generic
code in general, and it would not require any special explanation with the
range API.

It would also fit in with the language better if we ever add default
constructors with a future Edition (not that I expect that to happen, but
forcing range types to have a valid init value would make that harder).

The only real problem I see is that there will likely be some generic code
which will be written to assume that default initialization works for all
ranges, but again, that's a general problem with types that disallow default
initialization, and the bug that you then get is a compilation error,
whereas forgetting something like save results in code that compiles but
does the wrong thing. So, when we do run into such bugs, they won't silently
cause problems.

Weighing the pros and cons here, I think that we're better off just letting
ranges disable default initialization (much as it's annoying in general when
types do that). Simply requiring that the init value of a range that can be
default-initialized be valid fixes the core problem, and then the
initialization problems that we're left with are not range-specific.

3. If a range is finite (that is, it's not statically known to be empty),
and it can be default-initialized, then its init value must be empty. If it
cannot be default-initialized, then it must define a function called
emptyRange which returns an empty range of the same type. Phobos will then
define a function called emptyRange which takes a finite range which can be
default-initialized and returns the init value of that range.

The result of this is that we have a reliable way to get an empty range from
any finite range without changing the type of the range. If we could require
that finite ranges be default-initializable, then we wouldn't need
emptyRange in the same way, but even then, such code would have to check
whether the range was infinite to see if assuming that init was empty was
correct, whereas this way, code can just call emptyRange, and it'll get a
compilation error if it's given an infinite range. So, code that needs an
empty range will get compiler checks which verify that the range is finite,
and it'll work whether the range can be default-initialized or not (though
it'll only work with CTFE if the range can be default-initialized).

Of course, this entry doesn't apply to infinite ranges at all, since whether
they can be default-initialized or not, their empty is always false. The
fact that that can be determined statically is what makes them infinite
ranges.

4. All ranges must be either dynamic arrays or structs (and not pointers to
structs either).

This allows us to have consistent requirements with regards to copying,
assignment, and initialization (e.g. it's not possible to require that the
init value be valid if classes can be ranges). Most ranges are already
dynamic arrays or structs anyway, so for _most_ range-based code, this
requirement is a complete non-issue.

However, the main problem that we then have is the ranges that actually need
to be classes or interfaces. This happens when you need to do stuff like
wrap arbitrary ranges with a single range type at runtime (which
std.range.interfaces currently provides the functionality to do). The
solution to this problem then is to provide struct wrappers. That way, a
range that needs to be a class / interface will instead be a struct wrapping
a class / interface reference. The struct can then ensure that its init
value is valid, and it can ensure that the range has the correct copying and
assignment semantics. So, the replacement for std.range.interfaces will then
provide not only classes and interfaces, but it will provide the appropriate
struct wrappers as well.

So, a function like inputRangeObject might return something like
RangeWrapper!(InputRangeObject!Foo) instead of InputRangeObject!Foo like it
would currently do. The exact details still need to be sorted out, but there
should be no technical barrier to replacing all ranges which are currently
classes or interfaces with ranges that are structs wrapping classes or
interfaces, which would allow us to make it illegal for classes or
interfaces to be ranges without losing the actual functionality that such
ranges currently provide.

5. Forward ranges will no longer have save. Rather, it will be required that
copying a forward range will result in an independent copy. Both the
original and the copy must then have the same elements in the same order,
and iterating one will not affect the other. So, in essence, copying a
forward range must be equivalent to calling save (though the actual
implementation could be more clever than that).

So, for most forward ranges, this will simply mean that they won't need a
save function that just returns the this reference, and range-based code
will not need to call save.

For the ranges that would currently require that save be called, they will
have to be implemented in a way that save is no longer required (which is
made possible by disallowing pointers and references from being ranges). The
naive implementation would therefore just have the copy constructor be
equivalent to save, whereas a more sophisticated implementation would likely
use reference counting and only duplicate the internal state when the
reference count isn't 1, and a function is called which would mutate the
state (e.g. doing the equivalent to calling save when popFront is called,
and the reference count isn't 1). Obviously, the struct wrappers that the
replacement for std.range.interfaces would provide would implement the
correct copying functionality so that save would not be required, and anyone
using a class or interface as a range wouldn't actually have to implement
that behavior either.

6. Basic input ranges will be non-copyable.

Forward ranges can be either value types or pseudo-reference types which are
value types with regards to their iteration state (that is, their actual
elements may be outside of the range type itself, but the iteration state is
copied when the range is copied). The fact that their iteration state can be
copied is what makes it so that they can be forward ranges.

On the other hand, basic input ranges can't have their iteration state
copied; otherwise, they could be forward ranges. So, currently, they are
either reference types, or they are pseudo-reference types where a copy does
not copy the full iteration state, leaving the original in an invalid state
if anything is done to the copy (e.g. front is stored directly in the
struct, but the rest of the elements are on the heap or come from somewhere
else outside the struct like a socket, meaning that front will be stale on
the original after popFront is called on the copy).

So, fundamentally, basic input ranges have different copy semantics than
forward ranges. However, we _can_ have consistent copy semantics for all
basic input ranges and consistent copy semantics for all forward ranges -
just not consistent semantics between the two. The difference in their copy
semantics is fundamentally the difference between those two kinds of ranges.

So, for forward ranges, as described in the previous entry, we fix the
problem of inconsistent copy semantics by requiring that copying them result
in independent copies, whereas we have two options for basic input ranges:

1) Either we need to require that basic input ranges be reference types
(thereby avoiding screwed up copies, because the copying is basically just
copying a pointer)

2) or we need to make them non-copyable - in which case, they can still be
moved, but because they're non-copyable, you don't get situations where
iterating through one copy partially mutates the other, since there's only
ever one copy.

In order to fix the init problem, we already need to disallow classes and
pointers, which would mean that if we required that basic input ranges be
reference types, they would have to be structs wrapping pointers or
references, and it would be difficult to statically determine the difference
between a basic input range and a forward range (since there will no longer
be a save function to do that). So, we'd be forced to give basic input
ranges a different API in order to distinguish them - which isn't all bad,
since it's not always a good idea to have basic input ranges and forward
ranges share the same code given that they have different copy semantics,
but it would be nice to still be able to share code in the cases where the
copy semantics don't matter. Also, requiring that basic input ranges be
reference types would likely require some additional indirections in some
cases, since you then can't put anything directly in the range itself
(because all of the actual state would need to be in the type pointed to by
the wrapper struct in order for it to be a reference type).

On the other hand, if we just make basic input ranges non-copyable, then the
difference between them and forward ranges is then trivial to detect with
__traits(isCopyable, ...), and you can put as much state directly in the
basic input range as you want. The downside, of course, is then that you
can't copy the range. However, it then makes the semantics of what's going
on _very_ clear. A forward range is copyable, and a basic input range is
not.

This should make the fundamental difference between a basic input range and
a forward range stark in a way that it arguably isn't with save, even though
the difference is still fundamentally the same. Forward ranges can have
their iteration state copied, whereas basic input ranges can't. But instead
of having that be determined by a special range API function that most folks
forget to call, it will instead be a clear difference in copy semantics that
will actually result in compilation errors if you get it wrong.

Of course, the major downside here is simply that code will be forced to use
explicit moves in a bunch of cases. But it's certainly better than copying a
range and then using the partially mutated (and thus invalid) original like
you can easily do now. And DIP 1040 should improve the situation
considerably by adding implicit moves for the last use of a variable.

It will also help that most ranges are forward ranges anyway, so while this
will be annoying in some code (probably most commonly in the functions in
Phobos that are written to work with both basic input ranges and forward
ranges), most code probably won't have to deal with it, and the code that
does will then be avoiding subtle copying bugs like it would potentially
have now. So, it's one of those cases where the compilation errors could get
annoying, but they're also clearly preventing bugs.

As an aside, having non-copyable basic input ranges will be beneficial for
the replacement for std.random. It's historically been a problem that
accidentally copying random number generator ranges results in numbers being
repeated, and by making them non-copyable, that won't be a problem anymore.
We can then add a separate function to them (e.g. dup) to explicitly to get
a copy from them when we actually want to.

7. Assigning a forward range to an lvalue of the same type should have the
equivalent semantics as copying a forward range, with the difference of
course being that the lvalue is then completely replaced with the new state
as opposed to generating a completely new copy.

These should be the obvious semantics for assignment, but it's not something
that we can currently rely on for all of the same reasons that we can't
currently rely on the semantics of copying a range.

8. Assigning an basic input range to an lvalue of the same type should only
work if it's a move rather than a copy, since basic input ranges are
non-copyable.

Again, these should be the obvious semantics for assignment given what the
copy semantics are.

9. In order to deal with transient front, we will explicitly disallow it. It
will be a requirement that if front is copied, the copy will be unaffected
by popFront. Of course, this doesn't prevent other references to the same
data from mutating it (e.g. if it's a reference type), but calling popFront
will not mutate it.

Therefore, code which wishes do something like std.stdio's ByLine needs to
use a different solution, e.g. it can use opApply instead, or it can make
front non-copyable, since if it's non-copyable, then popFront can mutate it
without worrying about any copies being mutated.

10. With regards to ranges of non-copyable elements, there will be a clear
way to test for whether a range has copyable elements, and range-based code
that won't work with non-copyable elements should disallow them. This may be
as simple as __traits(isCopyable, ElementType!R), or it may involve a new
trait such as hasCopyableElements!R. It's undecided yet whether an
additional trait will be necessary, but the range documentation will be
clear on the matter, and range-based Phobos code that needs to copy elements
will test for copyability.

11. Finite random-access ranges are required to implement opDollar, and
their opIndex must work with $. Similarly, any ranges which implement
slicing must implement opDollar, and slicing must work with $.

In most cases, this will just be an alias to length, but barring a language
change that automatically treats length as opDollar (which has been
discussed before but has never materialized and is somewhat controversial
given types where it wouldn't make sense to treat length as opDollar), we
have to require that opDollar be defined, or generic code won't be able to
use $ with indexing or slicing. We probably would have required it years ago
except that it would have broken code to add the requirement.

12. We're going to rely on some form of language improvement (such as
Walter's recent Implicit Conversion of Template Instantiations DIP) to solve
the tail-const problem.

As an alternate solution, we could rework the range API to do be more like a
list in a functional language - e.g. head returns the first element, and
tail returns a range containing all of the elements except for the first
one. This would allow us to define tail's return type as being tail-const,
but that could get interesting with assignment (e.g. you can't assign
Range!(const Foo) to const!(Range!Foo)), and basic iteration would require
assignment with such an API, e.g.

    for(auto r = range; !r.empty; r = r.tail)
    {
        auto e = r.head;
    }

So, we'd probably be forced to add a separate function just to get a
tail-const copy (since tail wouldn't necessarily return the exact same type
as the range, because tail would be returning a tail-const copy minus head),
e.g.

    for(auto r = range.tailConstCopy; !r.empty; r = r.tail)
    {
        auto e = r.head;
    }

And at that point, the actual solution to the tail-const issue isn't really
changing to an API that copies the range so much as providing a function
that returns a tail-const copy and routinely using that, which means that we
could fix the tail-const problem with the current API simply by adding a
function which gives you a tail-const copy. As such, switching to an API
that copies ranges to iterate through them rather than popping off their
elements in-place doesn't really buy us anything.

Regardless, adding a function to get a tail-const copy would be putting us
in pretty much the same boat that we're in now with save - only probably
worse. It would really only be needed in situations where a range has become
const somehow, meaning that most range-based code would not be tested with
such ranges, and the new function would be forgotten just like save (only it
would probably be forgotten even more). So, I'm inclined to think that a
language solution is a much better solution.

Also, with regards to a range API that copies for iteration, concerns have
been raised about its performance vs popFront. I ran some basic tests to see
if that was a problem, and it didn't seem to make any difference, but it
wouldn't surprise me if it did with more complex types (which basically
would have required duplicating a bunch of std.algorithm with a different
range API to compare the performance, which I didn't take the time to do).
So, I don't know how valid those concerns are, but I do agree that it would
be giving the optimizer more to optimize out.

Regardless, I decided that it would just be too large a change to the range
API with too little benifit, and if we can just implement tail-const
conversions in the language (which we arguably should be doing for more than
just ranges), then that solves the problem. And if we decide that we really
do need a new range API function to fix the problem, we can just add it
(though if we do, hopefully we do it before the initial release of Phobos v3
so that we can require it without breaking code).

13. The range API functions for basic input ranges and forward ranges will
be changed from front, popFront, and empty to first, popFirst, and isEmpty.
And the range API functions for bidirectional ranges will be renamed last
and popLast.

I'm not a big fan of this, since it would be nice to keep the same function
names, and making this change requires a language change for foreach.
However, it solves two problems:

First, if we remove save from forward ranges, then the old basic input
ranges look like the new forward ranges so long as they're structs and
copyable. So, if we don't change the forward range API in some fashion, then
it will be easy to pass ranges that aren't really forward ranges to code
that requires forward ranges, and bugs will ensue.

Second, if we want to make it so that dynamic arrays / slices are forward
ranges without requiring an import, then we're going to get a _lot_ of
symbol conflicts. Almost all of the range-based code currently in existence
imports either std.range or std.range.primitives, and _all_ of that code
would get symbol conflicts if we added the range API functions for arrays to
object.d. That can be managed with an Edition, but it would mean that tons
of code would have to be updated with the new Edition. On the other hand, if
we use a different set of function names for the range API functions that
need to be added to object.d for arrays, then we don't get all of those
symbol conflicts, and only new code which is written to use the new range
API will need to worry about the name changes.

Of course, if we add the functions to object.d, we arguably should still do
it with an Edition if we want to minimize the risk of symbol conflicts
(since there could be code in the wild with conflicting symbols; but if we
rename the functions, then we're talking about _possible_ symbol conflicts
rather than the guarantee of a ton of them like we'll get if we don't rename
them).

And https://issues.dlang.org/show_bug.cgi?id=24451 makes putting the current
range API functions in object.d even worse, since selective imports would
only work with std.range.primitives and not std.range, meaning that a lot of
the code that currently uses selective imports would still break, though
if/when that bug is fixed, then that problem goes away.

Of course, if we decide that it's too much trouble to put the range API
functions for arrays in object.d, we could solve the problem of the old
basic input ranges looking like the new forward ranges by renaming only a
single function - e.g. empty to isEmpty - since even a single difference in
the API would be enough to make them different.

Alternatively, we could add an enum of some kind to the new range API to
make it different. It would be kind of ugly, but it would allow us to keep
all of the current function names - though we'd still have to import a
module to use arrays as ranges, so that problem wouldn't be fixed.

Of course, as annoying as it would be to rename the range API functions, it
_would_ have the advantage that you could see at a glance that code was
using the new range API rather than the old one. And that could definitely
be valuable. It also means that if you accidentally use the range traits
from the wrong version of the API, you'll get errors _very_ quickly, since
rather than being partially compatible, they won't be compatible at all.

So, while I'm definitely not happy about renaming the range API functions,
after having considered the options, I think that it's the best path
forward.

Of course, if we rename them, that unfortunately also opens up the door for
bikeshedding, but I think that first, popFirst, isEmpty, last, and popLast
are pretty obvious choices, and they correspond well to the old names.

Either way, it's my intention that Phobos v3 will have wrappers such that
you can easily convert a range from the v2 API to a range that works with
the v3 API and vice versa. In all likelihood, the main annoyance with the
new function names will simply be getting used to them. It'll be easy enough
to update the names if code is ported from v2 to v3, and any such ranges
will need to be looked over to make sure that they match the new
requirements anyway (though most ranges probably already do fulfill most of
the added requirements in this proposal).

Conclusion
----------

So, that covers all of the major points that I can think of right now. Of
course, there will be other minor changes (e.g. I'll probably rename
isForwardRange to isCopyableRange), but those are the core changes, and this
document is already ridiculously long as it is. However, I think that it's
covered what really needs to be covered.

I do need to put together a specification which lists the range API in its
entirety rather than as a proposal for what to change and why to change it
(if nothing else, because some form of that needs to be part of the
documentation), but this document is already ridiculously long without that,
so I'm leaving it as just the list of problems and proposed changes rather
than including a full spec.

In any case, have fun picking apart and complaining about the proposal. :)

- Jonathan M Davis
May 16
next sibling parent reply "Richard (Rikki) Andrew Cattermole" <richard cattermole.co.nz> writes:
Two things I suggest we consider, given the write up:

1. Automatic boxing of slices into a struct, this was something I was 
considering for signatures specifically for ranges.
2. Enable passing the this pointer on a class explicitly, coupled with a 
few final methods you could handle when the class is null and then 
forward to the implementation.
May 16
next sibling parent reply Jonathan M Davis <newsgroup.d jmdavisprog.com> writes:
On Thursday, May 16, 2024 9:33:26 AM MDT Richard (Rikki) Andrew Cattermole via 
Digitalmars-d wrote:
 Two things I suggest we consider, given the write up:

 1. Automatic boxing of slices into a struct, this was something I was
 considering for signatures specifically for ranges.
What would that buy us? I don't see how it adds any value. I guess that having a struct would negate the need to import the range API functions for arrays, but instead, we'd have to import the struct (or put it in object.d), and then we'd have to explicitly wrap arrays, whereas right now, all you need is an import to treat arrays as ranges, and with my proposed changes, you wouldn't even need that anymore. Also, considering that ranges are basically supposed to be an abstraction of dynamic arrays / slices, similar to how C++ iterators are an abstraction of pointers, having to wrap dynamic arrays in structs is arguably taking things in the wrong direction. It would be making it so that dynamic arrays aren't ranges instead of making ranges in general more like dynamic arrays. It would also get really annoying to have to constantly wrap arrays just to pass them to range-based code - and it would get extra awkward in the cases where a function that used to operate on an array was changed to operate on a range instead, since then you'd either be forced to treat arrays differently than normal with such a function or break code that used to pass arrays, forcing it to wrap them in a struct. And since strings are dynamic arrays, all of this would apply to strings too.
From what I can see, trying to stop dynamic arrays / slices as ranges just
makes things worse.
 2. Enable passing the this pointer on a class explicitly, coupled with a
 few final methods you could handle when the class is null and then
 forward to the implementation.
That would destroy our ability to rely on the copy or assignment semantics of ranges. The whole point of disallowing pointers or classes as ranges would be so that we can rely on forward ranges being copied without needing a function like save - and so that we can rely on the init value being valid. Having free functions to handle the case where a class reference was null would fix the init problem (at the cost of needing to import those functions), but it doesn't solve the problem that we cannot have reliable copy or assignment semantics if we allow reference types to be ranges. Requiring that class references be wrapped in structs allows us to have reliable copy and assignment semantics without losing any functionality, and it makes the semantics of ranges in general more consistent - and closer to that of dynamic arrays / slices. It's already the case that ranges which are classes are pretty rare. Some use cases require them, but the vast majority of range-based code uses structs. I really don't see any value in trying to support naked classes as ranges when it's fairly straightforward to just wrap them in structs and get rid of any special cases that they would otherwise cause. And since you typically use the functions in std.range.interfaces to generate class ranges anyway, the resulting code would be pretty similar to what anyone using class ranges would be doing now. It seems like you're proposing that we start requiring wrapper structs for some of the most common kinds of ranges, whereas my proposal would be requiring wrapper structs only for rare ranges - ones which are already typically generated using helper functions. - Jonathan M Davis
May 16
parent reply "Richard (Rikki) Andrew Cattermole" <richard cattermole.co.nz> writes:
On 17/05/2024 4:22 AM, Jonathan M Davis wrote:
 It seems like you're proposing that we start requiring wrapper structs for
 some of the most common kinds of ranges, whereas my proposal would be
 requiring wrapper structs only for rare ranges - ones which are already
 typically generated using helper functions.
Required, yes, automatic yes. The language could help here, not a pleasant proposal but it would be do-able to have the language auto-wrap into a boxed type. Given ranges are first class in D, it'll be worth it. ```d import core.attributes : autobox; struct Map( autobox T) { this( autobox T val); } ``` ```d module core.boxing.slice; struct BoxedSlice(T) { T[] slice; alias slice this; bool empty() { return slice.length == 0; } } ``` The reason I am suggesting this is we clearly don't like the solutions being proposed for the range functions for slices. So lets change it to a problem we do like and can solve.
May 16
parent reply Jonathan M Davis <newsgroup.d jmdavisprog.com> writes:
On Thursday, May 16, 2024 10:48:26 PM MDT Richard (Rikki) Andrew Cattermole 
via Digitalmars-d wrote:
 On 17/05/2024 4:22 AM, Jonathan M Davis wrote:
 It seems like you're proposing that we start requiring wrapper structs for
 some of the most common kinds of ranges, whereas my proposal would be
 requiring wrapper structs only for rare ranges - ones which are already
 typically generated using helper functions.
Required, yes, automatic yes. The language could help here, not a pleasant proposal but it would be do-able to have the language auto-wrap into a boxed type. Given ranges are first class in D, it'll be worth it. ```d import core.attributes : autobox; struct Map( autobox T) { this( autobox T val); } ``` ```d module core.boxing.slice; struct BoxedSlice(T) { T[] slice; alias slice this; bool empty() { return slice.length == 0; } } ``` The reason I am suggesting this is we clearly don't like the solutions being proposed for the range functions for slices. So lets change it to a problem we do like and can solve.
I see nothing positive about having to wrap arrays. Treating arrays as ranges is far more user-friendly than that and has worked very well for us overall. It's just annoying to have to import a module in order to be able to use the range API with arrays, and auto-decoding was a mistake. Other than that, it really hasn't been an issue. In order to get rid of save, the range API has to change enough that the old basic input ranges don't look like the new forward ranges, so it can't stay identical to what it is now regardless. Changing the names of the range API functions takes care of that, _and_ it allows us to put the new range API functions for dynamic arrays in object.d without breaking a ton of code. Yes, having to rename the functions is annoying, but it also has the advantage of making it clear whether code is using the updated range API with the adjusted semantics or whether the code was written for the old range API and does not necessarily conform to the new requirements. So, renaming the range API functions fixes multiple problems. Making it so that arrays must be wrapped to be ranges does nothing to fix the need to at least partially change the range API so that the old basic input ranges don't look like the new forward ranges. So, it doesn't fix the need to either rename the functions or add an otherwise pointless enum to the range API just to indicate that it's not the old range API. And if we're going to rename them, we might as well just rename them all so that the difference is clear, and it gives us an easy way to fix the import problem for treating arrays as ranges. So, I don't see anything here that wrapping arrays improves. It just causes additional friction, and if you have to import anything to create the wrapper, we're right back to where we are now with having to import std.range.primitives, just with a different module. I'm also highly skeptical that it would work to convince Walter to add an auto-boxing feature for this. He specifically complained in a recent DLF meeting about how we can't just use arrays as ranges with no need to do anything like import a module. He wants to be able to use the range API with arrays with no friction, and renaming the range API functions gives us a way to do that, whereas wrapping them creates additional friction. On top of that, alias this is almost always a mistake with generic code. Most generic code tests for specific types, not implicit conversions, and testing for implicit conversions is error-prone, because the implicit conversion doesn't actually happen just because it's tested for, and even if it does happen, because the code forced the conversion, it happens inside the function rather than at the call site, which risks all kinds of issues (not as many with dynamic arrays as with some other types, but implicit conversions and template constraints are typically a very bad combination). So, adding a wrapper type is going to cause all kinds of problems with template constraints and which overloads get used. The whole situation is just much simpler if we don't wrap dynamic arrays. All in all, I do not understand why you think that wrapping arrays improves anything. From what I can see, it's just purely worse. Yes, Walter has to be convinced to add first/popFirst/isEmpty to foreach in addition to front/popFront/empty for the proposed range API to work with foreach, but given that it explicitly fixes a problem that he was complaining about, and it's a very simple change, I expect that he'll approve it. But even if he doesn't, and we're forced to do something like put an otherwise pointless enum on the new range API to distinguish new ranges from the old ones, I'd still rather be importing the replacement for std.range.primitives than have to deal with a wrapper type. - Jonathan M Davis
May 16
parent reply "Richard (Rikki) Andrew Cattermole" <richard cattermole.co.nz> writes:
On 17/05/2024 6:40 PM, Jonathan M Davis wrote:
 All in all, I do not understand why you think that wrapping arrays improves
 anything. From what I can see, it's just purely worse.
Given the problems listed it is a possible solution. That doesn't mean it's better. But it is a common enough solution in PL design space that it is worth considering. For stuff like this I want to evaluate ideas not just to get to the first good one that could work, but also to make sure it actually is the best candidate.
May 16
parent Jonathan M Davis <newsgroup.d jmdavisprog.com> writes:
On Friday, May 17, 2024 12:53:07 AM MDT Richard (Rikki) Andrew Cattermole via 
Digitalmars-d wrote:
 On 17/05/2024 6:40 PM, Jonathan M Davis wrote:
 All in all, I do not understand why you think that wrapping arrays
 improves
 anything. From what I can see, it's just purely worse.
Given the problems listed it is a possible solution. That doesn't mean it's better. But it is a common enough solution in PL design space that it is worth considering. For stuff like this I want to evaluate ideas not just to get to the first good one that could work, but also to make sure it actually is the best candidate.
I have no problem with it being presented and evaluated, but I don't see any reason why it would be better. It seems like it's just worse. - Jonathan M Davis
May 17
prev sibling parent reply "H. S. Teoh" <hsteoh qfbox.info> writes:
On Fri, May 17, 2024 at 03:33:26AM +1200, Richard (Rikki) Andrew Cattermole via
Digitalmars-d wrote:
 Two things I suggest we consider, given the write up:
 
 1. Automatic boxing of slices into a struct, this was something I was
 considering for signatures specifically for ranges.
[...] What's the purpose of this, other than to add friction to use built-in arrays as ranges? What do we get in return for paying this price? T -- Error: Keyboard not attached. Press F1 to continue. -- Yoon Ha Lee, CONLANG
May 17
parent "Richard (Rikki) Andrew Cattermole" <richard cattermole.co.nz> writes:
On 18/05/2024 2:28 AM, H. S. Teoh wrote:
 On Fri, May 17, 2024 at 03:33:26AM +1200, Richard (Rikki) Andrew Cattermole
via Digitalmars-d wrote:
 Two things I suggest we consider, given the write up:

 1. Automatic boxing of slices into a struct, this was something I was
 considering for signatures specifically for ranges.
[...] What's the purpose of this, other than to add friction to use built-in arrays as ranges? What do we get in return for paying this price?
There would be no friction for users of ranges in PhobosV3 they would all be annotated to turn on the boxing automatically. User made ones would need annotating however, that could be a bit of a pain. It is a pain for user made ones already in v2, since you have to make sure you have the right import and don't use it in a way that UFCS cannot work. So it solves a number of issues like conflicting definitions as described by J.M.D. However he isn't keen on it, so as a proposal it'll probably end at my suggestion to consider it.
May 17
prev sibling next sibling parent reply monkyyy <crazymonkyyy gmail.com> writes:
On Thursday, 16 May 2024 at 14:56:55 UTC, Jonathan M Davis wrote:
 wall-o-text
 no function header lists or patterns
????
 no reference at all to keyed ranges
????
 explict range starters for unicode instead of autodecoding
ok, thats *half* the problem; how does `hello 😀 world`.byUnicode(flag.charIndex).indexOf('w')` count correctly? The existing range api discards information that is otherwise trivial to have the old solution of dchars and autodecoding failed; whats your proposal for the unicode problem on the *dchar* side of the problem where it was believed that dchar would simplify all use cases of unicode into simple indexing
 There is no guarantee that the init value of a range is even 
 valid
Thats not what init means in this language; I believe it should be `.zero`, optional and part of a larger change were float.zero is 0.0; see my dip
 8. $ cannot be used with random-access ranges in generic code, 
 because the range API does not require that a random-access 
 range define opDollar
I believe you should be much much much more spefic is [min(i,$-1)] supported? is [$..0] supported? the slicing api is a rabbit hole that needs allot of care
May 16
parent reply Jonathan M Davis <newsgroup.d jmdavisprog.com> writes:
On Thursday, May 16, 2024 11:05:56 AM MDT monkyyy via Digitalmars-d wrote:
 On Thursday, 16 May 2024 at 14:56:55 UTC, Jonathan M Davis wrote:
 explict range starters for unicode instead of autodecoding
ok, thats *half* the problem; how does `hello 😀 world`.byUnicode(flag.charIndex).indexOf('w')` count correctly? The existing range api discards information that is otherwise trivial to have the old solution of dchars and autodecoding failed; whats your proposal for the unicode problem on the *dchar* side of the problem where it was believed that dchar would simplify all use cases of unicode into simple indexing
The new range API is quite explictly _not_ trying to do any explicit Unicode handling. It's doing what has been discussed for years, which is to leave that up to the programmer and whatever algorithms they choose to use. Whether it's best to operate at the code unit, code point, or grapheme level depends on the operation, and issues such as normalization complicate the situation considerably with regards to any attempt to provide a one size fits all solution for indexing into graphemes. It's also not efficient to operate at the grapheme level by default. The replacement to std.utf will provide the tools to convert from code units to code points, just like std.utf does now, with the difference being that the programmer will no longer have to work around the range API functions to try to prevent decoding from happening automatically. Code that then needs to operate at the grapheme level will need to use the replacement for std.uni, and that can include a random-access range of graphemes. But it won't be the default way to operate, because that's inefficient, and no normalization scheme actually works in all cases, making selecting a default problematic. We may choose to include a string type in Phobos which provides an API that allows you to operate at the grapheme level by default, but dynamics arrays of code units will be treated as ranges of code units. The problems that we've had with auto-decoding have stemmed from trying to treat them as anything else. So, we can build whatever useful algorithms or types we want on top of the built-in strings, and we may be able to provide some better solutions than we currently have for dealing with graphemes, but the built-in strings will not have any kind of special-casing as part of the range API. Any ranges of graphemes will be types of their own and will not be the built-in strings even if they may wrap the built-in strings.
 8. $ cannot be used with random-access ranges in generic code,
 because the range API does not require that a random-access
 range define opDollar
I believe you should be much much much more spefic is [min(i,$-1)] supported? is [$..0] supported? the slicing api is a rabbit hole that needs allot of care
Finite random-access ranges will need to support the same operations with $ that dynamic arrays do, since the whole point here is to access them in the same manner as dynamic arrays, and operations such as [$ .. 0] never make sense, so no, that won't be supported. And no additional syntax for $ is being added. It's purely for indexing and slicing. The current situation makes it so that we can't use $ at all with ranges in generic code, because there is no requirement that it be supported, whereas ideally, we'd be able to use it in the same way that you would with dynamic arrays. So, isRandomAccessRange will test to make sure that [$ - 1] compiles and returns the correct type, and hasSlicing will test to make sure that [0 .. $] and [0 .. $ - 1] compile and return the correct type. It will be an additional requirement that $ be equivalent to length, but we can't statically test for that. Infinite random-access ranges will not support $ for opIndex, because they can't and be infinite. They will support it for slicing, but they won't be able to support doing arithmetic on $, because that makes no sense with an infinite range. $ will merely be used to indicate that slice is supposed to go to the end of the range (and therefore that the result will be infinite) rather than resulting in a finite slice as would occur if only indices are used. $ is actually already partially tested for with hasSlicing and infinite ranges where it requires that opSlice return the same range type if opDollar is used, whereas it's expected to return a finite range if only indices are used. However, since there's no requirement that $ work at all, it can't actually be used in generic code even if you know that the range is infinite. It's just that hasSlicing requires that the result be the same if $ is implemented to work with slicing, which is therefore kind of a pointless test as things stand. - Jonathan M Davis
May 16
parent reply monkyyy <crazymonkyyy gmail.com> writes:
On Thursday, 16 May 2024 at 22:21:54 UTC, Jonathan M Davis wrote:
 The new range API is quite explictly _not_ trying to do any 
 explicit Unicode handling.
my proposal of adding .key to the range api isnt unicode spefic either
 It's also not efficient to operate at the grapheme level by 
 default.
Anything lazy will be fine; the dstring conversion was slow because it had to allocate eagerly
 and no normalization scheme actually works in all cases, making 
 selecting a default problematic.
Im very much for "range starters"; my example code is wildly verbose for me and had one but I view it as *only half the problem*; strings index weird, this is trivial to deal with if you have two counters in the same abstraction; I believe if effencly impossible to handle by piling on more complexity
 So, we can build ***whatever useful algorithms*** or types we 
 want on top of the built-in strings
*I disagree*; I believe the majority of searching should be replaced with an "indexes" function `auto indexes(alias F,R)(R r)=>r.filter!F.swapkeyvalue;` such a function depends on having a .key concept in the api; and then you have the unicode range have apis for selecting the right kind of key-value for your range and it will all just work
 [$ .. 0] never make sense, so no, that won't be supported.
if I define a complex opDollar, I can easily imagine it being seen as the "indexing" type, when in fact im using it it to slice on a bidirectional datastruct that maybe knows how to handle $/2 ```d struct mydata{ auto opDollar(){ struct dollar{ opBinary!"-" opBinary!"/" ``` if the range api makes an assumption that `range.opSlice!(typeof(range.opDollar),typeof(range.opDollar))` this may break unnecessarily; where opDollar is used in opslicing matters, also like add $/2 to the wishlist
is [min(i,$-1)] supported?
_____
while I know such slicing probably isnt on your rader for being to magic I unironically like how well it works for the base slice
May 16
parent reply Jonathan M Davis <newsgroup.d jmdavisprog.com> writes:
On Thursday, May 16, 2024 7:30:06 PM MDT monkyyy via Digitalmars-d wrote:
 if I define a complex opDollar, I can easily imagine it being
 seen as the "indexing" type, when in fact im using it it to slice
 on a bidirectional datastruct that maybe knows how to handle $/2

 ```d
 struct mydata{
    auto opDollar(){
      struct dollar{
        opBinary!"-"
        opBinary!"/"
 ```

 if the range api makes an assumption that
 `range.opSlice!(typeof(range.opDollar),typeof(range.opDollar))`
 this may break unnecessarily; where opDollar is used in opslicing
 matters, also like add $/2 to the wishlist
Essentially, that requires that opDollar define the common arithmetic operations (except maybe multiplication). I'll have to think about the edge cases to decide how reasonable that is (though the fact that you can often use a single opBinary for all of the arithmetic operations helps). Whatever the operations are, _every_ finite range that defines indexing or slicing would then have to define all of them in order for generic code to be able to rely on them. Fortunately, in most cases, opDollar should just be an alias to length, so it may be reasonable, much as it could be annoying for the corner cases.
is [min(i,$-1)] supported?
_____
while I know such slicing probably isnt on your rader for being to magic I unironically like how well it works for the base slice
I was not aware that $ could be used anywhere except directly within the indexing or slicing expression (or if I knew, I forgot). I guess that the compiler lets it be used anywhere between the brackets, and since the result is just size_t with arrays, it works just fine with min. A user-defined type would have to implement opCmp for that to work. So having min work would mean requiring opCmp on top of the arithmetic operations. That might be reasonable, but it definitely adds to the complexity of implementing opDollar for types that can't just alias length, though realistically, there aren't going to be many finite ranges which implement opDollar another way. So, I'll have to think about it further, but whatever the operations are that ranges will be able to use with $ in generic code will be required by the API and documented accordingly (if not, there would be no way to rely on being able to use them in generic code). - Jonathan M Davis
May 16
parent monkyyy <crazymonkyyy gmail.com> writes:
On Friday, 17 May 2024 at 04:48:24 UTC, Jonathan M Davis wrote:
 Whatever the operations are, _every_ finite range that defines 
 indexing or slicing would then have to define all of them in 
 order for generic code to be able to rely on them.
I disagree; slicing should be separate from indexing (and keys) a hashmap can trivially provide a key, and reaccess the value, but it probaly cant slice. randomAccessRanges is a copy and paste with a clean up of c++ with its pointers, we have base types that airnt pointers. I prefer just a free-for-all of optional functions, pre-emptive compromise: bidirectionaly, key and indexing(by key), numeric slicing; should be independent feature sets
May 16
prev sibling next sibling parent reply Ogi <ogion.art gmail.com> writes:
On Thursday, 16 May 2024 at 14:56:55 UTC, Jonathan M Davis wrote:
 Alternatively, we could add an enum of some kind to the new 
 range API to make it different. It would be kind of ugly, but 
 it would allow us to keep all of the current function names
Actually, this option worth exploring. We can see now that implicit interfaces have downsides. When the interface changes, we are forced to jump through hoops to avoid problems. And what would will do if at some point in future we want to change API again? Come up with another set of names? Versioning would solve this once and for all. If you think that enum is ugly, we can use an UDA instead. This way, ranges that use the new API will be instantly recognizable.
May 17
next sibling parent "H. S. Teoh" <hsteoh qfbox.info> writes:
On Fri, May 17, 2024 at 01:22:48PM +0000, Ogi via Digitalmars-d wrote:
 On Thursday, 16 May 2024 at 14:56:55 UTC, Jonathan M Davis wrote:
 Alternatively, we could add an enum of some kind to the new range
 API to make it different. It would be kind of ugly, but it would
 allow us to keep all of the current function names
Actually, this option worth exploring. We can see now that implicit interfaces have downsides. When the interface changes, we are forced to jump through hoops to avoid problems. And what would will do if at some point in future we want to change API again? Come up with another set of names? Versioning would solve this once and for all. If you think that enum is ugly, we can use an UDA instead. This way, ranges that use the new API will be instantly recognizable.
[...] +1, I like the UDA idea. It will be much less effort to migrate to the new range API to slap a UDA on, than to rename every range method. T -- Bare foot: (n.) A device for locating thumb tacks on the floor.
May 17
prev sibling parent Jonathan M Davis <newsgroup.d jmdavisprog.com> writes:
On Friday, May 17, 2024 8:30:47 AM MDT H. S. Teoh via Digitalmars-d wrote:
 On Fri, May 17, 2024 at 01:22:48PM +0000, Ogi via Digitalmars-d wrote:
 On Thursday, 16 May 2024 at 14:56:55 UTC, Jonathan M Davis wrote:
 Alternatively, we could add an enum of some kind to the new range
 API to make it different. It would be kind of ugly, but it would
 allow us to keep all of the current function names
Actually, this option worth exploring. We can see now that implicit interfaces have downsides. When the interface changes, we are forced to jump through hoops to avoid problems. And what would will do if at some point in future we want to change API again? Come up with another set of names? Versioning would solve this once and for all. If you think that enum is ugly, we can use an UDA instead. This way, ranges that use the new API will be instantly recognizable.
[...] +1, I like the UDA idea. It will be much less effort to migrate to the new range API to slap a UDA on, than to rename every range method.
It wouldn't fix the import problem. Realistically, it's not going to work to retain the same function names and have arrays work as ranges without needing an explicit import. Adding the functions to object.d breaks pretty much all range-based code in existence due to symbol conflicts, and due to an issue with how selective imports work, you can't even use selective imports with std.range to fix it; you have to use selective imports with std.range.primitives - and of course, you'd have to use selective imports for every range API function, which would be tedious. Using new function names makes it so that we don't have any symbol conflicts - in addition to allowing us to statically distinguish between the old and new APIs. It would also make it easy to see at a glance which API a piece of code is using. Also, having the main difference that can be statically checked between the old and new API be just a UDA makes it much more likely that code will accidentally be set up for the wrong range API version, because someone forgot the UDA, and it will make it much riskier to try to write code that's overloaded on the old and new APIs. It would not surprise me if a number of library writers will want to make it so that their code works with both APIs by creating overloads that call the new Phobos functions to wrap an old range so that it works with the new API. Some ranges will be easily distinguishable due to a lack of save, but if the code is written to work on basic input ranges (or it overloads on the category of the range), then if the primary difference is whether the programmer remembered to put a UDA on the type or not, the odds are pretty high that the wrong overload will be used some percentage of the time. Whether that outright makes the code behave incorrectly or just degrades its performance is going to be highly dependent on what it's doing, but I'm afraid that if the main difference between the old and new APIs is a UDA, it's going to be missed in many cases and cause issues, whereas a different set of function names makes the difference crystal clear and completely obviates the risk of a range from one version of the API being mistaken as a range from the other version. And maybe it won't be at all common to try to overload code based on which range API it's using, but I don't think that there's much question that it will be less error-prone if the two APIs are distinct. But regardless of how risky it is to have the primary difference simply be a UDA, I don't see how the import problem can be solved without changing the names. And Walter has explicitly requested that we fix it so that we no longer need to import anything for arrays to be treated as ranges. We could force them to not be ranges at all and use wrapper types like Rikki is suggesting, but I can't imagine that Walter would be any happier about that, and having wrappers like that typically causes issues (especially with type introspection and any code that actually needs to operate on dynamic arrays), whereas the primary issues with having new names is that you have to remember to use the new ones when writing code that uses the new API, and when updating code, you'll have to do a search and replace. All in all, while renaming the functions is annoying, it definitely will solve some of the issues that we have, and I don't expect that it will ultimately be a big issue. And the more I thought about it while working out the proposal, the more I came to the conclusion that having the difference between the two API versions be visible at a glance would reduce the number of problems that we'd have. It makes the introspection situation cleaner, it makes it much more likely that making a mistake will result in a compilation error, and it makes it easier to understand what's going on when you read the code instead of having to dig around to figure out which version of the API is actually being used. And it allows us to get rid of needing to import anything to use arrays as ranges, which I don't think that we can actually do with the current function names. - Jonathan M Davis
May 17
prev sibling next sibling parent reply Dukc <ajieskola gmail.com> writes:
Jonathan M Davis kirjoitti 16.5.2024 klo 17.56:> This is a proposal for 
an updated input range API for Phobos v3.

Good effort writing all this up! I certainly like most of this. However, 
some questions / proposals I do have.

 
 Some of the Problems with Input Ranges
 --------------------------------------
 
 6. The range API does not currently specify what happens if you copy front
 and then call popFront. For the vast majority of code, it is both assumed
 that you can copy front and that the copy of front will be unaffected by
 calling popFront. However, there _are_ ranges (such as std.stdio's ByLine)
 which will do things like use a dynamic array as a buffer which has its
 elements overwritten in popFront, resulting in the elements of the copy
 being mutated, since it's a slice of the same data. Such ranges are
 sometimes referred to as having a transient front, and they do not work with
 a lot of range-based code. As such, they either need to be disallowed, or we
 need a way to detect such behavior so that range-based code can check for
 it.
I think this one is beyond the scope of the range spec. Whether the buffer of `ByLine` gets copied on element copy is an issue about the element of the range, not about the range itself. If this needs fixing, the element type of `byLine` needs to be changed, not the range spec. Otherwise `front` would need to always deep copy the element on each invocation, which would make any range of ranges horribly inefficient!
 
 9. const makes ranges useless. Realistically, a const range is not a range,
 because you can't call popFront, and you can't iterate through it
 (technically, it _is_ possible to have popFront be const in cases where the
 iteration state doesn't actually live in the range, and popFront isn't pure,
 but normal ranges do not work that way).
In principle it goes like: elements being const is a non-issue, it's just another element type. Range itself can be const, but has to be cast to non-const with const elements to be actually iterable. In practice, no way! With some care, you can write a range that copes with const elements, but even that is surprisingly thorny since a struct having a const field runs into all sorts of issues. But just try designing a range that can be cast between const and tail const! Bar for some niche ranges like TakeNone, it would be extremely hard if not impossible with the current language.
 
 10. While it's not exactly a problem with the range API, it would be nice if
 we didn't have to import std.range / std.range.primitives (or the Phobos v3
 equivalent) to use arrays as ranges.
Respectful disagree on this one. Ranges are (bar for the foreach range API) a Phobos concept, not a language concept. It's good to be able to pick whether functions in your module will digest ranges, and if, whether it's V2, V3 or some other API. I wish for no changes here.
 
 Overview of Proposed Changes
 ----------------------------
 
 3. If a range is finite (that is, it's not statically known to be empty),
 and it can be default-initialized, then its init value must be empty. If it
 cannot be default-initialized, then it must define a function called
 emptyRange which returns an empty range of the same type. Phobos will then
 define a function called emptyRange which takes a finite range which can be
 default-initialized and returns the init value of that range.
I suggest a slight relaxation. A finite range must have an empty init value if it does not provide an explicit `emptyRange`. That is, any range, finite or infinite, may have any valid `.init` value, as long as it's `.emptyRange` (whether explicit or Phobos-provided) is actually empty.
 4. All ranges must be either dynamic arrays or structs (and not pointers to
 structs either).
Maybe unions too? Not that it's often needed.
 9. In order to deal with transient front, we will explicitly disallow it. It
 will be a requirement that if front is copied, the copy will be unaffected
 by popFront. Of course, this doesn't prevent other references to the same
 data from mutating it (e.g. if it's a reference type), but calling popFront
 will not mutate it.
I'd rather have this one as a recommendation as opposed to something other ranges are allowed to assume. That would probably turn some current uses of `recurrence` or `generate` to unspecified behaviour, silently. In the case of `ByLine`, I agree the proposed design is superior but I wouldn't want to declare the present design illegal either.
 
 11. Finite random-access ranges are required to implement opDollar, and
 their opIndex must work with $. Similarly, any ranges which implement
 slicing must implement opDollar, and slicing must work with $.
 
 In most cases, this will just be an alias to length, but barring a language
 change that automatically treats length as opDollar (which has been
 discussed before but has never materialized and is somewhat controversial
 given types where it wouldn't make sense to treat length as opDollar), we
 have to require that opDollar be defined, or generic code won't be able to
 use $ with indexing or slicing. We probably would have required it years ago
 except that it would have broken code to add the requirement.
Why would any language changes be needed for that? Can't `.length` be a Phobos function for types that have `opDollar` but not their own `.length`?
 Of course, there will be other minor changes (e.g. I'll probably rename
 isForwardRange to isCopyableRange),
I would have somewhat high bar for renaming that but leaving the other range names alone. I understand the names input range, forward range, bidirectional range and random access range all come from C++ iterators that are named exactly the same. So maybe best to either stick with the standard we follow, or break away in full and reconsider the names of all of them.
May 18
next sibling parent Jonathan M Davis <newsgroup.d jmdavisprog.com> writes:
On Saturday, May 18, 2024 2:48:51 PM MDT Dukc via Digitalmars-d wrote:
 6. The range API does not currently specify what happens if you copy front
 and then call popFront. For the vast majority of code, it is both assumed
 that you can copy front and that the copy of front will be unaffected by
 calling popFront. However, there _are_ ranges (such as std.stdio's ByLine)
 which will do things like use a dynamic array as a buffer which has its
 elements overwritten in popFront, resulting in the elements of the copy
 being mutated, since it's a slice of the same data. Such ranges are
 sometimes referred to as having a transient front, and they do not work
 with a lot of range-based code. As such, they either need to be
 disallowed, or we need a way to detect such behavior so that range-based
 code can check for it.
I think this one is beyond the scope of the range spec. Whether the buffer of `ByLine` gets copied on element copy is an issue about the element of the range, not about the range itself. If this needs fixing, the element type of `byLine` needs to be changed, not the range spec. Otherwise `front` would need to always deep copy the element on each invocation, which would make any range of ranges horribly inefficient!
It is something that has proven to be a problem. Many range-based functions will behave incorrectly if they copy front and then popFront mutates the copy. The range API needs to lock down the semantics involved for range-based code to actually be able to rely on them. Any range-based code which makes assumptions which aren't guaranteed by the range API risks breaking with types that do not follow those assumptions, and that's something that has been causing us problems for years, because not enough was locked down. And the possibility of front being transient is one of those issues that was missed when the range API was originally designed. If deep-copying elements is an issue, then pointers or reference types should be used - or the elements should be non-copyable. We are improving the support for non-copyable types as part of this (whereas right now, most range-based code assumes that the elements are copyable), but experience has shown that allowing popFront to mutate the copy of front causes problems, and the rare case where something like that is actually desirable has alternate solutions. In the case of byLine, it can use non-copyable elements, or it can use opApply, or it could use some sort of reference-counted type so that it can detect whether there are currently any copies which would be affected if the buffer was reused. There is no reason why it has to use its current solution. Ultimately though, IMHO, it would be better for such a type to not even be a range than to make it so that range-based code has to worry about this issue. Experience has shown that it causes problems for most range-based code, and it really doesn't add much value.
 9. const makes ranges useless. Realistically, a const range is not a
 range,
 because you can't call popFront, and you can't iterate through it
 (technically, it _is_ possible to have popFront be const in cases where
 the
 iteration state doesn't actually live in the range, and popFront isn't
 pure, but normal ranges do not work that way).
In principle it goes like: elements being const is a non-issue, it's just another element type. Range itself can be const, but has to be cast to non-const with const elements to be actually iterable.
Casting away const and then mutating the result violates D's type system. The only cases where it doesn't is when the cast actually copies the type. So, as soon as you have something like a range of const ranges, you're screwed.
 In practice, no way! With some care, you can write a range that copes
 with const elements, but even that is surprisingly thorny since a struct
 having a const field runs into all sorts of issues.

 But just try designing a range that can be cast between const and tail
 const! Bar for some niche ranges like TakeNone, it would be extremely
 hard if not impossible with the current language.
Fixing the issue of tail-const realistically requires a language DIP, and Walter's DIP should make it so that a const range will be able to be converted to a range of const elements, whereas right now, that only works with dynamic arrays, because technically, const(Range!Foo) and Range!(const Foo) are completely different and unrelated types as far as the type system is concerned (technically, they could even have completely different implementations in spite of the fact that they come from the same template).
 10. While it's not exactly a problem with the range API, it would be nice
 if we didn't have to import std.range / std.range.primitives (or the
 Phobos v3 equivalent) to use arrays as ranges.
Respectful disagree on this one. Ranges are (bar for the foreach range API) a Phobos concept, not a language concept. It's good to be able to pick whether functions in your module will digest ranges, and if, whether it's V2, V3 or some other API. I wish for no changes here.
Well, Walter has specifically requested that we fix it so that the import is no longer required, and a lot of us would love to get rid of it. Either way, this will have no effect on the v2 API, because the functions for the v3 API will be different. And since they only affect ranges, the only code that it would potentially cause problems for is code that wants to name one or more free functions which take dynamic arrays with exactly the same names. Anyone who wants to create their own range API will still be free to do so.
 Overview of Proposed Changes
 ----------------------------

 3. If a range is finite (that is, it's not statically known to be empty),
 and it can be default-initialized, then its init value must be empty. If
 it
 cannot be default-initialized, then it must define a function called
 emptyRange which returns an empty range of the same type. Phobos will then
 define a function called emptyRange which takes a finite range which can
 be
 default-initialized and returns the init value of that range.
I suggest a slight relaxation. A finite range must have an empty init value if it does not provide an explicit `emptyRange`. That is, any range, finite or infinite, may have any valid `.init` value, as long as it's `.emptyRange` (whether explicit or Phobos-provided) is actually empty.
That means that you cannot rely on a default-initialized finite range being valid and empty, which is part of the whole point here. Having that requirement will reduce bugs, and the lack of that requirement has caused bugs, particularly since plenty of folks end up assuming that the init value is empty even when there is no such guarantee. Right now, there isn't even a guarantee that it's valid, let alone empty. At least when a range cannot be default-initialized, you get an error when you attempt instead of just getting buggy behavior. Requiring emptyRange in that case then allows us a way to get an empty range in spite of the fact that we can't rely on the range being default-initializable, but by requiring that finite ranges that can be default-initialized have an init value that's both valid and empty will prevent bugs due to folks assuming that the range is valid and empty like has already been happening. And since the vast majority of ranges can be default-initialized, emptyRange will only need to be declared in rare cases, and the free function, emptyRange, will take care of the cases where the range can be default-initialized.
 4. All ranges must be either dynamic arrays or structs (and not pointers
 to
 structs either).
Maybe unions too? Not that it's often needed.
Unions can't have functions, so they couldn't directly be ranges anyway. But even if they could, passing around unions is just begging for trouble. They don't have postblit or copy constructors even when their members would need it (since the union doesn't keep track of which of its members is the valid one), and for code to work properly, it really needs to access whichever the currently valid member of the union is anyway. Trying to do anything with unions and ranges would just be begging for bugs.
 11. Finite random-access ranges are required to implement opDollar, and
 their opIndex must work with $. Similarly, any ranges which implement
 slicing must implement opDollar, and slicing must work with $.

 In most cases, this will just be an alias to length, but barring a
 language
 change that automatically treats length as opDollar (which has been
 discussed before but has never materialized and is somewhat controversial
 given types where it wouldn't make sense to treat length as opDollar), we
 have to require that opDollar be defined, or generic code won't be able to
 use $ with indexing or slicing. We probably would have required it years
 ago except that it would have broken code to add the requirement.
Why would any language changes be needed for that? Can't `.length` be a Phobos function for types that have `opDollar` but not their own `.length`?
Right now, generic code canot use $ with ranges, because the range API does not require that it be there, and there are many, many ranges which do not implement it even if they're random-access and/or have slicing. So, generic code cannot do something like auto slice = range[5 .. $ - 1]; or auto value = range[$ - 7]; For that to work, we have to actually require that $ work properly for ranges with slicing and/or random-access. Right now, you're forced to do stuff like auto slice = range[5 .. range.length - 1]; or auto value = range[range.length - 7]; which is far more verbose and does not play well with converting code that operates on dynamic arrays to code that operates on random-access ranges instead. hasSlicing and isRandomAccessRange likely would have reqired opDollar years ago, but it wasn't there initially, and adding it later would have broken too much code. With the current language, for $ to work with a user-defined type for either indexing or slicing, that type must implement opDollar, because that's what the language replaces $ with. There is no other way to have $. It has been suggested in the past that we make it so that the language treats length as opDollar if it's present and opDollar is not present, which would make it so that all random-access ranges automatically got a working $. However, it has also been brought up that there are types which have both indexing and length but where it would be inappropriate for length to be treated as opDollar - the prime example being hash tables / maps where $ wouldn't make any sense at all. So, barring a language change, the most straightforward way for a finite range to implement opDollar is just to alias length to opDollar, and finite ranges without length shouldn't have opDollar anyway. But infinite ranges will have to define their own opDollar regardless, since they need one that's a type that just marks the end when slicing, and it makes no sense for their opDollar to be size_t, since size_t is not infinite. The proposal here is basically just to require opDollar with slicing and indexing like we should have originally - and would have if it hadn't been forgotten. - Jonathan M Davis
May 18
prev sibling parent Quirin Schroll <qs.il.paperinik gmail.com> writes:
On Saturday, 18 May 2024 at 20:48:51 UTC, Dukc wrote:
 Jonathan M Davis kirjoitti 16.5.2024 klo 17.56:> This is a
 10. While it's not exactly a problem with the range API, it 
 would be nice if
 we didn't have to import std.range / std.range.primitives (or 
 the Phobos v3
 equivalent) to use arrays as ranges.
Respectful disagree on this one. Ranges are (bar for the foreach range API) a Phobos concept, not a language concept. It's good to be able to pick whether functions in your module will digest ranges, and if, whether it's V2, V3 or some other API. I wish for no changes here.
The range API is baked into the language through `foreach` semantics. It’s not a Phobos feature, it’s a core language feature.
Jun 04
prev sibling next sibling parent Steven Schveighoffer <schveiguy gmail.com> writes:
On Thursday, 16 May 2024 at 14:56:55 UTC, Jonathan M Davis wrote:

 5. Forward ranges will no longer have save. Rather, it will be 
 required that copying a forward range will result in an 
 independent copy. Both the original and the copy must then have 
 the same elements in the same order, and iterating one will not 
 affect the other. So, in essence, copying a forward range must 
 be equivalent to calling save (though the actual implementation 
 could be more clever than that).
So something I just thought of as a drawback here -- `save` gives a nice explicit mechanism to copy when passing to a function that takes a value by `auto ref`. In other words it *forces* non-ref semantics so you do not affect the original. Since passing to an `auto ref` function lets the compiler decide to not make a copy, code that currently uses `save` for this purpose doesn't have an equivalent in the new regime. If we aren't going to have it, we might want to add a specialized do-nothing UFCS function which just takes an lvalue and turns it into an rvalue. Maybe we just call it `save`! Or maybe just a boring `asRvalue`... -Steve
May 27
prev sibling next sibling parent Quirin Schroll <qs.il.paperinik gmail.com> writes:
On Thursday, 16 May 2024 at 14:56:55 UTC, Jonathan M Davis wrote:
 […]
Maybe good new names could be `empty`, `front`, and `opPopFront` for forward ranges, and `empty` and `opFrontThenPop` (or something). Using `op…` makes sense as those functions get used by core-language constructs such as `foreach`. The reason `empty` and `front` aren’t `op…` is because those would often be called by programmers directly, and people would just define an alias anyway. One `op…` function suffices for a new interface. As a matter of fact, `++range` could lower to `range.opPopFront()` or, if that is `void`, `(range.opPopFront(), range)` (comma operator). And `*range++` could lower to `range.opFrontThenPop`. Another idea: For a forward range, `front` can be a field or property with a setter. The same can be true for input ranges’ `opFrontThenPop` which could be overloaded: If `opFrontThenPop()` is a getter for `front`, it’s an input range, and if there is `opFrontThenPop(rhs)` it’s an output range. Expressions of the form `*range++ = rhs` could lower to `opFrontThenPop(rhs)`. If the names are too range-specific, they could be generalized to `opDereferenceUnary`, such that `*obj++` and `*obj--` lower to `opDereferenceUnary!"++"` and `opDereferenceUnary!"--"`, respectively. If we go that path, a forward range could be defined by having `empty`, `front`, and supporting `++range` (which implements `popFront`); an input range be defined by having `empty` and supporting `*range++` (possibly by `opDereferenceUnary!"++"` that can be called with no arguments), and an output range could be defined by having `empty` and supporting `*range++ = rhs` (possibly by `opDereferenceUnary!"++"` that can be called with 1 argument).
Jul 03
prev sibling parent Imperatorn <johan_forsberg_86 hotmail.com> writes:
On Thursday, 16 May 2024 at 14:56:55 UTC, Jonathan M Davis wrote:
 Okay. This is my proposal for an updated input range API for 
 Phobos v3. I previously sent it out to some or the core D 
 folks, and the feedback has largely been positive, but I'm now 
 posting it to the newsgroup for wider feedback.
Tales from the AI ----------------- General Observations: Length and Clarity: The proposal is very long and dense, which might deter some readers from providing thorough feedback. Consider breaking it into more manageable sections or providing a summarized version at the beginning. Some sections could benefit from more concise language. Removing redundant phrases and focusing on the core points could enhance readability. Implementation Details: You mention several times that the exact details need to be sorted out. It would be beneficial to provide more concrete examples or pseudo-code where possible to clarify your ideas. Specific Observations and Suggestions: Auto-decoding: Your point about auto-decoding is clear, but ensure you emphasize the impact on backward compatibility. This change might break a lot of existing code, so highlighting strategies for a smooth transition would be beneficial. Save Requirement for Forward Ranges: You propose removing save and instead relying on copy semantics. While this can simplify the API, it's important to provide more detailed guidelines on how to implement forward ranges under this new requirement. Examples of how to handle common use cases without save would help clarify this change. Copy Semantics for Basic Input Ranges: Making basic input ranges non-copyable is a significant change. Ensure you clearly articulate the benefits versus the drawbacks. You might also want to provide migration strategies or helper functions for common patterns that currently rely on copying input ranges. Default Initialization: Requiring all default-initializable ranges to have a valid init value is a good move towards safety. However, the discussion on whether to disallow finite ranges from disabling default initialization seems unresolved. A more definitive stance would strengthen the proposal. Consider detailing specific examples of how this requirement improves robustness and how to handle exceptions. Transient Front: Explicitly disallowing transient fronts is a significant change. It might help to provide more context or examples of how common use cases will be handled under the new rules. For instance, what are the alternatives for ranges that currently rely on transient fronts? Const Ranges: The problem with const ranges is well-articulated, but the solution proposed (relying on language improvements) seems uncertain. It might be useful to outline potential workarounds or interim solutions until language changes are implemented. Renaming Range API Functions: Renaming functions like front to first and empty to isEmpty could introduce a significant learning curve. Ensure you provide strong justifications for these changes and perhaps include a transition plan or tools to help developers update their codebases. Backward Compatibility and Transition Plan: The proposal introduces many breaking changes. A detailed transition plan, including deprecation strategies, versioning schemes, and tooling support, would be crucial. Consider providing scripts or automated tools to help developers migrate their code. Structural Suggestions: Summary Section: Add a summary section at the beginning that lists the key changes and their benefits. This helps readers quickly grasp the main points before diving into the details. Separate Concerns: Consider separating the proposal into multiple documents or sections: one for identifying problems with the current API, one for the proposed changes, and one for implementation strategies and examples. Examples and Use Cases: Throughout the document, add more concrete examples and use cases. This helps in understanding the practical implications of the changes and provides a clearer picture of the benefits and challenges. Community Feedback: Encourage community feedback by posing specific questions or areas where input is particularly needed. This can help focus the discussion and gather more targeted feedback. Overall, your proposal is comprehensive and addresses many crucial aspects of the range API. By refining the structure and providing more concrete details, you can enhance its clarity and make it easier for the community to provide valuable feedback.
Jul 03