www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Range Redesign: Empty Ranges

reply Jonathan M Davis <newsgroup.d jmdavisprog.com> writes:
Okay. A few weeks ago, I made a post about potential design decisions when
updating the range API in the next version of Phobos, and this is a
continuation of that, albeit with a different set of problems. Specifically:

1. The range API does not currently - and cannot currently - require or
guarantee that the init value of a range is valid.

2. The range API provides no way (other than fully iterating through a
range) to get an empty range of the same type from a range unless the range
is a random-access range.

Fixing the first issue is straightforward, I think, though it potentially
affects the second, and of course, someone here may have thoughts or
insights which I've missed.

The issue with init is that of course code wants to use the init value, and
code often _does_ use the init value for specific ranges, but doing so when
that specific range type does not specify that the init value is valid (and
documentation almost never does) is relying on unspecified behavior - and in
the general case, using the init value of a range cannot be specified
behavior. The obvious case is ranges which are classes, because their init
value is going to be null, meaning that using the init value of a range
could result in a segfault. Of course, _most_ range-based code doesn't use
classes for ranges, so I expect that it's quite rare to hit that problem in
practice, but most range-based code uses Voldemort types and tends to ignore
the fact that the init value can still be used (either by using typeof on
the result to get its type or by getting init directly from the result -
e.g. range.init). And you actually need to do that when storing a range as a
member variable. E.G.

struct Foo
{
    ...
    auto _range = typeof(chain(func1(), func2(), func3())).init;
    ...
}

The more correct thing to do would be something like

struct Foo
{
    ...
    auto _range = Nullable!(typeof(chain(func1(), func2(), func3())));
    ...
}

so that you can avoid using the range's init value, but realistically, I
don't think that it's something that many folks think of. They use the
range's init value, and it works (since it often does work), and then they
end up relying on how that range type happens to work at the moment, which
could change in the future, since the range API does not guarantee that it
works, and the folks maintaining that range probably aren't thinking much
about the init value anyway - especially when it's a Voldemort type. The
typical expectation is that the user will be creating the range by calling
the function that returns it, not by using type introspection to get their
hands on the init value. But it is always possible to get the init value,
and code declaring member variables from ranges will use it unless it goes
to the extra effort of using something like Nullable so that it's possible
to store the range without using its init value as the default value.

Now, I think that the fix for this particular problem is pretty easy. We
just require that all ranges have a valid init value. This means disallowing
classes or pointers being used directly as ranges, but we were already
looking at doing that for forward ranges anyway in order to be able to get
rid of save. It's arguably more problematic for basic input ranges, but it's
quite feasible there as well. The previous thread discussed requiring that
they be reference types, which would lean towards them being pointers or
classes, but you can have structs which are reference types (e.g. they
contain a class reference but then have the logic around it to ensure that
it's not used when null). Also, it was proposed in that thread that we
require that basic input ranges be non-copyable, which would also make it
straightforward to require that the init value be valid, since in that case,
all basic input ranges would have to be structs regardless of what their
internals were doing with pointers or references. Either way, we can
disallow classes and pointers being used directly as ranges and require that
all ranges have a valid init value.

So, I'm proposing that we require that all ranges have a valid init value. I
suspect that the issue will still often be forgotten, particularly since
ranges are often Voldemort types, but if the range API explicitly requires
it, and the range documentation is clear on that point, then we can clearly
consider it a bug when a range has an invalid init value, and in most cases,
I wouldn't expect it to be a big deal to fix.

The second issue is then the ability to get an empty range from a range type
in O(1) which is the same type as the original range. In my experience, it's
sometimes the case that you really want to be able to do something like

range = Range.emptyRange;

or

range.makeEmpty();

in order to make a range empty without eating up CPU cycles calling popFront
until empty is true. This is possible with a random-access range by slicing
it with a zero length, but it's not possible with lower classes of ranges.
std.range.takeNone tries to do it, but it is forced to give you a new range
type if the range isn't a random-access range - specifically, it uses
takeExactly, but regardless, that doesn't work in cases where you need to
keep the range type the same but make it empty (e.g. because you're dealing
with a ref parameter). And right now, that means calling popFront over and
over again.

Of course, if you're dealing with an infinite range, then there can't be a
way to make it empty, but IMHO, we should have a way to get an empty range
from an arbitrary finite range type where the empty range is of the same
type. And for that, I see three possibly options:

1. Make it so that for finite ranges, the init value of a range is required
to not just be valid, but it's also required to be empty.

2. We add a static / enum member to the range API called something like
emptyRange where you can get an empty range from that range type where the
range is that range type.

3. Add a new member function to the range API called something like
makeEmpty which makes that particular instance empty.


functions, and in general, the init value probably should be empty. However,
my concern there is that there may be ranges where that's going to constrain
the implementation in annoying ways. All finite ranges have to have an empty
state, because you'll get to empty if you call popFront enough times, so it
should be possible for all ranges to have their init value be empty, but I
could also see it being the case where requiring that would mean that the
implementer of the range would have to jump through some annoying hoops to
make it work - though in most cases, if they have to jump through hoops, I
would expect that they'd have to jump through most (or all) of those hoops
to make the init value valid whether it's empty or not. But I would like
feedback on how reasonable folks think it would be to require that a range's
init value be empty.


get an empty range without having to create one with elements first, whereas
if we went with makeEmpty, then you'd need to construct a range first just
to make it empty. That being said, implementing emptyRange (or whatever we'd
call it) wouldn't be all that different from making the init value empty.
The main difference I see is that emptyRange wouldn't have to be statically
known, which might be useful in some cases, though the kinds of cases that I
can think of tend to be ones where making the init value valid would require
additional logic, in which case, in going to the effort of making the init
value valid, you might as well make it empty anyway.

I expect that for some ranges, makeEmpty would be easier to implement, and
it would probably also work better if we decide to make basic input ranges
non-copyable, since mutating a non-copyable range would be more
straightforward than moving an empty one on top of it. But I would think
that we could make the latter work, and in the general case, being able to
have an empty range for any finite range in O(1) would be nice to have
without having to create a range just to make it empty.

Either awy, one advantage to having emptyRange / makeEmpty over using the
init value as empty would be that while adding a new function would be
annoying, it would force anyone implementing a range to take into account
the fact that they need to provide a way to get an empty range. In many
cases, it may just be

    alias emptyRange = typeof(this).init;

but range implementers would be forced to think about it, whereas if we just
made it so that the documentation said that they had to make the init value
empty, they'd be less likely to notice. But it's still extra stuff that has
to implemented which would be completely unnecessary with ranges that would
have empty init value anyway.

A related issue is that if we're removing save from forward ranges, and we
keep the rest of the functions the same, some of the old basic input ranges
then look like the new forward ranges (since they wouldn't have save, and
forward ranges wouldn't require it), which would be bad, because they
wouldn't have the correct semantics. To fix that, we either need to add
something to the range API (be it a whole new function like emptyRange or
something like an enum that flagged a type as being a forward range), or
we'd have to change the existing functions (be it something as simple as
changing empty to isEmpty or something more complicated like switching from
front/popFront to head/tail, which would also allow us to get around the
tail-const problem but would be a pretty massive change to how ranges work).
Adding emptyRange or makeEmpty _almost_ fixes that problem, but since
infinite ranges wouldn't have have it (since they couldn't implement it), it
wouldn't actually fix the problem. Some old basic input ranges which are
infinite would still pass the new isForwardRange.

So, if it weren't for the infinite range issue, I'd see the fact that it
would change the API enough to distingush between the old and new APIs as a
good reason to go with emptyRange over requiring that the init value be
empty, but since it doesn't fix the issue with infinite ranges, I don't
think that it makes sense to go with emptyRange / makeEmpty just to
differentiate between the old and new range APIs (it would make it more
obvious to programmers using these ranges, which could be valuable - but
then again, having isEmpty instead of empty would do the same).

So, hopefully that makes the situation with those two issues clear. The
question then is what insights you folks might have on them and the proposed
solutions. It may even be that it's way too onerous to require that the init
value be valid, and someone has a use case to show that, though I very much
hope that that is not the case. I expect though that the bigger question is
whether init should be required to be empty for finite ranges or whether it
would be better to add a new function for that.

Thoughts?

- Jonathan M Davis
Mar 04 2024
next sibling parent reply Sebastiaan Koppe <mail skoppe.eu> writes:
On Monday, 4 March 2024 at 21:29:40 UTC, Jonathan M Davis wrote:
 1. Make it so that for finite ranges, the init value of a range 
 is required to not just be valid, but it's also required to be 
 empty.
Makes a lot of sense, but I agree it will be very annoying for some ranges. I suspect it will result in extra state and/or checks for those cases. Would it be too crazy of an idea for those ranges to implement a static init function?
Mar 04 2024
parent Jonathan M Davis <newsgroup.d jmdavisprog.com> writes:
On Monday, March 4, 2024 2:58:08 PM MST Sebastiaan Koppe via Digitalmars-d 
wrote:
 On Monday, 4 March 2024 at 21:29:40 UTC, Jonathan M Davis wrote:
 1. Make it so that for finite ranges, the init value of a range
 is required to not just be valid, but it's also required to be
 empty.
Makes a lot of sense, but I agree it will be very annoying for some ranges. I suspect it will result in extra state and/or checks for those cases. Would it be too crazy of an idea for those ranges to implement a static init function?
You mean define a static function named init? I think that most of us agree at this point that it was a mistake to allow init to be anything other than the compiler-defined init, and as I understand it, it doesn't really work to redefine it in the way that Walter intended. Realistically, code tends to depend on the init value being the compiler-defined one, and trying to do anything else is asking for trouble. A redefined init value doesn't actually get used with default-initialization, so with a case like struct Foo { ... typeof(chain(func1(), func2(), func3())) _range; ... } any init that you declare will be ignored (even if it's a value and not a function), meaning that unless that range type has a valid init value, that variable won't work properly unless it happens to be assigned a value before it's used. We could of course add a function / enum to the range API which would be required to be used instead of the init value when you're looking for some kind of initialization, but that doesn't solve the problem of default-initialized values. For that, it really needs the init to be valid. Of course, we _could_ allow ranges to disable default initialization, and then the type could define init with a static function which would be used when init was used explicitly, but that would potentially cause problems if init were used explicitly at compile time. It's also on the list of things that I half expect to be disallowed at some point with a future edition, because allowing init to be anything other than the default one tends to cause issues. So, we _might_ be able to make it work, but my gut reaction is that we're playing with fire, and we'll be better off overall if we can just require that the compiler-defined init value be valid. Still, if we allowed it, at least it should result in compile time errors in generic code rather than weird runtime behavior. But it would mean that generic code couldn't rely on being able to default-initialize ranges, and it would have to give all ranges a value at runtime, which would be annoying, and it realistically wouldn't happen outside of something like Phobos (and it would likely only happen in Phobos if that specific case was tested for by Phobos functions in general). Code in general expects default initialization to work, and it's usually only code specifically written to work with types that don't allow it where it actually works to have a type that disables it. So, I don't know. I'd really rather not play games with trying to redefine the init value. I also suspect that most of the annoyance that ranges might have with the init value would be there simply by requiring that the init value be valid without requiring that it be empty, in which case, requiring that it be empty wouldn't be onerous. It's requiring that it be valid which would potentially be onerous. But I'm also not sure how much of an issue that that would be in practice. For most ranges, I expect that it will be pretty simple, and many of them already have a valid, empty init value. In the vast majority of cases, I expect that if a range's init value isn't valid, it's because the person who wrote it didn't think about the init value being used (since it's probably a Voldemort type) and didn't test that case. And if the issue of it being problematic to make the init value valid isn't common, I'm disinclined to make range-based code in general have to worry about it - especially if it's still possible to make such ranges work (even if it's more annoying than would be desirable). I can also think of several ways to make it easy right off the top of my head, though you wouldn't necessarily want the additional overhead (e.g. any range could just use is to compare itself to its init value in empty, or it could have a bool to indicate whether it had been initialized via a constructor and check that; it would be annoying but easy). But I'm also not entirely sure how onerous it will be in practice to make the init values of ranges valid (be it empty or otherwise) - though it's actually _really_ easy in the cases where it would segfault now, since if such types then have a wrapper struct, it can easily check whether the pointer or reference is null and return true for empty. - Jonathan M Davis
Mar 04 2024
prev sibling next sibling parent reply Dom DiSc <dominikus scherkl.de> writes:
On Monday, 4 March 2024 at 21:29:40 UTC, Jonathan M Davis wrote:
 Of course, if you're dealing with an infinite range, then there 
 can't be a way to make it empty, but IMHO, we should have a way 
 to get an empty range from an arbitrary finite range type where 
 the empty range is of the same type. And for that, I see three 
 possibly options:

 1. Make it so that for finite ranges, the init value of a range 
 is required to not just be valid, but it's also required to be 
 empty.
I think this is the best solution.
 A related issue is that if we're removing save from forward 
 ranges, [...] Adding emptyRange or makeEmpty [...] wouldn't 
 actually fix the problem. Some old basic input ranges which are 
 infinite would still pass the new isForwardRange.
Are there infinite ranges that are indeed not forward ranges? I mean - they produce their output somehow algorithmically. Should be possible to safe the state of such an algorithm, no? Ok, maybe someone has not implemented it, but as we are requiring things anyway, why not require an infinite range to implement the copy function?
Mar 05 2024
parent reply Alexandru Ermicioi <alexandru.ermicioi gmail.com> writes:
On Tuesday, 5 March 2024 at 14:39:17 UTC, Dom DiSc wrote:
 Are there infinite ranges that are indeed not forward ranges?
/dev/random could be such a range.
Mar 05 2024
next sibling parent reply Adam Wilson <flyboynw gmail.com> writes:
On Tuesday, 5 March 2024 at 16:41:34 UTC, Alexandru Ermicioi 
wrote:
 On Tuesday, 5 March 2024 at 14:39:17 UTC, Dom DiSc wrote:
 Are there infinite ranges that are indeed not forward ranges?
/dev/random could be such a range.
In the case of /dev/random I don't really see how it's a true infinity, you can't go infinitely backwards. It is an infinitely forward moving stream, so you can have the concept of "zero" and "positive infinity", but there is no concept of "negative infinity". A negative number is utter non-sense for TRNG's, as whatever negative number you specify would become the new "zero". Practically there is a limit to how much entropy you can grab off the the chip in single chunk, so it is a bounded range, it's just that the boundary is somewhat fuzzy. In short you would probably end up implementing an entropy source as a forward range with some large upper limit.
Mar 05 2024
parent reply "H. S. Teoh" <hsteoh qfbox.info> writes:
On Wed, Mar 06, 2024 at 12:11:10AM +0000, Adam Wilson via Digitalmars-d wrote:
 On Tuesday, 5 March 2024 at 16:41:34 UTC, Alexandru Ermicioi wrote:
 On Tuesday, 5 March 2024 at 14:39:17 UTC, Dom DiSc wrote:
 Are there infinite ranges that are indeed not forward ranges?
/dev/random could be such a range.
In the case of /dev/random I don't really see how it's a true infinity, you can't go infinitely backwards.
Nothing about an infinite range requires that it be infinite both ways, unless it's a bidirectional range, which an RNG is not. [...]
 Practically there is a limit to how much entropy you can grab off the
 the chip in single chunk, so it is a bounded range, it's just that the
 boundary is somewhat fuzzy.
[...] A periodically-reseeded RNG is indeed a practically infinite range, with no cycling. You don't have to grab every value from the hardware entropy source; it suffices to use a cryptographic hash function on a counter that's periodically reseeded from the hardware entropy. It can literally generate an endless stream of random numbers. T -- Tell me and I forget. Teach me and I remember. Involve me and I understand. -- Benjamin Franklin
Mar 05 2024
parent Adam Wilson <flyboynw gmail.com> writes:
On Wednesday, 6 March 2024 at 00:23:26 UTC, H. S. Teoh wrote:
 A periodically-reseeded RNG is indeed a practically infinite 
 range, with no cycling.  You don't have to grab every value 
 from the hardware entropy source; it suffices to use a 
 cryptographic hash function on a counter that's periodically 
 reseeded from the hardware entropy. It can literally generate 
 an endless stream of random numbers.


 T
That is true in theory, but in practice if you try it on real hardware, not only will you pay some pretty serious performance penalties as the CPU tries to dump all that entropy, it will be dumping it to memory, of which you will eventually run out. So yes, it's theoretically unlimited, but in practice, there is no valid reason to actually implement it that way, and to-date, no modern Operating System entropy source allows you to. For example, on Windows, you'll be passing a fixed size buffer to the entropy source. Same with OpenSSL. We do not design code for what is theoretically possible, only that which can actually be achieved. I know, because I wrote a Crypto library for D that specifically deals with this.
Mar 05 2024
prev sibling parent reply Arafel <er.krali gmail.com> writes:
On 05.03.24 17:41, Alexandru Ermicioi wrote:
 On Tuesday, 5 March 2024 at 14:39:17 UTC, Dom DiSc wrote:
 Are there infinite ranges that are indeed not forward ranges?
/dev/random could be such a range.
Or a range that returns captured data from a sensor.
Mar 06 2024
next sibling parent Alexandru Ermicioi <alexandru.ermicioi gmail.com> writes:
On Wednesday, 6 March 2024 at 08:56:02 UTC, Arafel wrote:
 On 05.03.24 17:41, Alexandru Ermicioi wrote:
 On Tuesday, 5 March 2024 at 14:39:17 UTC, Dom DiSc wrote:
 Are there infinite ranges that are indeed not forward ranges?
/dev/random could be such a range.
Or a range that returns captured data from a sensor.
Nice example, Basically, any global resource (sensor, /dev/random, etc.), that can be representable as a range, and doesn't have a definitive end, is an infinite input range, since you can't save it at a specific point and then continue multiple times from that point. Best regards, Alexandru.
Mar 06 2024
prev sibling parent reply Dom DiSc <dominikus scherkl.de> writes:
On Wednesday, 6 March 2024 at 08:56:02 UTC, Arafel wrote:
 On 05.03.24 17:41, Alexandru Ermicioi wrote:
 On Tuesday, 5 March 2024 at 14:39:17 UTC, Dom DiSc wrote:
 Are there infinite ranges that are indeed not forward ranges?
/dev/random could be such a range.
If you have a source of real entropy, ok. But pseudo-random-number-generators do have a seed and with same seed produces always the same numbers.
 Or a range that returns captured data from a sensor.
Ok, that's a more convincing example. This is really hard to copy. Requires perfect knowlegde in a deterministic universe :-)
Mar 06 2024
parent Arafel <er.krali gmail.com> writes:
On 06.03.24 11:56, Dom DiSc wrote:
 
 If you have a source of real entropy, ok. But 
 pseudo-random-number-generators do have a seed and with same seed 
 produces always the same numbers.
While technically true, you might not have access to the seed, so in practical terms it's a bit of distinction without a difference. For instance, in `/dev/[u]random` the seed is kept internally by the OS, and I don't think you have access to it.... and even if you had, you wouldn't want to re-implement the kernel's PRNG, right?
Mar 06 2024
prev sibling next sibling parent reply Atila Neves <atila.neves gmail.com> writes:
On Monday, 4 March 2024 at 21:29:40 UTC, Jonathan M Davis wrote:
 Okay. A few weeks ago, I made a post about potential design 
 decisions when updating the range API in the next version of 
 Phobos, and this is a continuation of that, albeit with a 
 different set of problems. Specifically:

 [...]
I like T.init being empty. I thought I'd found a clever way to make this work for classes but apparently this crashes, much to my surprise: ```d class Class { bool empty() safe nogc pure nothrow scope const { return this is null; } } Class c; assert(c.empty); ``` This, although supposedly equivalent, works fine: ```d class Class {} bool empty(in Class c) safe nogc pure nothow { return c is null; } ``` Which might be a way out? I still think the first example should work.
Mar 06 2024
next sibling parent reply Ogi <ogion.art gmail.com> writes:
On Wednesday, 6 March 2024 at 12:20:40 UTC, Atila Neves wrote:
I thought I'd found a clever way to
 make this work for classes but apparently this crashes, much to 
 my surprise:

 ```d
 class Class {
     bool empty()  safe  nogc pure nothrow scope const {
         return this is null;
     }
 }

 Class c;
 assert(c.empty);
 ```
I’m surprised that the equivalent C++ code *doesn’t* crash (at least on my machine): ``` Class* c = nullptr; assert(c->empty()); That’s still UB though.
Mar 06 2024
parent reply Jonathan M Davis <newsgroup.d jmdavisprog.com> writes:
On Wednesday, March 6, 2024 6:09:52 AM MST Ogi via Digitalmars-d wrote:
 On Wednesday, 6 March 2024 at 12:20:40 UTC, Atila Neves wrote:
 I thought I'd found a clever way to

 make this work for classes but apparently this crashes, much to
 my surprise:

 ```d
 class Class {

     bool empty()  safe  nogc pure nothrow scope const {

         return this is null;

     }

 }

 Class c;
 assert(c.empty);
 ```
I’m surprised that the equivalent C++ code *doesn’t* crash (at least on my machine): ``` Class* c = nullptr; assert(c->empty()); That’s still UB though.
The issue is whether the function is virtual. In D, class member functions are virtual by default, so you get a crash when the program attempts to use the vtable to look the function up (and therefore attempts to dereference null), whereas in C++, you don't, because class member functions are non-virtual by default, and so it doesn't do anything with the vtable. If you mark the function as final in D (without it overriding anything), then there shouldn't be a crash, and if you mark the C++ function as virtual, then there will be. - Jonathan M Davis
Mar 06 2024
parent Ogi <ogion.art gmail.com> writes:
On Wednesday, 6 March 2024 at 19:03:39 UTC, Jonathan M Davis 
wrote:
 The issue is whether the function is virtual. In D, class 
 member functions are virtual by default, so you get a crash 
 when the program attempts to use the vtable to look the 
 function up (and therefore attempts to dereference null), 
 whereas in C++, you don't, because class member functions are 
 non-virtual by default, and so it doesn't do anything with the 
 vtable. If you mark the function as final in D (without it 
 overriding anything), then there shouldn't be a crash, and if 
 you mark the C++ function as virtual, then there will be.
Virtual or not, GCC assumes that `this` cannot be null: ```C++ #include <cassert> class C { public: bool empty() const { return this == nullptr; // warning: `nonnull` argument `this` compared to NULL } }; int main() { assert(((C*)nullptr)->empty()); // warning: `this` pointer is null } ```
Mar 07 2024
prev sibling parent reply Paul Backus <snarwin gmail.com> writes:
On Wednesday, 6 March 2024 at 12:20:40 UTC, Atila Neves wrote:
 I like T.init being empty. I thought I'd found a clever way to 
 make this work for classes but apparently this crashes, much to 
 my surprise:

 ```d
 class Class {
     bool empty()  safe  nogc pure nothrow scope const {
         return this is null;
     }
 }

 Class c;
 assert(c.empty);
 ```
It crashes because it's attempting to access Class's vtable. If you make the method final, it works: ```d class Class { final bool empty() { return this is null; } } void main() { Class c; assert(c.empty); // ok } ```
Mar 06 2024
parent reply Atila Neves <atila.neves gmail.com> writes:
On Wednesday, 6 March 2024 at 13:26:14 UTC, Paul Backus wrote:
 On Wednesday, 6 March 2024 at 12:20:40 UTC, Atila Neves wrote:
 I like T.init being empty. I thought I'd found a clever way to 
 make this work for classes but apparently this crashes, much 
 to my surprise:

 ```d
 class Class {
     bool empty()  safe  nogc pure nothrow scope const {
         return this is null;
     }
 }

 Class c;
 assert(c.empty);
 ```
It crashes because it's attempting to access Class's vtable. If you make the method final, it works: ```d class Class { final bool empty() { return this is null; } } void main() { Class c; assert(c.empty); // ok } ```
Of course! I write classes so seldom I forget my own rule to add `final` unless I can't. That would severely restrict class ranges though, since presumably the whole point is to have dynamic dispatch.
Mar 07 2024
parent Jonathan M Davis <newsgroup.d jmdavisprog.com> writes:
On Thursday, March 7, 2024 2:11:29 AM MST Atila Neves via Digitalmars-d wrote:
 Of course! I write classes so seldom I forget my own rule to add
 `final` unless I can't.

 That would severely restrict class ranges though, since
 presumably the whole point is to have dynamic dispatch.
The solution that I'm currently planning to go with is to wrap them: https://forum.dlang.org/post/mailman.1091.1709755456.3719.digitalmars-d puremagic.com - Jonathan M Davis
Mar 07 2024
prev sibling parent reply Paul Backus <snarwin gmail.com> writes:
On Monday, 4 March 2024 at 21:29:40 UTC, Jonathan M Davis wrote:
 2. The range API provides no way (other than fully iterating 
 through a range) to get an empty range of the same type from a 
 range unless the range is a random-access range.
Genuine question: what are the use-cases for this? In general, the capabilities of ranges are designed to serve the needs of algorithms. Input ranges exist because single-pass iteration is all that's needed to implement algorithms like map, filter, and reduce. Random-access ranges exist because they're needed for sorting algorithms. And so on. I'm not aware of any algorithm, or class of algorithm, that needs this specific capability you're describing. If such algorithms exist, we should be using them to guide our design here. If they don't...then maybe this isn't really a problem at all.
Mar 06 2024
next sibling parent reply Steven Schveighoffer <schveiguy gmail.com> writes:
On Wednesday, 6 March 2024 at 14:18:50 UTC, Paul Backus wrote:
 On Monday, 4 March 2024 at 21:29:40 UTC, Jonathan M Davis wrote:
 2. The range API provides no way (other than fully iterating 
 through a range) to get an empty range of the same type from a 
 range unless the range is a random-access range.
Genuine question: what are the use-cases for this? In general, the capabilities of ranges are designed to serve the needs of algorithms. Input ranges exist because single-pass iteration is all that's needed to implement algorithms like map, filter, and reduce. Random-access ranges exist because they're needed for sorting algorithms. And so on. I'm not aware of any algorithm, or class of algorithm, that needs this specific capability you're describing. If such algorithms exist, we should be using them to guide our design here. If they don't...then maybe this isn't really a problem at all.
When you need to pass a specific range type, and you want to pass in an empty range of that type, how do you do it? There is no formal definition for "create an empty range T". It's not just algorithms, it's regular functions or types with specific parameter requirements. What is a very logical mechanism is just to pass the `init` value, because 99% of the time, that is the same thing as an empty range. Except when it isn't. But if you aren't testing for that, then you don't realize it. Or if code changes slightly such that now you involve a range you didn't before which doesn't have that property, then all of a sudden code breaks. We can add a new requirement that ranges that can be initialized as empty have some `emptyRange` static member or something like that. But I expect this is going to go down the same route as `save`, where really you should be calling `save` every time you copy a forward range, but in practice nobody does, since regular copying is 99% of the time the same thing. Likewise people will pass `.init` instead of `.emptyRange` because it's almost always the same thing. This is why I think it should just be formally stated that the `init` value of a non-infinite range should be empty, bringing into standard what people *already do*. The only tricky aspect is ranges that are references (classes/pointers). Neither of those to me should be supported IMO, you can always wrap such a thing in a range harness. But if people insist that we must have class ranges, I'd say an `emptyRange` property is in order. -Steve
Mar 06 2024
next sibling parent reply "H. S. Teoh" <hsteoh qfbox.info> writes:
On Wed, Mar 06, 2024 at 04:47:02PM +0000, Steven Schveighoffer via
Digitalmars-d wrote:
[...]
 The only tricky aspect is ranges that are references
 (classes/pointers).  Neither of those to me should be supported IMO,
 you can always wrap such a thing in a range harness.
[...] Every time this topic comes up, class-based ranges become the whipping boy of range design woes. Actually, they serve an extremely important role: type erasure, which is critical when you have code like this: auto myRangeFunc(R,S)(R range1, S range2) { if (runtimeDecision()) { return range1; } else { return range2; } } This generally will not compile because R and S are different types, even if both conform to a common underlying range API, like an input range. To remedy this situation, class-based range wrappers come to the rescue: auto myRangeFunc(R,S)(R range1, S range2) { if (runtimeDecision()) { return inputRangeObject(range1); } else { return inputRangeObject(range2); } } Note that you can't use a struct wrapper here, because R and S have different ABIs; the only way to correctly forward range methods to R or S is to use overridden base class methods. IOW, the type erasure of R and S is unavoidable for this code to work. // Also, class-based ranges are sometimes necessary for practical reasons, like in this one program I had, that has an UFCS chain consisting of hundreds of components (not all written out explicitly in the same function, of course, but that's what the underlying instantiation looks like). It generated ridiculously large symbols that crashed the compiler. Eventually the bug I filed prodded Rainer to implement symbol compression in DMD. But even then, the symbols were still ridiculously huge -- because the outermost template instantiation had to encode the types of every one of the hundreds of components. As a result, symbol compression or not, compile times were still ridiculously slow because the compiler still had to copy those symbols around -- their length grew quadratically with every component added to the chain. Inserting a call to .inputRangeObject in the middle of the chain dramatically cut down the size of the resulting symbols, because it effectively erased all the types preceding it, resulting in much saner codegen afterwards. T -- Leather is waterproof. Ever see a cow with an umbrella?
Mar 06 2024
next sibling parent reply Paul Backus <snarwin gmail.com> writes:
On Wednesday, 6 March 2024 at 17:32:17 UTC, H. S. Teoh wrote:
 Every time this topic comes up, class-based ranges become the 
 whipping boy of range design woes.  Actually, they serve an 
 extremely important role: type erasure, which is critical when 
 you have code like this:

 	auto myRangeFunc(R,S)(R range1, S range2) {
 		if (runtimeDecision()) {
 			return range1;
 		} else {
 			return range2;
 		}
 	}

 [...]

 Note that you can't use a struct wrapper here, because R and S 
 have different ABIs; the only way to correctly forward range 
 methods to R or S is to use overridden base class methods. IOW, 
 the type erasure of R and S is unavoidable for this code to 
 work.
This is what std.range.choose [1] is for. Internally, it's implemented as a tagged union of R and S. [1] https://dlang.org/library/std/range/choose.html
Mar 06 2024
parent reply "H. S. Teoh" <hsteoh qfbox.info> writes:
On Wed, Mar 06, 2024 at 05:43:05PM +0000, Paul Backus via Digitalmars-d wrote:
 On Wednesday, 6 March 2024 at 17:32:17 UTC, H. S. Teoh wrote:
 Every time this topic comes up, class-based ranges become the
 whipping boy of range design woes.  Actually, they serve an
 extremely important role: type erasure, which is critical when you
 have code like this:
 
 	auto myRangeFunc(R,S)(R range1, S range2) {
 		if (runtimeDecision()) {
 			return range1;
 		} else {
 			return range2;
 		}
 	}
 
 [...]
 
 Note that you can't use a struct wrapper here, because R and S have
 different ABIs; the only way to correctly forward range methods to R
 or S is to use overridden base class methods. IOW, the type erasure
 of R and S is unavoidable for this code to work.
This is what std.range.choose [1] is for. Internally, it's implemented as a tagged union of R and S. [1] https://dlang.org/library/std/range/choose.html
Haha, I knew that one would be brought up. Unfortunately, it only works for a very limited subset of use cases. In the above simplified code you could replace the if-statement with .choose. However, this immediately falls flat if the decision is made by a switch statement, or nested if-blocks, or an early exit from a loop. The problem is that .choose requires that both ranges are constructible in the same scope that it is called in. In non-trivial code this is sometimes impossible, as you may be in a decision tree where certain subranges are only constructed if the condition is true, or may depend on value types that may not even exist if the condition is not satisfied. This makes .choose essentially useless outside of trivial cases. Besides, .choose does not type-erase, so in the cases where you want type erasure it's still no help. T -- English is useful because it is a mess. Since English is a mess, it maps well onto the problem space, which is also a mess, which we call reality. Similarly, Perl was designed to be a mess, though in the nicest of all possible ways. -- Larry Wall
Mar 06 2024
parent reply Dukc <ajieskola gmail.com> writes:
On Wednesday, 6 March 2024 at 17:57:36 UTC, H. S. Teoh wrote:
 The problem is that .choose requires that both ranges are 
 constructible in the same scope that it is called in.
It's not even about those ranges being constructible. It's about separation of concerns. Like you wrote in your earlier post, sometimes we want to work with an arbitrary range without templating all of our code. Maybe for compile time reasons, or maybe to write a library that can be separately compiled from the client code.
 This makes .choose essentially useless outside of trivial cases.
It's far from useless, it's only like taking an enum argument and `switch`ing on it instead of taking a delerate argument. Certainly works but means you have to list all your cases in the function, which may or may not be what you want. The gripe I have with `std.range.interface` is lack of attributes in the virtual functions. I don't think any simple attribute set alone would solve the problem well. Granted, there should rarely be reason to have a dynamic ` system` range type, but `pure` and `nothrow` are attributes you sometimes really want, sometimes you really don't. Therefore we need the ability to pick, meaning the interfaces will have to be templated, along the lines of: ```D interface ForwardRange(uint attrs) : InputRange!attrs { // ... } ``` Plus there should probably be function to convert, for example, a ``` BidirectionalRange!(FunctionAttribute!safe | FunctionAttribute!pure_ | FunctionAttribute!nothrow_) ``` to ``` BidirectionalRange!(FunctionAttribute!safe | FunctionAttribute!nothrow_) ``` .
Mar 06 2024
next sibling parent reply Jonathan M Davis <newsgroup.d jmdavisprog.com> writes:
On Wednesday, March 6, 2024 1:53:55 PM MST Dukc via Digitalmars-d wrote:
 The gripe I have with `std.range.interface` is lack of attributes
 in the virtual functions. I don't think any simple attribute set
 alone would solve the problem well. Granted, there should rarely
 be reason to have a dynamic ` system` range type, but `pure` and
 `nothrow` are attributes you sometimes really want, sometimes you
 really don't. Therefore we need the ability to pick, meaning the
 interfaces will have to be templated, along the lines of:
 ```D
 interface ForwardRange(uint attrs) : InputRange!attrs
 {
     // ...
 }
 ```

 Plus there should probably be function to convert, for example, a
 ```
 BidirectionalRange!(FunctionAttribute!safe |
 FunctionAttribute!pure_ | FunctionAttribute!nothrow_)
 ```
 to
 ```
 BidirectionalRange!(FunctionAttribute!safe |
 FunctionAttribute!nothrow_)
 ```
 .
This is a fundamental problem with classes in general. Attributes just do not play nicely with them. Either you're forced to require attributes that potentially won't work for some code, or you're forced to disallow attributes that some code might require. And what you're describing results in a proliferation of interfaces, which to an extent, goes against the point of using interfaces. So, we might do something like that, but it does cause a different set of problems. Regardless, we will have to look over exactly what we want to do with the replacement for std.range.interfaces. I think that it was mostly thrown together because it was thought that it was needed and not because it really had been needed much at that point, and we should definitely look at what needs reworking based on what we've learned in the interim. Classes in general are kind of terrible with generic code in D though, particularly because of the issues with attributes. - Jonathan M Davis
Mar 06 2024
parent reply Dukc <ajieskola gmail.com> writes:
On Wednesday, 6 March 2024 at 21:24:59 UTC, Jonathan M Davis 
wrote:
 This is a fundamental problem with classes in general. 
 Attributes just do not play nicely with them. Either you're 
 forced to require attributes that potentially won't work for 
 some code, or you're forced to disallow attributes that some 
 code might require.
It's not a problem with classes, it's a decision one has to make when designing an interface with runtime variance regardless of the mechanism. You either have to allow the unknown code you're calling to execute side effects in which case it's impure, or you must execute the side effects it requests on it's behalf which means the unknown code must use your interfaces instead of doing their io themselves. It would be the same with any other mechanism.
 And what you're describing results in a proliferation of 
 interfaces, which to an extent, goes against the point of using 
 interfaces.
I don't think so. Say, you're designing an interpreter, and decide the user-defined functions can't execute io directly or break D type system, but can throw exceptions. Therefore the D type for a range in your client langauge will be `ForwardRange!(ClientLangValue, FunctionAttribute.safe | FunctionAttribute.pure_)`. You need only one template instance, same as presently. How is this exactly proliferation?
Mar 06 2024
parent Jonathan M Davis <newsgroup.d jmdavisprog.com> writes:
On Wednesday, March 6, 2024 3:54:54 PM MST Dukc via Digitalmars-d wrote:
 On Wednesday, 6 March 2024 at 21:24:59 UTC, Jonathan M Davis

 wrote:
 This is a fundamental problem with classes in general.
 Attributes just do not play nicely with them. Either you're
 forced to require attributes that potentially won't work for
 some code, or you're forced to disallow attributes that some
 code might require.
It's not a problem with classes, it's a decision one has to make when designing an interface with runtime variance regardless of the mechanism. You either have to allow the unknown code you're calling to execute side effects in which case it's impure, or you must execute the side effects it requests on it's behalf which means the unknown code must use your interfaces instead of doing their io themselves. It would be the same with any other mechanism.
Yes, it's a problem with non-templated code in general, but because you can't templatize virtual functions, it's a problem with classes that you can't really fix, whereas with most other code, you can usually fix it by templatizing the code and inferring the attributes based on the arguments.
 And what you're describing results in a proliferation of
 interfaces, which to an extent, goes against the point of using
 interfaces.
I don't think so. Say, you're designing an interpreter, and decide the user-defined functions can't execute io directly or break D type system, but can throw exceptions. Therefore the D type for a range in your client langauge will be `ForwardRange!(ClientLangValue, FunctionAttribute.safe | FunctionAttribute.pure_)`. You need only one template instance, same as presently. How is this exactly proliferation?
It's a proliferation, because you're declaring a bunch of different interfaces. Instead of having ForwardRange, you have a whole bunch of variants of ForwardRange. Now, that might not end up being a problem in practice (particularly since you do already have to templatize ForwardRange on the element type), because you may only need one variation in your codebase, but it does further complicate the interfaces, and it does mean that interfaces with the same element type won't necessarily be compatible, whereas with the current design, that isn't a problem. So, it may ultimately be what we want to do, but it's not without its downsides. - Jonathan M Davis
Mar 06 2024
prev sibling parent "H. S. Teoh" <hsteoh qfbox.info> writes:
On Wed, Mar 06, 2024 at 08:53:55PM +0000, Dukc via Digitalmars-d wrote:
 On Wednesday, 6 March 2024 at 17:57:36 UTC, H. S. Teoh wrote:
[...]
 This makes .choose essentially useless outside of trivial cases.
It's far from useless, it's only like taking an enum argument and `switch`ing on it instead of taking a delerate argument. Certainly works but means you have to list all your cases in the function, which may or may not be what you want.
Well yes, so it's very limited in what it can do. In the cases where you can't list all cases in the call to .choose, or where you don't want to for code cleanliness reasons, it's useless.
 The gripe I have with `std.range.interface` is lack of attributes in
 the virtual functions. I don't think any simple attribute set alone
 would solve the problem well.
[...] The problem is that the base class cannot predict which restrictive attributes will be applied to a derived class override of a method. You cannot override a safe base class method with a system derived class method, for example. So it must settle for the most permissive attributes. But that means the code that receives a base class reference must handle the most permissive attributes as well; a safe function would not be able to invoke a safe method in a derived class if the only reference it has to the derived object is a base class reference. There's no way around this except to get rid of (restrictive) attributes altogether. T -- EMACS = Extremely Massive And Cumbersome System
Mar 06 2024
prev sibling next sibling parent reply Meta <jared771 gmail.com> writes:
On Wednesday, 6 March 2024 at 17:32:17 UTC, H. S. Teoh wrote:
 Every time this topic comes up, class-based ranges become the 
 whipping boy of range design woes.  Actually, they serve an 
 extremely important role: type erasure, which is critical when 
 you have code like this:
 ...etc.
This would be a complete non-issue if we had language-level sumtypes: ``` sumtype { R, S } myRangeFunc(R,S)(R range1, S range2) { if (runtimeDecision()) { return range1; } else { return range2; } } ``` std.sumtype can also do this, the ergonomics just aren't as nice.
Mar 06 2024
parent reply "H. S. Teoh" <hsteoh qfbox.info> writes:
On Wed, Mar 06, 2024 at 05:51:11PM +0000, Meta via Digitalmars-d wrote:
 On Wednesday, 6 March 2024 at 17:32:17 UTC, H. S. Teoh wrote:
 Every time this topic comes up, class-based ranges become the
 whipping boy of range design woes.  Actually, they serve an
 extremely important role: type erasure, which is critical when you
 have code like this: ...etc.
This would be a complete non-issue if we had language-level sumtypes: ``` sumtype { R, S } myRangeFunc(R,S)(R range1, S range2) { if (runtimeDecision()) { return range1; } else { return range2; } } ``` std.sumtype can also do this, the ergonomics just aren't as nice.
No, it doesn't work the way you think it would. The returned range must behave like an input range (forward, etc.), as a single entity, because the caller doesn't and shouldn't know or care whether the returned value is a sumtype or not. It must be able to just call .front, .empty, .popFront on the result without any explicit unwrapping. You can't do that with a sumtype, because you need to unwrap first. And even if you make unwrapping implicit, so that a sumtype allows operations common to all underlying types, there are implementational difficulties. Calling e.g. .front on a sumtype potentially accessing two completely incompatible functions. For example, if R.empty is a function but S.empty is a field, there is no sane way to implement the sumtype in such a way that the caller could just access .empty without introducing a runtime switch into every range API call. (And even if you were to implement things in this convoluted way, there are unaddressed difficulties of what .empty on a sumtype might mean if it refers to two completely incompatible things, like a 3-parameter function in R and an alias to a symbol in S. What would mySumtype.empty actually compile to in that case?) T -- It's bad luck to be superstitious. -- YHL
Mar 06 2024
parent reply Meta <jared771 gmail.com> writes:
On Wednesday, 6 March 2024 at 18:07:22 UTC, H. S. Teoh wrote:
 On Wed, Mar 06, 2024 at 05:51:11PM +0000, Meta via 
 Digitalmars-d wrote:
 On Wednesday, 6 March 2024 at 17:32:17 UTC, H. S. Teoh wrote:
 Every time this topic comes up, class-based ranges become 
 the whipping boy of range design woes.  Actually, they serve 
 an extremely important role: type erasure, which is critical 
 when you have code like this: ...etc.
This would be a complete non-issue if we had language-level sumtypes: ``` sumtype { R, S } myRangeFunc(R,S)(R range1, S range2) { if (runtimeDecision()) { return range1; } else { return range2; } } ``` std.sumtype can also do this, the ergonomics just aren't as nice.
No, it doesn't work the way you think it would. The returned range must behave like an input range (forward, etc.), as a single entity, because the caller doesn't and shouldn't know or care whether the returned value is a sumtype or not. It must be able to just call .front, .empty, .popFront on the result without any explicit unwrapping. You can't do that with a sumtype, because you need to unwrap first. And even if you make unwrapping implicit, so that a sumtype allows operations common to all underlying types, there are implementational difficulties. Calling e.g. .front on a sumtype potentially accessing two completely incompatible functions. For example, if R.empty is a function but S.empty is a field, there is no sane way to implement the sumtype in such a way that the caller could just access .empty without introducing a runtime switch into every range API call. (And even if you were to implement things in this convoluted way, there are unaddressed difficulties of what .empty on a sumtype might mean if it refers to two completely incompatible things, like a 3-parameter function in R and an alias to a symbol in S. What would mySumtype.empty actually compile to in that case?) T
Ah, you're right. There is really no way to do this in a completely flexible way in current D, if you also want to have value range semantics. Couldn't we somehow fit RefRange into the current range hierarchy somewhere, and provide primitives to make it more difficult to shoot yourself in the foot?
Mar 06 2024
parent reply Jonathan M Davis <newsgroup.d jmdavisprog.com> writes:
On Wednesday, March 6, 2024 1:23:52 PM MST Meta via Digitalmars-d wrote:
 Couldn't we somehow fit RefRange into the
 current range hierarchy somewhere, and provide primitives to make
 it more difficult to shoot yourself in the foot?
RefRange was a huge mistake (and it was my mistake). It's basically trying to force reference semantics on something that has value semantics, and it causes problems as a result. I don't think that it can actually be made to work properly - at least not with forward ranges. IMHO, we need to be able to rely on the semantics of the basic range operations, and most of the problems with the current range API come from when operations are under-specified or outright unspecified - be it because we're trying to support too much (e.g. both structs and classes for ranges), or because we just didn't actually think to specify what the behavior should be. And in some cases, we though of it but too late to actually require it (e.g. you can't actually rely on $ working with random-access ranges, because we didn't think to require it up front, and too many random-access ranges had been written that didn't support it by the time we realized our mistake). - Jonathan M Davis
Mar 06 2024
parent Meta <jared771 gmail.com> writes:
On Wednesday, 6 March 2024 at 21:17:37 UTC, Jonathan M Davis 
wrote:
 On Wednesday, March 6, 2024 1:23:52 PM MST Meta via 
 Digitalmars-d wrote:
 Couldn't we somehow fit RefRange into the
 current range hierarchy somewhere, and provide primitives to 
 make
 it more difficult to shoot yourself in the foot?
RefRange was a huge mistake (and it was my mistake). It's basically trying to force reference semantics on something that has value semantics, and it causes problems as a result. I don't think that it can actually be made to work properly - at least not with forward ranges. IMHO, we need to be able to rely on the semantics of the basic range operations, and most of the problems with the current range API come from when operations are under-specified or outright unspecified - be it because we're trying to support too much (e.g. both structs and classes for ranges), or because we just didn't actually think to specify what the behavior should be. And in some cases, we though of it but too late to actually require it (e.g. you can't actually rely on $ working with random-access ranges, because we didn't think to require it up front, and too many random-access ranges had been written that didn't support it by the time we realized our mistake). - Jonathan M Davis
I don't mean the type RefRange; I mean expanding the range hierarchy to include the concept of a "RefRange" (just like InputRange, BidirectionalRange, etc.) that explicitly has reference semantics instead of value semantics. Then you can specialize algorithms on whether they're a ref range or not (implying that ranges with reference semantics should no longer pass the isInputRange et al. checks - no idea how you'd do that OTOH), and then you can account for that fact in the implementation of the algorithm, rather than all range algorithms currently carrying an implicit assumption that they're working with a value range.
Mar 06 2024
prev sibling parent Steven Schveighoffer <schveiguy gmail.com> writes:
On Wednesday, 6 March 2024 at 17:32:17 UTC, H. S. Teoh wrote:
 On Wed, Mar 06, 2024 at 04:47:02PM +0000, Steven Schveighoffer 
 via Digitalmars-d wrote: [...]
 The only tricky aspect is ranges that are references 
 (classes/pointers).  Neither of those to me should be 
 supported IMO, you can always wrap such a thing in a range 
 harness.
[...] Every time this topic comes up, class-based ranges become the whipping boy of range design woes. Actually, they serve an extremely important role: type erasure, which is critical when you have code like this:
As Jonathan already responded, you can have a class range by wrapping it in a struct. Which is what I meant when I said: "you can always wrap such a thing in a range harness" -Steve
Mar 07 2024
prev sibling next sibling parent reply Paul Backus <snarwin gmail.com> writes:
On Wednesday, 6 March 2024 at 16:47:02 UTC, Steven Schveighoffer 
wrote:
 On Wednesday, 6 March 2024 at 14:18:50 UTC, Paul Backus wrote:
 On Monday, 4 March 2024 at 21:29:40 UTC, Jonathan M Davis 
 wrote:
 2. The range API provides no way (other than fully iterating 
 through a range) to get an empty range of the same type from 
 a range unless the range is a random-access range.
Genuine question: what are the use-cases for this? In general, the capabilities of ranges are designed to serve the needs of algorithms. Input ranges exist because single-pass iteration is all that's needed to implement algorithms like map, filter, and reduce. Random-access ranges exist because they're needed for sorting algorithms. And so on. I'm not aware of any algorithm, or class of algorithm, that needs this specific capability you're describing. If such algorithms exist, we should be using them to guide our design here. If they don't...then maybe this isn't really a problem at all.
When you need to pass a specific range type, and you want to pass in an empty range of that type, how do you do it?
By "specific range type", do you mean a specific category of range (input, forward, random-access, etc.), or a specific concrete type? If the former, this is already easy to do with existing language and library features (for example, in many cases you can use an empty slice). If the latter, then the question I'm asking is, why do you need to do that? Personally, I can think of plenty of times that I've needed to do the first thing, but I can't think of a single time that I've needed to do the second thing.
 The only tricky aspect is ranges that are references 
 (classes/pointers). Neither of those to me should be supported 
 IMO, you can always wrap such a thing in a range harness.
The main thing you lose by dropping support for reference-type ranges is interfaces. In particular, the interface inheritance hierarchy in `std.range.interfaces`, where `ForwardRange` inherits from `InputRange` and so on, cannot really be replicated using `structs` (`alias this` only goes so far).
Mar 06 2024
next sibling parent "Richard (Rikki) Andrew Cattermole" <richard cattermole.co.nz> writes:
On 07/03/2024 6:38 AM, Paul Backus wrote:
     The only tricky aspect is ranges that are references
     (classes/pointers). Neither of those to me should be supported IMO,
     you can always wrap such a thing in a range harness.
 
 The main thing you lose by dropping support for reference-type ranges is 
 interfaces. In particular, the interface inheritance hierarchy in 
 |std.range.interfaces|, where |ForwardRange| inherits from |InputRange| 
 and so on, cannot really be replicated using |structs| (|alias this| 
 only goes so far).
All of this would ideally be solved with the introduction of signatures. But alas as we've covered in past conversations that is a big set of features to introduce.
Mar 06 2024
prev sibling parent reply Steven Schveighoffer <schveiguy gmail.com> writes:
On Wednesday, 6 March 2024 at 17:38:56 UTC, Paul Backus wrote:
 On Wednesday, 6 March 2024 at 16:47:02 UTC, Steven 
 Schveighoffer wrote:
 On Wednesday, 6 March 2024 at 14:18:50 UTC, Paul Backus wrote:
 On Monday, 4 March 2024 at 21:29:40 UTC, Jonathan M Davis 
 wrote:
 2. The range API provides no way (other than fully iterating 
 through a range) to get an empty range of the same type from 
 a range unless the range is a random-access range.
Genuine question: what are the use-cases for this? In general, the capabilities of ranges are designed to serve the needs of algorithms. Input ranges exist because single-pass iteration is all that's needed to implement algorithms like map, filter, and reduce. Random-access ranges exist because they're needed for sorting algorithms. And so on. I'm not aware of any algorithm, or class of algorithm, that needs this specific capability you're describing. If such algorithms exist, we should be using them to guide our design here. If they don't...then maybe this isn't really a problem at all.
When you need to pass a specific range type, and you want to pass in an empty range of that type, how do you do it?
By "specific range type", do you mean a specific category of range (input, forward, random-access, etc.), or a specific concrete type? If the former, this is already easy to do with existing language and library features (for example, in many cases you can use an empty slice). If the latter, then the question I'm asking is, why do you need to do that?
I meant the latter. A concrete range type. How does it happen? Consider that an array is a concrete type that people use all the time. Why should any other range be different? I've definitely stored ranges and other things as type members that were voldemort types. This ability is more of a question of "do we want to add this feature to ranges or not?" The feature doesn't currently exist -- you can't assume that an uninitialized range is empty. Sometimes, we are bitten by the fact that the array is the most common range, and behaves in a specific way. People depend on that mechanism without realizing it, and then sometime later, they decide to change the type to one that is very compatible with arrays, but offers some benefit (i.e. to remove an allocation). However, the new range type might behave in unexpected ways, *but still compiles*. A few examples I can think of: 1. copying an array is equivalent to `arr.save`, but may not be the case for other forward ranges. 2. character arrays have a mechanism to decode into dchar if you specify `dchar` as the loop variable type. 3. arrays have a default value of an empty array. Reducing surprises when you substitute what seems like a "compatible" type is desirable, but not strictly necessary. I.e. I'm also OK if we don't try and add these definitions. For the empty array case, I think the mechanism is trivial to add as a formal requirement, since nearly all ranges default to empty, and the ones that don't are probably easy to change. Maybe this isn't the case? I don't know. I think reducing friction when it's easy to do is something we should always be looking at.
 The only tricky aspect is ranges that are references 
 (classes/pointers). Neither of those to me should be supported 
 IMO, you can always wrap such a thing in a range harness.
The main thing you lose by dropping support for reference-type ranges is interfaces. In particular, the interface inheritance hierarchy in `std.range.interfaces`, where `ForwardRange` inherits from `InputRange` and so on, cannot really be replicated using `structs` (`alias this` only goes so far).
As mentioned, you can wrap these interfaces into structs, which then have better lifetime tracking capabilities. -Steve
Mar 07 2024
next sibling parent "H. S. Teoh" <hsteoh qfbox.info> writes:
On Thu, Mar 07, 2024 at 06:32:50PM +0000, Steven Schveighoffer via
Digitalmars-d wrote:
[...]
 Sometimes, we are bitten by the fact that the array is the most common
 range, and behaves in a specific way. People depend on that mechanism
 without realizing it, and then sometime later, they decide to change
 the type to one that is very compatible with arrays, but offers some
 benefit (i.e. to remove an allocation). However, the new range type
 might behave in unexpected ways, *but still compiles*.
Over the past decade of working with D, this has repeatedly come up as the weakness of a signature-constraint based approach to ducktyping. The problem is that what the signature constraint requires (e.g., isInputRange) may only be a subset of what the function body assumes, and there is no way to check this mechanically (signature constraints are Turing-complete). What ideally should happen is that whatever the code assumes should also be declared in the function signature. So if a template takes a parameter t of generic type T, and then goes and performs t++, then the compiler should enforce that ++ is declared as part of the constraints on T. The C++ concepts approach is better in this respect, in that the compiler can typecheck the function body and emit an error if it tries to perform an operation on T that it didn't declare as part of its signature constraint. Barring implementing concepts in D, which is unlikely to happen, the next best alternative is for the compiler to enforce that any operation on T must have a corresponding check in the signature constraint, and when this is not the case, it should, based on the attempted operation, issue an error message with a suggested addition to the sig constraint that would satisfy this requirement. This doesn't fully solve the problem here, of course -- sometimes the difference is semantic rather than something easily checkable by the compiler, like subtle differences between built-in arrays and user-defined arrays; in that case you're still up the creek without a paddle. But it's a step forward. T -- The trouble with TCP jokes is that it's like hearing the same joke over and over.
Mar 07 2024
prev sibling parent Paul Backus <snarwin gmail.com> writes:
On Thursday, 7 March 2024 at 18:32:50 UTC, Steven Schveighoffer 
wrote:
 I meant the latter. A concrete range type.

 How does it happen? Consider that an array is a concrete type 
 that people use all the time. Why should any other range be 
 different? I've definitely stored ranges and other things as 
 type members that were voldemort types.
This doesn't answer my question. There are lots of ways in which (built-in) arrays are different from other types of ranges. Most notably, an array can be used as either a range or a container. There are plenty of use-cases for creating an empty container. What I am asking, specifically, is whether there is any use-case where *generic code*, given a range of some arbitrary type `R`, needs to create another range which both (a) has the exact concrete type `R`, and (b) is empty. Since that's the feature that's being proposed here. (Non-generic code does not need this feature to be part of the range API, because it can rely on specific features of whatever concrete type it's working with.)
 This ability is more of a question of "do we want to add this 
 feature to ranges or not?" The feature doesn't currently exist 
 -- you can't assume that an uninitialized range is empty.
If there are no use-cases for this feature, then the answer to "do we want to add it" ought to be "no." That's why I'm asking about use-cases.
 Sometimes, we are bitten by the fact that the array is the most 
 common range, and behaves in a specific way. People depend on 
 that mechanism without realizing it, and then sometime later, 
 they decide to change the type to one that is very compatible 
 with arrays, but offers some benefit (i.e. to remove an 
 allocation). However, the new range type might behave in 
 unexpected ways, *but still compiles*.
This is a general problem with templates/macros compared to typed generics. Even if we get rid of this particular edge case, there are still dozens more that users are going to run into if they only test with arrays. If we want to address this problem, I think the best thing we can do is to provide a standard suite of test ranges that users can plug into their code to uncover edge cases and bugs.
 The main thing you lose by dropping support for reference-type 
 ranges is interfaces. In particular, the interface inheritance 
 hierarchy in `std.range.interfaces`, where `ForwardRange` 
 inherits from `InputRange` and so on, cannot really be 
 replicated using `structs` (`alias this` only goes so far).
As mentioned, you can wrap these interfaces into structs, which then have better lifetime tracking capabilities.
How do you implement this with structs? interface ForwardAssignable : InputAssignable!E, ForwardRange!E Interfaces allow multiple inheritance. Structs can only have one `alias this` member. Maybe you're fine with giving up on this feature, but let's at least be honest that we *are* giving up features here.
Mar 07 2024
prev sibling next sibling parent Jonathan M Davis <newsgroup.d jmdavisprog.com> writes:
On Wednesday, March 6, 2024 10:32:17 AM MST H. S. Teoh via Digitalmars-d 
wrote:
 On Wed, Mar 06, 2024 at 04:47:02PM +0000, Steven Schveighoffer via
 Digitalmars-d wrote: [...]

 The only tricky aspect is ranges that are references
 (classes/pointers).  Neither of those to me should be supported IMO,
 you can always wrap such a thing in a range harness.
[...] Every time this topic comes up, class-based ranges become the whipping boy of range design woes. Actually, they serve an extremely important role: type erasure, which is critical when you have code like this: auto myRangeFunc(R,S)(R range1, S range2) { if (runtimeDecision()) { return range1; } else { return range2; } } This generally will not compile because R and S are different types, even if both conform to a common underlying range API, like an input range. To remedy this situation, class-based range wrappers come to the rescue: auto myRangeFunc(R,S)(R range1, S range2) { if (runtimeDecision()) { return inputRangeObject(range1); } else { return inputRangeObject(range2); } } Note that you can't use a struct wrapper here, because R and S have different ABIs; the only way to correctly forward range methods to R or S is to use overridden base class methods. IOW, the type erasure of R and S is unavoidable for this code to work. // Also, class-based ranges are sometimes necessary for practical reasons, like in this one program I had, that has an UFCS chain consisting of hundreds of components (not all written out explicitly in the same function, of course, but that's what the underlying instantiation looks like). It generated ridiculously large symbols that crashed the compiler. Eventually the bug I filed prodded Rainer to implement symbol compression in DMD. But even then, the symbols were still ridiculously huge -- because the outermost template instantiation had to encode the types of every one of the hundreds of components. As a result, symbol compression or not, compile times were still ridiculously slow because the compiler still had to copy those symbols around -- their length grew quadratically with every component added to the chain. Inserting a call to .inputRangeObject in the middle of the chain dramatically cut down the size of the resulting symbols, because it effectively erased all the types preceding it, resulting in much saner codegen afterwards.
My current plan is to retain std.range.interfaces in some form or another so that we can have ranges as classes as far as their implementation goes (at my job, we actually have to use classes for ranges, because we have to be able to wrap arbitrary ranges at runtime, since we're dealing with an intepreter). However, they would not count as ranges by any range-based algorithms, because they would be classes, and so they would have to be wrapped in structs to actually be used as ranges. The wrapper structs would then do all of the work to ensure that the proper copying behavior was implemented. So, you'd do something like auto r = inputRangeObject(range1); and get a struct range wrapping a class from the replacement for std.range.interfaces. And if we need the ability to get at the underlying class for something, it's as simple as providing a member to the struct which returns it - e.g. source, like we do with several other range types. Then you could do something like auto r = inputRangeObject(range1); InputRange cls = r.source; Of course, you lose access to the class as soon as it's wrapped in another range type, but that happens right now when you pass ranges around which are classes. So, that shouldn't really change much from what I can see. It's basically just wrapping classes in structs so that the struct does the behavior that we currently do with save (and it could do it more intelligently by taking advantage of reference counting rather than dup-ing the class whenever the range is copied). So, unless I'm missing something here, we _should_ be able to just wrap classes in structs to allow ranges which are implemented as classes even if the classes themselves can't be ranges any longer. The net result should be that dealing with forward ranges is simpler, and we stop having save-related bugs, but we shouldn't actually lose any functionality. - Jonathan M Davis
Mar 06 2024
prev sibling parent "H. S. Teoh" <hsteoh qfbox.info> writes:
On Wed, Mar 06, 2024 at 01:03:35PM -0700, Jonathan M Davis via Digitalmars-d
wrote:
[...]
 My current plan is to retain std.range.interfaces in some form or
 another so that we can have ranges as classes as far as their
 implementation goes (at my job, we actually have to use classes for
 ranges, because we have to be able to wrap arbitrary ranges at
 runtime, since we're dealing with an intepreter). However, they would
 not count as ranges by any range-based algorithms, because they would
 be classes, and so they would have to be wrapped in structs to
 actually be used as ranges. The wrapper structs would then do all of
 the work to ensure that the proper copying behavior was implemented.
[...]
 So, unless I'm missing something here, we _should_ be able to just
 wrap classes in structs to allow ranges which are implemented as
 classes even if the classes themselves can't be ranges any longer. The
 net result should be that dealing with forward ranges is simpler, and
 we stop having save-related bugs, but we shouldn't actually lose any
 functionality.
[...] This actually sounds like a reasonable approach. T -- Applying computer technology is simply finding the right wrench to pound in the correct screw.
Mar 06 2024
prev sibling parent Jonathan M Davis <newsgroup.d jmdavisprog.com> writes:
On Wednesday, March 6, 2024 7:18:50 AM MST Paul Backus via Digitalmars-d 
wrote:
 On Monday, 4 March 2024 at 21:29:40 UTC, Jonathan M Davis wrote:
 2. The range API provides no way (other than fully iterating
 through a range) to get an empty range of the same type from a
 range unless the range is a random-access range.
Genuine question: what are the use-cases for this? In general, the capabilities of ranges are designed to serve the needs of algorithms. Input ranges exist because single-pass iteration is all that's needed to implement algorithms like map, filter, and reduce. Random-access ranges exist because they're needed for sorting algorithms. And so on. I'm not aware of any algorithm, or class of algorithm, that needs this specific capability you're describing. If such algorithms exist, we should be using them to guide our design here. If they don't...then maybe this isn't really a problem at all.
There are times where I've had to make a range empty in algorithms in order to indicate that it's done or force it to be done. This is particularly true with some algorithms I've written where the range was passed by ref. IIRC, I tried to add something to std.range for this years ago (either that, or I tried to make takeNone do it; it's been a while) using the init value, but Andrei nixed it on the grounds that we couldn't rely on it. The result is that while we have takeNone, you can't use it to actually make a range empty. You can just use it to get an empty range of a potentially different type. So, while this is not something that I've seen come up frequently, this is definitely something that I've seen come up in my own code. And in general, if we're able to require that the init value be empty for finite ranges, that will reduce the number of bugs surrounding member variables and the init value, since it's fairly common to end up using the init value of a range when you have to store it as a member variable (even if it was a Voldemort type to begin with), and the natural thing to do at that point is to assume that the init value is empty (even though in reality, for some range types right now, it isn't even in a valid). But yes, the desire to have a way to force a range to be empty is something that's come from actual experience. - Jonathan M Davis
Mar 06 2024