digitalmars.D - Range Redesign: Empty Ranges

Jonathan M Davis (172/172) Mar 04 2024 Okay. A few weeks ago, I made a post about potential design decisions wh...

Sebastiaan Koppe (7/10) Mar 04 2024 Makes a lot of sense, but I agree it will be very annoying for

Jonathan M Davis (69/79) Mar 04 2024 You mean define a static function named init? I think that most of us ag...

Dom DiSc (8/20) Mar 05 2024 Are there infinite ranges that are indeed not forward ranges?

Alexandru Ermicioi (2/3) Mar 05 2024 /dev/random could be such a range.

Adam Wilson (13/16) Mar 05 2024 In the case of /dev/random I don't really see how it's a true

H. S. Teoh (13/24) Mar 05 2024 Nothing about an infinite range requires that it be infinite both ways,

Adam Wilson (13/20) Mar 05 2024 That is true in theory, but in practice if you try it on real

Arafel (2/6) Mar 06 2024 Or a range that returns captured data from a sensor.

Alexandru Ermicioi (8/14) Mar 06 2024 Nice example,
Dom DiSc (6/12) Mar 06 2024 If you have a source of real entropy, ok. But

Arafel (6/10) Mar 06 2024 While technically true, you might not have access to the seed, so in

Atila Neves (22/27) Mar 06 2024 I like T.init being empty. I thought I'd found a clever way to

Ogi (8/19) Mar 06 2024 I’m surprised that the equivalent C++ code *doesn’t* crash (at

Jonathan M Davis (10/35) Mar 06 2024 The issue is whether the function is virtual. In D, class member functio...

Ogi (18/27) Mar 07 2024 Virtual or not, GCC assumes that `this` cannot be null:

Paul Backus (15/27) Mar 06 2024 It crashes because it's attempting to access Class's vtable. If

Atila Neves (5/34) Mar 07 2024 Of course! I write classes so seldom I forget my own rule to add

Jonathan M Davis (4/8) Mar 07 2024 The solution that I'm currently planning to go with is to wrap them:

Paul Backus (11/14) Mar 06 2024 Genuine question: what are the use-cases for this?

Steven Schveighoffer (28/43) Mar 06 2024 When you need to pass a specific range type, and you want to pass

H. S. Teoh (48/51) Mar 06 2024 [...]

Paul Backus (4/21) Mar 06 2024 This is what std.range.choose [1] is for. Internally, it's

H. S. Teoh (18/43) Mar 06 2024 Haha, I knew that one would be brought up. Unfortunately, it only works

Dukc (35/38) Mar 06 2024 It's not even about those ranges being constructible. It's about

Jonathan M Davis (16/40) Mar 06 2024 This is a fundamental problem with classes in general. Attributes just d...

Dukc (17/25) Mar 06 2024 It's not a problem with classes, it's a decision one has to make

Jonathan M Davis (16/41) Mar 06 2024 Yes, it's a problem with non-templated code in general, but because you

H. S. Teoh (19/29) Mar 06 2024 Well yes, so it's very limited in what it can do. In the cases where you

Meta (13/18) Mar 06 2024 This would be a complete non-issue if we had language-level

H. S. Teoh (22/41) Mar 06 2024 No, it doesn't work the way you think it would. The returned range must

Meta (6/50) Mar 06 2024 Ah, you're right. There is really no way to do this in a

Jonathan M Davis (16/19) Mar 06 2024 RefRange was a huge mistake (and it was my mistake). It's basically tryi...

Meta (13/36) Mar 06 2024 I don't mean the type RefRange; I mean expanding the range

Steven Schveighoffer (5/16) Mar 07 2024 As Jonathan already responded, you can have a class range by

Paul Backus (18/43) Mar 06 2024 By "specific range type", do you mean a specific category of

Richard (Rikki) Andrew Cattermole (4/13) Mar 06 2024 All of this would ideally be solved with the introduction of signatures.
Steven Schveighoffer (33/75) Mar 07 2024 I meant the latter. A concrete range type.

H. S. Teoh (29/35) Mar 07 2024 Over the past decade of working with D, this has repeatedly come up as
Paul Backus (32/54) Mar 07 2024 This doesn't answer my question.

Jonathan M Davis (33/81) Mar 06 2024 My current plan is to retain std.range.interfaces in some form or anothe...
H. S. Teoh (8/23) Mar 06 2024 [...]

Jonathan M Davis (23/37) Mar 06 2024 There are times where I've had to make a range empty in algorithms in or...

Jonathan M Davis <newsgroup.d jmdavisprog.com> writes:

Okay. A few weeks ago, I made a post about potential design decisions when
updating the range API in the next version of Phobos, and this is a
continuation of that, albeit with a different set of problems. Specifically:

1. The range API does not currently - and cannot currently - require or
guarantee that the init value of a range is valid.

2. The range API provides no way (other than fully iterating through a
range) to get an empty range of the same type from a range unless the range
is a random-access range.

Fixing the first issue is straightforward, I think, though it potentially
affects the second, and of course, someone here may have thoughts or
insights which I've missed.

The issue with init is that of course code wants to use the init value, and
code often _does_ use the init value for specific ranges, but doing so when
that specific range type does not specify that the init value is valid (and
documentation almost never does) is relying on unspecified behavior - and in
the general case, using the init value of a range cannot be specified
behavior. The obvious case is ranges which are classes, because their init
value is going to be null, meaning that using the init value of a range
could result in a segfault. Of course, _most_ range-based code doesn't use
classes for ranges, so I expect that it's quite rare to hit that problem in
practice, but most range-based code uses Voldemort types and tends to ignore
the fact that the init value can still be used (either by using typeof on
the result to get its type or by getting init directly from the result -
e.g. range.init). And you actually need to do that when storing a range as a
member variable. E.G.

struct Foo
{
    ...
    auto _range = typeof(chain(func1(), func2(), func3())).init;
    ...
}

The more correct thing to do would be something like

struct Foo
{
    ...
    auto _range = Nullable!(typeof(chain(func1(), func2(), func3())));
    ...
}

so that you can avoid using the range's init value, but realistically, I
don't think that it's something that many folks think of. They use the
range's init value, and it works (since it often does work), and then they
end up relying on how that range type happens to work at the moment, which
could change in the future, since the range API does not guarantee that it
works, and the folks maintaining that range probably aren't thinking much
about the init value anyway - especially when it's a Voldemort type. The
typical expectation is that the user will be creating the range by calling
the function that returns it, not by using type introspection to get their
hands on the init value. But it is always possible to get the init value,
and code declaring member variables from ranges will use it unless it goes
to the extra effort of using something like Nullable so that it's possible
to store the range without using its init value as the default value.

Now, I think that the fix for this particular problem is pretty easy. We
just require that all ranges have a valid init value. This means disallowing
classes or pointers being used directly as ranges, but we were already
looking at doing that for forward ranges anyway in order to be able to get
rid of save. It's arguably more problematic for basic input ranges, but it's
quite feasible there as well. The previous thread discussed requiring that
they be reference types, which would lean towards them being pointers or
classes, but you can have structs which are reference types (e.g. they
contain a class reference but then have the logic around it to ensure that
it's not used when null). Also, it was proposed in that thread that we
require that basic input ranges be non-copyable, which would also make it
straightforward to require that the init value be valid, since in that case,
all basic input ranges would have to be structs regardless of what their
internals were doing with pointers or references. Either way, we can
disallow classes and pointers being used directly as ranges and require that
all ranges have a valid init value.

So, I'm proposing that we require that all ranges have a valid init value. I
suspect that the issue will still often be forgotten, particularly since
ranges are often Voldemort types, but if the range API explicitly requires
it, and the range documentation is clear on that point, then we can clearly
consider it a bug when a range has an invalid init value, and in most cases,
I wouldn't expect it to be a big deal to fix.

The second issue is then the ability to get an empty range from a range type
in O(1) which is the same type as the original range. In my experience, it's
sometimes the case that you really want to be able to do something like

range = Range.emptyRange;

or

range.makeEmpty();

in order to make a range empty without eating up CPU cycles calling popFront
until empty is true. This is possible with a random-access range by slicing
it with a zero length, but it's not possible with lower classes of ranges.
std.range.takeNone tries to do it, but it is forced to give you a new range
type if the range isn't a random-access range - specifically, it uses
takeExactly, but regardless, that doesn't work in cases where you need to
keep the range type the same but make it empty (e.g. because you're dealing
with a ref parameter). And right now, that means calling popFront over and
over again.

Of course, if you're dealing with an infinite range, then there can't be a
way to make it empty, but IMHO, we should have a way to get an empty range
from an arbitrary finite range type where the empty range is of the same
type. And for that, I see three possibly options:

1. Make it so that for finite ranges, the init value of a range is required
to not just be valid, but it's also required to be empty.

2. We add a static / enum member to the range API called something like
emptyRange where you can get an empty range from that range type where the
range is that range type.

3. Add a new member function to the range API called something like
makeEmpty which makes that particular instance empty.


functions, and in general, the init value probably should be empty. However,
my concern there is that there may be ranges where that's going to constrain
the implementation in annoying ways. All finite ranges have to have an empty
state, because you'll get to empty if you call popFront enough times, so it
should be possible for all ranges to have their init value be empty, but I
could also see it being the case where requiring that would mean that the
implementer of the range would have to jump through some annoying hoops to
make it work - though in most cases, if they have to jump through hoops, I
would expect that they'd have to jump through most (or all) of those hoops
to make the init value valid whether it's empty or not. But I would like
feedback on how reasonable folks think it would be to require that a range's
init value be empty.


get an empty range without having to create one with elements first, whereas
if we went with makeEmpty, then you'd need to construct a range first just
to make it empty. That being said, implementing emptyRange (or whatever we'd
call it) wouldn't be all that different from making the init value empty.
The main difference I see is that emptyRange wouldn't have to be statically
known, which might be useful in some cases, though the kinds of cases that I
can think of tend to be ones where making the init value valid would require
additional logic, in which case, in going to the effort of making the init
value valid, you might as well make it empty anyway.

I expect that for some ranges, makeEmpty would be easier to implement, and
it would probably also work better if we decide to make basic input ranges
non-copyable, since mutating a non-copyable range would be more
straightforward than moving an empty one on top of it. But I would think
that we could make the latter work, and in the general case, being able to
have an empty range for any finite range in O(1) would be nice to have
without having to create a range just to make it empty.

Either awy, one advantage to having emptyRange / makeEmpty over using the
init value as empty would be that while adding a new function would be
annoying, it would force anyone implementing a range to take into account
the fact that they need to provide a way to get an empty range. In many
cases, it may just be

    alias emptyRange = typeof(this).init;

but range implementers would be forced to think about it, whereas if we just
made it so that the documentation said that they had to make the init value
empty, they'd be less likely to notice. But it's still extra stuff that has
to implemented which would be completely unnecessary with ranges that would
have empty init value anyway.

A related issue is that if we're removing save from forward ranges, and we
keep the rest of the functions the same, some of the old basic input ranges
then look like the new forward ranges (since they wouldn't have save, and
forward ranges wouldn't require it), which would be bad, because they
wouldn't have the correct semantics. To fix that, we either need to add
something to the range API (be it a whole new function like emptyRange or
something like an enum that flagged a type as being a forward range), or
we'd have to change the existing functions (be it something as simple as
changing empty to isEmpty or something more complicated like switching from
front/popFront to head/tail, which would also allow us to get around the
tail-const problem but would be a pretty massive change to how ranges work).
Adding emptyRange or makeEmpty _almost_ fixes that problem, but since
infinite ranges wouldn't have have it (since they couldn't implement it), it
wouldn't actually fix the problem. Some old basic input ranges which are
infinite would still pass the new isForwardRange.

So, if it weren't for the infinite range issue, I'd see the fact that it
would change the API enough to distingush between the old and new APIs as a
good reason to go with emptyRange over requiring that the init value be
empty, but since it doesn't fix the issue with infinite ranges, I don't
think that it makes sense to go with emptyRange / makeEmpty just to
differentiate between the old and new range APIs (it would make it more
obvious to programmers using these ranges, which could be valuable - but
then again, having isEmpty instead of empty would do the same).

So, hopefully that makes the situation with those two issues clear. The
question then is what insights you folks might have on them and the proposed
solutions. It may even be that it's way too onerous to require that the init
value be valid, and someone has a use case to show that, though I very much
hope that that is not the case. I expect though that the bigger question is
whether init should be required to be empty for finite ranges or whether it
would be better to add a new function for that.

Thoughts?

- Jonathan M Davis

Mar 04 2024

Sebastiaan Koppe <mail skoppe.eu> writes:

On Monday, 4 March 2024 at 21:29:40 UTC, Jonathan M Davis wrote:
 1. Make it so that for finite ranges, the init value of a range 
 is required to not just be valid, but it's also required to be 
 empty.

Makes a lot of sense, but I agree it will be very annoying for 
some ranges.

I suspect it will result in extra state and/or checks for those 
cases.

Would it be too crazy of an idea for those ranges to implement a 
static init function?

Mar 04 2024

Jonathan M Davis <newsgroup.d jmdavisprog.com> writes:

On Monday, March 4, 2024 2:58:08 PM MST Sebastiaan Koppe via Digitalmars-d 
wrote:
 On Monday, 4 March 2024 at 21:29:40 UTC, Jonathan M Davis wrote:
 1. Make it so that for finite ranges, the init value of a range
 is required to not just be valid, but it's also required to be
 empty.

 Makes a lot of sense, but I agree it will be very annoying for
 some ranges.

 I suspect it will result in extra state and/or checks for those
 cases.

 Would it be too crazy of an idea for those ranges to implement a
 static init function?

You mean define a static function named init? I think that most of us agree
at this point that it was a mistake to allow init to be anything other than
the compiler-defined init, and as I understand it, it doesn't really work to
redefine it in the way that Walter intended. Realistically, code tends to
depend on the init value being the compiler-defined one, and trying to do
anything else is asking for trouble. A redefined init value doesn't actually
get used with default-initialization, so with a case like

struct Foo
{
    ...
    typeof(chain(func1(), func2(), func3())) _range;
    ...
}

any init that you declare will be ignored (even if it's a value and not a
function), meaning that unless that range type has a valid init value, that
variable won't work properly unless it happens to be assigned a value before
it's used.

We could of course add a function / enum to the range API which would be
required to be used instead of the init value when you're looking for some
kind of initialization, but that doesn't solve the problem of
default-initialized values. For that, it really needs the init to be valid.

Of course, we _could_ allow ranges to  disable default initialization, and
then the type could define init with a static function which would be used
when init was used explicitly, but that would potentially cause problems if
init were used explicitly at compile time. It's also on the list of things
that I half expect to be disallowed at some point with a future edition,
because allowing init to be anything other than the default one tends to
cause issues. So, we _might_ be able to make it work, but my gut reaction is
that we're playing with fire, and we'll be better off overall if we can just
require that the compiler-defined init value be valid.

Still, if we allowed it, at least it should result in compile time errors in
generic code rather than weird runtime behavior. But it would mean that
generic code couldn't rely on being able to default-initialize ranges, and
it would have to give all ranges a value at runtime, which would be
annoying, and it realistically wouldn't happen outside of something like
Phobos (and it would likely only happen in Phobos if that specific case was
tested for by Phobos functions in general). Code in general expects default
initialization to work, and it's usually only code specifically written to
work with types that don't allow it where it actually works to have a type
that  disables it.

So, I don't know. I'd really rather not play games with trying to redefine
the init value. I also suspect that most of the annoyance that ranges might
have with the init value would be there simply by requiring that the init
value be valid without requiring that it be empty, in which case, requiring
that it be empty wouldn't be onerous. It's requiring that it be valid which
would potentially be onerous. But I'm also not sure how much of an issue
that that would be in practice. For most ranges, I expect that it will be
pretty simple, and many of them already have a valid, empty init value. In
the vast majority of cases, I expect that if a range's init value isn't
valid, it's because the person who wrote it didn't think about the init
value being used (since it's probably a Voldemort type) and didn't test that
case.

And if the issue of it being problematic to make the init value valid isn't
common, I'm disinclined to make range-based code in general have to worry
about it - especially if it's still possible to make such ranges work (even
if it's more annoying than would be desirable). I can also think of several
ways to make it easy right off the top of my head, though you wouldn't
necessarily want the additional overhead (e.g. any range could just use is
to compare itself to its init value in empty, or it could have a bool to
indicate whether it had been initialized via a constructor and check that;
it would be annoying but easy).

But I'm also not entirely sure how onerous it will be in practice to make
the init values of ranges valid (be it empty or otherwise) - though it's
actually _really_ easy in the cases where it would segfault now, since if
such types then have a wrapper struct, it can easily check whether the
pointer or reference is null and return true for empty.

- Jonathan M Davis

Mar 04 2024

Dom DiSc <dominikus scherkl.de> writes:

On Monday, 4 March 2024 at 21:29:40 UTC, Jonathan M Davis wrote:
 Of course, if you're dealing with an infinite range, then there 
 can't be a way to make it empty, but IMHO, we should have a way 
 to get an empty range from an arbitrary finite range type where 
 the empty range is of the same type. And for that, I see three 
 possibly options:

 1. Make it so that for finite ranges, the init value of a range 
 is required to not just be valid, but it's also required to be 
 empty.

I think this is the best solution.

 A related issue is that if we're removing save from forward 
 ranges, [...] Adding emptyRange or makeEmpty [...] wouldn't 
 actually fix the problem. Some old basic input ranges which are 
 infinite would still pass the new isForwardRange.

Are there infinite ranges that are indeed not forward ranges?
I mean - they produce their output somehow algorithmically.
Should be possible to safe the state of such an algorithm, no?
Ok, maybe someone has not implemented it, but as we are requiring 
things anyway, why not require an infinite range to implement the 
copy function?

Mar 05 2024

Alexandru Ermicioi <alexandru.ermicioi gmail.com> writes:

On Tuesday, 5 March 2024 at 14:39:17 UTC, Dom DiSc wrote:
 Are there infinite ranges that are indeed not forward ranges?

/dev/random could be such a range.

Mar 05 2024

Adam Wilson <flyboynw gmail.com> writes:

On Tuesday, 5 March 2024 at 16:41:34 UTC, Alexandru Ermicioi 
wrote:
 On Tuesday, 5 March 2024 at 14:39:17 UTC, Dom DiSc wrote:
 Are there infinite ranges that are indeed not forward ranges?

 /dev/random could be such a range.

In the case of /dev/random I don't really see how it's a true 
infinity, you can't go infinitely backwards. It is an infinitely 
forward moving stream, so you can have the concept of "zero" and 
"positive infinity", but there is no concept of "negative 
infinity". A negative number is utter non-sense for TRNG's, as 
whatever negative number you specify would become the new "zero".

Practically there is a limit to how much entropy you can grab off 
the the chip in single chunk, so it is a bounded range, it's just 
that the boundary is somewhat fuzzy.

In short you would probably end up implementing an entropy source 
as a forward range with some large upper limit.

Mar 05 2024

"H. S. Teoh" <hsteoh qfbox.info> writes:

On Wed, Mar 06, 2024 at 12:11:10AM +0000, Adam Wilson via Digitalmars-d wrote:
 On Tuesday, 5 March 2024 at 16:41:34 UTC, Alexandru Ermicioi wrote:
 On Tuesday, 5 March 2024 at 14:39:17 UTC, Dom DiSc wrote:
 Are there infinite ranges that are indeed not forward ranges?

 
 /dev/random could be such a range.

 
 In the case of /dev/random I don't really see how it's a true
 infinity, you can't go infinitely backwards.

Nothing about an infinite range requires that it be infinite both ways,
unless it's a bidirectional range, which an RNG is not.


[...]
 Practically there is a limit to how much entropy you can grab off the
 the chip in single chunk, so it is a bounded range, it's just that the
 boundary is somewhat fuzzy.

[...]

A periodically-reseeded RNG is indeed a practically infinite range, with
no cycling.  You don't have to grab every value from the hardware
entropy source; it suffices to use a cryptographic hash function on a
counter that's periodically reseeded from the hardware entropy. It can
literally generate an endless stream of random numbers.


T

-- 
Tell me and I forget. Teach me and I remember. Involve me and I understand. --
Benjamin Franklin

Mar 05 2024

Adam Wilson <flyboynw gmail.com> writes:

On Wednesday, 6 March 2024 at 00:23:26 UTC, H. S. Teoh wrote:
 A periodically-reseeded RNG is indeed a practically infinite 
 range, with no cycling.  You don't have to grab every value 
 from the hardware entropy source; it suffices to use a 
 cryptographic hash function on a counter that's periodically 
 reseeded from the hardware entropy. It can literally generate 
 an endless stream of random numbers.


 T

That is true in theory, but in practice if you try it on real 
hardware, not only will you pay some pretty serious performance 
penalties as the CPU tries to dump all that entropy, it will be 
dumping it to memory, of which you will eventually run out. So 
yes, it's theoretically unlimited, but in practice, there is no 
valid reason to actually implement it that way, and to-date, no 
modern Operating System entropy source allows you to. For 
example, on Windows, you'll be passing a fixed size buffer to the 
entropy source. Same with OpenSSL.

We do not design code for what is theoretically possible, only 
that which can actually be achieved. I know, because I wrote a 
Crypto library for D that specifically deals with this.

Mar 05 2024

Arafel <er.krali gmail.com> writes:

On 05.03.24 17:41, Alexandru Ermicioi wrote:
 On Tuesday, 5 March 2024 at 14:39:17 UTC, Dom DiSc wrote:
 Are there infinite ranges that are indeed not forward ranges?

 
 /dev/random could be such a range.

Or a range that returns captured data from a sensor.

Mar 06 2024

Alexandru Ermicioi <alexandru.ermicioi gmail.com> writes:

On Wednesday, 6 March 2024 at 08:56:02 UTC, Arafel wrote:
 On 05.03.24 17:41, Alexandru Ermicioi wrote:
 On Tuesday, 5 March 2024 at 14:39:17 UTC, Dom DiSc wrote:
 Are there infinite ranges that are indeed not forward ranges?

 
 /dev/random could be such a range.

 Or a range that returns captured data from a sensor.

Nice example,

Basically, any global resource (sensor, /dev/random, etc.), that 
can be representable as a range, and doesn't have a definitive 
end, is an infinite input range, since you can't save it at a 
specific point and then continue multiple times from that point.

Best regards,
Alexandru.

Mar 06 2024

Dom DiSc <dominikus scherkl.de> writes:

On Wednesday, 6 March 2024 at 08:56:02 UTC, Arafel wrote:
 On 05.03.24 17:41, Alexandru Ermicioi wrote:
 On Tuesday, 5 March 2024 at 14:39:17 UTC, Dom DiSc wrote:
 Are there infinite ranges that are indeed not forward ranges?

 
 /dev/random could be such a range.


If you have a source of real entropy, ok. But 
pseudo-random-number-generators do have a seed and with same seed 
produces always the same numbers.

 Or a range that returns captured data from a sensor.

Ok, that's a more convincing example. This is really hard to 
copy. Requires perfect knowlegde in a deterministic universe :-)

Mar 06 2024

Arafel <er.krali gmail.com> writes:

On 06.03.24 11:56, Dom DiSc wrote:
 
 If you have a source of real entropy, ok. But 
 pseudo-random-number-generators do have a seed and with same seed 
 produces always the same numbers.


While technically true, you might not have access to the seed, so in 
practical terms it's a bit of distinction without a difference.

For instance, in `/dev/[u]random` the seed is kept internally by the OS, 
and I don't think you have access to it.... and even if you had, you 
wouldn't want to re-implement the kernel's PRNG, right?

Mar 06 2024

Atila Neves <atila.neves gmail.com> writes:

On Monday, 4 March 2024 at 21:29:40 UTC, Jonathan M Davis wrote:
 Okay. A few weeks ago, I made a post about potential design 
 decisions when updating the range API in the next version of 
 Phobos, and this is a continuation of that, albeit with a 
 different set of problems. Specifically:

 [...]

I like T.init being empty. I thought I'd found a clever way to 
make this work for classes but apparently this crashes, much to 
my surprise:

```d
class Class {
     bool empty()  safe  nogc pure nothrow scope const {
         return this is null;
     }
}

Class c;
assert(c.empty);
```

This, although supposedly equivalent, works fine:

```d
class Class {}
bool empty(in Class c)  safe  nogc pure nothow {
     return c is null;
}
```

Which might be a way out? I still think the first example should 
work.

Mar 06 2024

Ogi <ogion.art gmail.com> writes:

On Wednesday, 6 March 2024 at 12:20:40 UTC, Atila Neves wrote:
I thought I'd found a clever way to
 make this work for classes but apparently this crashes, much to 
 my surprise:

 ```d
 class Class {
     bool empty()  safe  nogc pure nothrow scope const {
         return this is null;
     }
 }

 Class c;
 assert(c.empty);
 ```

I’m surprised that the equivalent C++ code *doesn’t* crash (at 
least on my machine):
```
Class* c = nullptr;
assert(c->empty());

That’s still UB though.

Mar 06 2024

Jonathan M Davis <newsgroup.d jmdavisprog.com> writes:

On Wednesday, March 6, 2024 6:09:52 AM MST Ogi via Digitalmars-d wrote:
 On Wednesday, 6 March 2024 at 12:20:40 UTC, Atila Neves wrote:
 I thought I'd found a clever way to

 make this work for classes but apparently this crashes, much to
 my surprise:

 ```d
 class Class {

     bool empty()  safe  nogc pure nothrow scope const {

         return this is null;

     }

 }

 Class c;
 assert(c.empty);
 ```

 I’m surprised that the equivalent C++ code *doesn’t* crash (at
 least on my machine):
 ```
 Class* c = nullptr;
 assert(c->empty());

 That’s still UB though.

The issue is whether the function is virtual. In D, class member functions
are virtual by default, so you get a crash when the program attempts to use
the vtable to look the function up (and therefore attempts to dereference
null), whereas in C++, you don't, because class member functions are
non-virtual by default, and so it doesn't do anything with the vtable. If
you mark the function as final in D (without it overriding anything), then
there shouldn't be a crash, and if you mark the C++ function as virtual,
then there will be.

- Jonathan M Davis

Mar 06 2024

Ogi <ogion.art gmail.com> writes:

On Wednesday, 6 March 2024 at 19:03:39 UTC, Jonathan M Davis 
wrote:
 The issue is whether the function is virtual. In D, class 
 member functions are virtual by default, so you get a crash 
 when the program attempts to use the vtable to look the 
 function up (and therefore attempts to dereference null), 
 whereas in C++, you don't, because class member functions are 
 non-virtual by default, and so it doesn't do anything with the 
 vtable. If you mark the function as final in D (without it 
 overriding anything), then there shouldn't be a crash, and if 
 you mark the C++ function as virtual, then there will be.

Virtual or not, GCC assumes that `this` cannot be null:

```C++
#include <cassert>

class C {
public:
     bool empty() const {
          return this == nullptr; // warning: `nonnull` argument 
`this` compared to NULL
     }
};

int main()
{
     assert(((C*)nullptr)->empty()); // warning: `this` pointer is 
null
}
```

Mar 07 2024

Paul Backus <snarwin gmail.com> writes:

On Wednesday, 6 March 2024 at 12:20:40 UTC, Atila Neves wrote:
 I like T.init being empty. I thought I'd found a clever way to 
 make this work for classes but apparently this crashes, much to 
 my surprise:

 ```d
 class Class {
     bool empty()  safe  nogc pure nothrow scope const {
         return this is null;
     }
 }

 Class c;
 assert(c.empty);
 ```

It crashes because it's attempting to access Class's vtable. If 
you make the method final, it works:

```d
class Class {
     final bool empty() {
         return this is null;
     }
}

void main()
{
     Class c;
     assert(c.empty); // ok
}
```

Mar 06 2024

Atila Neves <atila.neves gmail.com> writes:

On Wednesday, 6 March 2024 at 13:26:14 UTC, Paul Backus wrote:
 On Wednesday, 6 March 2024 at 12:20:40 UTC, Atila Neves wrote:
 I like T.init being empty. I thought I'd found a clever way to 
 make this work for classes but apparently this crashes, much 
 to my surprise:

 ```d
 class Class {
     bool empty()  safe  nogc pure nothrow scope const {
         return this is null;
     }
 }

 Class c;
 assert(c.empty);
 ```

 It crashes because it's attempting to access Class's vtable. If 
 you make the method final, it works:

 ```d
 class Class {
     final bool empty() {
         return this is null;
     }
 }

 void main()
 {
     Class c;
     assert(c.empty); // ok
 }
 ```

Of course! I write classes so seldom I forget my own rule to add 
`final` unless I can't.

That would severely restrict class ranges though, since 
presumably the whole point is to have dynamic dispatch.

Mar 07 2024

Jonathan M Davis <newsgroup.d jmdavisprog.com> writes:

On Thursday, March 7, 2024 2:11:29 AM MST Atila Neves via Digitalmars-d wrote:
 Of course! I write classes so seldom I forget my own rule to add
 `final` unless I can't.

 That would severely restrict class ranges though, since
 presumably the whole point is to have dynamic dispatch.

The solution that I'm currently planning to go with is to wrap them:

https://forum.dlang.org/post/mailman.1091.1709755456.3719.digitalmars-d puremagic.com

- Jonathan M Davis

Mar 07 2024

Paul Backus <snarwin gmail.com> writes:

On Monday, 4 March 2024 at 21:29:40 UTC, Jonathan M Davis wrote:
 2. The range API provides no way (other than fully iterating 
 through a range) to get an empty range of the same type from a 
 range unless the range is a random-access range.

Genuine question: what are the use-cases for this?

In general, the capabilities of ranges are designed to serve the 
needs of algorithms. Input ranges exist because single-pass 
iteration is all that's needed to implement algorithms like map, 
filter, and reduce. Random-access ranges exist because they're 
needed for sorting algorithms. And so on.

I'm not aware of any algorithm, or class of algorithm, that needs 
this specific capability you're describing. If such algorithms 
exist, we should be using them to guide our design here. If they 
don't...then maybe this isn't really a problem at all.

Mar 06 2024

Steven Schveighoffer <schveiguy gmail.com> writes:

On Wednesday, 6 March 2024 at 14:18:50 UTC, Paul Backus wrote:
 On Monday, 4 March 2024 at 21:29:40 UTC, Jonathan M Davis wrote:
 2. The range API provides no way (other than fully iterating 
 through a range) to get an empty range of the same type from a 
 range unless the range is a random-access range.

 Genuine question: what are the use-cases for this?

 In general, the capabilities of ranges are designed to serve 
 the needs of algorithms. Input ranges exist because single-pass 
 iteration is all that's needed to implement algorithms like 
 map, filter, and reduce. Random-access ranges exist because 
 they're needed for sorting algorithms. And so on.

 I'm not aware of any algorithm, or class of algorithm, that 
 needs this specific capability you're describing. If such 
 algorithms exist, we should be using them to guide our design 
 here. If they don't...then maybe this isn't really a problem at 
 all.

When you need to pass a specific range type, and you want to pass 
in an empty range of that type, how do you do it? There is no 
formal definition for "create an empty range T". It's not just 
algorithms, it's regular functions or types with specific 
parameter requirements.

What is a very logical mechanism is just to pass the `init` 
value, because 99% of the time, that is the same thing as an 
empty range. Except when it isn't. But if you aren't testing for 
that, then you don't realize it. Or if code changes slightly such 
that now you involve a range you didn't before which doesn't have 
that property, then all of a sudden code breaks.

We can add a new requirement that ranges that can be initialized 
as empty have some `emptyRange` static member or something like 
that. But I expect this is going to go down the same route as 
`save`, where really you should be calling `save` every time you 
copy a forward range, but in practice nobody does, since regular 
copying is 99% of the time the same thing. Likewise people will 
pass `.init` instead of `.emptyRange` because it's almost always 
the same thing. This is why I think it should just be formally 
stated that the `init` value of a non-infinite range should be 
empty, bringing into standard what people *already do*.

The only tricky aspect is ranges that are references 
(classes/pointers). Neither of those to me should be supported 
IMO, you can always wrap such a thing in a range harness.

But if people insist that we must have class ranges, I'd say an 
`emptyRange` property is in order.

-Steve

Mar 06 2024

"H. S. Teoh" <hsteoh qfbox.info> writes:

On Wed, Mar 06, 2024 at 04:47:02PM +0000, Steven Schveighoffer via
Digitalmars-d wrote:
[...]
 The only tricky aspect is ranges that are references
 (classes/pointers).  Neither of those to me should be supported IMO,
 you can always wrap such a thing in a range harness.

[...]

Every time this topic comes up, class-based ranges become the whipping
boy of range design woes.  Actually, they serve an extremely important
role: type erasure, which is critical when you have code like this:

	auto myRangeFunc(R,S)(R range1, S range2) {
		if (runtimeDecision()) {
			return range1;
		} else {
			return range2;
		}
	}

This generally will not compile because R and S are different types,
even if both conform to a common underlying range API, like an input
range. To remedy this situation, class-based range wrappers come to the
rescue:

	auto myRangeFunc(R,S)(R range1, S range2) {
		if (runtimeDecision()) {
			return inputRangeObject(range1);
		} else {
			return inputRangeObject(range2);
		}
	}

Note that you can't use a struct wrapper here, because R and S have
different ABIs; the only way to correctly forward range methods to R or
S is to use overridden base class methods. IOW, the type erasure of R
and S is unavoidable for this code to work.

//

Also, class-based ranges are sometimes necessary for practical reasons,
like in this one program I had, that has an UFCS chain consisting of
hundreds of components (not all written out explicitly in the same
function, of course, but that's what the underlying instantiation looks
like).  It generated ridiculously large symbols that crashed the
compiler. Eventually the bug I filed prodded Rainer to implement symbol
compression in DMD. But even then, the symbols were still ridiculously
huge -- because the outermost template instantiation had to encode the
types of every one of the hundreds of components.  As a result, symbol
compression or not, compile times were still ridiculously slow because
the compiler still had to copy those symbols around -- their length grew
quadratically with every component added to the chain.

Inserting a call to .inputRangeObject in the middle of the chain
dramatically cut down the size of the resulting symbols, because it
effectively erased all the types preceding it, resulting in much saner
codegen afterwards.


T

-- 
Leather is waterproof.  Ever see a cow with an umbrella?

Mar 06 2024

Paul Backus <snarwin gmail.com> writes:

On Wednesday, 6 March 2024 at 17:32:17 UTC, H. S. Teoh wrote:
 Every time this topic comes up, class-based ranges become the 
 whipping boy of range design woes.  Actually, they serve an 
 extremely important role: type erasure, which is critical when 
 you have code like this:

 	auto myRangeFunc(R,S)(R range1, S range2) {
 		if (runtimeDecision()) {
 			return range1;
 		} else {
 			return range2;
 		}
 	}

 [...]

 Note that you can't use a struct wrapper here, because R and S 
 have different ABIs; the only way to correctly forward range 
 methods to R or S is to use overridden base class methods. IOW, 
 the type erasure of R and S is unavoidable for this code to 
 work.

This is what std.range.choose [1] is for. Internally, it's 
implemented as a tagged union of R and S.

[1] https://dlang.org/library/std/range/choose.html

Mar 06 2024

"H. S. Teoh" <hsteoh qfbox.info> writes:

On Wed, Mar 06, 2024 at 05:43:05PM +0000, Paul Backus via Digitalmars-d wrote:
 On Wednesday, 6 March 2024 at 17:32:17 UTC, H. S. Teoh wrote:
 Every time this topic comes up, class-based ranges become the
 whipping boy of range design woes.  Actually, they serve an
 extremely important role: type erasure, which is critical when you
 have code like this:
 
 	auto myRangeFunc(R,S)(R range1, S range2) {
 		if (runtimeDecision()) {
 			return range1;
 		} else {
 			return range2;
 		}
 	}
 
 [...]
 
 Note that you can't use a struct wrapper here, because R and S have
 different ABIs; the only way to correctly forward range methods to R
 or S is to use overridden base class methods. IOW, the type erasure
 of R and S is unavoidable for this code to work.

 
 This is what std.range.choose [1] is for. Internally, it's implemented
 as a tagged union of R and S.
 
 [1] https://dlang.org/library/std/range/choose.html

Haha, I knew that one would be brought up.  Unfortunately, it only works
for a very limited subset of use cases. In the above simplified code you
could replace the if-statement with .choose. However, this immediately
falls flat if the decision is made by a switch statement, or nested
if-blocks, or an early exit from a loop.

The problem is that .choose requires that both ranges are constructible
in the same scope that it is called in. In non-trivial code this is
sometimes impossible, as you may be in a decision tree where certain
subranges are only constructed if the condition is true, or may depend
on value types that may not even exist if the condition is not
satisfied.  This makes .choose essentially useless outside of trivial
cases.

Besides, .choose does not type-erase, so in the cases where you want
type erasure it's still no help.


T

-- 
English is useful because it is a mess. Since English is a mess, it maps well
onto the problem space, which is also a mess, which we call reality. Similarly,
Perl was designed to be a mess, though in the nicest of all possible ways. --
Larry Wall

Mar 06 2024

Dukc <ajieskola gmail.com> writes:

On Wednesday, 6 March 2024 at 17:57:36 UTC, H. S. Teoh wrote:
 The problem is that .choose requires that both ranges are 
 constructible in the same scope that it is called in.

It's not even about those ranges being constructible. It's about 
separation of concerns. Like you wrote in your earlier post, 
sometimes we want to work with an arbitrary range without 
templating all of our code. Maybe for compile time reasons, or 
maybe to write a library that can be separately compiled from the 
client code.

 This makes .choose essentially useless outside of trivial cases.

It's far from useless, it's only like taking an enum argument and 
`switch`ing on it instead of taking a delerate argument. 
Certainly works but means you have to list all your cases in the 
function, which may or may not be what you want.


The gripe I have with `std.range.interface` is lack of attributes 
in the virtual functions. I don't think any simple attribute set 
alone would solve the problem well. Granted, there should rarely 
be reason to have a dynamic ` system` range type, but `pure` and 
`nothrow` are attributes you sometimes really want, sometimes you 
really don't. Therefore we need the ability to pick, meaning the 
interfaces will have to be templated, along the lines of:
```D
interface ForwardRange(uint attrs) : InputRange!attrs
{
    // ...
}
```

Plus there should probably be function to convert, for example, a
```
BidirectionalRange!(FunctionAttribute!safe | 
FunctionAttribute!pure_ | FunctionAttribute!nothrow_)
```
to
```
BidirectionalRange!(FunctionAttribute!safe | 
FunctionAttribute!nothrow_)
```
.

Mar 06 2024

Jonathan M Davis <newsgroup.d jmdavisprog.com> writes:

On Wednesday, March 6, 2024 1:53:55 PM MST Dukc via Digitalmars-d wrote:
 The gripe I have with `std.range.interface` is lack of attributes
 in the virtual functions. I don't think any simple attribute set
 alone would solve the problem well. Granted, there should rarely
 be reason to have a dynamic ` system` range type, but `pure` and
 `nothrow` are attributes you sometimes really want, sometimes you
 really don't. Therefore we need the ability to pick, meaning the
 interfaces will have to be templated, along the lines of:
 ```D
 interface ForwardRange(uint attrs) : InputRange!attrs
 {
     // ...
 }
 ```

 Plus there should probably be function to convert, for example, a
 ```
 BidirectionalRange!(FunctionAttribute!safe |
 FunctionAttribute!pure_ | FunctionAttribute!nothrow_)
 ```
 to
 ```
 BidirectionalRange!(FunctionAttribute!safe |
 FunctionAttribute!nothrow_)
 ```
 .

This is a fundamental problem with classes in general. Attributes just do
not play nicely with them. Either you're forced to require attributes that
potentially won't work for some code, or you're forced to disallow
attributes that some code might require.

And what you're describing results in a proliferation of interfaces, which
to an extent, goes against the point of using interfaces. So, we might do
something like that, but it does cause a different set of problems.

Regardless, we will have to look over exactly what we want to do with the
replacement for std.range.interfaces. I think that it was mostly thrown
together because it was thought that it was needed and not because it really
had been needed much at that point, and we should definitely look at what
needs reworking based on what we've learned in the interim. Classes in
general are kind of terrible with generic code in D though, particularly
because of the issues with attributes.

- Jonathan M Davis

Mar 06 2024

Dukc <ajieskola gmail.com> writes:

On Wednesday, 6 March 2024 at 21:24:59 UTC, Jonathan M Davis 
wrote:
 This is a fundamental problem with classes in general. 
 Attributes just do not play nicely with them. Either you're 
 forced to require attributes that potentially won't work for 
 some code, or you're forced to disallow attributes that some 
 code might require.

It's not a problem with classes, it's a decision one has to make 
when designing an interface with runtime variance regardless of 
the mechanism. You either have to allow the unknown code you're 
calling to execute side effects in which case it's impure, or you 
must execute the side effects it requests on it's behalf which 
means the unknown code must use your interfaces instead of doing 
their io themselves. It would be the same with any other 
mechanism.

 And what you're describing results in a proliferation of 
 interfaces, which to an extent, goes against the point of using 
 interfaces.

I don't think so. Say, you're designing an interpreter, and 
decide the user-defined functions can't execute io directly or 
break D type system, but can throw exceptions. Therefore the D 
type for a range in your client langauge will be 
`ForwardRange!(ClientLangValue, FunctionAttribute.safe | 
FunctionAttribute.pure_)`. You need only one template instance, 
same as presently. How is this exactly proliferation?

Mar 06 2024

Jonathan M Davis <newsgroup.d jmdavisprog.com> writes:

On Wednesday, March 6, 2024 3:54:54 PM MST Dukc via Digitalmars-d wrote:
 On Wednesday, 6 March 2024 at 21:24:59 UTC, Jonathan M Davis

 wrote:
 This is a fundamental problem with classes in general.
 Attributes just do not play nicely with them. Either you're
 forced to require attributes that potentially won't work for
 some code, or you're forced to disallow attributes that some
 code might require.

 It's not a problem with classes, it's a decision one has to make
 when designing an interface with runtime variance regardless of
 the mechanism. You either have to allow the unknown code you're
 calling to execute side effects in which case it's impure, or you
 must execute the side effects it requests on it's behalf which
 means the unknown code must use your interfaces instead of doing
 their io themselves. It would be the same with any other
 mechanism.

Yes, it's a problem with non-templated code in general, but because you
can't templatize virtual functions, it's a problem with classes that you
can't really fix, whereas with most other code, you can usually fix it by
templatizing the code and inferring the attributes based on the arguments.

 And what you're describing results in a proliferation of
 interfaces, which to an extent, goes against the point of using
 interfaces.

 I don't think so. Say, you're designing an interpreter, and
 decide the user-defined functions can't execute io directly or
 break D type system, but can throw exceptions. Therefore the D
 type for a range in your client langauge will be
 `ForwardRange!(ClientLangValue, FunctionAttribute.safe |
 FunctionAttribute.pure_)`. You need only one template instance,
 same as presently. How is this exactly proliferation?

It's a proliferation, because you're declaring a bunch of different
interfaces. Instead of having ForwardRange, you have a whole bunch of
variants of ForwardRange. Now, that might not end up being a problem in
practice (particularly since you do already have to templatize ForwardRange
on the element type), because you may only need one variation in your
codebase, but it does further complicate the interfaces, and it does mean
that interfaces with the same element type won't necessarily be compatible,
whereas with the current design, that isn't a problem.

So, it may ultimately be what we want to do, but it's not without its
downsides.

- Jonathan M Davis

Mar 06 2024

"H. S. Teoh" <hsteoh qfbox.info> writes:

On Wed, Mar 06, 2024 at 08:53:55PM +0000, Dukc via Digitalmars-d wrote:
 On Wednesday, 6 March 2024 at 17:57:36 UTC, H. S. Teoh wrote:

[...]
 This makes .choose essentially useless outside of trivial cases.

 
 It's far from useless, it's only like taking an enum argument and
 `switch`ing on it instead of taking a delerate argument. Certainly
 works but means you have to list all your cases in the function, which
 may or may not be what you want.

Well yes, so it's very limited in what it can do. In the cases where you
can't list all cases in the call to .choose, or where you don't want to
for code cleanliness reasons, it's useless.


 The gripe I have with `std.range.interface` is lack of attributes in
 the virtual functions. I don't think any simple attribute set alone
 would solve the problem well.

[...]

The problem is that the base class cannot predict which restrictive
attributes will be applied to a derived class override of a method. You
cannot override a  safe base class method with a  system derived class
method, for example. So it must settle for the most permissive
attributes.  But that means the code that receives a base class
reference must handle the most permissive attributes as well; a  safe
function would not be able to invoke a  safe method in a derived class
if the only reference it has to the derived object is a base class
reference.  There's no way around this except to get rid of
(restrictive) attributes altogether.


T

-- 
EMACS = Extremely Massive And Cumbersome System

Mar 06 2024

Meta <jared771 gmail.com> writes:

On Wednesday, 6 March 2024 at 17:32:17 UTC, H. S. Teoh wrote:
 Every time this topic comes up, class-based ranges become the 
 whipping boy of range design woes.  Actually, they serve an 
 extremely important role: type erasure, which is critical when 
 you have code like this:
 ...etc.

This would be a complete non-issue if we had language-level 
sumtypes:

```
	sumtype { R, S } myRangeFunc(R,S)(R range1, S range2) {
		if (runtimeDecision()) {
			return range1;
		} else {
			return range2;
		}
	}
```

std.sumtype can also do this, the ergonomics just aren't as nice.

Mar 06 2024

"H. S. Teoh" <hsteoh qfbox.info> writes:

On Wed, Mar 06, 2024 at 05:51:11PM +0000, Meta via Digitalmars-d wrote:
 On Wednesday, 6 March 2024 at 17:32:17 UTC, H. S. Teoh wrote:
 Every time this topic comes up, class-based ranges become the
 whipping boy of range design woes.  Actually, they serve an
 extremely important role: type erasure, which is critical when you
 have code like this: ...etc.

 
 This would be a complete non-issue if we had language-level sumtypes:
 
 ```
 	sumtype { R, S } myRangeFunc(R,S)(R range1, S range2) {
 		if (runtimeDecision()) {
 			return range1;
 		} else {
 			return range2;
 		}
 	}
 ```
 
 std.sumtype can also do this, the ergonomics just aren't as nice.

No, it doesn't work the way you think it would. The returned range must
behave like an input range (forward, etc.), as a single entity, because
the caller doesn't and shouldn't know or care whether the returned value
is a sumtype or not.  It must be able to just call .front, .empty,
.popFront on the result without any explicit unwrapping.

You can't do that with a sumtype, because you need to unwrap first. And
even if you make unwrapping implicit, so that a sumtype allows
operations common to all underlying types, there are implementational
difficulties.  Calling e.g.  .front on a sumtype potentially accessing
two completely incompatible functions. For example, if R.empty is a
function but S.empty is a field, there is no sane way to implement the
sumtype in such a way that the caller could just access .empty without
introducing a runtime switch into every range API call. (And even if you
were to implement things in this convoluted way, there are unaddressed
difficulties of what .empty on a sumtype might mean if it refers to two
completely incompatible things, like a 3-parameter function in R and an
alias to a symbol in S. What would mySumtype.empty actually compile to
in that case?)


T

-- 
It's bad luck to be superstitious. -- YHL

Mar 06 2024

Meta <jared771 gmail.com> writes:

On Wednesday, 6 March 2024 at 18:07:22 UTC, H. S. Teoh wrote:
 On Wed, Mar 06, 2024 at 05:51:11PM +0000, Meta via 
 Digitalmars-d wrote:
 On Wednesday, 6 March 2024 at 17:32:17 UTC, H. S. Teoh wrote:
 Every time this topic comes up, class-based ranges become 
 the whipping boy of range design woes.  Actually, they serve 
 an extremely important role: type erasure, which is critical 
 when you have code like this: ...etc.

 
 This would be a complete non-issue if we had language-level 
 sumtypes:
 
 ```
 	sumtype { R, S } myRangeFunc(R,S)(R range1, S range2) {
 		if (runtimeDecision()) {
 			return range1;
 		} else {
 			return range2;
 		}
 	}
 ```
 
 std.sumtype can also do this, the ergonomics just aren't as 
 nice.

 No, it doesn't work the way you think it would. The returned 
 range must behave like an input range (forward, etc.), as a 
 single entity, because the caller doesn't and shouldn't know or 
 care whether the returned value is a sumtype or not.  It must 
 be able to just call .front, .empty, .popFront on the result 
 without any explicit unwrapping.

 You can't do that with a sumtype, because you need to unwrap 
 first. And even if you make unwrapping implicit, so that a 
 sumtype allows operations common to all underlying types, there 
 are implementational difficulties.  Calling e.g.  .front on a 
 sumtype potentially accessing two completely incompatible 
 functions. For example, if R.empty is a function but S.empty is 
 a field, there is no sane way to implement the sumtype in such 
 a way that the caller could just access .empty without 
 introducing a runtime switch into every range API call. (And 
 even if you were to implement things in this convoluted way, 
 there are unaddressed difficulties of what .empty on a sumtype 
 might mean if it refers to two completely incompatible things, 
 like a 3-parameter function in R and an alias to a symbol in S. 
 What would mySumtype.empty actually compile to in that case?)


 T

Ah, you're right. There is really no way to do this in a 
completely flexible way in current D, if you also want to have 
value range semantics. Couldn't we somehow fit RefRange into the 
current range hierarchy somewhere, and provide primitives to make 
it more difficult to shoot yourself in the foot?

Mar 06 2024

Jonathan M Davis <newsgroup.d jmdavisprog.com> writes:

On Wednesday, March 6, 2024 1:23:52 PM MST Meta via Digitalmars-d wrote:
 Couldn't we somehow fit RefRange into the
 current range hierarchy somewhere, and provide primitives to make
 it more difficult to shoot yourself in the foot?

RefRange was a huge mistake (and it was my mistake). It's basically trying
to force reference semantics on something that has value semantics, and it
causes problems as a result. I don't think that it can actually be made to
work properly - at least not with forward ranges.

IMHO, we need to be able to rely on the semantics of the basic range
operations, and most of the problems with the current range API come from
when operations are under-specified or outright unspecified - be it because
we're trying to support too much (e.g. both structs and classes for ranges),
or because we just didn't actually think to specify what the behavior should
be. And in some cases, we though of it but too late to actually require it
(e.g. you can't actually rely on $ working with random-access ranges,
because we didn't think to require it up front, and too many random-access
ranges had been written that didn't support it by the time we realized our
mistake).

- Jonathan M Davis

Mar 06 2024

Meta <jared771 gmail.com> writes:

On Wednesday, 6 March 2024 at 21:17:37 UTC, Jonathan M Davis 
wrote:
 On Wednesday, March 6, 2024 1:23:52 PM MST Meta via 
 Digitalmars-d wrote:
 Couldn't we somehow fit RefRange into the
 current range hierarchy somewhere, and provide primitives to 
 make
 it more difficult to shoot yourself in the foot?

 RefRange was a huge mistake (and it was my mistake). It's 
 basically trying to force reference semantics on something that 
 has value semantics, and it causes problems as a result. I 
 don't think that it can actually be made to work properly - at 
 least not with forward ranges.

 IMHO, we need to be able to rely on the semantics of the basic 
 range operations, and most of the problems with the current 
 range API come from when operations are under-specified or 
 outright unspecified - be it because we're trying to support 
 too much (e.g. both structs and classes for ranges), or because 
 we just didn't actually think to specify what the behavior 
 should be. And in some cases, we though of it but too late to 
 actually require it (e.g. you can't actually rely on $ working 
 with random-access ranges, because we didn't think to require 
 it up front, and too many random-access ranges had been written 
 that didn't support it by the time we realized our mistake).

 - Jonathan M Davis

I don't mean the type RefRange; I mean expanding the range 
hierarchy to include the concept of a "RefRange" (just like 
InputRange, BidirectionalRange, etc.) that explicitly has 
reference semantics instead of value semantics. Then you can 
specialize algorithms on whether they're a ref range or not 
(implying that ranges with reference semantics should no longer 
pass the isInputRange et al. checks - no idea how you'd do that 
OTOH), and then you can account for that fact in the 
implementation of the algorithm, rather than all range algorithms 
currently carrying an implicit assumption that they're working 
with a value range.

Mar 06 2024

Steven Schveighoffer <schveiguy gmail.com> writes:

On Wednesday, 6 March 2024 at 17:32:17 UTC, H. S. Teoh wrote:
 On Wed, Mar 06, 2024 at 04:47:02PM +0000, Steven Schveighoffer 
 via Digitalmars-d wrote: [...]
 The only tricky aspect is ranges that are references 
 (classes/pointers).  Neither of those to me should be 
 supported IMO, you can always wrap such a thing in a range 
 harness.

 [...]

 Every time this topic comes up, class-based ranges become the 
 whipping boy of range design woes.  Actually, they serve an 
 extremely important role: type erasure, which is critical when 
 you have code like this:

As Jonathan already responded, you can have a class range by 
wrapping it in a struct. Which is what I meant when I said:

"you can always wrap such a thing in a range harness"

-Steve

Mar 07 2024

Paul Backus <snarwin gmail.com> writes:

On Wednesday, 6 March 2024 at 16:47:02 UTC, Steven Schveighoffer 
wrote:
 On Wednesday, 6 March 2024 at 14:18:50 UTC, Paul Backus wrote:
 On Monday, 4 March 2024 at 21:29:40 UTC, Jonathan M Davis 
 wrote:
 2. The range API provides no way (other than fully iterating 
 through a range) to get an empty range of the same type from 
 a range unless the range is a random-access range.

 Genuine question: what are the use-cases for this?

 In general, the capabilities of ranges are designed to serve 
 the needs of algorithms. Input ranges exist because 
 single-pass iteration is all that's needed to implement 
 algorithms like map, filter, and reduce. Random-access ranges 
 exist because they're needed for sorting algorithms. And so on.

 I'm not aware of any algorithm, or class of algorithm, that 
 needs this specific capability you're describing. If such 
 algorithms exist, we should be using them to guide our design 
 here. If they don't...then maybe this isn't really a problem 
 at all.

 When you need to pass a specific range type, and you want to 
 pass in an empty range of that type, how do you do it?

By "specific range type", do you mean a specific category of 
range (input, forward, random-access, etc.), or a specific 
concrete type?

If the former, this is already easy to do with existing language 
and library features (for example, in many cases you can use an 
empty slice).

If the latter, then the question I'm asking is, why do you need 
to do that?

Personally, I can think of plenty of times that I've needed to do 
the first thing, but I can't think of a single time that I've 
needed to do the second thing.

 The only tricky aspect is ranges that are references 
 (classes/pointers). Neither of those to me should be supported 
 IMO, you can always wrap such a thing in a range harness.

The main thing you lose by dropping support for reference-type 
ranges is interfaces. In particular, the interface inheritance 
hierarchy in `std.range.interfaces`, where `ForwardRange` 
inherits from `InputRange` and so on, cannot really be replicated 
using `structs` (`alias this` only goes so far).

Mar 06 2024

"Richard (Rikki) Andrew Cattermole" <richard cattermole.co.nz> writes:

On 07/03/2024 6:38 AM, Paul Backus wrote:
     The only tricky aspect is ranges that are references
     (classes/pointers). Neither of those to me should be supported IMO,
     you can always wrap such a thing in a range harness.
 
 The main thing you lose by dropping support for reference-type ranges is 
 interfaces. In particular, the interface inheritance hierarchy in 
 |std.range.interfaces|, where |ForwardRange| inherits from |InputRange| 
 and so on, cannot really be replicated using |structs| (|alias this| 
 only goes so far).

All of this would ideally be solved with the introduction of signatures.

But alas as we've covered in past conversations that is a big set of 
features to introduce.

Mar 06 2024

Steven Schveighoffer <schveiguy gmail.com> writes:

On Wednesday, 6 March 2024 at 17:38:56 UTC, Paul Backus wrote:
 On Wednesday, 6 March 2024 at 16:47:02 UTC, Steven 
 Schveighoffer wrote:
 On Wednesday, 6 March 2024 at 14:18:50 UTC, Paul Backus wrote:
 On Monday, 4 March 2024 at 21:29:40 UTC, Jonathan M Davis 
 wrote:
 2. The range API provides no way (other than fully iterating 
 through a range) to get an empty range of the same type from 
 a range unless the range is a random-access range.

 Genuine question: what are the use-cases for this?

 In general, the capabilities of ranges are designed to serve 
 the needs of algorithms. Input ranges exist because 
 single-pass iteration is all that's needed to implement 
 algorithms like map, filter, and reduce. Random-access ranges 
 exist because they're needed for sorting algorithms. And so 
 on.

 I'm not aware of any algorithm, or class of algorithm, that 
 needs this specific capability you're describing. If such 
 algorithms exist, we should be using them to guide our design 
 here. If they don't...then maybe this isn't really a problem 
 at all.

 When you need to pass a specific range type, and you want to 
 pass in an empty range of that type, how do you do it?

 By "specific range type", do you mean a specific category of 
 range (input, forward, random-access, etc.), or a specific 
 concrete type?

 If the former, this is already easy to do with existing 
 language and library features (for example, in many cases you 
 can use an empty slice).

 If the latter, then the question I'm asking is, why do you need 
 to do that?

I meant the latter. A concrete range type.

How does it happen? Consider that an array is a concrete type 
that people use all the time. Why should any other range be 
different? I've definitely stored ranges and other things as type 
members that were voldemort types.

This ability is more of a question of "do we want to add this 
feature to ranges or not?" The feature doesn't currently exist -- 
you can't assume that an uninitialized range is empty.

Sometimes, we are bitten by the fact that the array is the most 
common range, and behaves in a specific way. People depend on 
that mechanism without realizing it, and then sometime later, 
they decide to change the type to one that is very compatible 
with arrays, but offers some benefit (i.e. to remove an 
allocation). However, the new range type might behave in 
unexpected ways, *but still compiles*.

A few examples I can think of:

1. copying an array is equivalent to `arr.save`, but may not be 
the case for other forward ranges.
2. character arrays have a mechanism to decode into dchar if you 
specify `dchar` as the loop variable type.
3. arrays have a default value of an empty array.

Reducing surprises when you substitute what seems like a 
"compatible" type is desirable, but not strictly necessary. I.e. 
I'm also OK if we don't try and add these definitions.

For the empty array case, I think the mechanism is trivial to add 
as a formal requirement, since nearly all ranges default to 
empty, and the ones that don't are probably easy to change. Maybe 
this isn't the case? I don't know. I think reducing friction when 
it's easy to do is something we should always be looking at.

 The only tricky aspect is ranges that are references 
 (classes/pointers). Neither of those to me should be supported 
 IMO, you can always wrap such a thing in a range harness.

 The main thing you lose by dropping support for reference-type 
 ranges is interfaces. In particular, the interface inheritance 
 hierarchy in `std.range.interfaces`, where `ForwardRange` 
 inherits from `InputRange` and so on, cannot really be 
 replicated using `structs` (`alias this` only goes so far).

As mentioned, you can wrap these interfaces into structs, which 
then have better lifetime tracking capabilities.

-Steve

Mar 07 2024

"H. S. Teoh" <hsteoh qfbox.info> writes:

On Thu, Mar 07, 2024 at 06:32:50PM +0000, Steven Schveighoffer via
Digitalmars-d wrote:
[...]
 Sometimes, we are bitten by the fact that the array is the most common
 range, and behaves in a specific way. People depend on that mechanism
 without realizing it, and then sometime later, they decide to change
 the type to one that is very compatible with arrays, but offers some
 benefit (i.e. to remove an allocation). However, the new range type
 might behave in unexpected ways, *but still compiles*.

Over the past decade of working with D, this has repeatedly come up as
the weakness of a signature-constraint based approach to ducktyping. The
problem is that what the signature constraint requires (e.g.,
isInputRange) may only be a subset of what the function body assumes,
and there is no way to check this mechanically (signature constraints
are Turing-complete).

What ideally should happen is that whatever the code assumes should also
be declared in the function signature. So if a template takes a
parameter t of generic type T, and then goes and performs t++, then the
compiler should enforce that ++ is declared as part of the constraints
on T.  The C++ concepts approach is better in this respect, in that the
compiler can typecheck the function body and emit an error if it tries
to perform an operation on T that it didn't declare as part of its
signature constraint.  Barring implementing concepts in D, which is
unlikely to happen, the next best alternative is for the compiler to
enforce that any operation on T must have a corresponding check in the
signature constraint, and when this is not the case, it should, based on
the attempted operation, issue an error message with a suggested
addition to the sig constraint that would satisfy this requirement.

This doesn't fully solve the problem here, of course -- sometimes the
difference is semantic rather than something easily checkable by the
compiler, like subtle differences between built-in arrays and
user-defined arrays; in that case you're still up the creek without a
paddle.  But it's a step forward.


T

-- 
The trouble with TCP jokes is that it's like hearing the same joke over and
over.

Mar 07 2024

Paul Backus <snarwin gmail.com> writes:

On Thursday, 7 March 2024 at 18:32:50 UTC, Steven Schveighoffer 
wrote:
 I meant the latter. A concrete range type.

 How does it happen? Consider that an array is a concrete type 
 that people use all the time. Why should any other range be 
 different? I've definitely stored ranges and other things as 
 type members that were voldemort types.

This doesn't answer my question.

There are lots of ways in which (built-in) arrays are different 
from other types of ranges. Most notably, an array can be used as 
either a range or a container. There are plenty of use-cases for 
creating an empty container.

What I am asking, specifically, is whether there is any use-case 
where *generic code*, given a range of some arbitrary type `R`, 
needs to create another range which both (a) has the exact 
concrete type `R`, and (b) is empty. Since that's the feature 
that's being proposed here.

(Non-generic code does not need this feature to be part of the 
range API, because it can rely on specific features of whatever 
concrete type it's working with.)

 This ability is more of a question of "do we want to add this 
 feature to ranges or not?" The feature doesn't currently exist 
 -- you can't assume that an uninitialized range is empty.

If there are no use-cases for this feature, then the answer to 
"do we want to add it" ought to be "no." That's why I'm asking 
about use-cases.

 Sometimes, we are bitten by the fact that the array is the most 
 common range, and behaves in a specific way. People depend on 
 that mechanism without realizing it, and then sometime later, 
 they decide to change the type to one that is very compatible 
 with arrays, but offers some benefit (i.e. to remove an 
 allocation). However, the new range type might behave in 
 unexpected ways, *but still compiles*.

This is a general problem with templates/macros compared to typed 
generics. Even if we get rid of this particular edge case, there 
are still dozens more that users are going to run into if they 
only test with arrays.

If we want to address this problem, I think the best thing we can 
do is to provide a standard suite of test ranges that users can 
plug into their code to uncover edge cases and bugs.

 The main thing you lose by dropping support for reference-type 
 ranges is interfaces. In particular, the interface inheritance 
 hierarchy in `std.range.interfaces`, where `ForwardRange` 
 inherits from `InputRange` and so on, cannot really be 
 replicated using `structs` (`alias this` only goes so far).

 As mentioned, you can wrap these interfaces into structs, which 
 then have better lifetime tracking capabilities.

How do you implement this with structs?

     interface ForwardAssignable : InputAssignable!E, 
ForwardRange!E

Interfaces allow multiple inheritance. Structs can only have one 
`alias this` member. Maybe you're fine with giving up on this 
feature, but let's at least be honest that we *are* giving up 
features here.

Mar 07 2024

Jonathan M Davis <newsgroup.d jmdavisprog.com> writes:

On Wednesday, March 6, 2024 10:32:17 AM MST H. S. Teoh via Digitalmars-d 
wrote:
 On Wed, Mar 06, 2024 at 04:47:02PM +0000, Steven Schveighoffer via
 Digitalmars-d wrote: [...]

 The only tricky aspect is ranges that are references
 (classes/pointers).  Neither of those to me should be supported IMO,
 you can always wrap such a thing in a range harness.

 [...]

 Every time this topic comes up, class-based ranges become the whipping
 boy of range design woes.  Actually, they serve an extremely important
 role: type erasure, which is critical when you have code like this:

   auto myRangeFunc(R,S)(R range1, S range2) {
       if (runtimeDecision()) {
           return range1;
       } else {
           return range2;
       }
   }

 This generally will not compile because R and S are different types,
 even if both conform to a common underlying range API, like an input
 range. To remedy this situation, class-based range wrappers come to the
 rescue:

   auto myRangeFunc(R,S)(R range1, S range2) {
       if (runtimeDecision()) {
           return inputRangeObject(range1);
       } else {
           return inputRangeObject(range2);
       }
   }

 Note that you can't use a struct wrapper here, because R and S have
 different ABIs; the only way to correctly forward range methods to R or
 S is to use overridden base class methods. IOW, the type erasure of R
 and S is unavoidable for this code to work.

 //

 Also, class-based ranges are sometimes necessary for practical reasons,
 like in this one program I had, that has an UFCS chain consisting of
 hundreds of components (not all written out explicitly in the same
 function, of course, but that's what the underlying instantiation looks
 like).  It generated ridiculously large symbols that crashed the
 compiler. Eventually the bug I filed prodded Rainer to implement symbol
 compression in DMD. But even then, the symbols were still ridiculously
 huge -- because the outermost template instantiation had to encode the
 types of every one of the hundreds of components.  As a result, symbol
 compression or not, compile times were still ridiculously slow because
 the compiler still had to copy those symbols around -- their length grew
 quadratically with every component added to the chain.

 Inserting a call to .inputRangeObject in the middle of the chain
 dramatically cut down the size of the resulting symbols, because it
 effectively erased all the types preceding it, resulting in much saner
 codegen afterwards.

My current plan is to retain std.range.interfaces in some form or another so
that we can have ranges as classes as far as their implementation goes (at
my job, we actually have to use classes for ranges, because we have to be
able to wrap arbitrary ranges at runtime, since we're dealing with an
intepreter). However, they would not count as ranges by any range-based
algorithms, because they would be classes, and so they would have to be
wrapped in structs to actually be used as ranges. The wrapper structs would
then do all of the work to ensure that the proper copying behavior was
implemented.

So, you'd do something like

    auto r = inputRangeObject(range1);

and get a struct range wrapping a class from the replacement for
std.range.interfaces. And if we need the ability to get at the underlying
class for something, it's as simple as providing a member to the struct
which returns it - e.g. source, like we do with several other range types.
Then you could do something like

auto r = inputRangeObject(range1);
InputRange cls = r.source;

Of course, you lose access to the class as soon as it's wrapped in another
range type, but that happens right now when you pass ranges around which are
classes. So, that shouldn't really change much from what I can see. It's
basically just wrapping classes in structs so that the struct does the
behavior that we currently do with save (and it could do it more
intelligently by taking advantage of reference counting rather than dup-ing
the class whenever the range is copied).

So, unless I'm missing something here, we _should_ be able to just wrap
classes in structs to allow ranges which are implemented as classes even if
the classes themselves can't be ranges any longer. The net result should be
that dealing with forward ranges is simpler, and we stop having save-related
bugs, but we shouldn't actually lose any functionality.

- Jonathan M Davis

Mar 06 2024

"H. S. Teoh" <hsteoh qfbox.info> writes:

On Wed, Mar 06, 2024 at 01:03:35PM -0700, Jonathan M Davis via Digitalmars-d
wrote:
[...]
 My current plan is to retain std.range.interfaces in some form or
 another so that we can have ranges as classes as far as their
 implementation goes (at my job, we actually have to use classes for
 ranges, because we have to be able to wrap arbitrary ranges at
 runtime, since we're dealing with an intepreter). However, they would
 not count as ranges by any range-based algorithms, because they would
 be classes, and so they would have to be wrapped in structs to
 actually be used as ranges. The wrapper structs would then do all of
 the work to ensure that the proper copying behavior was implemented.

[...]
 So, unless I'm missing something here, we _should_ be able to just
 wrap classes in structs to allow ranges which are implemented as
 classes even if the classes themselves can't be ranges any longer. The
 net result should be that dealing with forward ranges is simpler, and
 we stop having save-related bugs, but we shouldn't actually lose any
 functionality.

[...]

This actually sounds like a reasonable approach.


T

-- 
Applying computer technology is simply finding the right wrench to pound in the
correct screw.

Mar 06 2024

Jonathan M Davis <newsgroup.d jmdavisprog.com> writes:

On Wednesday, March 6, 2024 7:18:50 AM MST Paul Backus via Digitalmars-d 
wrote:
 On Monday, 4 March 2024 at 21:29:40 UTC, Jonathan M Davis wrote:
 2. The range API provides no way (other than fully iterating
 through a range) to get an empty range of the same type from a
 range unless the range is a random-access range.

 Genuine question: what are the use-cases for this?

 In general, the capabilities of ranges are designed to serve the
 needs of algorithms. Input ranges exist because single-pass
 iteration is all that's needed to implement algorithms like map,
 filter, and reduce. Random-access ranges exist because they're
 needed for sorting algorithms. And so on.

 I'm not aware of any algorithm, or class of algorithm, that needs
 this specific capability you're describing. If such algorithms
 exist, we should be using them to guide our design here. If they
 don't...then maybe this isn't really a problem at all.

There are times where I've had to make a range empty in algorithms in order
to indicate that it's done or force it to be done. This is particularly true
with some algorithms I've written where the range was passed by ref. IIRC, I
tried to add something to std.range for this years ago (either that, or I
tried to make takeNone do it; it's been a while) using the init value, but
Andrei nixed it on the grounds that we couldn't rely on it. The result is
that while we have takeNone, you can't use it to actually make a range
empty. You can just use it to get an empty range of a potentially different
type.

So, while this is not something that I've seen come up frequently, this is
definitely something that I've seen come up in my own code.

And in general, if we're able to require that the init value be empty for
finite ranges, that will reduce the number of bugs surrounding member
variables and the init value, since it's fairly common to end up using the
init value of a range when you have to store it as a member variable (even
if it was a Voldemort type to begin with), and the natural thing to do at
that point is to assume that the init value is empty (even though in
reality, for some range types right now, it isn't even in a valid).

But yes, the desire to have a way to force a range to be empty is something
that's come from actual experience.

- Jonathan M Davis

Mar 06 2024

D Programming

C/C++ Programming

Other

digitalmars.D - Range Redesign: Empty Ranges