digitalmars.D - Range Redesign: Empty Ranges
- Jonathan M Davis (172/172) Mar 04 Okay. A few weeks ago, I made a post about potential design decisions wh...
- Sebastiaan Koppe (7/10) Mar 04 Makes a lot of sense, but I agree it will be very annoying for
- Jonathan M Davis (69/79) Mar 04 You mean define a static function named init? I think that most of us ag...
- Dom DiSc (8/20) Mar 05 Are there infinite ranges that are indeed not forward ranges?
- Alexandru Ermicioi (2/3) Mar 05 /dev/random could be such a range.
- Adam Wilson (13/16) Mar 05 In the case of /dev/random I don't really see how it's a true
- H. S. Teoh (13/24) Mar 05 Nothing about an infinite range requires that it be infinite both ways,
- Adam Wilson (13/20) Mar 05 That is true in theory, but in practice if you try it on real
- Arafel (2/6) Mar 06 Or a range that returns captured data from a sensor.
- Alexandru Ermicioi (8/14) Mar 06 Nice example,
- Dom DiSc (6/12) Mar 06 If you have a source of real entropy, ok. But
- Arafel (6/10) Mar 06 While technically true, you might not have access to the seed, so in
- Atila Neves (22/27) Mar 06 I like T.init being empty. I thought I'd found a clever way to
- Ogi (8/19) Mar 06 I’m surprised that the equivalent C++ code *doesn’t* crash (at
- Jonathan M Davis (10/35) Mar 06 The issue is whether the function is virtual. In D, class member functio...
- Ogi (18/27) Mar 07 Virtual or not, GCC assumes that `this` cannot be null:
- Paul Backus (15/27) Mar 06 It crashes because it's attempting to access Class's vtable. If
- Atila Neves (5/34) Mar 07 Of course! I write classes so seldom I forget my own rule to add
- Jonathan M Davis (4/8) Mar 07 The solution that I'm currently planning to go with is to wrap them:
- Paul Backus (11/14) Mar 06 Genuine question: what are the use-cases for this?
- Steven Schveighoffer (28/43) Mar 06 When you need to pass a specific range type, and you want to pass
- H. S. Teoh (48/51) Mar 06 [...]
- Paul Backus (4/21) Mar 06 This is what std.range.choose [1] is for. Internally, it's
- H. S. Teoh (18/43) Mar 06 Haha, I knew that one would be brought up. Unfortunately, it only works
- Dukc (35/38) Mar 06 It's not even about those ranges being constructible. It's about
- Jonathan M Davis (16/40) Mar 06 This is a fundamental problem with classes in general. Attributes just d...
- Dukc (17/25) Mar 06 It's not a problem with classes, it's a decision one has to make
- Jonathan M Davis (16/41) Mar 06 Yes, it's a problem with non-templated code in general, but because you
- H. S. Teoh (19/29) Mar 06 Well yes, so it's very limited in what it can do. In the cases where you
- Meta (13/18) Mar 06 This would be a complete non-issue if we had language-level
- H. S. Teoh (22/41) Mar 06 No, it doesn't work the way you think it would. The returned range must
- Meta (6/50) Mar 06 Ah, you're right. There is really no way to do this in a
- Jonathan M Davis (16/19) Mar 06 RefRange was a huge mistake (and it was my mistake). It's basically tryi...
- Meta (13/36) Mar 06 I don't mean the type RefRange; I mean expanding the range
- Steven Schveighoffer (5/16) Mar 07 As Jonathan already responded, you can have a class range by
- Paul Backus (18/43) Mar 06 By "specific range type", do you mean a specific category of
- Richard (Rikki) Andrew Cattermole (4/13) Mar 06 All of this would ideally be solved with the introduction of signatures.
- Steven Schveighoffer (33/75) Mar 07 I meant the latter. A concrete range type.
- H. S. Teoh (29/35) Mar 07 Over the past decade of working with D, this has repeatedly come up as
- Paul Backus (32/54) Mar 07 This doesn't answer my question.
- Jonathan M Davis (33/81) Mar 06 My current plan is to retain std.range.interfaces in some form or anothe...
- H. S. Teoh (8/23) Mar 06 [...]
- Jonathan M Davis (23/37) Mar 06 There are times where I've had to make a range empty in algorithms in or...
Okay. A few weeks ago, I made a post about potential design decisions when updating the range API in the next version of Phobos, and this is a continuation of that, albeit with a different set of problems. Specifically: 1. The range API does not currently - and cannot currently - require or guarantee that the init value of a range is valid. 2. The range API provides no way (other than fully iterating through a range) to get an empty range of the same type from a range unless the range is a random-access range. Fixing the first issue is straightforward, I think, though it potentially affects the second, and of course, someone here may have thoughts or insights which I've missed. The issue with init is that of course code wants to use the init value, and code often _does_ use the init value for specific ranges, but doing so when that specific range type does not specify that the init value is valid (and documentation almost never does) is relying on unspecified behavior - and in the general case, using the init value of a range cannot be specified behavior. The obvious case is ranges which are classes, because their init value is going to be null, meaning that using the init value of a range could result in a segfault. Of course, _most_ range-based code doesn't use classes for ranges, so I expect that it's quite rare to hit that problem in practice, but most range-based code uses Voldemort types and tends to ignore the fact that the init value can still be used (either by using typeof on the result to get its type or by getting init directly from the result - e.g. range.init). And you actually need to do that when storing a range as a member variable. E.G. struct Foo { ... auto _range = typeof(chain(func1(), func2(), func3())).init; ... } The more correct thing to do would be something like struct Foo { ... auto _range = Nullable!(typeof(chain(func1(), func2(), func3()))); ... } so that you can avoid using the range's init value, but realistically, I don't think that it's something that many folks think of. They use the range's init value, and it works (since it often does work), and then they end up relying on how that range type happens to work at the moment, which could change in the future, since the range API does not guarantee that it works, and the folks maintaining that range probably aren't thinking much about the init value anyway - especially when it's a Voldemort type. The typical expectation is that the user will be creating the range by calling the function that returns it, not by using type introspection to get their hands on the init value. But it is always possible to get the init value, and code declaring member variables from ranges will use it unless it goes to the extra effort of using something like Nullable so that it's possible to store the range without using its init value as the default value. Now, I think that the fix for this particular problem is pretty easy. We just require that all ranges have a valid init value. This means disallowing classes or pointers being used directly as ranges, but we were already looking at doing that for forward ranges anyway in order to be able to get rid of save. It's arguably more problematic for basic input ranges, but it's quite feasible there as well. The previous thread discussed requiring that they be reference types, which would lean towards them being pointers or classes, but you can have structs which are reference types (e.g. they contain a class reference but then have the logic around it to ensure that it's not used when null). Also, it was proposed in that thread that we require that basic input ranges be non-copyable, which would also make it straightforward to require that the init value be valid, since in that case, all basic input ranges would have to be structs regardless of what their internals were doing with pointers or references. Either way, we can disallow classes and pointers being used directly as ranges and require that all ranges have a valid init value. So, I'm proposing that we require that all ranges have a valid init value. I suspect that the issue will still often be forgotten, particularly since ranges are often Voldemort types, but if the range API explicitly requires it, and the range documentation is clear on that point, then we can clearly consider it a bug when a range has an invalid init value, and in most cases, I wouldn't expect it to be a big deal to fix. The second issue is then the ability to get an empty range from a range type in O(1) which is the same type as the original range. In my experience, it's sometimes the case that you really want to be able to do something like range = Range.emptyRange; or range.makeEmpty(); in order to make a range empty without eating up CPU cycles calling popFront until empty is true. This is possible with a random-access range by slicing it with a zero length, but it's not possible with lower classes of ranges. std.range.takeNone tries to do it, but it is forced to give you a new range type if the range isn't a random-access range - specifically, it uses takeExactly, but regardless, that doesn't work in cases where you need to keep the range type the same but make it empty (e.g. because you're dealing with a ref parameter). And right now, that means calling popFront over and over again. Of course, if you're dealing with an infinite range, then there can't be a way to make it empty, but IMHO, we should have a way to get an empty range from an arbitrary finite range type where the empty range is of the same type. And for that, I see three possibly options: 1. Make it so that for finite ranges, the init value of a range is required to not just be valid, but it's also required to be empty. 2. We add a static / enum member to the range API called something like emptyRange where you can get an empty range from that range type where the range is that range type. 3. Add a new member function to the range API called something like makeEmpty which makes that particular instance empty. functions, and in general, the init value probably should be empty. However, my concern there is that there may be ranges where that's going to constrain the implementation in annoying ways. All finite ranges have to have an empty state, because you'll get to empty if you call popFront enough times, so it should be possible for all ranges to have their init value be empty, but I could also see it being the case where requiring that would mean that the implementer of the range would have to jump through some annoying hoops to make it work - though in most cases, if they have to jump through hoops, I would expect that they'd have to jump through most (or all) of those hoops to make the init value valid whether it's empty or not. But I would like feedback on how reasonable folks think it would be to require that a range's init value be empty. get an empty range without having to create one with elements first, whereas if we went with makeEmpty, then you'd need to construct a range first just to make it empty. That being said, implementing emptyRange (or whatever we'd call it) wouldn't be all that different from making the init value empty. The main difference I see is that emptyRange wouldn't have to be statically known, which might be useful in some cases, though the kinds of cases that I can think of tend to be ones where making the init value valid would require additional logic, in which case, in going to the effort of making the init value valid, you might as well make it empty anyway. I expect that for some ranges, makeEmpty would be easier to implement, and it would probably also work better if we decide to make basic input ranges non-copyable, since mutating a non-copyable range would be more straightforward than moving an empty one on top of it. But I would think that we could make the latter work, and in the general case, being able to have an empty range for any finite range in O(1) would be nice to have without having to create a range just to make it empty. Either awy, one advantage to having emptyRange / makeEmpty over using the init value as empty would be that while adding a new function would be annoying, it would force anyone implementing a range to take into account the fact that they need to provide a way to get an empty range. In many cases, it may just be alias emptyRange = typeof(this).init; but range implementers would be forced to think about it, whereas if we just made it so that the documentation said that they had to make the init value empty, they'd be less likely to notice. But it's still extra stuff that has to implemented which would be completely unnecessary with ranges that would have empty init value anyway. A related issue is that if we're removing save from forward ranges, and we keep the rest of the functions the same, some of the old basic input ranges then look like the new forward ranges (since they wouldn't have save, and forward ranges wouldn't require it), which would be bad, because they wouldn't have the correct semantics. To fix that, we either need to add something to the range API (be it a whole new function like emptyRange or something like an enum that flagged a type as being a forward range), or we'd have to change the existing functions (be it something as simple as changing empty to isEmpty or something more complicated like switching from front/popFront to head/tail, which would also allow us to get around the tail-const problem but would be a pretty massive change to how ranges work). Adding emptyRange or makeEmpty _almost_ fixes that problem, but since infinite ranges wouldn't have have it (since they couldn't implement it), it wouldn't actually fix the problem. Some old basic input ranges which are infinite would still pass the new isForwardRange. So, if it weren't for the infinite range issue, I'd see the fact that it would change the API enough to distingush between the old and new APIs as a good reason to go with emptyRange over requiring that the init value be empty, but since it doesn't fix the issue with infinite ranges, I don't think that it makes sense to go with emptyRange / makeEmpty just to differentiate between the old and new range APIs (it would make it more obvious to programmers using these ranges, which could be valuable - but then again, having isEmpty instead of empty would do the same). So, hopefully that makes the situation with those two issues clear. The question then is what insights you folks might have on them and the proposed solutions. It may even be that it's way too onerous to require that the init value be valid, and someone has a use case to show that, though I very much hope that that is not the case. I expect though that the bigger question is whether init should be required to be empty for finite ranges or whether it would be better to add a new function for that. Thoughts? - Jonathan M Davis
Mar 04
On Monday, 4 March 2024 at 21:29:40 UTC, Jonathan M Davis wrote:1. Make it so that for finite ranges, the init value of a range is required to not just be valid, but it's also required to be empty.Makes a lot of sense, but I agree it will be very annoying for some ranges. I suspect it will result in extra state and/or checks for those cases. Would it be too crazy of an idea for those ranges to implement a static init function?
Mar 04
On Monday, March 4, 2024 2:58:08 PM MST Sebastiaan Koppe via Digitalmars-d wrote:On Monday, 4 March 2024 at 21:29:40 UTC, Jonathan M Davis wrote:You mean define a static function named init? I think that most of us agree at this point that it was a mistake to allow init to be anything other than the compiler-defined init, and as I understand it, it doesn't really work to redefine it in the way that Walter intended. Realistically, code tends to depend on the init value being the compiler-defined one, and trying to do anything else is asking for trouble. A redefined init value doesn't actually get used with default-initialization, so with a case like struct Foo { ... typeof(chain(func1(), func2(), func3())) _range; ... } any init that you declare will be ignored (even if it's a value and not a function), meaning that unless that range type has a valid init value, that variable won't work properly unless it happens to be assigned a value before it's used. We could of course add a function / enum to the range API which would be required to be used instead of the init value when you're looking for some kind of initialization, but that doesn't solve the problem of default-initialized values. For that, it really needs the init to be valid. Of course, we _could_ allow ranges to disable default initialization, and then the type could define init with a static function which would be used when init was used explicitly, but that would potentially cause problems if init were used explicitly at compile time. It's also on the list of things that I half expect to be disallowed at some point with a future edition, because allowing init to be anything other than the default one tends to cause issues. So, we _might_ be able to make it work, but my gut reaction is that we're playing with fire, and we'll be better off overall if we can just require that the compiler-defined init value be valid. Still, if we allowed it, at least it should result in compile time errors in generic code rather than weird runtime behavior. But it would mean that generic code couldn't rely on being able to default-initialize ranges, and it would have to give all ranges a value at runtime, which would be annoying, and it realistically wouldn't happen outside of something like Phobos (and it would likely only happen in Phobos if that specific case was tested for by Phobos functions in general). Code in general expects default initialization to work, and it's usually only code specifically written to work with types that don't allow it where it actually works to have a type that disables it. So, I don't know. I'd really rather not play games with trying to redefine the init value. I also suspect that most of the annoyance that ranges might have with the init value would be there simply by requiring that the init value be valid without requiring that it be empty, in which case, requiring that it be empty wouldn't be onerous. It's requiring that it be valid which would potentially be onerous. But I'm also not sure how much of an issue that that would be in practice. For most ranges, I expect that it will be pretty simple, and many of them already have a valid, empty init value. In the vast majority of cases, I expect that if a range's init value isn't valid, it's because the person who wrote it didn't think about the init value being used (since it's probably a Voldemort type) and didn't test that case. And if the issue of it being problematic to make the init value valid isn't common, I'm disinclined to make range-based code in general have to worry about it - especially if it's still possible to make such ranges work (even if it's more annoying than would be desirable). I can also think of several ways to make it easy right off the top of my head, though you wouldn't necessarily want the additional overhead (e.g. any range could just use is to compare itself to its init value in empty, or it could have a bool to indicate whether it had been initialized via a constructor and check that; it would be annoying but easy). But I'm also not entirely sure how onerous it will be in practice to make the init values of ranges valid (be it empty or otherwise) - though it's actually _really_ easy in the cases where it would segfault now, since if such types then have a wrapper struct, it can easily check whether the pointer or reference is null and return true for empty. - Jonathan M Davis1. Make it so that for finite ranges, the init value of a range is required to not just be valid, but it's also required to be empty.Makes a lot of sense, but I agree it will be very annoying for some ranges. I suspect it will result in extra state and/or checks for those cases. Would it be too crazy of an idea for those ranges to implement a static init function?
Mar 04
On Monday, 4 March 2024 at 21:29:40 UTC, Jonathan M Davis wrote:Of course, if you're dealing with an infinite range, then there can't be a way to make it empty, but IMHO, we should have a way to get an empty range from an arbitrary finite range type where the empty range is of the same type. And for that, I see three possibly options: 1. Make it so that for finite ranges, the init value of a range is required to not just be valid, but it's also required to be empty.I think this is the best solution.A related issue is that if we're removing save from forward ranges, [...] Adding emptyRange or makeEmpty [...] wouldn't actually fix the problem. Some old basic input ranges which are infinite would still pass the new isForwardRange.Are there infinite ranges that are indeed not forward ranges? I mean - they produce their output somehow algorithmically. Should be possible to safe the state of such an algorithm, no? Ok, maybe someone has not implemented it, but as we are requiring things anyway, why not require an infinite range to implement the copy function?
Mar 05
On Tuesday, 5 March 2024 at 14:39:17 UTC, Dom DiSc wrote:Are there infinite ranges that are indeed not forward ranges?/dev/random could be such a range.
Mar 05
On Tuesday, 5 March 2024 at 16:41:34 UTC, Alexandru Ermicioi wrote:On Tuesday, 5 March 2024 at 14:39:17 UTC, Dom DiSc wrote:In the case of /dev/random I don't really see how it's a true infinity, you can't go infinitely backwards. It is an infinitely forward moving stream, so you can have the concept of "zero" and "positive infinity", but there is no concept of "negative infinity". A negative number is utter non-sense for TRNG's, as whatever negative number you specify would become the new "zero". Practically there is a limit to how much entropy you can grab off the the chip in single chunk, so it is a bounded range, it's just that the boundary is somewhat fuzzy. In short you would probably end up implementing an entropy source as a forward range with some large upper limit.Are there infinite ranges that are indeed not forward ranges?/dev/random could be such a range.
Mar 05
On Wed, Mar 06, 2024 at 12:11:10AM +0000, Adam Wilson via Digitalmars-d wrote:On Tuesday, 5 March 2024 at 16:41:34 UTC, Alexandru Ermicioi wrote:Nothing about an infinite range requires that it be infinite both ways, unless it's a bidirectional range, which an RNG is not. [...]On Tuesday, 5 March 2024 at 14:39:17 UTC, Dom DiSc wrote:In the case of /dev/random I don't really see how it's a true infinity, you can't go infinitely backwards.Are there infinite ranges that are indeed not forward ranges?/dev/random could be such a range.Practically there is a limit to how much entropy you can grab off the the chip in single chunk, so it is a bounded range, it's just that the boundary is somewhat fuzzy.[...] A periodically-reseeded RNG is indeed a practically infinite range, with no cycling. You don't have to grab every value from the hardware entropy source; it suffices to use a cryptographic hash function on a counter that's periodically reseeded from the hardware entropy. It can literally generate an endless stream of random numbers. T -- Tell me and I forget. Teach me and I remember. Involve me and I understand. -- Benjamin Franklin
Mar 05
On Wednesday, 6 March 2024 at 00:23:26 UTC, H. S. Teoh wrote:A periodically-reseeded RNG is indeed a practically infinite range, with no cycling. You don't have to grab every value from the hardware entropy source; it suffices to use a cryptographic hash function on a counter that's periodically reseeded from the hardware entropy. It can literally generate an endless stream of random numbers. TThat is true in theory, but in practice if you try it on real hardware, not only will you pay some pretty serious performance penalties as the CPU tries to dump all that entropy, it will be dumping it to memory, of which you will eventually run out. So yes, it's theoretically unlimited, but in practice, there is no valid reason to actually implement it that way, and to-date, no modern Operating System entropy source allows you to. For example, on Windows, you'll be passing a fixed size buffer to the entropy source. Same with OpenSSL. We do not design code for what is theoretically possible, only that which can actually be achieved. I know, because I wrote a Crypto library for D that specifically deals with this.
Mar 05
On 05.03.24 17:41, Alexandru Ermicioi wrote:On Tuesday, 5 March 2024 at 14:39:17 UTC, Dom DiSc wrote:Or a range that returns captured data from a sensor.Are there infinite ranges that are indeed not forward ranges?/dev/random could be such a range.
Mar 06
On Wednesday, 6 March 2024 at 08:56:02 UTC, Arafel wrote:On 05.03.24 17:41, Alexandru Ermicioi wrote:Nice example, Basically, any global resource (sensor, /dev/random, etc.), that can be representable as a range, and doesn't have a definitive end, is an infinite input range, since you can't save it at a specific point and then continue multiple times from that point. Best regards, Alexandru.On Tuesday, 5 March 2024 at 14:39:17 UTC, Dom DiSc wrote:Or a range that returns captured data from a sensor.Are there infinite ranges that are indeed not forward ranges?/dev/random could be such a range.
Mar 06
On Wednesday, 6 March 2024 at 08:56:02 UTC, Arafel wrote:On 05.03.24 17:41, Alexandru Ermicioi wrote:If you have a source of real entropy, ok. But pseudo-random-number-generators do have a seed and with same seed produces always the same numbers.On Tuesday, 5 March 2024 at 14:39:17 UTC, Dom DiSc wrote:Are there infinite ranges that are indeed not forward ranges?/dev/random could be such a range.Or a range that returns captured data from a sensor.Ok, that's a more convincing example. This is really hard to copy. Requires perfect knowlegde in a deterministic universe :-)
Mar 06
On 06.03.24 11:56, Dom DiSc wrote:If you have a source of real entropy, ok. But pseudo-random-number-generators do have a seed and with same seed produces always the same numbers.While technically true, you might not have access to the seed, so in practical terms it's a bit of distinction without a difference. For instance, in `/dev/[u]random` the seed is kept internally by the OS, and I don't think you have access to it.... and even if you had, you wouldn't want to re-implement the kernel's PRNG, right?
Mar 06
On Monday, 4 March 2024 at 21:29:40 UTC, Jonathan M Davis wrote:Okay. A few weeks ago, I made a post about potential design decisions when updating the range API in the next version of Phobos, and this is a continuation of that, albeit with a different set of problems. Specifically: [...]I like T.init being empty. I thought I'd found a clever way to make this work for classes but apparently this crashes, much to my surprise: ```d class Class { bool empty() safe nogc pure nothrow scope const { return this is null; } } Class c; assert(c.empty); ``` This, although supposedly equivalent, works fine: ```d class Class {} bool empty(in Class c) safe nogc pure nothow { return c is null; } ``` Which might be a way out? I still think the first example should work.
Mar 06
On Wednesday, 6 March 2024 at 12:20:40 UTC, Atila Neves wrote: I thought I'd found a clever way tomake this work for classes but apparently this crashes, much to my surprise: ```d class Class { bool empty() safe nogc pure nothrow scope const { return this is null; } } Class c; assert(c.empty); ```I’m surprised that the equivalent C++ code *doesn’t* crash (at least on my machine): ``` Class* c = nullptr; assert(c->empty()); That’s still UB though.
Mar 06
On Wednesday, March 6, 2024 6:09:52 AM MST Ogi via Digitalmars-d wrote:On Wednesday, 6 March 2024 at 12:20:40 UTC, Atila Neves wrote: I thought I'd found a clever way toThe issue is whether the function is virtual. In D, class member functions are virtual by default, so you get a crash when the program attempts to use the vtable to look the function up (and therefore attempts to dereference null), whereas in C++, you don't, because class member functions are non-virtual by default, and so it doesn't do anything with the vtable. If you mark the function as final in D (without it overriding anything), then there shouldn't be a crash, and if you mark the C++ function as virtual, then there will be. - Jonathan M Davismake this work for classes but apparently this crashes, much to my surprise: ```d class Class { bool empty() safe nogc pure nothrow scope const { return this is null; } } Class c; assert(c.empty); ```I’m surprised that the equivalent C++ code *doesn’t* crash (at least on my machine): ``` Class* c = nullptr; assert(c->empty()); That’s still UB though.
Mar 06
On Wednesday, 6 March 2024 at 19:03:39 UTC, Jonathan M Davis wrote:The issue is whether the function is virtual. In D, class member functions are virtual by default, so you get a crash when the program attempts to use the vtable to look the function up (and therefore attempts to dereference null), whereas in C++, you don't, because class member functions are non-virtual by default, and so it doesn't do anything with the vtable. If you mark the function as final in D (without it overriding anything), then there shouldn't be a crash, and if you mark the C++ function as virtual, then there will be.Virtual or not, GCC assumes that `this` cannot be null: ```C++ #include <cassert> class C { public: bool empty() const { return this == nullptr; // warning: `nonnull` argument `this` compared to NULL } }; int main() { assert(((C*)nullptr)->empty()); // warning: `this` pointer is null } ```
Mar 07
On Wednesday, 6 March 2024 at 12:20:40 UTC, Atila Neves wrote:I like T.init being empty. I thought I'd found a clever way to make this work for classes but apparently this crashes, much to my surprise: ```d class Class { bool empty() safe nogc pure nothrow scope const { return this is null; } } Class c; assert(c.empty); ```It crashes because it's attempting to access Class's vtable. If you make the method final, it works: ```d class Class { final bool empty() { return this is null; } } void main() { Class c; assert(c.empty); // ok } ```
Mar 06
On Wednesday, 6 March 2024 at 13:26:14 UTC, Paul Backus wrote:On Wednesday, 6 March 2024 at 12:20:40 UTC, Atila Neves wrote:Of course! I write classes so seldom I forget my own rule to add `final` unless I can't. That would severely restrict class ranges though, since presumably the whole point is to have dynamic dispatch.I like T.init being empty. I thought I'd found a clever way to make this work for classes but apparently this crashes, much to my surprise: ```d class Class { bool empty() safe nogc pure nothrow scope const { return this is null; } } Class c; assert(c.empty); ```It crashes because it's attempting to access Class's vtable. If you make the method final, it works: ```d class Class { final bool empty() { return this is null; } } void main() { Class c; assert(c.empty); // ok } ```
Mar 07
On Thursday, March 7, 2024 2:11:29 AM MST Atila Neves via Digitalmars-d wrote:Of course! I write classes so seldom I forget my own rule to add `final` unless I can't. That would severely restrict class ranges though, since presumably the whole point is to have dynamic dispatch.The solution that I'm currently planning to go with is to wrap them: https://forum.dlang.org/post/mailman.1091.1709755456.3719.digitalmars-d puremagic.com - Jonathan M Davis
Mar 07
On Monday, 4 March 2024 at 21:29:40 UTC, Jonathan M Davis wrote:2. The range API provides no way (other than fully iterating through a range) to get an empty range of the same type from a range unless the range is a random-access range.Genuine question: what are the use-cases for this? In general, the capabilities of ranges are designed to serve the needs of algorithms. Input ranges exist because single-pass iteration is all that's needed to implement algorithms like map, filter, and reduce. Random-access ranges exist because they're needed for sorting algorithms. And so on. I'm not aware of any algorithm, or class of algorithm, that needs this specific capability you're describing. If such algorithms exist, we should be using them to guide our design here. If they don't...then maybe this isn't really a problem at all.
Mar 06
On Wednesday, 6 March 2024 at 14:18:50 UTC, Paul Backus wrote:On Monday, 4 March 2024 at 21:29:40 UTC, Jonathan M Davis wrote:When you need to pass a specific range type, and you want to pass in an empty range of that type, how do you do it? There is no formal definition for "create an empty range T". It's not just algorithms, it's regular functions or types with specific parameter requirements. What is a very logical mechanism is just to pass the `init` value, because 99% of the time, that is the same thing as an empty range. Except when it isn't. But if you aren't testing for that, then you don't realize it. Or if code changes slightly such that now you involve a range you didn't before which doesn't have that property, then all of a sudden code breaks. We can add a new requirement that ranges that can be initialized as empty have some `emptyRange` static member or something like that. But I expect this is going to go down the same route as `save`, where really you should be calling `save` every time you copy a forward range, but in practice nobody does, since regular copying is 99% of the time the same thing. Likewise people will pass `.init` instead of `.emptyRange` because it's almost always the same thing. This is why I think it should just be formally stated that the `init` value of a non-infinite range should be empty, bringing into standard what people *already do*. The only tricky aspect is ranges that are references (classes/pointers). Neither of those to me should be supported IMO, you can always wrap such a thing in a range harness. But if people insist that we must have class ranges, I'd say an `emptyRange` property is in order. -Steve2. The range API provides no way (other than fully iterating through a range) to get an empty range of the same type from a range unless the range is a random-access range.Genuine question: what are the use-cases for this? In general, the capabilities of ranges are designed to serve the needs of algorithms. Input ranges exist because single-pass iteration is all that's needed to implement algorithms like map, filter, and reduce. Random-access ranges exist because they're needed for sorting algorithms. And so on. I'm not aware of any algorithm, or class of algorithm, that needs this specific capability you're describing. If such algorithms exist, we should be using them to guide our design here. If they don't...then maybe this isn't really a problem at all.
Mar 06
On Wed, Mar 06, 2024 at 04:47:02PM +0000, Steven Schveighoffer via Digitalmars-d wrote: [...]The only tricky aspect is ranges that are references (classes/pointers). Neither of those to me should be supported IMO, you can always wrap such a thing in a range harness.[...] Every time this topic comes up, class-based ranges become the whipping boy of range design woes. Actually, they serve an extremely important role: type erasure, which is critical when you have code like this: auto myRangeFunc(R,S)(R range1, S range2) { if (runtimeDecision()) { return range1; } else { return range2; } } This generally will not compile because R and S are different types, even if both conform to a common underlying range API, like an input range. To remedy this situation, class-based range wrappers come to the rescue: auto myRangeFunc(R,S)(R range1, S range2) { if (runtimeDecision()) { return inputRangeObject(range1); } else { return inputRangeObject(range2); } } Note that you can't use a struct wrapper here, because R and S have different ABIs; the only way to correctly forward range methods to R or S is to use overridden base class methods. IOW, the type erasure of R and S is unavoidable for this code to work. // Also, class-based ranges are sometimes necessary for practical reasons, like in this one program I had, that has an UFCS chain consisting of hundreds of components (not all written out explicitly in the same function, of course, but that's what the underlying instantiation looks like). It generated ridiculously large symbols that crashed the compiler. Eventually the bug I filed prodded Rainer to implement symbol compression in DMD. But even then, the symbols were still ridiculously huge -- because the outermost template instantiation had to encode the types of every one of the hundreds of components. As a result, symbol compression or not, compile times were still ridiculously slow because the compiler still had to copy those symbols around -- their length grew quadratically with every component added to the chain. Inserting a call to .inputRangeObject in the middle of the chain dramatically cut down the size of the resulting symbols, because it effectively erased all the types preceding it, resulting in much saner codegen afterwards. T -- Leather is waterproof. Ever see a cow with an umbrella?
Mar 06
On Wednesday, 6 March 2024 at 17:32:17 UTC, H. S. Teoh wrote:Every time this topic comes up, class-based ranges become the whipping boy of range design woes. Actually, they serve an extremely important role: type erasure, which is critical when you have code like this: auto myRangeFunc(R,S)(R range1, S range2) { if (runtimeDecision()) { return range1; } else { return range2; } } [...] Note that you can't use a struct wrapper here, because R and S have different ABIs; the only way to correctly forward range methods to R or S is to use overridden base class methods. IOW, the type erasure of R and S is unavoidable for this code to work.This is what std.range.choose [1] is for. Internally, it's implemented as a tagged union of R and S. [1] https://dlang.org/library/std/range/choose.html
Mar 06
On Wed, Mar 06, 2024 at 05:43:05PM +0000, Paul Backus via Digitalmars-d wrote:On Wednesday, 6 March 2024 at 17:32:17 UTC, H. S. Teoh wrote:Haha, I knew that one would be brought up. Unfortunately, it only works for a very limited subset of use cases. In the above simplified code you could replace the if-statement with .choose. However, this immediately falls flat if the decision is made by a switch statement, or nested if-blocks, or an early exit from a loop. The problem is that .choose requires that both ranges are constructible in the same scope that it is called in. In non-trivial code this is sometimes impossible, as you may be in a decision tree where certain subranges are only constructed if the condition is true, or may depend on value types that may not even exist if the condition is not satisfied. This makes .choose essentially useless outside of trivial cases. Besides, .choose does not type-erase, so in the cases where you want type erasure it's still no help. T -- English is useful because it is a mess. Since English is a mess, it maps well onto the problem space, which is also a mess, which we call reality. Similarly, Perl was designed to be a mess, though in the nicest of all possible ways. -- Larry WallEvery time this topic comes up, class-based ranges become the whipping boy of range design woes. Actually, they serve an extremely important role: type erasure, which is critical when you have code like this: auto myRangeFunc(R,S)(R range1, S range2) { if (runtimeDecision()) { return range1; } else { return range2; } } [...] Note that you can't use a struct wrapper here, because R and S have different ABIs; the only way to correctly forward range methods to R or S is to use overridden base class methods. IOW, the type erasure of R and S is unavoidable for this code to work.This is what std.range.choose [1] is for. Internally, it's implemented as a tagged union of R and S. [1] https://dlang.org/library/std/range/choose.html
Mar 06
On Wednesday, 6 March 2024 at 17:57:36 UTC, H. S. Teoh wrote:The problem is that .choose requires that both ranges are constructible in the same scope that it is called in.It's not even about those ranges being constructible. It's about separation of concerns. Like you wrote in your earlier post, sometimes we want to work with an arbitrary range without templating all of our code. Maybe for compile time reasons, or maybe to write a library that can be separately compiled from the client code.This makes .choose essentially useless outside of trivial cases.It's far from useless, it's only like taking an enum argument and `switch`ing on it instead of taking a delerate argument. Certainly works but means you have to list all your cases in the function, which may or may not be what you want. The gripe I have with `std.range.interface` is lack of attributes in the virtual functions. I don't think any simple attribute set alone would solve the problem well. Granted, there should rarely be reason to have a dynamic ` system` range type, but `pure` and `nothrow` are attributes you sometimes really want, sometimes you really don't. Therefore we need the ability to pick, meaning the interfaces will have to be templated, along the lines of: ```D interface ForwardRange(uint attrs) : InputRange!attrs { // ... } ``` Plus there should probably be function to convert, for example, a ``` BidirectionalRange!(FunctionAttribute!safe | FunctionAttribute!pure_ | FunctionAttribute!nothrow_) ``` to ``` BidirectionalRange!(FunctionAttribute!safe | FunctionAttribute!nothrow_) ``` .
Mar 06
On Wednesday, March 6, 2024 1:53:55 PM MST Dukc via Digitalmars-d wrote:The gripe I have with `std.range.interface` is lack of attributes in the virtual functions. I don't think any simple attribute set alone would solve the problem well. Granted, there should rarely be reason to have a dynamic ` system` range type, but `pure` and `nothrow` are attributes you sometimes really want, sometimes you really don't. Therefore we need the ability to pick, meaning the interfaces will have to be templated, along the lines of: ```D interface ForwardRange(uint attrs) : InputRange!attrs { // ... } ``` Plus there should probably be function to convert, for example, a ``` BidirectionalRange!(FunctionAttribute!safe | FunctionAttribute!pure_ | FunctionAttribute!nothrow_) ``` to ``` BidirectionalRange!(FunctionAttribute!safe | FunctionAttribute!nothrow_) ``` .This is a fundamental problem with classes in general. Attributes just do not play nicely with them. Either you're forced to require attributes that potentially won't work for some code, or you're forced to disallow attributes that some code might require. And what you're describing results in a proliferation of interfaces, which to an extent, goes against the point of using interfaces. So, we might do something like that, but it does cause a different set of problems. Regardless, we will have to look over exactly what we want to do with the replacement for std.range.interfaces. I think that it was mostly thrown together because it was thought that it was needed and not because it really had been needed much at that point, and we should definitely look at what needs reworking based on what we've learned in the interim. Classes in general are kind of terrible with generic code in D though, particularly because of the issues with attributes. - Jonathan M Davis
Mar 06
On Wednesday, 6 March 2024 at 21:24:59 UTC, Jonathan M Davis wrote:This is a fundamental problem with classes in general. Attributes just do not play nicely with them. Either you're forced to require attributes that potentially won't work for some code, or you're forced to disallow attributes that some code might require.It's not a problem with classes, it's a decision one has to make when designing an interface with runtime variance regardless of the mechanism. You either have to allow the unknown code you're calling to execute side effects in which case it's impure, or you must execute the side effects it requests on it's behalf which means the unknown code must use your interfaces instead of doing their io themselves. It would be the same with any other mechanism.And what you're describing results in a proliferation of interfaces, which to an extent, goes against the point of using interfaces.I don't think so. Say, you're designing an interpreter, and decide the user-defined functions can't execute io directly or break D type system, but can throw exceptions. Therefore the D type for a range in your client langauge will be `ForwardRange!(ClientLangValue, FunctionAttribute.safe | FunctionAttribute.pure_)`. You need only one template instance, same as presently. How is this exactly proliferation?
Mar 06
On Wednesday, March 6, 2024 3:54:54 PM MST Dukc via Digitalmars-d wrote:On Wednesday, 6 March 2024 at 21:24:59 UTC, Jonathan M Davis wrote:Yes, it's a problem with non-templated code in general, but because you can't templatize virtual functions, it's a problem with classes that you can't really fix, whereas with most other code, you can usually fix it by templatizing the code and inferring the attributes based on the arguments.This is a fundamental problem with classes in general. Attributes just do not play nicely with them. Either you're forced to require attributes that potentially won't work for some code, or you're forced to disallow attributes that some code might require.It's not a problem with classes, it's a decision one has to make when designing an interface with runtime variance regardless of the mechanism. You either have to allow the unknown code you're calling to execute side effects in which case it's impure, or you must execute the side effects it requests on it's behalf which means the unknown code must use your interfaces instead of doing their io themselves. It would be the same with any other mechanism.It's a proliferation, because you're declaring a bunch of different interfaces. Instead of having ForwardRange, you have a whole bunch of variants of ForwardRange. Now, that might not end up being a problem in practice (particularly since you do already have to templatize ForwardRange on the element type), because you may only need one variation in your codebase, but it does further complicate the interfaces, and it does mean that interfaces with the same element type won't necessarily be compatible, whereas with the current design, that isn't a problem. So, it may ultimately be what we want to do, but it's not without its downsides. - Jonathan M DavisAnd what you're describing results in a proliferation of interfaces, which to an extent, goes against the point of using interfaces.I don't think so. Say, you're designing an interpreter, and decide the user-defined functions can't execute io directly or break D type system, but can throw exceptions. Therefore the D type for a range in your client langauge will be `ForwardRange!(ClientLangValue, FunctionAttribute.safe | FunctionAttribute.pure_)`. You need only one template instance, same as presently. How is this exactly proliferation?
Mar 06
On Wed, Mar 06, 2024 at 08:53:55PM +0000, Dukc via Digitalmars-d wrote:On Wednesday, 6 March 2024 at 17:57:36 UTC, H. S. Teoh wrote:[...]Well yes, so it's very limited in what it can do. In the cases where you can't list all cases in the call to .choose, or where you don't want to for code cleanliness reasons, it's useless.This makes .choose essentially useless outside of trivial cases.It's far from useless, it's only like taking an enum argument and `switch`ing on it instead of taking a delerate argument. Certainly works but means you have to list all your cases in the function, which may or may not be what you want.The gripe I have with `std.range.interface` is lack of attributes in the virtual functions. I don't think any simple attribute set alone would solve the problem well.[...] The problem is that the base class cannot predict which restrictive attributes will be applied to a derived class override of a method. You cannot override a safe base class method with a system derived class method, for example. So it must settle for the most permissive attributes. But that means the code that receives a base class reference must handle the most permissive attributes as well; a safe function would not be able to invoke a safe method in a derived class if the only reference it has to the derived object is a base class reference. There's no way around this except to get rid of (restrictive) attributes altogether. T -- EMACS = Extremely Massive And Cumbersome System
Mar 06
On Wednesday, 6 March 2024 at 17:32:17 UTC, H. S. Teoh wrote:Every time this topic comes up, class-based ranges become the whipping boy of range design woes. Actually, they serve an extremely important role: type erasure, which is critical when you have code like this: ...etc.This would be a complete non-issue if we had language-level sumtypes: ``` sumtype { R, S } myRangeFunc(R,S)(R range1, S range2) { if (runtimeDecision()) { return range1; } else { return range2; } } ``` std.sumtype can also do this, the ergonomics just aren't as nice.
Mar 06
On Wed, Mar 06, 2024 at 05:51:11PM +0000, Meta via Digitalmars-d wrote:On Wednesday, 6 March 2024 at 17:32:17 UTC, H. S. Teoh wrote:No, it doesn't work the way you think it would. The returned range must behave like an input range (forward, etc.), as a single entity, because the caller doesn't and shouldn't know or care whether the returned value is a sumtype or not. It must be able to just call .front, .empty, .popFront on the result without any explicit unwrapping. You can't do that with a sumtype, because you need to unwrap first. And even if you make unwrapping implicit, so that a sumtype allows operations common to all underlying types, there are implementational difficulties. Calling e.g. .front on a sumtype potentially accessing two completely incompatible functions. For example, if R.empty is a function but S.empty is a field, there is no sane way to implement the sumtype in such a way that the caller could just access .empty without introducing a runtime switch into every range API call. (And even if you were to implement things in this convoluted way, there are unaddressed difficulties of what .empty on a sumtype might mean if it refers to two completely incompatible things, like a 3-parameter function in R and an alias to a symbol in S. What would mySumtype.empty actually compile to in that case?) T -- It's bad luck to be superstitious. -- YHLEvery time this topic comes up, class-based ranges become the whipping boy of range design woes. Actually, they serve an extremely important role: type erasure, which is critical when you have code like this: ...etc.This would be a complete non-issue if we had language-level sumtypes: ``` sumtype { R, S } myRangeFunc(R,S)(R range1, S range2) { if (runtimeDecision()) { return range1; } else { return range2; } } ``` std.sumtype can also do this, the ergonomics just aren't as nice.
Mar 06
On Wednesday, 6 March 2024 at 18:07:22 UTC, H. S. Teoh wrote:On Wed, Mar 06, 2024 at 05:51:11PM +0000, Meta via Digitalmars-d wrote:Ah, you're right. There is really no way to do this in a completely flexible way in current D, if you also want to have value range semantics. Couldn't we somehow fit RefRange into the current range hierarchy somewhere, and provide primitives to make it more difficult to shoot yourself in the foot?On Wednesday, 6 March 2024 at 17:32:17 UTC, H. S. Teoh wrote:No, it doesn't work the way you think it would. The returned range must behave like an input range (forward, etc.), as a single entity, because the caller doesn't and shouldn't know or care whether the returned value is a sumtype or not. It must be able to just call .front, .empty, .popFront on the result without any explicit unwrapping. You can't do that with a sumtype, because you need to unwrap first. And even if you make unwrapping implicit, so that a sumtype allows operations common to all underlying types, there are implementational difficulties. Calling e.g. .front on a sumtype potentially accessing two completely incompatible functions. For example, if R.empty is a function but S.empty is a field, there is no sane way to implement the sumtype in such a way that the caller could just access .empty without introducing a runtime switch into every range API call. (And even if you were to implement things in this convoluted way, there are unaddressed difficulties of what .empty on a sumtype might mean if it refers to two completely incompatible things, like a 3-parameter function in R and an alias to a symbol in S. What would mySumtype.empty actually compile to in that case?) TEvery time this topic comes up, class-based ranges become the whipping boy of range design woes. Actually, they serve an extremely important role: type erasure, which is critical when you have code like this: ...etc.This would be a complete non-issue if we had language-level sumtypes: ``` sumtype { R, S } myRangeFunc(R,S)(R range1, S range2) { if (runtimeDecision()) { return range1; } else { return range2; } } ``` std.sumtype can also do this, the ergonomics just aren't as nice.
Mar 06
On Wednesday, March 6, 2024 1:23:52 PM MST Meta via Digitalmars-d wrote:Couldn't we somehow fit RefRange into the current range hierarchy somewhere, and provide primitives to make it more difficult to shoot yourself in the foot?RefRange was a huge mistake (and it was my mistake). It's basically trying to force reference semantics on something that has value semantics, and it causes problems as a result. I don't think that it can actually be made to work properly - at least not with forward ranges. IMHO, we need to be able to rely on the semantics of the basic range operations, and most of the problems with the current range API come from when operations are under-specified or outright unspecified - be it because we're trying to support too much (e.g. both structs and classes for ranges), or because we just didn't actually think to specify what the behavior should be. And in some cases, we though of it but too late to actually require it (e.g. you can't actually rely on $ working with random-access ranges, because we didn't think to require it up front, and too many random-access ranges had been written that didn't support it by the time we realized our mistake). - Jonathan M Davis
Mar 06
On Wednesday, 6 March 2024 at 21:17:37 UTC, Jonathan M Davis wrote:On Wednesday, March 6, 2024 1:23:52 PM MST Meta via Digitalmars-d wrote:I don't mean the type RefRange; I mean expanding the range hierarchy to include the concept of a "RefRange" (just like InputRange, BidirectionalRange, etc.) that explicitly has reference semantics instead of value semantics. Then you can specialize algorithms on whether they're a ref range or not (implying that ranges with reference semantics should no longer pass the isInputRange et al. checks - no idea how you'd do that OTOH), and then you can account for that fact in the implementation of the algorithm, rather than all range algorithms currently carrying an implicit assumption that they're working with a value range.Couldn't we somehow fit RefRange into the current range hierarchy somewhere, and provide primitives to make it more difficult to shoot yourself in the foot?RefRange was a huge mistake (and it was my mistake). It's basically trying to force reference semantics on something that has value semantics, and it causes problems as a result. I don't think that it can actually be made to work properly - at least not with forward ranges. IMHO, we need to be able to rely on the semantics of the basic range operations, and most of the problems with the current range API come from when operations are under-specified or outright unspecified - be it because we're trying to support too much (e.g. both structs and classes for ranges), or because we just didn't actually think to specify what the behavior should be. And in some cases, we though of it but too late to actually require it (e.g. you can't actually rely on $ working with random-access ranges, because we didn't think to require it up front, and too many random-access ranges had been written that didn't support it by the time we realized our mistake). - Jonathan M Davis
Mar 06
On Wednesday, 6 March 2024 at 17:32:17 UTC, H. S. Teoh wrote:On Wed, Mar 06, 2024 at 04:47:02PM +0000, Steven Schveighoffer via Digitalmars-d wrote: [...]As Jonathan already responded, you can have a class range by wrapping it in a struct. Which is what I meant when I said: "you can always wrap such a thing in a range harness" -SteveThe only tricky aspect is ranges that are references (classes/pointers). Neither of those to me should be supported IMO, you can always wrap such a thing in a range harness.[...] Every time this topic comes up, class-based ranges become the whipping boy of range design woes. Actually, they serve an extremely important role: type erasure, which is critical when you have code like this:
Mar 07
On Wednesday, 6 March 2024 at 16:47:02 UTC, Steven Schveighoffer wrote:On Wednesday, 6 March 2024 at 14:18:50 UTC, Paul Backus wrote:By "specific range type", do you mean a specific category of range (input, forward, random-access, etc.), or a specific concrete type? If the former, this is already easy to do with existing language and library features (for example, in many cases you can use an empty slice). If the latter, then the question I'm asking is, why do you need to do that? Personally, I can think of plenty of times that I've needed to do the first thing, but I can't think of a single time that I've needed to do the second thing.On Monday, 4 March 2024 at 21:29:40 UTC, Jonathan M Davis wrote:When you need to pass a specific range type, and you want to pass in an empty range of that type, how do you do it?2. The range API provides no way (other than fully iterating through a range) to get an empty range of the same type from a range unless the range is a random-access range.Genuine question: what are the use-cases for this? In general, the capabilities of ranges are designed to serve the needs of algorithms. Input ranges exist because single-pass iteration is all that's needed to implement algorithms like map, filter, and reduce. Random-access ranges exist because they're needed for sorting algorithms. And so on. I'm not aware of any algorithm, or class of algorithm, that needs this specific capability you're describing. If such algorithms exist, we should be using them to guide our design here. If they don't...then maybe this isn't really a problem at all.The only tricky aspect is ranges that are references (classes/pointers). Neither of those to me should be supported IMO, you can always wrap such a thing in a range harness.The main thing you lose by dropping support for reference-type ranges is interfaces. In particular, the interface inheritance hierarchy in `std.range.interfaces`, where `ForwardRange` inherits from `InputRange` and so on, cannot really be replicated using `structs` (`alias this` only goes so far).
Mar 06
On 07/03/2024 6:38 AM, Paul Backus wrote:The only tricky aspect is ranges that are references (classes/pointers). Neither of those to me should be supported IMO, you can always wrap such a thing in a range harness. The main thing you lose by dropping support for reference-type ranges is interfaces. In particular, the interface inheritance hierarchy in |std.range.interfaces|, where |ForwardRange| inherits from |InputRange| and so on, cannot really be replicated using |structs| (|alias this| only goes so far).All of this would ideally be solved with the introduction of signatures. But alas as we've covered in past conversations that is a big set of features to introduce.
Mar 06
On Wednesday, 6 March 2024 at 17:38:56 UTC, Paul Backus wrote:On Wednesday, 6 March 2024 at 16:47:02 UTC, Steven Schveighoffer wrote:I meant the latter. A concrete range type. How does it happen? Consider that an array is a concrete type that people use all the time. Why should any other range be different? I've definitely stored ranges and other things as type members that were voldemort types. This ability is more of a question of "do we want to add this feature to ranges or not?" The feature doesn't currently exist -- you can't assume that an uninitialized range is empty. Sometimes, we are bitten by the fact that the array is the most common range, and behaves in a specific way. People depend on that mechanism without realizing it, and then sometime later, they decide to change the type to one that is very compatible with arrays, but offers some benefit (i.e. to remove an allocation). However, the new range type might behave in unexpected ways, *but still compiles*. A few examples I can think of: 1. copying an array is equivalent to `arr.save`, but may not be the case for other forward ranges. 2. character arrays have a mechanism to decode into dchar if you specify `dchar` as the loop variable type. 3. arrays have a default value of an empty array. Reducing surprises when you substitute what seems like a "compatible" type is desirable, but not strictly necessary. I.e. I'm also OK if we don't try and add these definitions. For the empty array case, I think the mechanism is trivial to add as a formal requirement, since nearly all ranges default to empty, and the ones that don't are probably easy to change. Maybe this isn't the case? I don't know. I think reducing friction when it's easy to do is something we should always be looking at.On Wednesday, 6 March 2024 at 14:18:50 UTC, Paul Backus wrote:By "specific range type", do you mean a specific category of range (input, forward, random-access, etc.), or a specific concrete type? If the former, this is already easy to do with existing language and library features (for example, in many cases you can use an empty slice). If the latter, then the question I'm asking is, why do you need to do that?On Monday, 4 March 2024 at 21:29:40 UTC, Jonathan M Davis wrote:When you need to pass a specific range type, and you want to pass in an empty range of that type, how do you do it?2. The range API provides no way (other than fully iterating through a range) to get an empty range of the same type from a range unless the range is a random-access range.Genuine question: what are the use-cases for this? In general, the capabilities of ranges are designed to serve the needs of algorithms. Input ranges exist because single-pass iteration is all that's needed to implement algorithms like map, filter, and reduce. Random-access ranges exist because they're needed for sorting algorithms. And so on. I'm not aware of any algorithm, or class of algorithm, that needs this specific capability you're describing. If such algorithms exist, we should be using them to guide our design here. If they don't...then maybe this isn't really a problem at all.As mentioned, you can wrap these interfaces into structs, which then have better lifetime tracking capabilities. -SteveThe only tricky aspect is ranges that are references (classes/pointers). Neither of those to me should be supported IMO, you can always wrap such a thing in a range harness.The main thing you lose by dropping support for reference-type ranges is interfaces. In particular, the interface inheritance hierarchy in `std.range.interfaces`, where `ForwardRange` inherits from `InputRange` and so on, cannot really be replicated using `structs` (`alias this` only goes so far).
Mar 07
On Thu, Mar 07, 2024 at 06:32:50PM +0000, Steven Schveighoffer via Digitalmars-d wrote: [...]Sometimes, we are bitten by the fact that the array is the most common range, and behaves in a specific way. People depend on that mechanism without realizing it, and then sometime later, they decide to change the type to one that is very compatible with arrays, but offers some benefit (i.e. to remove an allocation). However, the new range type might behave in unexpected ways, *but still compiles*.Over the past decade of working with D, this has repeatedly come up as the weakness of a signature-constraint based approach to ducktyping. The problem is that what the signature constraint requires (e.g., isInputRange) may only be a subset of what the function body assumes, and there is no way to check this mechanically (signature constraints are Turing-complete). What ideally should happen is that whatever the code assumes should also be declared in the function signature. So if a template takes a parameter t of generic type T, and then goes and performs t++, then the compiler should enforce that ++ is declared as part of the constraints on T. The C++ concepts approach is better in this respect, in that the compiler can typecheck the function body and emit an error if it tries to perform an operation on T that it didn't declare as part of its signature constraint. Barring implementing concepts in D, which is unlikely to happen, the next best alternative is for the compiler to enforce that any operation on T must have a corresponding check in the signature constraint, and when this is not the case, it should, based on the attempted operation, issue an error message with a suggested addition to the sig constraint that would satisfy this requirement. This doesn't fully solve the problem here, of course -- sometimes the difference is semantic rather than something easily checkable by the compiler, like subtle differences between built-in arrays and user-defined arrays; in that case you're still up the creek without a paddle. But it's a step forward. T -- The trouble with TCP jokes is that it's like hearing the same joke over and over.
Mar 07
On Thursday, 7 March 2024 at 18:32:50 UTC, Steven Schveighoffer wrote:I meant the latter. A concrete range type. How does it happen? Consider that an array is a concrete type that people use all the time. Why should any other range be different? I've definitely stored ranges and other things as type members that were voldemort types.This doesn't answer my question. There are lots of ways in which (built-in) arrays are different from other types of ranges. Most notably, an array can be used as either a range or a container. There are plenty of use-cases for creating an empty container. What I am asking, specifically, is whether there is any use-case where *generic code*, given a range of some arbitrary type `R`, needs to create another range which both (a) has the exact concrete type `R`, and (b) is empty. Since that's the feature that's being proposed here. (Non-generic code does not need this feature to be part of the range API, because it can rely on specific features of whatever concrete type it's working with.)This ability is more of a question of "do we want to add this feature to ranges or not?" The feature doesn't currently exist -- you can't assume that an uninitialized range is empty.If there are no use-cases for this feature, then the answer to "do we want to add it" ought to be "no." That's why I'm asking about use-cases.Sometimes, we are bitten by the fact that the array is the most common range, and behaves in a specific way. People depend on that mechanism without realizing it, and then sometime later, they decide to change the type to one that is very compatible with arrays, but offers some benefit (i.e. to remove an allocation). However, the new range type might behave in unexpected ways, *but still compiles*.This is a general problem with templates/macros compared to typed generics. Even if we get rid of this particular edge case, there are still dozens more that users are going to run into if they only test with arrays. If we want to address this problem, I think the best thing we can do is to provide a standard suite of test ranges that users can plug into their code to uncover edge cases and bugs.How do you implement this with structs? interface ForwardAssignable : InputAssignable!E, ForwardRange!E Interfaces allow multiple inheritance. Structs can only have one `alias this` member. Maybe you're fine with giving up on this feature, but let's at least be honest that we *are* giving up features here.The main thing you lose by dropping support for reference-type ranges is interfaces. In particular, the interface inheritance hierarchy in `std.range.interfaces`, where `ForwardRange` inherits from `InputRange` and so on, cannot really be replicated using `structs` (`alias this` only goes so far).As mentioned, you can wrap these interfaces into structs, which then have better lifetime tracking capabilities.
Mar 07
On Wednesday, March 6, 2024 10:32:17 AM MST H. S. Teoh via Digitalmars-d wrote:On Wed, Mar 06, 2024 at 04:47:02PM +0000, Steven Schveighoffer via Digitalmars-d wrote: [...]My current plan is to retain std.range.interfaces in some form or another so that we can have ranges as classes as far as their implementation goes (at my job, we actually have to use classes for ranges, because we have to be able to wrap arbitrary ranges at runtime, since we're dealing with an intepreter). However, they would not count as ranges by any range-based algorithms, because they would be classes, and so they would have to be wrapped in structs to actually be used as ranges. The wrapper structs would then do all of the work to ensure that the proper copying behavior was implemented. So, you'd do something like auto r = inputRangeObject(range1); and get a struct range wrapping a class from the replacement for std.range.interfaces. And if we need the ability to get at the underlying class for something, it's as simple as providing a member to the struct which returns it - e.g. source, like we do with several other range types. Then you could do something like auto r = inputRangeObject(range1); InputRange cls = r.source; Of course, you lose access to the class as soon as it's wrapped in another range type, but that happens right now when you pass ranges around which are classes. So, that shouldn't really change much from what I can see. It's basically just wrapping classes in structs so that the struct does the behavior that we currently do with save (and it could do it more intelligently by taking advantage of reference counting rather than dup-ing the class whenever the range is copied). So, unless I'm missing something here, we _should_ be able to just wrap classes in structs to allow ranges which are implemented as classes even if the classes themselves can't be ranges any longer. The net result should be that dealing with forward ranges is simpler, and we stop having save-related bugs, but we shouldn't actually lose any functionality. - Jonathan M DavisThe only tricky aspect is ranges that are references (classes/pointers). Neither of those to me should be supported IMO, you can always wrap such a thing in a range harness.[...] Every time this topic comes up, class-based ranges become the whipping boy of range design woes. Actually, they serve an extremely important role: type erasure, which is critical when you have code like this: auto myRangeFunc(R,S)(R range1, S range2) { if (runtimeDecision()) { return range1; } else { return range2; } } This generally will not compile because R and S are different types, even if both conform to a common underlying range API, like an input range. To remedy this situation, class-based range wrappers come to the rescue: auto myRangeFunc(R,S)(R range1, S range2) { if (runtimeDecision()) { return inputRangeObject(range1); } else { return inputRangeObject(range2); } } Note that you can't use a struct wrapper here, because R and S have different ABIs; the only way to correctly forward range methods to R or S is to use overridden base class methods. IOW, the type erasure of R and S is unavoidable for this code to work. // Also, class-based ranges are sometimes necessary for practical reasons, like in this one program I had, that has an UFCS chain consisting of hundreds of components (not all written out explicitly in the same function, of course, but that's what the underlying instantiation looks like). It generated ridiculously large symbols that crashed the compiler. Eventually the bug I filed prodded Rainer to implement symbol compression in DMD. But even then, the symbols were still ridiculously huge -- because the outermost template instantiation had to encode the types of every one of the hundreds of components. As a result, symbol compression or not, compile times were still ridiculously slow because the compiler still had to copy those symbols around -- their length grew quadratically with every component added to the chain. Inserting a call to .inputRangeObject in the middle of the chain dramatically cut down the size of the resulting symbols, because it effectively erased all the types preceding it, resulting in much saner codegen afterwards.
Mar 06
On Wed, Mar 06, 2024 at 01:03:35PM -0700, Jonathan M Davis via Digitalmars-d wrote: [...]My current plan is to retain std.range.interfaces in some form or another so that we can have ranges as classes as far as their implementation goes (at my job, we actually have to use classes for ranges, because we have to be able to wrap arbitrary ranges at runtime, since we're dealing with an intepreter). However, they would not count as ranges by any range-based algorithms, because they would be classes, and so they would have to be wrapped in structs to actually be used as ranges. The wrapper structs would then do all of the work to ensure that the proper copying behavior was implemented.[...]So, unless I'm missing something here, we _should_ be able to just wrap classes in structs to allow ranges which are implemented as classes even if the classes themselves can't be ranges any longer. The net result should be that dealing with forward ranges is simpler, and we stop having save-related bugs, but we shouldn't actually lose any functionality.[...] This actually sounds like a reasonable approach. T -- Applying computer technology is simply finding the right wrench to pound in the correct screw.
Mar 06
On Wednesday, March 6, 2024 7:18:50 AM MST Paul Backus via Digitalmars-d wrote:On Monday, 4 March 2024 at 21:29:40 UTC, Jonathan M Davis wrote:There are times where I've had to make a range empty in algorithms in order to indicate that it's done or force it to be done. This is particularly true with some algorithms I've written where the range was passed by ref. IIRC, I tried to add something to std.range for this years ago (either that, or I tried to make takeNone do it; it's been a while) using the init value, but Andrei nixed it on the grounds that we couldn't rely on it. The result is that while we have takeNone, you can't use it to actually make a range empty. You can just use it to get an empty range of a potentially different type. So, while this is not something that I've seen come up frequently, this is definitely something that I've seen come up in my own code. And in general, if we're able to require that the init value be empty for finite ranges, that will reduce the number of bugs surrounding member variables and the init value, since it's fairly common to end up using the init value of a range when you have to store it as a member variable (even if it was a Voldemort type to begin with), and the natural thing to do at that point is to assume that the init value is empty (even though in reality, for some range types right now, it isn't even in a valid). But yes, the desire to have a way to force a range to be empty is something that's come from actual experience. - Jonathan M Davis2. The range API provides no way (other than fully iterating through a range) to get an empty range of the same type from a range unless the range is a random-access range.Genuine question: what are the use-cases for this? In general, the capabilities of ranges are designed to serve the needs of algorithms. Input ranges exist because single-pass iteration is all that's needed to implement algorithms like map, filter, and reduce. Random-access ranges exist because they're needed for sorting algorithms. And so on. I'm not aware of any algorithm, or class of algorithm, that needs this specific capability you're describing. If such algorithms exist, we should be using them to guide our design here. If they don't...then maybe this isn't really a problem at all.
Mar 06