digitalmars.D - Range Redesign: Copy Semantics
- Jonathan M Davis (238/238) Jan 20 I've been thinking about this for a while now, but with the next version...
- Per =?UTF-8?B?Tm9yZGzDtnc=?= (6/7) Jan 21 Shall
- Jonathan M Davis (23/30) Jan 21 Maybe. It probably should be, though I don't know how many changes like ...
- Alexandru Ermicioi (4/21) Jan 21 Imho, it must not be added, as this might conflate with contents
- Jonathan M Davis (13/36) Jan 21 It doesn't conflate with Nullable!bool at all, because the alias this wa...
- Richard (Rikki) Andrew Cattermole (4/9) Jan 21 Umm, why would we bring back Nullable?
- Jonathan M Davis (27/37) Jan 21 Much as I know that you like the idea of sum types, AFAIK, it's never be...
- Richard (Rikki) Andrew Cattermole (4/21) Jan 21 Given that Walter has put together a DIP for them, I'm fairly confident
- Jonathan M Davis (15/38) Jan 21 Well, we'll see what we actually end up with, but I'd consider that prop...
- Alexandru Ermicioi (14/46) Jan 21 For all these cases there is `.save` method which seems to be
- Jonathan M Davis (39/49) Jan 21 And how on earth would you use a range if it can't be copied? They're co...
- Nick Treleaven (6/14) Jan 21 Maybe they could be moved instead of copied when the original is
- Jonathan M Davis (45/59) Jan 21 Do you really want to have to call move every time that you call a
- Alexandru Ermicioi (19/87) Jan 21 We could use java Iterator api in this case which has also `bool
- Jonathan M Davis (49/82) Jan 21 Well, the key change the API that would really matter IMHO would be to m...
- Paul Backus (51/62) Jan 21 I think maybe we're losing sight of the big picture in this
- Jonathan M Davis (48/91) Jan 22 In practice, basic input ranges don't work with the vast majority of
- Paul Backus (43/62) Jan 22 Here's a very rough approximation of the number of algorithms in
- Jonathan M Davis (89/140) Jan 22 I will admit that the number is higher than I would have expected, but m...
- monkyyy (3/12) Jan 24 Oh, I hope someone pays attension and makes arrays more important
- Quirin Schroll (44/57) Jan 23 Not necessarily. First, any algorithm requiring a forward range
- Alexandru Ermicioi (9/20) Jan 22 Well one advantage of `hasNext` is that you can check in advance
- Jonathan M Davis (68/88) Jan 22 true
- Paul Backus (19/23) Jan 21 With the proposed design, it would be possible to implement
- Paul Backus (2/5) Jan 21 Correction: fr should be passed by ref (or maybe auto ref).
- Jonathan M Davis (9/32) Jan 21 True, but then you have a function with the same copy semantics problem ...
- Paul Backus (6/14) Jan 21 [...]
- Jonathan M Davis (24/39) Jan 21 Well, even if the parameter is ref (which it would need to be), the core
- Jonathan M Davis (18/41) Jan 22 Actually. What occurs to me is that if we made this function take the ra...
- Quirin Schroll (15/38) Jan 23 It wouldn’t work. That’s the point. It would compile, probably,
- Walter Bright (5/5) Jan 21 Thanks for bringing this up. Improving range semantics is something we s...
- Paul Backus (11/14) Jan 21 Surely this would be better handled outside the range API.
- Jonathan M Davis (31/36) Jan 21 I will have to study that more to see what's needed there. If it's the k...
- Sebastiaan Koppe (24/32) Jan 21 Thanks for the write up.
- Paul Backus (11/20) Jan 21 If your concern is performance, an indirect function call is
- Ben Jones (13/15) Jan 21 I've been writing a fair amount of Kotlin (an a tiny bit of
- Quirin Schroll (7/22) Jan 23 This is off-topic, but I want to mention that the implicit `it`
- Jonathan M Davis (54/88) Jan 21 Well, arguably, that's really an implementation detail of the range, and...
- Sebastiaan Koppe (30/88) Jan 22 Yes exactly. My point is that if there is an explicit place to do
- Steven Schveighoffer (11/40) Jan 21 While I agree with most of what you wrote, there is one very big
- Paul Backus (17/25) Jan 21 One possibility is to change the forward range interface to
- Jonathan M Davis (34/60) Jan 21 Well, I think that the reality of the matter here is that if we want to ...
- Jonathan M Davis (24/32) Jan 21 Yeah, I was orignally hoping that this sort of thing could just be solve...
- Atila Neves (9/19) Jan 22 I don't think I've ever encountered a situation where reference
- Richard (Rikki) Andrew Cattermole (15/35) Jan 22 I did experiment with them when I first learned ranges.
- H. S. Teoh (40/58) Jan 22 It's useful in recursive-descent parsers where you expect the range to
- Atila Neves (3/20) Jan 23 Or moved.
- H. S. Teoh (8/21) Jan 23 [...]
- Paul Backus (15/19) Jan 22 This was my initial thought too, but it turns out it's not quite
- Atila Neves (16/34) Jan 23 I don't think we should that because algorithms that don't care
- Steven Schveighoffer (21/40) Jan 23 You are thinking about this the wrong way. Copyability becomes a
- Paul Backus (27/40) Jan 23 The only problem with this is that what you are asking the range
- Nick Treleaven (5/11) Jan 24 `isForwardRange` could require the range *type* to have `empty,
- Paul Backus (10/21) Jan 24 It could, but I don't think it's a great idea, because it would
- Steven Schveighoffer (26/62) Jan 24 Wouldn't that be on the person who wrapped the type? I wouldn't
- Paul Backus (28/49) Jan 24 Yes, it would--which is exactly what I'd like to avoid. I don't
- Paul Backus (5/10) Jan 24 Having thought about it some more, this is a bad argument, and I
- Jonathan M Davis (14/24) Jan 24 Yeah. We can enforce some stuff statically (e.g. we can require that a
- Alexandru Ermicioi (3/5) Jan 24 Isn't `isInputRange`, `isForwardRange`, and the rest, kind of
- Jonathan M Davis (18/23) Jan 24 Traits like that can test (and thus be used to enforce) _some_ of the
- Alexandru Ermicioi (7/26) Jan 24 That's what unit tests are for, while those traits act like
- Jonathan M Davis (29/56) Jan 24 Of course that's what unit tests are for, but that doesn't enforce anyth...
- Quirin Schroll (6/9) Feb 01 Even if you could, what would you even test? Especially for
- Jonathan M Davis (53/73) Jan 22 Once you disable the copy/postblit constructor, you can't generally pass...
- victoroak (14/15) Jan 22 I think one problem with just "next" as the api for an Input
- monkyyy (17/20) Jan 22 Can this be summarized as
I've been thinking about this for a while now, but with the next version of Phobos which is in the early planning stages, we really should do some redesigning of ranges. Most of their API is just fine how it is, but there are some aspects of it which really should be changed if we want them to be better (the most obvious one being the removal of auto-decoding). But what I'd like to discuss specifically in this thread is fixing - and defining - the semantics of copying and assigning to ranges. Specifically, the semantics of stuff like auto copy = orig; and range1 = range2; should be defined (which then defines what happens when passing ranges to functions and stuff like foreach). Right now, the only semantics of copying a range which are defined by the range API are what happens to the copy when you make a copy. Specifically, the copy will have the exact same elements in the exact same order that the original would have had if it had been iterated through prior to the copy being made. If that weren't true, then we couldn't pass ranges around. However, because ranges support a whole range of types with varying copy semantics, you can't rely on a range's copy semantics beyond that (particularly with regards to the state of the original range after the copy has been made). Specifically, a range type can have four different types of copy semantics: 1. The range is a value type, so copying it results in two completely indepedent copies. Iterating through one has no effect on the other. 2. The range is a reference type, so copying it results in two completely dependent copies. They both share all of their state, so iterating through one iterates through the other, and they both always have the same state. 3. The range is a pseudo-reference type, and it has value semantics for its iteration state but reference semantics for some portion of the rest of its state. This means that if you iterate through a copy of the range, it has no effect on the original, so the range is a value type in that respect, but the rest of its state may or may not be copied by value (often - though not always - this means that the iteration state is directly within the range type, but the elements themselves are referred to via pointer, so mutating the actual elements would affect all copies of that range, but each range can be iterated independently). Dynamic arrays are the most common example of this sort of range. 4. The range is a pseudo-reference type, but it does not have value semantics with regards to its iteration state. So, when copying the range, some portion of its state is copied by value, but you don't end up with indendepent copies - and in fact, you don't get dependent copies either, since some of the state is copied by value. A common example of this sort of range which would be one which contains most of its state via a pointer or reference but which caches its front as a member variable within the struct. The result of this is that once you've mutated a copy of the range, you can no longer use the original, because mutating the copy only partially mutates the state of the original, leaving it in an inconsistent / invalid state. Generic code must support ranges with any of these copy semantics, and since these copy semantics are drastically different from one another, the result is that in fully generic code, once you copy a range, you cannot ever use the original again (since you can't depend on its state). And you can't even assign it a new value and then use it, because the semantics of assignment aren't defined for ranges either (though they're likely to be strongly tied to the copy semantics). Once you copy a range, you have to assume that the orginal range is in a potentially invalid state and can't do anything further with it. In addition to copying ranges to new variables and passing them to other functions, this also means that you can't actually continue to use a range once it's been passed to foreach, because the lowering that foreach uses copies the range (and it has to or it would break the semantics of using dynamic arrays with foreach). So, you can't do stuff like partially iterate through a range, break out of the loop, and then continue to iterate through it, because the state of the range is implementation-dependent and could be invalid (on top of the fact that in the case of many forward ranges, copying it is equivalent to save, so trying to continue the iteration on the original won't work, whereas it would work with some basic input ranges, so even without any ranges ending up in an invalid state, the semantics depend on the type in a way that does not work with generic code). In fact, someone necro-ed a thread in D.Learn on this issue with foreach just the other day, since it's not obvious and can be pretty surprising. Of course, in order to try to solve the problem with varying copy semantics, the range API has save for forward ranges. This makes it possible to get an independent copy of a forward range, and any generic code which needs an independent copy needs to use it to get one. However, that does not help at all with the inconsistent semantics of actually copying a range, and it's very common for people to write code that works with dynamic arrays but then doesn't work with other types of ranges, because other types of ranges have differing copy semantics, and they didn't test their code with enough range types to find where they accidentally relied on the copy semantics (or assignment semantics) of dynamic arrays (or on the semantics of whatever range type it was that they tested with). If we really want to fix these classes of issues, we need ranges to have consistent copy semantics, which means that we need to define what those semantics are. For forward ranges, this is quite straightfoward. We need to require that copying a forward range results in two copies which can be independently iterated - so essentially, the iteration state must be copied by value, or another way to look at it is that copying a forward range must be equivalent to calling save (though the situation is actually more nuanced than that). We would be requiring that all forward ranges have the same copying semantics as dynamic arrays with regards to their iteration state (but would allow the elements themselves to be copied by value or by reference like we do now). And of course, we would then get rid of save, since it would be unnecessary. What this would mean is that all forward ranges would be either dynamic arrays or structs. Any code that then wanted a class or a pointer to a struct to be a forward range would need to wrap it in a struct to give it the correct copy semantics. A naive implementation would just duplicate the class whenever the struct was copied, but a smarter implementation could do something like use reference counting to keep track of how many ranges there were pointing to that same class object and then duplicate it whenever a function was called on it which would mutate its iteration state, and the reference count was greater than 1 (and of course not bother to duplicate it if the reference count was 1). It wouldn't need to do more memory management than that (though it could), so that doesn't necessarily require anything system or trusted as often gets discussed with reference counting. Of course, allowing such an implementation does technically mean that copying a forward range is not necessarily strictly equivalent to calling save, but the overall iteration semantics would treat each copy as independent like save is supposed to allow, so it wouldn't be a problem. And with that change, we then have consistent copy semantics with forward ranges. All code operating on forward ranges could treat them the same as they do dynamic arrays when copying them and passing them around, and you wouldn't have to worry about forgetting to call save. I think that we can probably get general agreement on such a change, since it's been discussed off-and-on for years now. However, the problem then is basic input ranges. Basic input ranges cannot have the same copy semantics as dynamic arrays. If they could, then they could be forward ranges. As things stand, they either have reference semantics, or they have pseudo-reference semantics of the variety where mutating a copy puts the original in an invalid state. The only way that we could make their copy semantics consistent is if we then require that they be reference types. The simplest way to do that would be to require that they be classes or pointers to structs, though that would then make it impossible to do anything like reference counting with basic input ranges, which isn't ideal, but it would have the advantage of allowing us to force reference semantics at compile time. This would of course mean that basic input ranges and forward ranges would then be defined to have different copy semantics, which would mean that either generic code that relied on the copy semantics would have to work with only one of them, or we would need to give basic input ranges a different API. And if we allowed basic input ranges to be structs that defined reference semantics internally (e.g. by implementing reference counting on a pointer or class reference), then we would have to give basic input ranges a different API, because we wouldn't have save to differentiate them anymore, and if we allowed structs, then we couldn't differentiate them based on a basic input range being a class reference or pointer to a struct, whereas a forward range would be a struct or a dynamic array. Ultimately, I'm inclined to argue that we should give basic input ranges a new API - not just because it would allow them to use reference counting, but also because the current input range API tends to force basic input ranges to cache the value of front (which if anything, encourages basic input ranges to be pseudo-reference types and could result in an extra layer of indirection in some cases if they're forced to be reference types). It would be annoying in some cases to require that different functions be used for basic input ranges and forward ranges (though overloading could obviously be used to avoid needing different names), but it's already often the case that code isn't really designed to work with both, and overloading on the category of range being used is already fairly common, since different range capabilities allow for different implementations. So, given that it would prevent whole classes of copying bugs as well as potentially remove the requirement to cache front for basic input ranges, I think that a separate API for basic input ranges is warranted. What I would propose for that would be a single function auto next(); where next returns a nullable type where the value returned is the next element in the range, with a null value being returned if the range is empty. The return type would then need to emulate a pointer - specifically, when casting it to bool, it would be true if it's non-null and false if it's null, and dereferencing it would give you the actual value if it's non-null (with it being undefined behavior if you dereference null). So, a basic input range of ints might define next as int* next(); or alternatively, it could be something like Nullable!int next(); though that wouldn't work with Phobos' current Nullable type, since it doesn't support either casting to bool or dereferencing (probably because it originally used alias this). Either way, since we'd just require the specific API for the return type rather than requirng a pointer, the range type would have some flexibliity in what it used. This would then mean that if you wanted to loop over a basic input range, you'd do something like for(auto front = range.next; !front; front = range.next) { ... } And if we go down this road, then we could also add this API to foreach, allowing for code such as foreach(e; basicInputRange) { ... } to work - and unlike now, you could rely on the copy semantics involved such that you would know that you could then break out of the loop and continue to use the range (whereas right now, you can't safely break out of a foreach and then continue to use the range that you were iterating over). Of course, for forward ranges, breaking out of a foreach still wouldn't let you continue to iterate from where you were, since the foreach would be iterating over a copy, but those would be the semantics for all forward ranges, so you could rely on them. In addition to these changes to the copy semantics, I propose that we then require that assignment follow the same semantics as copying. So, range1 = range2; on a forward range would result in two independent copies with the same the same elements in the same order (regardless of what range1 pointed to before), just like you'd get with auto copy = orig; whereas for basic input ranges range1 = range2; would result in range1 and range2 being fully dependent copies (regardless of what range1 pointed to before) - just like what you'd expect when assigning one pointer to another. So, the assignment semantics would differ between basic input ranges and forward ranges, but they would be consistent with their copy semantics, and they would be consistent across all ranges of the same category (and of course, assignment would only be expected to work if the two ranges were of exactly the same type). The result of all of this is that we could then have consistent, reliable copy and assignment semantics for both basic input ranges and forward ranges, and the two would be easily distinguishable so that you don't accidentally use one as the other and end up with bugs due to the differences in semantics between the two categories of ranges. The other potential change related to this would be to rename basic input ranges. Right now, forward ranges are treated as an extension of input ranges, which makes sense if you're ignoring the issues surrounding copy semantics, since a forward range is an input range with an extra function that allows you to get an independent copy, but if we're giving them different APIs so that the copy semantics can be cleanly defined and separated, then arguably, it doesn't make sense to call them both ranges. As such, we could switch to calling forward ranges simply ranges (which we mostly do anyway) and give basic input ranges a new name. My initial suggestion would be an input stream (or if we absolutely had to continue to call them ranges, then maybe simple ranges), but there's probably a better name to be had, and obviously, plenty of bikeshedding can be done on that topic. Either way, the names used here aren't the primary concern. What I'd like to get out of this thread is feedback on how much sense this idea does or doesn't make and what problems I'm missing that someone else is able to see. From what I can see, the main negative is simply that you can't then write code that works on both a basic input range and a forward range (though you can obviously still create function overloads so that the caller can use either), but given the issues surrounding copy semantics, I think that that's probably ultimately a good change (and the number of functions that can operate on basic input ranges is already pretty limited anyway in comparison to forward ranges - particularly with regards to generic algorithms). It will also make it much easier to discuss the separation between basic input ranges and forward ranges, which IMHO can be too easy to lose or confuse as things stand. - Jonathan M Davis
Jan 20
On Sunday, 21 January 2024 at 05:00:31 UTC, Jonathan M Davis wrote:I've been thinking about this for a while now, but with the ...Shall - `Nullable.opCast` to `bool` - `Nullable.opUnary` `*` be added to Phobos' `std.typecons`?
Jan 21
On Sunday, January 21, 2024 3:36:37 AM MST Per Nordlöw via Digitalmars-d wrote:On Sunday, 21 January 2024 at 05:00:31 UTC, Jonathan M Davis wrote:Maybe. It probably should be, though I don't know how many changes like that we're going to do to the current version of Phobos. Either way, if we're doing a new major version of Phobos, we'll be reworking a variety of stuff anyway, and so Nullable would presumably be on the list (e.g. I think that it was a big mistake to make it a range, and I'd remove that functionality; slicing it to get a range is one thing, but the fact that it itself is a range definitely causes problems in generic code). And if the new API for basic input ranges uses a pointer API to access its elements, then that's a very good reason to make Nullable support that. As I said in my post, I suspect that the main reason that the current version doesn't is because it originally used alias this - which we removed due to all of the bugs that that caused, but giving Nullable an opCast to bool and * for dereferencing would likely have been a terrible idea when Nullable used alias this, whereas without that, I can't think of any reason why it would be a bad idea - though I'd likely still choose to have get and isNull for those who wanted to make it clear that a Nullable was being used. But I think that it's better to use a pointer API for basic input ranges, because then they can actually use pointers in those cases where that makes sense instead of requiring a Nullable type, much as it also needs to work to use a Nullable type. - Jonathan M DavisI've been thinking about this for a while now, but with the ...Shall - `Nullable.opCast` to `bool` - `Nullable.opUnary` `*` be added to Phobos' `std.typecons`?
Jan 21
On Sunday, 21 January 2024 at 10:58:32 UTC, Jonathan M Davis wrote:On Sunday, January 21, 2024 3:36:37 AM MST Per Nordlöw via Digitalmars-d wrote:Imho, it must not be added, as this might conflate with contents of `Nullable!bool` types.On Sunday, 21 January 2024 at 05:00:31 UTC, Jonathan M Davis wrote:Maybe. It probably should be, though I don't know how many changes like that we're going to do to the current version of Phobos.I've been thinking about this for a while now, but with the ...Shall - `Nullable.opCast` to `bool` - `Nullable.opUnary` `*` be added to Phobos' `std.typecons`?
Jan 21
On Sunday, January 21, 2024 6:26:16 AM MST Alexandru Ermicioi via Digitalmars- d wrote:On Sunday, 21 January 2024 at 10:58:32 UTC, Jonathan M Davis wrote:It doesn't conflate with Nullable!bool at all, because the alias this was removed from Nullable. The only way to get the value out of Nullable is to use its get member. If we added dereferencing to it, then dereferencing it would then also work to get its value, but if we added an opCast to bool, casting would never give the Nullable's value even if the type of its value were bool. It would just be equivalent to negating the result of isNull. I agree that if we still had alias this on Nullable, then adding opCast to bool to it would be problematic - and that's likely why it doesn't have that opCast, since its original API included alias this, but since the alias this is now gone, that's no longer a problem. There is no ambiguity. - Jonathan M DavisOn Sunday, January 21, 2024 3:36:37 AM MST Per Nordlöw via Digitalmars-d wrote:Imho, it must not be added, as this might conflate with contents of `Nullable!bool` types.On Sunday, 21 January 2024 at 05:00:31 UTC, Jonathan M Davis wrote:Maybe. It probably should be, though I don't know how many changes like that we're going to do to the current version of Phobos.I've been thinking about this for a while now, but with the ...Shall - `Nullable.opCast` to `bool` - `Nullable.opUnary` `*` be added to Phobos' `std.typecons`?
Jan 21
On 21/01/2024 11:58 PM, Jonathan M Davis wrote:Either way, if we're doing a new major version of Phobos, we'll be reworking a variety of stuff anyway, and so Nullable would presumably be on the list (e.g. I think that it was a big mistake to make it a range, and I'd remove that functionality; slicing it to get a range is one thing, but the fact that it itself is a range definitely causes problems in generic code).Umm, why would we bring back Nullable? By then we'd have sum types which can represent it in the language far better.
Jan 21
On Sunday, January 21, 2024 10:39:41 AM MST Richard (Rikki) Andrew Cattermole via Digitalmars-d wrote:On 21/01/2024 11:58 PM, Jonathan M Davis wrote:Much as I know that you like the idea of sum types, AFAIK, it's never been officially agreed that we're going to have them in the language, and even if we do get them, I don't know what that's actually going to look like. If it's anything like std.sumtype, IMHO, it would be utterly terrible for when you want a Nullable type. For that, you want to be able to ask whether the object has a value, and if it does, you want to be able to fetch its value. So, while you could use different function names for it, you basically want the same kind of API that a pointer has. Something where you'd have to pass delegates in just to get the value out would be incredibly annoying and bloated in comparison (and personally, I hate std.sumtype in general because of that design, but it makes a lot more sense when you have a variety of possible types as opposed to just one which may or may not be there). But regardless of whether sum types would make sense in this case (which I very much question), we'd have to know that they were definitely being added to the language and what exactly that would look like - and they would have to function in a way that was clearly superior to have a dedicated Nullable type, or it would make more sense to add a Nullable type and use that for cases where you want a nullable object. Obviously, if sum types get added to the language soon enough, and they really do replace the need for Nullable, then Nullable probably won't be in Phobos v3, but as thing stand, I would fully expect it to be there. Either way, using the API of a pointer makes a lot of sense for the proposed basic input range API, in which case, a range that wanted to return something other than a pointer would need to return a type that used the same API. - Jonathan M DavisEither way, if we're doing a new major version of Phobos, we'll be reworking a variety of stuff anyway, and so Nullable would presumably be on the list (e.g. I think that it was a big mistake to make it a range, and I'd remove that functionality; slicing it to get a range is one thing, but the fact that it itself is a range definitely causes problems in generic code).Umm, why would we bring back Nullable? By then we'd have sum types which can represent it in the language far better.
Jan 21
On 22/01/2024 3:08 PM, Jonathan M Davis wrote:On Sunday, January 21, 2024 10:39:41 AM MST Richard (Rikki) Andrew Cattermole via Digitalmars-d wrote:Given that Walter has put together a DIP for them, I'm fairly confident it is a question of when not if. https://github.com/WalterBright/DIPs/blob/sumtypes/DIPs/1NNN-(wgb).mdOn 21/01/2024 11:58 PM, Jonathan M Davis wrote:Much as I know that you like the idea of sum types, AFAIK, it's never been officially agreed that we're going to have them in the language, and even if we do get them, I don't know what that's actually going to look like.Either way, if we're doing a new major version of Phobos, we'll be reworking a variety of stuff anyway, and so Nullable would presumably be on the list (e.g. I think that it was a big mistake to make it a range, and I'd remove that functionality; slicing it to get a range is one thing, but the fact that it itself is a range definitely causes problems in generic code).Umm, why would we bring back Nullable? By then we'd have sum types which can represent it in the language far better.
Jan 21
On Sunday, January 21, 2024 7:17:35 PM MST Richard (Rikki) Andrew Cattermole via Digitalmars-d wrote:On 22/01/2024 3:08 PM, Jonathan M Davis wrote:Well, we'll see what we actually end up with, but I'd consider that proposal to be terrible if all you want is a nullable type, and restrictions such as "Members of sumtypes cannot have copy constructors, postblits, or destructors" would be far too much for plenty of use cases such that I would still be very much inclined to have an explicit Nullable type. In any case, whether Phobos v3 has a Nullable type and the question of the exact type returned by next in the proposed API don't really matter that much right now. What matters is the basics of the API, and some of the details can be ironed out later. IMHO, having next return a type that emulates a pointer is the best choice (especially when we don't even have sum types yet), but that can be revisited down the line, particularly since it's not like we're releasing major changes to ranges right now. - Jonathan M DavisOn Sunday, January 21, 2024 10:39:41 AM MST Richard (Rikki) Andrew Cattermole> via Digitalmars-d wrote:Given that Walter has put together a DIP for them, I'm fairly confident it is a question of when not if. https://github.com/WalterBright/DIPs/blob/sumtypes/DIPs/1NNN-(wgb).mdOn 21/01/2024 11:58 PM, Jonathan M Davis wrote:Much as I know that you like the idea of sum types, AFAIK, it's never been officially agreed that we're going to have them in the language, and even if we do get them, I don't know what that's actually going to look like.Either way, if we're doing a new major version of Phobos, we'll be reworking a variety of stuff anyway, and so Nullable would presumably be on the list (e.g. I think that it was a big mistake to make it a range, and I'd remove that functionality; slicing it to get a range is one thing, but the fact that it itself is a range definitely causes problems in generic code).Umm, why would we bring back Nullable? By then we'd have sum types which can represent it in the language far better.
Jan 21
On Sunday, 21 January 2024 at 05:00:31 UTC, Jonathan M Davis wrote:1. The range is a value type, so copying it results in two completely indepedent copies. Iterating through one has no effect on the other. 2. The range is a reference type, so copying it results in two completely dependent copies. They both share all of their state, so iterating through one iterates through the other, and they both always have the same state. 3. The range is a pseudo-reference type, and it has value semantics for its iteration state but reference semantics for some portion of the rest of its state. This means that if you iterate through a copy of the range, it has no effect on the original, so the range is a value type in that respect, but the rest of its state may or may not be copied by value (often - though not always - this means that the iteration state is directly within the range type, but the elements themselves are referred to via pointer, so mutating the actual elements would affect all copies of that range, but each range can be iterated independently). Dynamic arrays are the most common example of this sort of range. 4. The range is a pseudo-reference type, but it does not have value semantics with regards to its iteration state. So, when copying the range, some portion of its state is copied by value, but you don't end up with indendepent copies - and in fact, you don't get dependent copies either, since some of the state is copied by value. A common example of this sort of range which would be one which contains most of its state via a pointer or reference but which caches its front as a member variable within the struct. The result of this is that once you've mutated a copy of the range, you can no longer use the original, because mutating the copy only partially mutates the state of the original, leaving it in an inconsistent / invalid state.For all these cases there is `.save` method which seems to be ignored in most of the code and compiler, when it should not be. It explicitly defines that range is copied irrelevant of the type of range you've mntioned. Since `.save` is defined on forward range, this means that pure input range must never be copyable, i.e. copy and post constructors must be disabled. If this rule is followed, issues with copying would dissappear. Even more drastic measure would be to disable copying on all types of ranges, and allow copying only through `.save` method, i.e. do explicit copying when it is intended to. Best regards, Alexandru.
Jan 21
On Sunday, January 21, 2024 6:22:26 AM MST Alexandru Ermicioi via Digitalmars- d wrote:For all these cases there is `.save` method which seems to be ignored in most of the code and compiler, when it should not be. It explicitly defines that range is copied irrelevant of the type of range you've mntioned. Since `.save` is defined on forward range, this means that pure input range must never be copyable, i.e. copy and post constructors must be disabled. If this rule is followed, issues with copying would dissappear. Even more drastic measure would be to disable copying on all types of ranges, and allow copying only through `.save` method, i.e. do explicit copying when it is intended to.And how on earth would you use a range if it can't be copied? They're copied all over the place and have to be to even be used. You copy them when you pass them to any range-based function. You copy them when you return them from a range-based function. You copy them when they get wrapped by another range. You copy them when you pass them to foreach. Disabling copying on ranges would make them borderline useless. And you also can't disable copying on classes or dynamic arrays. We could disallow classes which are ranges (even for basic input ranges) if we really wanted to make it so that they couldn't be copied, but we couldn't do the same with dynamic arrays. And really, ranges need to be copyable, because we need to be able to pass them around around and store them. And while ref could theoretically be used in some of those cases, in many of them, it could not (e.g. you have to copy a range once you wrap it in another type, and the ability to do that is crucial for most range-based algorithms). Also, forward ranges are supposed to mimic dynamic arrays. C++ iterators are meant to mimic pointers, and D's ranges are meant to mimic dynamic arrays. The only reason that we even have save is to support ranges which are classes. A lot of the problems with range-based code come from code that assumes that a range behaves like a dynamic array when it doesn't. If we required that all forward ranges have the same copy semantics as dynamic arrays, then that problem is solved. Having save means treating forward ranges differently than we treat dynamic arrays, and it's completely unnecessary so long as we're willing to require that classes be wrapped in structs to be forward ranges. The ideal situation here is for ranges in general to have the same semantics as dynamic arrays with regards to copying, assignment, and iteration. So, we can't disable copying for ranges in general and have them work, and even if we all agreed that disabling copying for ranges in general was a good idea, we still couldn't do that for dynamic arrays. So, we'd continue to have the problem that code would be written to be range-based but only worked with arrays, because that was what it was tested with. By requiring that forward ranges have the same semantics as dynamic arrays and treating basic input ranges as a separate thing, it becomes possible to write code that assumes the same semantics as dynamic arrays and then works with all forward ranges, which would significantly reduce the number of bugs that we get in range-based code. - Jonathan M Davis
Jan 21
On Sunday, 21 January 2024 at 13:48:56 UTC, Jonathan M Davis wrote:And how on earth would you use a range if it can't be copied? They're copied all over the place and have to be to even be used. You copy them when you pass them to any range-based function. You copy them when you return them from a range-based function. You copy them when they get wrapped by another range. You copy them when you pass them to foreach. Disabling copying on ranges would make them borderline useless.Maybe they could be moved instead of copied when the original is not needed.And you also can't disable copying on classes or dynamic arrays.Unless all ranges have to be structs, but then we probably need a function/syntax to ask for a range from a class or dynamic array.
Jan 21
On Sunday, January 21, 2024 11:13:02 AM MST Nick Treleaven via Digitalmars-d wrote:On Sunday, 21 January 2024 at 13:48:56 UTC, Jonathan M Davis wrote:Do you really want to have to call move every time that you call a range-based function or want to wrap one range in another? Immovable types are incredibly un-user-friendly in general. It also would mean that ranges wouldn't work with foreach with how foreach currently works, since it copies the range. And if it doesn't copy the range, then ranges other than dynamic arrays would have different semantics when being iterated with foreach than dynamic arrays do, which would also be a big problem.And how on earth would you use a range if it can't be copied? They're copied all over the place and have to be to even be used. You copy them when you pass them to any range-based function. You copy them when you return them from a range-based function. You copy them when they get wrapped by another range. You copy them when you pass them to foreach. Disabling copying on ranges would make them borderline useless.Maybe they could be moved instead of copied when the original is not needed.Forward ranges are supposed to mimic dynamic arrays. Having to wrap dynamic arrays to use them as ranges would be very annoying. And for what? To just avoid copying? Yes, if copying ranges is illegal, then you've given them all the same copy semantics, but the result is incredibly unwieldy, and it means that you can't have dynamic arrays and forward ranges use the same code without wrapping dynamic arrays. And are people really going to want to be wrapping dynamic arrays in another type just so that they can make them uncopyable and the be forced to call move all over the place instead? The most friendly way for forward ranges to function is to make them more properly emulate dynamic arrays as they were originally intended to do. The reason that they don't already is so that we could support classes as ranges, which while it should work, is not the typical use case, and we can allow them to be forward ranges almost as easily with a wrapper struct as we can by having a save function. If forward ranges all have the same copy semantics as dynamic arrays, then they become easy to use correctly without having bugs due to stuff like forgetting to call save - and without a bunch of compiler errors, because you forgot to use move all over the place. For forward ranges, it would be a huge win, and it's exactly what we've talked about doing for years now. We just haven't done it, because doing it in the current version of Phobos would break existing code. The bigger problem is then what to do with basic input ranges, because they can't have the same copy semantics, but the fact that they can't implement something like save already hamstrings them enough that arguably they should be treated as something separate anyway. So, by just giving them a separate API, we simplify the situation with forward ranges considerably, and we avoid the classes of bugs that come with code being written to work with both basic input ranges and forward ranges. Of course, there are some cases where having code written to work with both works just fine, and having separate APIs could be annoying in those cases, but basic input ranges are limited enough that most code doesn't support them anyway. Obviously, the tradeoffs are debatable, but I think that it's clear that ideal situation for forward ranges is to have them all have the same copy semantics as dynamic arrays. - Jonathan M DavisAnd you also can't disable copying on classes or dynamic arrays.Unless all ranges have to be structs, but then we probably need a function/syntax to ask for a range from a class or dynamic array.
Jan 21
On Sunday, 21 January 2024 at 05:00:31 UTC, Jonathan M Davis wrote:Ultimately, I'm inclined to argue that we should give basic input ranges a new API - not just because it would allow them to use reference counting, but also because the current input range API tends to force basic input ranges to cache the value of front (which if anything, encourages basic input ranges to be pseudo-reference types and could result in an extra layer of indirection in some cases if they're forced to be reference types). It would be annoying in some cases to require that different functions be used for basic input ranges and forward ranges (though overloading could obviously be used to avoid needing different names), but it's already often the case that code isn't really designed to work with both, and overloading on the category of range being used is already fairly common, since different range capabilities allow for different implementations. So, given that it would prevent whole classes of copying bugs as well as potentially remove the requirement to cache front for basic input ranges, I think that a separate API for basic input ranges is warranted. What I would propose for that would be a single function auto next();We could use java Iterator api in this case which has also `bool hasNext()` function. Then new input range api can also be propagated to other types of range such as forward range and further.where next returns a nullable type where the value returned is the next element in the range, with a null value being returned if the range is empty. The return type would then need to emulate a pointer - specifically, when casting it to bool, it would be true if it's non-null and false if it's null, and dereferencing it would give you the actual value if it's non-null (with it being undefined behavior if you dereference null). So, a basic input range of ints might define next as int* next(); or alternatively, it could be something like Nullable!int next();The returned type should be a tagged union, that can allow storing of actual value inside, or pointer to it, while it's interface will hide the details, i.e. get would look like this: `ref T get()`.though that wouldn't work with Phobos' current Nullable type, since it doesn't support either casting to bool or dereferencing (probably because it originally used alias this). Either way, since we'd just require the specific API for the return type rather than requirng a pointer, the range type would have some flexibliity in what it used. This would then mean that if you wanted to loop over a basic input range, you'd do something like for(auto front = range.next; !front; front = range.next) { ... } And if we go down this road, then we could also add this API to foreach, allowing for code such as foreach(e; basicInputRange) { ... } to work - and unlike now, you could rely on the copy semantics involved such that you would know that you could then break out of the loop and continue to use the range (whereas right now, you can't safely break out of a foreach and then continue to use the range that you were iterating over). Of course, forInput ranges could just be disallowed in foreach statements, that would solve different semantics between them and forward ranges, just like how in Java it is done with Stream api.What I'd like to get out of this thread is feedback on how much sense this idea does or doesn't make and what problems I'm missing that someone else is able to see. From what I can see, the main negative is simply that you can't then write code that works on both a basic input range and a forward range (though you can obviously still create function overloads so that the caller can use either), but given the issues surrounding copy semantics, I think that that's probably ultimately a good change (and the number of functions that can operate on basic input ranges is already pretty limited anyway in comparison to forward ranges - particularly with regards to generic algorithms). It will also make it much easier to discuss the separation between basic input ranges and forward ranges, which IMHO can be too easy to lose or confuse as things stand.Imho, this proposal is complicated, and unnecessarily complicates construction of ranges, making them less appealing to implement in user code. I'd opt for restricting copying completely, and allow copying through `.save` only. The `.next` method proposal is a good improvement though, with addition of `.hasNext` method at minimum.
Jan 21
On Sunday, January 21, 2024 6:50:26 AM MST Alexandru Ermicioi via Digitalmars- d wrote:On Sunday, 21 January 2024 at 05:00:31 UTC, Jonathan M Davis wrote: We could use java Iterator api in this case which has also `bool hasNext()` function.Well, the key change the API that would really matter IMHO would be to merge front and popFront, since that would eliminate the need for basic input ranges to cache front, which many of them are forced to do right now. I'd have to think about it more to see whether having hasNext instead of returning a nullable type from next would be an improvement, make it worse, or be more or less equal. A lot of that depends on how it would affect the typical implementation of a basic input range.Then new input range api can also be propagated to other types of range such as forward range and further.Part of the point here is to _not_ do that, because they fundamentally have different semantics.Requiring the pointer API already does that. It's just that checking whether the value is null means casting to bool, and you dereference the return value rather than calling get. The implementation would be quite free to then use a Nullable or some other type which does whatever it wants internally so long as casting to bool tells you whether the object contains a value, and dereferencing it gives you the value.where next returns a nullable type where the value returned is the next element in the range, with a null value being returned if the range is empty. The return type would then need to emulate a pointer - specifically, when casting it to bool, it would be true if it's non-null and false if it's null, and dereferencing it would give you the actual value if it's non-null (with it being undefined behavior if you dereference null). So, a basic input range of ints might define next as int* next(); or alternatively, it could be something like Nullable!int next();The returned type should be a tagged union, that can allow storing of actual value inside, or pointer to it, while it's interface will hide the details, i.e. get would look like this: `ref T get()`.Input ranges could just be disallowed in foreach statements, that would solve different semantics between them and forward ranges, just like how in Java it is done with Stream api.They could be, but that wouldn't be very user-friendly. And it doesn't solve the different semantics of copying them.Imho, this proposal is complicated, and unnecessarily complicates construction of ranges, making them less appealing to implement in user code. I'd opt for restricting copying completely, and allow copying through `.save` only. The `.next` method proposal is a good improvement though, with addition of `.hasNext` method at minimum.Restricting copying would make ranges borderline unusable. They have to be able to be passed around and be wrappable, which means either copying them or moving them, and moving them would be very un-user-friendly, since it would require explicit calls to move calls all over the place. What we really want to be able to do in general is treat ranges like dynamic arrays (and part of the reason to give basic input ranges a different API is because the can't have the same copy semantics as dynamic arrays). The suggested API would actually simplify forward ranges considerably, because then most of them wouldn't have to implement save any longer, and those few that do would just implement the necessary semantics via their copy constructor (or a more advanced implementation might use reference-counting, which would be somewhat more complex, but the vast majority of ranges would not be in that situation). And the result would be that forward ranges in general could be treated just like dynamic arrays, which would simplify the vast majority of range-based code and allow range-based code to rely on the copy and assignment semantics instead of having to carefully avoid using ranges which have been copied and avoid assignment like they have to do now. So, as far as forward ranges go, this seems to me like it's very much a simplification, not something complicated. If this is coming across as complicated, then it's probably because of all of the explanatory text I had to give as to why these changes should be made. But the changes themselves simplify things for forward ranges. The part that's arguably more complex is simply that the basic input ranges then have to have a separate API, but since they fundamentally have different cropy semantics and can't be used in most range-based functions anyway, I don't think that it complicates things much. And it makes the distinction and behavioral differences between basic input ranges and forward ranges much clearer than they typically are now, which would make understanding ranges easier. - Jonathan M Davis
Jan 21
On Monday, 22 January 2024 at 03:52:02 UTC, Jonathan M Davis wrote:On Sunday, January 21, 2024 6:50:26 AM MST Alexandru Ermicioi via Digitalmars- d wrote:I think maybe we're losing sight of the big picture in this discussion. The big-picture, overarching goal of the range API is to provide a common interface between data structures and algorithms. You have N data structures and M algorithms, and by using this common interface you can implement all N*M combinations in O(N+M) code. Each data structure implements the range API once, and each algorithm is implemented once using the range API. If you split the API in two, by making input streams (basic input ranges) and forward ranges completely disjoint, you undermine this goal. Now, each data structure has to implement *two* APIs, and each algorithm has to be implemented *twice*, once for input streams and once for forward ranges. In practice, what's going to happen is library authors will simply not bother to implement both, and we will end up with gaps where many (data structure, algorithm) pairs are not covered. I appreciate the desire to provide consistent copying semantics, but if it comes to a choice between that and having a single unified API, I will choose the unified API, warts and all.Then new input range api can also be propagated to other types of range such as forward range and further.Part of the point here is to _not_ do that, because they fundamentally have different semantics.Restricting copying would make ranges borderline unusable. They have to be able to be passed around and be wrappable, which means either copying them or moving them, and moving them would be very un-user-friendly, since it would require explicit calls to move calls all over the place.Worth noting that generic range algorithms already have to account for the existence of non-copyable types, since even if a range itself is copyable, its elements may not be. Also, as I pointed out in an earlier reply to Walter [1], there are legitimate use-cases for non-copyable input streams, where making them copyable would incur a significant performance overhead. I think the correct solution to this is something like DIP 1040, which makes non-copyable types less of a pain to work with in general. [1] https://forum.dlang.org/post/sullaynnrjufookbuijt forum.dlang.org --- In light of the points above, my proposed copying semantics for input streams are: 1. Input streams MAY be non-copyable, but are not required to be. 2. If you copy an input stream and then call next() on the original, the behavior of both the original and the copy is unspecified. That is, I think we should give up on having 100% consistent copying semantics in this one case, in order to keep the overall API unified (by supporting UFCS next()) and avoid unnecessary pessimization of range algorithms. The good news is that with these semantics, if you write you generic code conservatively and treat all input streams as non-copyable, you'll get the right behavior in all cases. And if you don't, you'll get a compile-time error as soon as you actually try to pass in a non-copyable input stream, and you'll know exactly how to fix it. So this design is still an improvement on the status quo.
Jan 21
On Sunday, January 21, 2024 10:22:44 PM MST Paul Backus via Digitalmars-d wrote:On Monday, 22 January 2024 at 03:52:02 UTC, Jonathan M Davis wrote:On Sunday, January 21, 2024 6:50:26 AM MST Alexandru Ermicioi via Digitalmars- d wrote:Then new input range api can also be propagated to other types of range such as forward range and further.Part of the point here is to _not_ do that, because they fundamentally have different semantics.If you split the API in two, by making input streams (basic input ranges) and forward ranges completely disjoint, you undermine this goal. Now, each data structure has to implement *two* APIs, and each algorithm has to be implemented *twice*, once for input streams and once for forward ranges. In practice, what's going to happen is library authors will simply not bother to implement both, and we will end up with gaps where many (data structure, algorithm) pairs are not covered.In practice, basic input ranges don't work with the vast majority of algorithms anyway. Some do (e.g. map), but pretty much anything that needs to do more than a simple transformation on the elements ends up needing a forward range. I really don't think that we're going to lose much by forcing basic input ranges to be separate.Do they? Non-copyable types do not work with ranges right now. They don't work with foreach right now. Non-copyable types have their uses to be sure, but they're also fairly niche. Realistically, very little range-based code is going to work with them because of how restricted they are to deal with, and unless you're explicitly testing code with them, you're not going to end up writing code that works with them. We could bend over backwards to try to make them work in Phobos, but no one else is going to do that unless they're using non-copyable types in ranges themselves (which almost no one will be doing even if we support it), and trying to support non-copyable types will actively make range-base code worse in many cases, because it will require calling front multiple times, and many range-based algorithms do real work in front rather than simply returning a value (e.g. map works that way). Granted, range-based code in general doesn't tend to be very disciplined about whether it calls front once or several times, but forcing it to be several times in order to support non-copyable types will have a real cost for code that is _way_ more common than non-copyable types. Personally, I think that we should make it very explicit that non-copyable types are not supported by ranges. They don't work with them now, and I don't think that it's even vaguely worth it to try to make ranges work with them. The cost is too high for too little benefit.Restricting copying would make ranges borderline unusable. They have to be able to be passed around and be wrappable, which means either copying them or moving them, and moving them would be very un-user-friendly, since it would require explicit calls to move calls all over the place.Worth noting that generic range algorithms already have to account for the existence of non-copyable types, since even if a range itself is copyable, its elements may not be.In light of the points above, my proposed copying semantics for input streams are: 1. Input streams MAY be non-copyable, but are not required to be. 2. If you copy an input stream and then call next() on the original, the behavior of both the original and the copy is unspecified. That is, I think we should give up on having 100% consistent copying semantics in this one case, in order to keep the overall API unified (by supporting UFCS next()) and avoid unnecessary pessimization of range algorithms. The good news is that with these semantics, if you write you generic code conservatively and treat all input streams as non-copyable, you'll get the right behavior in all cases. And if you don't, you'll get a compile-time error as soon as you actually try to pass in a non-copyable input stream, and you'll know exactly how to fix it. So this design is still an improvement on the status quo.I really think that trying to support non-copyable types is not worth it - either for the ranges themselves or for their elements. It's not something that very many people are even going to think about, let alone do. So, if it were supported, realistically, it would just be with the algorithms in Phobos and with the code that the few people using such types wrote for themselves. And even in Phobos, even if we wanted to support it, realistically, it wouldn't work a lot of the time until the few people that cared complained about it, because very few programmers would even think about testisg with non-copyable types. It would be _far_ worse than the situation we have now with code needing to call save to work correctly and yet failing to call save all over the place, because most ranges don't need it. Even with Phobos, we have had many bugs over the years due to stuff like forgetting to call save or reusing a range, because you have to test with way too many range types to catch all of the edge cases. IMHO, we need to be trying to reduce the number of edge cases, not increasing them. And it's not like non-copyable types have ever worked properly with ranges - so not supporting them is not removing capabiltiies. It's just not adding capabilities that a select few would like to have. And while I feel for them, I really think that it's far too niche to support given how costly it is to support. - Jonathan M Davis
Jan 22
On Monday, 22 January 2024 at 23:07:28 UTC, Jonathan M Davis wrote:In practice, basic input ranges don't work with the vast majority of algorithms anyway. Some do (e.g. map), but pretty much anything that needs to do more than a simple transformation on the elements ends up needing a forward range. I really don't think that we're going to lose much by forcing basic input ranges to be separate.Here's a very rough approximation of the number of algorithms in Phobos that work with basic input ranges: $ grep 'isInputRange' std/algorithm/*.d | wc -l 153 For comparison, here are the other range types: $ grep 'isForwardRange' std/algorithm/*.d | wc -l 141 $ grep 'isBidirectionalRange' std/algorithm/*.d | wc -l 42 $ grep 'isRandomAccessRange' std/algorithm/*.d | wc -l 62 So, I would say that your hypothesis is not supported by the data, in this case.Do they? Non-copyable types do not work with ranges right now. They don't work with foreach right now. Non-copyable types have their uses to be sure, but they're also fairly niche.Work is being done to make ranges work with non-copyable types (e.g. [1]). Foreach works if the elements are rvalues, or the elements are lvalues and you use ref for the loop variable. This is the correct, expected behavior. I think a big part of the reason non-copyable types are not widely used is that a lot of existing library code was written without support for them in mind. Since we have an opportunity to start from scratch with Phobos v3, there is no reason to repeat this mistake. [1] https://github.com/dlang/phobos/pull/8721trying to support non-copyable types will actively make range-base code worse in many cases, because it will require calling front multiple times, and many range-based algorithms do real work in front rather than simply returning a value (e.g. map works that way).Have you ever actually written generic code that works with non-copyable types? I have, and it mostly involves (a) passing things by ref, (b) using 'auto ref' and core.lifetime.forward a lot, and (c) checking isCopyable!T before trying to copy things. I have no idea why you think this would require calling front multiple times, or doing extra work. Perhaps you could try writing out some example code to clarify?Personally, I think that we should make it very explicit that non-copyable types are not supported by ranges. They don't work with them now, and I don't think that it's even vaguely worth it to try to make ranges work with them. The cost is too high for too little benefit.First, as a general rule, I don't think that we, as standard-library authors, should be in the business of telling our users that their use-cases are illegitimate just because it makes our jobs more difficult. Second...how do you know the benefit is too little? You thought earlier that "the vast majority of algorithms" didn't work with basic input ranges, even though the actual data shows that isn't true. Unless you have concrete evidence, I think it is better to avoid making assumptions about which use-cases are important and which aren't.
Jan 22
On Monday, January 22, 2024 5:57:49 PM MST Paul Backus via Digitalmars-d wrote:On Monday, 22 January 2024 at 23:07:28 UTC, Jonathan M Davis wrote:I will admit that the number is higher than I would have expected, but most of those algorithms are on the simple side where the algorithm itself doesn't care whether the range is consumed, whereas the caller very much cares. They were written to accept basic input ranges, because the algorithm itself will work with a basic input range, but that doesn't mean that it can actually be used with basic input ranges to do anything useful. For instance, you can use startsWith on a basic input range, but you pretty much never would, because then you've consumed it - and actually, it's not even safe to use the rest of the range in generic code after passing it to startsWith, because it was copied, and the original could be in an invalid state. The same goes with any function that takes an argument and gives you a result that doesn't wrap the input range and return it or return a partially iterated range. In my experience, it's very rare that it's possible to make anything that isn't incredibly simple work with a basic input range due to the fact that looking at it consumes it and makes it impossible to then do anything useful with it. And it's easy to write simple algorithms which will technically work with a basic input range but which aren't actually useful with a basic input range. So, sure, a surprising number of the algorithms in std.algorithm work with isInputRange, but a number of them really shouldn't be used with basic input ranges. Now, a few of them might be more useful if you could rely on a basic input range having reference semantics, but their composibility in general is pretty poor. Either way, I think that the gain from being able to require specific copy semantics for both forward ranges and basic input ranges is worth it. And since such a requirement couldn't actually be enforced at compile time unless we made it impossible to have basic input ranges that were structs (which would then make reference counting impossible), then a specific algorithm could choose to add a function to forward ranges via UFCS and support them if it didn't care about the copy semantics, but like with trusted code, I think that it should then be up to the programmer to make sure that it does the right thing so that basic input ranges can be required to have reference semantics.In practice, basic input ranges don't work with the vast majority of algorithms anyway. Some do (e.g. map), but pretty much anything that needs to do more than a simple transformation on the elements ends up needing a forward range. I really don't think that we're going to lose much by forcing basic input ranges to be separate.Here's a very rough approximation of the number of algorithms in Phobos that work with basic input ranges: $ grep 'isInputRange' std/algorithm/*.d | wc -l 153 For comparison, here are the other range types: $ grep 'isForwardRange' std/algorithm/*.d | wc -l 141 $ grep 'isBidirectionalRange' std/algorithm/*.d | wc -l 42 $ grep 'isRandomAccessRange' std/algorithm/*.d | wc -l 62 So, I would say that your hypothesis is not supported by the data, in this case.Foreach works if the elements are rvalues, or the elements are lvalues and you use ref for the loop variable. This is the correct, expected behavior.Well, I couldn't get it to work when I tried, but either way, it means that you can't use foreach on a range of non-copyable types in generic code, because the way that you use foreach would have to be the same regardless of the elment type. Of course, you could use static if and branch the code based on whether the types are copyable or not, but that's yet another complication to range implementations which are already often far too complicated with too many static if branches.If you can't rely on front returning something that you can copy, then you're going to have to call front multiple times any time that you need to access front multiple times - either that or call move on it, which you can't necessarily do safely, because that would alter the range. Algorithms that don't have to look at front more than once don't have that problem, but plenty of algorithms need to do multiple things with front. And often, it's much better to just store the result of front rather than call front multiple times, because it's often the case that front does actual work that you'd rather not do over and over again (and it's particularly bad when you have stuff like the result of map where the transformation function it was given allocates on the heap). So the result of having to worry about non-copyable types is that algorithms that need to access front multiple times are either going to end up calling front multiple times, harming performance in many cases, or they're going to have static if branches to change what they're doing based on the return type of front, which is going to make them that much more complex to implement.trying to support non-copyable types will actively make range-base code worse in many cases, because it will require calling front multiple times, and many range-based algorithms do real work in front rather than simply returning a value (e.g. map works that way).Have you ever actually written generic code that works with non-copyable types? I have, and it mostly involves (a) passing things by ref, (b) using 'auto ref' and core.lifetime.forward a lot, and (c) checking isCopyable!T before trying to copy things. I have no idea why you think this would require calling front multiple times, or doing extra work. Perhaps you could try writing out some example code to clarify?IMHO, it's already way too complicated to write correct range-based code, and adding non-copyable types into the mix is very much a step too far. Obviously, there can be disagreement on that, but it's already very difficult to write range-based code that actually works properly with all types of ranges. You have to test it with a ridiculous number of types, and most people just don't do that. Phobos often doesn't do that. And every capability that we add to ranges means even more static if blocks and even more variations that need to be tested. The only way to fix that is to simplify things by putting greater restrictions and requirements on how ranges work, whereas making them work with a wider variety of things is going in the opposite direction of that. Realistically, the best that even might happen here for folks who want ranges to work with non-copyable types is that a subset of the algorithms in Phobos will be made to work with them, but third-party libraries are not going to go to that level of effort (and I don't think that it's all reasonable to tell them that they should). And Phobos' range-based code will become even more complicated and that much harder to maintain and make sure that it works correctly. And we already have too much trouble making sure that Phobos' range-based code works correctly. So, my take on it is that supporting non-copyable types is simply not worth the effort given that it's going to complicate the code and the tests that much more, and they're already too complex. And so, if it's my choice, then we simply won't support non-copyable types with ranges. Obviously, I'm not the sole voice in this, and ultimately, the choice is going to be up to Walter and Atila (probably Atila given that he's supposed to be the one in charge of Phobos), so I could definitely be overruled on this, but I very much want to see it become easier to write correct range-based code, not have it become harder. - Jonathan M DavisPersonally, I think that we should make it very explicit that non-copyable types are not supported by ranges. They don't work with them now, and I don't think that it's even vaguely worth it to try to make ranges work with them. The cost is too high for too little benefit.First, as a general rule, I don't think that we, as standard-library authors, should be in the business of telling our users that their use-cases are illegitimate just because it makes our jobs more difficult. Second...how do you know the benefit is too little? You thought earlier that "the vast majority of algorithms" didn't work with basic input ranges, even though the actual data shows that isn't true. Unless you have concrete evidence, I think it is better to avoid making assumptions about which use-cases are important and which aren't.
Jan 22
On Tuesday, 23 January 2024 at 00:57:49 UTC, Paul Backus wrote:$ grep 'isForwardRange' std/algorithm/*.d | wc -l 141 $ grep 'isBidirectionalRange' std/algorithm/*.d | wc -l 42 $ grep 'isRandomAccessRange' std/algorithm/*.d | wc -l 62 So, I would say that your hypothesis is not supported by the data, in this case.Oh, I hope someone pays attension and makes arrays more important then doubly linked lists
Jan 24
On Monday, 22 January 2024 at 05:22:44 UTC, Paul Backus wrote:You have N data structures and M algorithms, and by using this common interface you can implement all N*M combinations in O(N+M) code. Each data structure implements the range API once, and each algorithm is implemented once using the range API. If you split the API in two, by making input streams (basic input ranges) and forward ranges completely disjoint, you undermine this goal. Now, each data structure has to implement *two* APIs, and each algorithm has to be implemented *twice*, once for input streams and once for forward ranges.Not necessarily. First, any algorithm requiring a forward range wouldn’t be implemented twice. And, most importantly, a data structure would not offer input range iteration if it can offer forward range. Why would it?In practice, what's going to happen is library authors will simply not bother to implement both, and we will end up with gaps where many (data structure, algorithm) pairs are not covered.The only realistic concern I see with the proposal is algorithms being implemented requiring forward ranges when they could (also) work with basic input ranges. Overconstraining input is not specific to ranges, though. What Jonathan never mentioned in his proposal, but something that, as far as I see, is true: One can wrap any forward range to be an input range. Heck, the pointer to any forward range is an input range, assuming there is a global `next(ForwardRange*)` function utilizing the forward range primitives. If you pass `&forwardRange` to an input range algorithm, it may copy it around as it pleases, it will change the underlying range just like an input range should; and you’d have the `&` for an indication that it does. A problem I see with this is composition. Requiring the lifting to a pointer requires lvalues, and composition lives on not having those. The issue will generally be negligible, as most range algorithms returning a range will for the most part want to preserve as many of the abilities of the original range as possible. The difference in the algorithm would be that, instead of: ```d // implement input range static if (/* base range is forward range */) // implement forward range static if (/* base range is bidi range */ // implement bidi range // etc. ``` You would have: ```d static if (/* base range is input range */) // implement input range else: static if (/* base range is forward range */) // implement forward range static if (/* base range is bidi range */) // implement bidi range // etc. ```
Jan 23
On Monday, 22 January 2024 at 03:52:02 UTC, Jonathan M Davis wrote:I'd have to think about it more to see whether having hasNext instead of returning a nullable type from next would be an improvement, make it worse, or be more or less equal. A lot of that depends on how it would affect the typical implementation of a basic input range.Well one advantage of `hasNext` is that you can check in advance whether range is empty or not, without the need to pop front element.Part of the point here is to _not_ do that, because they fundamentally have different semantics.That would mean that you need to overload a method to accept both input, and forward ranges when foreach is not used, right?If this is coming across as complicated, then it's probably because of all of the explanatory text I had to give as to why these changes should be made. But the changes themselves simplify things for forward ranges.Could be the case :). Perhaps it would be nice to see it as a code example.
Jan 22
On Monday, January 22, 2024 2:17:08 AM MST Alexandru Ermicioi via Digitalmars- d wrote:On Monday, 22 January 2024 at 03:52:02 UTC, Jonathan M Davis wrote:trueI'd have to think about it more to see whether having hasNext instead of returning a nullable type from next would be an improvement, make it worse, or be more or less equal. A lot of that depends on how it would affect the typical implementation of a basic input range.Well one advantage of `hasNext` is that you can check in advance whether range is empty or not, without the need to pop front element.Yes, though in practice, most functions tend to require forward ranges anyway. So, while the separation will be annoying in some cases, in general, I wouldn't expect it to be a big problem, and if anything, it will help make the distinction between basic input ranges and forward ranges clearer, which will probably make it easier to reason about the code. Obviously, we'll have to actually do it to see what the impact is, but realistically, about all you can do with basic input ranges is iterate through them once and possibly do transformations on their elements as you do it. The fact that they can be iterated separately from a foreach loop is beneficial, and some range-based functions are useful with them (e.g. map), but most range-based algorithms need forward ranges.Part of the point here is to _not_ do that, because they fundamentally have different semantics.That would mean that you need to overload a method to accept both input, and forward ranges when foreach is not used, right?Well for forward ranges, you'd basically be writing what you write now except that you wouldn't have to worry about calling save anywhere, which would signifcantly reduce the number of bugs that crop into range-based code. So, about all I could really show you would be normal range-based code that just doesn't call save, because it doesn't need to. It would also allow you to assign one range to another (assuming that they're the same type), which you can't do right now without relying on the specific copy semantics of a range. But for forward ranges, the main change here would simply be that you could assume that copying a range was equivalent to save, which is purely a simplification. If there's a complication, it's only that ranges which are currently reference types would have to do something smarter than saving in their copy constructor if they wanted to be efficient. So, they'd work with something like struct Range(T) { T front() { return _range.front; } void popFront() { _range.popFront(); } bool empty() { return _range.empty; } // or copy constructor this(this) { _range = _range.save; } private static class RefRange { ... } private RefRange _range; } but if you wanted them to be more efficient, you'd need something like struct Range(T) { T front() { return _range.front; } void popFront() { if(_range._refCount != 1) _range = _range.save; // ref count now 1 _range.popFront(); } bool empty() { return _range.empty; } // or copy constructor this(this) { _range.incRef(); } ~this() { _range.decRef(); } private static class RefRange { ... } private RefRange _range; } So, there would be some extra complication for reference type forward ranges, but most forward ranges already do the equivalent of calling save when they're copied, which is a large part of why you frequently end up with bugs when you try to use a range that needs you to call save explicitly. - Jonathan M DavisIf this is coming across as complicated, then it's probably because of all of the explanatory text I had to give as to why these changes should be made. But the changes themselves simplify things for forward ranges.Could be the case :). Perhaps it would be nice to see it as a code example.
Jan 22
On Sunday, 21 January 2024 at 05:00:31 UTC, Jonathan M Davis wrote:From what I can see, the main negative is simply that you can't then write code that works on both a basic input range and a forward range (though you can obviously still create function overloads so that the caller can use either)With the proposed design, it would be possible to implement next() for forward ranges as a UFCS function: auto next(FR)(FR fr) if (std.v2.range.isForwardRange!FR) { if (fr.empty) { return Nullable!(ElementType!FR).init; } else { scope(success) fr.popFront; return nullable(fr.front); } } So, a function written to operate on basic input ranges would be able to accept any kind of range, same as today.
Jan 21
On Sunday, 21 January 2024 at 14:51:33 UTC, Paul Backus wrote:With the proposed design, it would be possible to implement next() for forward ranges as a UFCS function: auto next(FR)(FR fr)Correction: fr should be passed by ref (or maybe auto ref).
Jan 21
On Sunday, January 21, 2024 7:51:33 AM MST Paul Backus via Digitalmars-d wrote:On Sunday, 21 January 2024 at 05:00:31 UTC, Jonathan M Davis wrote:True, but then you have a function with the same copy semantics problem that we have now, since the forward range wouldn't have reference semantics. It would need to be fully wrapped in another type which gave it reference semantics to do that. So, while someone could do something like this, I question that we should encourage it or support it with anything in Phobos. - Jonathan M DavisFrom what I can see, the main negative is simply that you can't then write code that works on both a basic input range and a forward range (though you can obviously still create function overloads so that the caller can use either)With the proposed design, it would be possible to implement next() for forward ranges as a UFCS function: auto next(FR)(FR fr) if (std.v2.range.isForwardRange!FR) { if (fr.empty) { return Nullable!(ElementType!FR).init; } else { scope(success) fr.popFront; return nullable(fr.front); } } So, a function written to operate on basic input ranges would be able to accept any kind of range, same as today.
Jan 21
On Monday, 22 January 2024 at 02:10:32 UTC, Jonathan M Davis wrote:On Sunday, January 21, 2024 7:51:33 AM MST Paul Backus via Digitalmars-d wrote:[...]With the proposed design, it would be possible to implement next() for forward ranges as a UFCS function:True, but then you have a function with the same copy semantics problem that we have now, since the forward range wouldn't have reference semantics. It would need to be fully wrapped in another type which gave it reference semantics to do that.You caught me--I forgot to mark the parameter as 'ref'. Dangers of writing code directly into a message without testing it first, I suppose. :)
Jan 21
On Sunday, January 21, 2024 7:30:40 PM MST Paul Backus via Digitalmars-d wrote:On Monday, 22 January 2024 at 02:10:32 UTC, Jonathan M Davis wrote:Well, even if the parameter is ref (which it would need to be), the core problem is that the range-based function then has to deal with passing around an object that could have value semantics even though basic input ranges would be required to have reference semantics with the proposed changes. Obviously, range-based code _can_ work without caring about the copy semantics of the range type, but the main thrust of the proposed changes is to try to require specific copy semantics from different categories of ranges so that we can avoid the bugs that come with different range types having different copy semantics (as well as making it possible to write code that takes advantage of the specific copy semantics of a particular category of range; even simply being able to assign one range to another in generic code would be a big help to some code). Ideally, code would be able to rely on the copy and assignment semantics of ranges - and that means not trying to treat basic input ranges and forward ranges as the same thing without actually wrapping one in a way that forces it to be the same thing (be it by turning a forward range into a reference type with the basic input range API or by wrapping a basic input range in a struct which caches the data such that it can implement the forward range API). Simply slapping the functions for the other API on one of the two types will result in ranges that won't behave properly, which will work with some functions and fail miserably with others. - Jonathan M DavisOn Sunday, January 21, 2024 7:51:33 AM MST Paul Backus via Digitalmars-d wrote:With the proposed design, it would be possible to implement[...]next() for forward ranges as a UFCS function:True, but then you have a function with the same copy semantics problem that we have now, since the forward range wouldn't have reference semantics. It would need to be fully wrapped in another type which gave it reference semantics to do that.You caught me--I forgot to mark the parameter as 'ref'. Dangers of writing code directly into a message without testing it first, I suppose. :)
Jan 21
On Sunday, January 21, 2024 7:51:33 AM MST Paul Backus via Digitalmars-d wrote:On Sunday, 21 January 2024 at 05:00:31 UTC, Jonathan M Davis wrote:Actually. What occurs to me is that if we made this function take the range by pointer, then it would be possible to give a forward range the correct copy semantics. You'd then have to take its address (or put it on the heap) to make it work, but then you'd get reference semantics even though it's a forward range. Obviously, that's less than ideal if you want a function to accept both basic input ranges and forward ranges, but if it's a function that really only needs basic input ranges and doesn't benefit from operating on a forward range, then it would be simple enough to create an overload which takes the address - and if you really want to be returning the range from the function, then you wouldn't want it to be operating on both basic input ranges and forward ranges anyway, since you'd usually be looking to wrap the result, in which case, you'd need to do all of the annoying static ifs to determine which functions from the range API you could slap on the wrapper range based on the capabilities of the one being passed in. - Jonathan M DavisFrom what I can see, the main negative is simply that you can't then write code that works on both a basic input range and a forward range (though you can obviously still create function overloads so that the caller can use either)With the proposed design, it would be possible to implement next() for forward ranges as a UFCS function: auto next(FR)(FR fr) if (std.v2.range.isForwardRange!FR) { if (fr.empty) { return Nullable!(ElementType!FR).init; } else { scope(success) fr.popFront; return nullable(fr.front); } } So, a function written to operate on basic input ranges would be able to accept any kind of range, same as today.
Jan 22
On Sunday, 21 January 2024 at 14:51:33 UTC, Paul Backus wrote:On Sunday, 21 January 2024 at 05:00:31 UTC, Jonathan M Davis wrote:It wouldn’t work. That’s the point. It would compile, probably, but by the proposal, an input range would have guaranteed reference semantics, whereas a forward range has value semantics (as far as iteration state is concerned). That means, if you call `next` on a copy of an input range, *both* ranges, the old one and the copy, are on the following element, but doing that on a forward range, only the copy would be on the next element and the original would not. Both guaranteed. I’d rather say that a forward range must have a ` disable`d `next`, i.e. `isForwardRange` should test if `next` is present and if it is, refuse to acknowledge the forward range. By the proposal, input and forward ranges are fundamentally incommensurate concepts on the usage side; granted, only when it comes to copying, but that’s enough.From what I can see, the main negative is simply that you can't then write code that works on both a basic input range and a forward range (though you can obviously still create function overloads so that the caller can use either)With the proposed design, it would be possible to implement next() for forward ranges as a UFCS function: auto next(FR)(FR fr) if (std.v2.range.isForwardRange!FR) { if (fr.empty) { return Nullable!(ElementType!FR).init; } else { scope(success) fr.popFront; return nullable(fr.front); } } So, a function written to operate on basic input ranges would be able to accept any kind of range, same as today.
Jan 23
Thanks for bringing this up. Improving range semantics is something we should not pass up the opportunity for with Phobos3. (I consider Phobos1 the D1 version, Phobos2 the one we have currently.) Another thing that should be visited is the ability to lock/unlock a range. For example, a range that represents stdout needs to be lockable.
Jan 21
On Sunday, 21 January 2024 at 17:15:58 UTC, Walter Bright wrote:Another thing that should be visited is the ability to lock/unlock a range. For example, a range that represents stdout needs to be lockable.Surely this would be better handled outside the range API. E.g., you call a library function that locks stdout and returns a range, and then when you're done with the range, it unlocks stdout again in its destructor. But the functions you pass the range to (map, filter, whatever) don't need to know anything about locking. This *would* be a good use-case for a non-copyable range, since you only want to unlock once (and reference counting would add significant overhead). So even if we don't *require* ranges to be non-copyable, we probably want to *allow* then to be.
Jan 21
On Sunday, January 21, 2024 10:15:58 AM MST Walter Bright via Digitalmars-d wrote:Thanks for bringing this up. Improving range semantics is something we should not pass up the opportunity for with Phobos3. (I consider Phobos1 the D1 version, Phobos2 the one we have currently.) Another thing that should be visited is the ability to lock/unlock a range. For example, a range that represents stdout needs to be lockable.I will have to study that more to see what's needed there. If it's the kind of thing that requires higer level locking than simply locking within the implementations of existing range-based functions (e.g. if it had to be locked across multiple calls), then that likely either merits a non-standard extension that the user has to worry about that's specific to stdio stuff, or we'll need to come up with some sort of higher level addition to handle it. I very much doubt that it's the sort of thing that's going to make much sense to worry about within most range-based functions though. I will add it to the list of things to look into though. There are several other minor changes that I think are going to be needed to the range API (e.g. requiring that random-access ranges implement opDollar rather than letting it be optional, since it's useless if it's optional like it is now), but the core point of this thread was to try to sort out the issue of the copy and assignment semantics for ranges, and the best solution for that seems to be to get rid of save (which would allow all forward ranges to behave like dynamic arrays as far as their copy, assignment, and iteration semantics go) and to make basic input ranges distinct from forward ranges, since they fundamentally can't have the same semantics. The issue of tail-const is going to be another big one, though that could result in us having to switch to a head/tail model for forward ranges instead of front/popFront - either that, or we're going to need a language improvement of some kind to sort it out, since we need to be able to generically get a tail-const copy of a forward range, whereas we currently don't have a way to convert const(Range!Element) to Range!(const Element). We can't even assume that const(Range!Element) is the same as const(Range!(const Element)). So, there are definitely further issues that will need to be hashed out beyond the copy semantics. I just decided to start with that. - Jonathan M Davis
Jan 21
On Sunday, 21 January 2024 at 05:00:31 UTC, Jonathan M Davis wrote:I've been thinking about this for a while now, but with the next version of Phobos which is in the early planning stages, we really should do some redesigning of ranges.Thanks for the write up. Another issue I have encountered is around priming. Sometimes it happens in `empty`, sometimes in a private helper function and sometimes even in the constructor. If priming happens in `empty`, then it can't be `const`. Which is strange because you would expect `empty` not to mutate. I have been thinking about having an explicit build and iteration phase, where priming happens when you switch from build to iteration. The benefit is that implementers have a clear place where to prime the range.What I would propose for that would be a single function auto next(); where next returns a nullable type where the value returned is the next element in the range, with a null value being returned if the range is empty.What about: ``` bool next(Callable)(Callable c) { if (empty) return false; c(front()); popFront(); return true; } ``` It has the benefit of not needing to unbox a Nullable/Pointer.
Jan 21
On Sunday, 21 January 2024 at 18:26:37 UTC, Sebastiaan Koppe wrote:What about: bool next(Callable)(Callable c) { if (empty) return false; c(front()); popFront(); return true; } It has the benefit of not needing to unbox a Nullable/Pointer.If your concern is performance, an indirect function call is going to hurt a lot more than a pointer dereference. If your concern is aesthetics, dereferencing a pointer or unboxing a Nullable is far less ugly than passing a callback. // With pointer while (auto p = r.next) doStuffWith(*p); // With callback while (r.next((e) { doStuffWith(e); })) {}
Jan 21
On Sunday, 21 January 2024 at 22:40:55 UTC, Paul Backus wrote:If your concern is aesthetics, dereferencing a pointer or unboxing a Nullable is far less ugly than passing a callback.I've been writing a fair amount of Kotlin (an a tiny bit of Swift) and the trailing lambda syntax is very nice for making these APIs look nice. Bascially, if the last argument to a function is a function, you can put it's curly braces after the parentheses: `range.map{ it*2}.filter{ it < 10}` The sauce that Kotlin has that D doesn't is that 1-arg lambdas get an implicit parameter called `it` and explicit parameters come after the opening curly brace: `{ p1, p2 -> body }`. I'm also not sure how it would work with templates. Generally, I think it's much easier to think about lifetime guarantees with functional APIs, so something like that might make them more palatable syntactically.
Jan 21
On Monday, 22 January 2024 at 05:54:59 UTC, Ben Jones wrote:On Sunday, 21 January 2024 at 22:40:55 UTC, Paul Backus wrote:This is off-topic, but I want to mention that the implicit `it` confused the hell out of me when I learned Kotlin. The feature literally saves measly 6 keystrokes for `it -> `, 4 if you ignore spaces. In D, `map!(it => it * 2)` IMO, is pretty good (e.g. compared to C++). D’s lambdas have their flaws, but lack of brevity isn’t one.If your concern is aesthetics, dereferencing a pointer or unboxing a Nullable is far less ugly than passing a callback.I've been writing a fair amount of Kotlin (an a tiny bit of Swift) and the trailing lambda syntax is very nice for making these APIs look nice. Bascially, if the last argument to a function is a function, you can put it's curly braces after the parentheses: `range.map{ it*2}.filter{ it < 10}` The sauce that Kotlin has that D doesn't is that 1-arg lambdas get an implicit parameter called `it` and explicit parameters come after the opening curly brace: `{ p1, p2 -> body }`. I'm also not sure how it would work with templates. Generally, I think it's much easier to think about lifetime guarantees with functional APIs, so something like that might make them more palatable syntactically.
Jan 23
On Sunday, January 21, 2024 11:26:37 AM MST Sebastiaan Koppe via Digitalmars-d wrote:On Sunday, 21 January 2024 at 05:00:31 UTC, Jonathan M Davis wrote:Well, arguably, that's really an implementation detail of the range, and it would be pretty annoying if range-based code had to worry about explicitly priming a range (you're basically getting into two phase construction at that point, which tends to be problematic). Ideally, once you have a range, it's ready to use. So, while I could see saying what best practice was on that, I'm not sure that I'd want to require that it be done a specific way. Part of the problem is that in my experience, it sometimes makes sense to do it one place, whereas with other ranges, it makes more sense to do it somewhere else. And whether empty mutates really shouldn't matter so long as it's not actually const (it has to be logically const to be sure, but that doesn't mean that it needs to actually be const), though obviously, it can become annoying to wrap a range when you want to make your empty const, and you can't rely on the range your wrapping having empty be const. For better or worse though, the current range API makes no requirements about const (and I'm not sure if we want to add such requirements given how restrictive D's const is). If we did require const anywhere on the range API though, empty and length would be the obvious places. That being said, I'm not sure that I've ever written a range that primes anything in empty. Usually, the question is between what goes in front and and what goes in popFront and how much has to go in the constructor for that to work cleanly. And the answer is not always the same. We may also want to rework how front and popFront work for forward ranges, which could simplify the priming issue depending on what we did. Regardless though, the priming issue is certainly something that we should be thinking about. Even if we don't add anything to the API for it, it will be good to keep it in mind for how it would affect the other range API functions for any reworking of them that we might do. Regardless, the big issues that were the point of this thread were the copy and assignment semantics for ranges and what we would need to do fix those. There are definitely other issues that we're going to have to sort out (e.g. the tail-const issue).I've been thinking about this for a while now, but with the next version of Phobos which is in the early planning stages, we really should do some redesigning of ranges.Thanks for the write up. Another issue I have encountered is around priming. Sometimes it happens in `empty`, sometimes in a private helper function and sometimes even in the constructor. If priming happens in `empty`, then it can't be `const`. Which is strange because you would expect `empty` not to mutate. I have been thinking about having an explicit build and iteration phase, where priming happens when you switch from build to iteration. The benefit is that implementers have a clear place where to prime the range.Well, that would basically be another sort of opApply, which is an idiom that tends to be pretty hard to understand in comparison to the alternatives. Getting a nullable type of some kind back is far easier to read and understand, and the only real downside I see to it is that you get bugs if someone tries to access the value when there isn't one. But ranges already have that in general with front and empty, and I don't think that it's been much of an issue in practice. We could also go with next and hasNext like Alexandru suggested, in which case, we wouldn't be returning a pointer or nullable type if that's the concern. I'm not sure if that's ultimately better or worse though. In some respects, it would be better to be able to put all of the range logic in a single function, but it would also be nice in some situations to be able to ask whether a basic input range has any elements without having to grab the first one if it's there. Another consideration that comes to mind is that we might want to add a function that explicitly skips elements (since in some cases, you can skip work if you don't actually need the next element), and since next both pops the element and returns it, we wouldn't have to worry about the state of front in the process (whereas adding some kind of skip function to the current range API would result in an invalid front). - Jonathan M DavisWhat I would propose for that would be a single function auto next(); where next returns a nullable type where the value returned is the next element in the range, with a null value being returned if the range is empty.What about: ``` bool next(Callable)(Callable c) { if (empty) return false; c(front()); popFront(); return true; } ``` It has the benefit of not needing to unbox a Nullable/Pointer.
Jan 21
On Monday, 22 January 2024 at 04:38:13 UTC, Jonathan M Davis wrote:On Sunday, January 21, 2024 11:26:37 AM MST Sebastiaan KoppeYes exactly. My point is that if there is an explicit place to do priming, that uncertainty would go away.I have been thinking about having an explicit build and iteration phase, where priming happens when you switch from build to iteration. The benefit is that implementers have a clear place where to prime the range.Well, arguably, that's really an implementation detail of the range, and it would be pretty annoying if range-based code had to worry about explicitly priming a range (you're basically getting into two phase construction at that point, which tends to be problematic). Ideally, once you have a range, it's ready to use. So, while I could see saying what best practice was on that, I'm not sure that I'd want to require that it be done a specific way. Part of the problem is that in my experience, it sometimes makes sense to do it one place, whereas with other ranges, it makes more sense to do it somewhere else.That being said, I'm not sure that I've ever written a range that primes anything in empty. Usually, the question is between what goes in front and and what goes in popFront and how much has to go in the constructor for that to work cleanly. And the answer is not always the same.With 2-phase ranges it would always be the same. Anyway, here are some Phobos examples: - `filter` does its priming in empty, https://github.com/dlang/phobos/blob/bf35228426529ab19e5d17a3286d187214bf024a/std/algorithm/iteration.d#L1381 - `filterByDirectional` does a while loop in its constructor, https://github.com/dlang/phobos/blob/bf35228426529ab19e5d17a3286d187214bf024a/std/algorithm/iteration.d#L1581 - `chunkBy` calls empty in its constructor, https://github.com/dlang/phobos/blob/bf35228426529ab19e5d17a3286d187214bf024a/std/algorithm/iteration.d#L1931 - `substitute` calls empty and popFront in its constructor, https://github.com/dlang/phobos/blob/bf35228426529ab19e5d17a3286d187214bf024a/std/algorithm/iteration.d#L6909 Taken together, pretty much anything can happen just by constructing a range. You don't even need to iterate it!Regardless, the big issues that were the point of this thread were the copy and assignment semantics for ranges and what we would need to do fix those. There are definitely other issues that we're going to have to sort out (e.g. the tail-const issue).Ultimately it all comes together or it doesn't.Actual usual would just use `foreach` of course. But in cases where you want to iterate manually you have to deal with the added ugliness/complexity, fair. I do think its easier to access the item by `ref` this way, instead of having to do a pointer in some nullable wrapper.What about: ``` bool next(Callable)(Callable c) { if (empty) return false; c(front()); popFront(); return true; } ``` It has the benefit of not needing to unbox a Nullable/Pointer.Well, that would basically be another sort of opApply, which is an idiom that tends to be pretty hard to understand in comparison to the alternatives.Getting a nullable type of some kind back is far easier to read and understand, and the only real downside I see to it is that you get bugs if someone tries to access the value when there isn't one. But ranges already have that in general with front and empty, and I don't think that it's been much of an issue in practice.I don't particularly like APIs that have a requirement to call methods in a particular order, I much rather have APIs that can't be used wrong. Pit of success and all that. In practice though, I messed up only a few times, and then added a condition on `empty` and continued on.We could also go with next and hasNext like Alexandru suggested, in which case, we wouldn't be returning a pointer or nullable type if that's the concern. I'm not sure if that's ultimately better or worse though. In some respects, it would be better to be able to put all of the range logic in a single function, but it would also be nice in some situations to be able to ask whether a basic input range has any elements without having to grab the first one if it's there.With some sources, determining whether there is a next item actually involves *getting* the next item. Instead, like `empty`, I think `hasNext` ought to be `const`.
Jan 22
On Sunday, 21 January 2024 at 05:00:31 UTC, Jonathan M Davis wrote:Ultimately, I'm inclined to argue that we should give basic input ranges a new API - not just because it would allow them to use reference counting, but also because the current input range API tends to force basic input ranges to cache the value of front (which if anything, encourages basic input ranges to be pseudo-reference types and could result in an extra layer of indirection in some cases if they're forced to be reference types). It would be annoying in some cases to require that different functions be used for basic input ranges and forward ranges (though overloading could obviously be used to avoid needing different names), but it's already often the case that code isn't really designed to work with both, and overloading on the category of range being used is already fairly common, since different range capabilities allow for different implementations. So, given that it would prevent whole classes of copying bugs as well as potentially remove the requirement to cache front for basic input ranges, I think that a separate API for basic input ranges is warranted. What I would propose for that would be a single function auto next(); where next returns a nullable type where the value returned is the next element in the range, with a null value being returned if the range is empty. The return type would then need to emulate a pointer - specifically, when casting it to bool, it would be true if it's non-null and false if it's null, and dereferencing it would give you the actual value if it's non-null (with it being undefined behavior if you dereference null). So, a basic input range of ints might define next as int* next();While I agree with most of what you wrote, there is one very big problem with this. Namely, current input ranges are suddenly considered to be forward ranges. Changing the semantics of what it means to only define `front`, `popFront`, and `empty` changes the semantics of existing code. And this is not something you can solve with some kind of versioning. The only way I can think of is to alter the name of one or more primitives to prevent accidental usage. -Steve
Jan 21
On Sunday, 21 January 2024 at 19:24:20 UTC, Steven Schveighoffer wrote:While I agree with most of what you wrote, there is one very big problem with this. Namely, current input ranges are suddenly considered to be forward ranges. Changing the semantics of what it means to only define `front`, `popFront`, and `empty` changes the semantics of existing code. And this is not something you can solve with some kind of versioning. The only way I can think of is to alter the name of one or more primitives to prevent accidental usage.One possibility is to change the forward range interface to something like bool empty(); Element head(); typeof(this) tail(); In addition to distinguishing new-style forward ranges from old-style ones, this interface would also allow forward ranges to be `const` (although iterating a `const` forward range would only be possible with recursion). `popFront` could then be implemented as a UFCS function: void popFront(R)(ref R r) if (std.v2.range.isForwardRange!R && isAssignable!R) { r = r.tail; }
Jan 21
On Sunday, January 21, 2024 3:46:06 PM MST Paul Backus via Digitalmars-d wrote:On Sunday, 21 January 2024 at 19:24:20 UTC, Steven Schveighoffer wrote:Well, I think that the reality of the matter here is that if we want to deal with const and immutable properly (which is another issue that we need to be tackling with the range API redesign), we're either going to need to switch to a more functional-style API like this, or we're going to need to add some sort of opTailConst to the language, or we're going to need to add some other sort of language improvement that would allow use to convert const(Range!Element) to Range!(const Element) generically. As things stand, const and ranges really don't interact well together outside of fairly limited situations (e.g. using dynamic arrays for everything), and containers in particular get screwed by it. So, we may very well decide that we need to move forward ranges to something like head and tail, though I'm not terribly happy with the idea, and I'm not sure what the performance implications are (though LDC at least could probably make it not matter). It would also require another change to foreach, but pretty much _anything_ we do here to fix the tail-const problem with ranges is going to require some kind of language change. Another problem with the head/tail approach is that we then have to come up with names for bidirectional ranges, and all of that could get pretty confusing. But ultimately, that can be solved. Either way, the main things that I think needs to happen to fix the copy semantics problems are to require that forward ranges have the same copy semantics as dynamic arrays (at least with regards to their iteration state), which means that save is no longer a thing, and we need basic input ranges to be reference types, which means that they should probably have a different API from forward ranges. I suppose that we could then change the forward range API and leave the basic input range API alone, but given that the current API requires caching front, which causes some problems for basic input ranges, I'm inclined to give basic input ranges a different API. If we then also need to change the API of forward ranges beyond removing save, then so be it, though that will result in a lot more code having to be changed when it's updated to use or be in Phobos v3. - Jonathan M DavisWhile I agree with most of what you wrote, there is one very big problem with this. Namely, current input ranges are suddenly considered to be forward ranges. Changing the semantics of what it means to only define `front`, `popFront`, and `empty` changes the semantics of existing code. And this is not something you can solve with some kind of versioning. The only way I can think of is to alter the name of one or more primitives to prevent accidental usage.One possibility is to change the forward range interface to something like bool empty(); Element head(); typeof(this) tail(); In addition to distinguishing new-style forward ranges from old-style ones, this interface would also allow forward ranges to be `const` (although iterating a `const` forward range would only be possible with recursion). `popFront` could then be implemented as a UFCS function: void popFront(R)(ref R r) if (std.v2.range.isForwardRange!R && isAssignable!R) { r = r.tail; }
Jan 21
On Sunday, January 21, 2024 12:24:20 PM MST Steven Schveighoffer via Digitalmars-d wrote:While I agree with most of what you wrote, there is one very big problem with this. Namely, current input ranges are suddenly considered to be forward ranges. Changing the semantics of what it means to only define `front`, `popFront`, and `empty` changes the semantics of existing code. And this is not something you can solve with some kind of versioning. The only way I can think of is to alter the name of one or more primitives to prevent accidental usage.Yeah, I was orignally hoping that this sort of thing could just be solved with editions, but regardless of what we do with editions and Phobos, third party code won't necessarily be updated to match, and it won't even necessarily be obvious whether it has been. To an extent, that doesn't really matter, and we could theoretically punt on the issue based on the idea that any basic input ranges written to follow the Phobos v2 API would just behave incorrectly with functions written to follow the Phobos v3 API for forward ranges, so testing would catch it, but that's obviously not ideal. It _could_ be caught if we required all basic input ranges to be either class references or pointers to structs, while requiring that all forward ranges be structs or dynamic arrays (and we will have to do the latter), but that would arguably restrict the implementation of basic input ranges too much (e.g. reference counting wouldn't be possible). For better or worse, it's definitely an argument for switching to a more functional-style API like Paul mentioned and like has occasionally been discussed in the past (usually as a potentially solution to the tail-const and tail-immutable issue). That would actually allow us to solve other problems rather than simply being a minor API change just to make it so that we could detect the difference, though I'm not sure what all of the other consequences of such a change would be (particularly with regards to performance). - Jonathan M Davis
Jan 21
On Sunday, 21 January 2024 at 05:00:31 UTC, Jonathan M Davis wrote:I've been thinking about this for a while now, but with the next version of Phobos which is in the early planning stages, we really should do some redesigning of ranges. Most of their API is just fine how it is, but there are some aspects of it which really should be changed if we want them to be better (the most obvious one being the removal of auto-decoding). But what I'd like to discuss specifically in this thread is fixing - and defining - the semantics of copying and assigning to ranges. Specifically, the semantics of stuff like [...]I don't think I've ever encountered a situation where reference ranges would have been desirable - I've never used one. I think that `.save` was a historical mistake, and that ranges that can be copied are forward ranges. Something like a range reading from stdin or a socket would/should disable the copy/postblit constructors. Has anyone here used the class-based ranges?
Jan 22
On 23/01/2024 4:41 AM, Atila Neves wrote:On Sunday, 21 January 2024 at 05:00:31 UTC, Jonathan M Davis wrote:I did experiment with them when I first learned ranges. Nothing is set up for them. But they do two things extremely well: 1. Create confusion, they are not the definition of what a range is, nor are they used anywhere. 2. Of course since then I've also learned that they offer opportunity to prevent optimizations, so not a great solution to be tied to. I want signatures. I've told Adam Wilson that if you want me involved with ranges for PhobosV3 at the very least I want the compile time aspect. The current situation with ranges not being properly documented in API is just not good enough. Signatures do have a runtime aspect, they can represent ranges if you need a reference, we can also implement something like save via that (even if the implementation doesn't have it).I've been thinking about this for a while now, but with the next version of Phobos which is in the early planning stages, we really should do some redesigning of ranges. Most of their API is just fine how it is, but there are some aspects of it which really should be changed if we want them to be better (the most obvious one being the removal of auto-decoding). But what I'd like to discuss specifically in this thread is fixing - and defining - the semantics of copying and assigning to ranges. Specifically, the semantics of stuff like [...]I don't think I've ever encountered a situation where reference ranges would have been desirable - I've never used one. I think that `.save` was a historical mistake, and that ranges that can be copied are forward ranges. Something like a range reading from stdin or a socket would/should disable the copy/postblit constructors. Has anyone here used the class-based ranges?
Jan 22
On Mon, Jan 22, 2024 at 03:41:35PM +0000, Atila Neves via Digitalmars-d wrote:On Sunday, 21 January 2024 at 05:00:31 UTC, Jonathan M Davis wrote:It's useful in recursive-descent parsers where you expect the range to have advanced past whatever a recursive call has consumed.I've been thinking about this for a while now, but with the next version of Phobos which is in the early planning stages, we really should do some redesigning of ranges. Most of their API is just fine how it is, but there are some aspects of it which really should be changed if we want them to be better (the most obvious one being the removal of auto-decoding). But what I'd like to discuss specifically in this thread is fixing - and defining - the semantics of copying and assigning to ranges. Specifically, the semantics of stuff like [...]I don't think I've ever encountered a situation where reference ranges would have been desirable - I've never used one.I think that `.save` was a historical mistake, and that ranges that can be copied are forward ranges. Something like a range reading from stdin or a socket would/should disable the copy/postblit constructors.Meaning they can only be passed by reference?Has anyone here used the class-based ranges?Yes. They are useful when you have a range whose type varies at runtime. For example: auto myFunc(R)(R range, int type) { switch (type) { case 1: return range.map!(a => a*2); case 2: auto auxRange = helperFunc(range); return range.filter!(a => a % 2) .chain(auxRange); case 3: return range.chain(recurrence!(...)); } assert(0); } This code will not compile because there is no common return type. To fix it, wrap each return in out of the class-based range objects: auto myFunc(R)(R range, int type) { switch (type) { case 1: return range.map!(a => a*2) .inputRangeObject; case 2: auto auxRange = helperFunc(range); return range.filter!(a => a % 2) .chain(auxRange) .inputRangeObject; case 3: return range.chain(recurrence!(...)) .inputRangeObject; } assert(0); } T -- They say that "guns don't kill people, people kill people." Well I think the gun helps. If you just stood there and yelled BANG, I don't think you'd kill too many people. -- Eddie Izzard, Dressed to Kill
Jan 22
On Monday, 22 January 2024 at 17:07:05 UTC, H. S. Teoh wrote:On Mon, Jan 22, 2024 at 03:41:35PM +0000, Atila Neves via Digitalmars-d wrote:Why can't this be done with value ranges?On Sunday, 21 January 2024 at 05:00:31 UTC, Jonathan M Davis wrote:It's useful in recursive-descent parsers where you expect the range to have advanced past whatever a recursive call has consumed.[...]I don't think I've ever encountered a situation where reference ranges would have been desirable - I've never used one.Or moved.I think that `.save` was a historical mistake, and that ranges that can be copied are forward ranges. Something like a range reading from stdin or a socket would/should disable the copy/postblit constructors.Meaning they can only be passed by reference?
Jan 23
On Tue, Jan 23, 2024 at 09:19:24AM +0000, Atila Neves via Digitalmars-d wrote:On Monday, 22 January 2024 at 17:07:05 UTC, H. S. Teoh wrote:[...] Because the recursive call advances a value copy of the passed range, and when it returns, the range in the caller is still in the old position. T -- "Hi." "'Lo."On Mon, Jan 22, 2024 at 03:41:35PM +0000, Atila Neves via Digitalmars-d wrote:Why can't this be done with value ranges?On Sunday, 21 January 2024 at 05:00:31 UTC, Jonathan M Davis wrote:It's useful in recursive-descent parsers where you expect the range to have advanced past whatever a recursive call has consumed.[...]I don't think I've ever encountered a situation where reference ranges would have been desirable - I've never used one.
Jan 23
On Monday, 22 January 2024 at 15:41:35 UTC, Atila Neves wrote:I think that `.save` was a historical mistake, and that ranges that can be copied are forward ranges. Something like a range reading from stdin or a socket would/should disable the copy/postblit constructors.This was my initial thought too, but it turns out it's not quite that simple. If we *require* basic input ranges to be non-copyable, then types which are inherently copyable (in particular, `T[]`) will be exclude from being basic input ranges. In order to use a `T[]` with an algorithm that expects a basic input range, you would have to wrap it in a `struct` that has copying disabled. It's also a big problem for composability. If half the algorithms require your range to be copyable, and the other half require it to be non-copyable, you have to keep track of [what color each algorithm is][1] whenever you write a range pipeline, or write a new algorithm that calls an existing one internally. [1]: https://journal.stuffwithstuff.com/2015/02/01/what-color-is-your-function/
Jan 22
On Monday, 22 January 2024 at 17:18:56 UTC, Paul Backus wrote:On Monday, 22 January 2024 at 15:41:35 UTC, Atila Neves wrote:I don't think we should that because algorithms that don't care if a range is copiable (i.e. forward) or not should be usable with any input range.I think that `.save` was a historical mistake, and that ranges that can be copied are forward ranges. Something like a range reading from stdin or a socket would/should disable the copy/postblit constructors.This was my initial thought too, but it turns out it's not quite that simple. If we *require* basic input ranges to be non-copyable,then types which are inherently copyable (in particular, `T[]`) will be exclude from being basic input ranges. In order to use a `T[]` with an algorithm that expects a basic input range, you would have to wrap it in a `struct` that has copying disabled.That would indeed be a mistake, but the forward range concept/template constraint is a... err... "subtype" of input ranges, so it *should* work. My idea here is changing the concept of forward ranges from ones that can have `.save` called on them to ones that can be copied. If that works, of course. The case against the idea is that right now, range authors have to opt-in to "forwardness", whereas copies have to be disabled to opt-out.It's also a big problem for composability. If half the algorithms require your range to be copyable, and the other half require it to be non-copyable, you have to keep track of [what color each algorithm is][1] whenever you write a range pipeline, or write a new algorithm that calls an existing one internally.This is the case already (sort of) with functions that expected isForwardRange!R vs isInputRange!R, the difference being a relationship of subtyping that makes it that all forward ranges are also input ranges.
Jan 23
On Monday, 22 January 2024 at 17:18:56 UTC, Paul Backus wrote:On Monday, 22 January 2024 at 15:41:35 UTC, Atila Neves wrote:You are thinking about this the wrong way. Copyability becomes a trait of forward ranges, instead of input ranges. An input range would not be copyable, therefore code which takes input ranges would have to take them by reference, or the caller would have to ensure no other copies exist. But you can still take forward ranges this way. It is not hard to do -- you just remove the requirement for copying from `isInputRange`, and add it to `isForwardRange`.I think that `.save` was a historical mistake, and that ranges that can be copied are forward ranges. Something like a range reading from stdin or a socket would/should disable the copy/postblit constructors.This was my initial thought too, but it turns out it's not quite that simple. If we *require* basic input ranges to be non-copyable, then types which are inherently copyable (in particular, `T[]`) will be exclude from being basic input ranges.In order to use a `T[]` with an algorithm that expects a basic input range, you would have to wrap it in a `struct` that has copying disabled.A `T[]` can be used without copying it. So why can't you pass that into an algorithm that takes an input range?It's also a big problem for composability. If half the algorithms require your range to be copyable, and the other half require it to be non-copyable, you have to keep track of [what color each algorithm is][1] whenever you write a range pipeline, or write a new algorithm that calls an existing one internally.The assertion is that if the range is copyable, it's now a forward range and copying is allowed. It is *on you* as the range developer to ensure copying does not work if it's an input range. The algorithm that says `isInputRange` would have to avoid copying, but that also works for forward ranges as well. There would be no need for multiple colors. Now, in *practical* terms, algorithms that are input-range compatible likely would need more careful attention to avoid copies than they currently do. -Steve
Jan 23
On Tuesday, 23 January 2024 at 17:25:47 UTC, Steven Schveighoffer wrote:You are thinking about this the wrong way. Copyability becomes a trait of forward ranges, instead of input ranges. An input range would not be copyable, therefore code which takes input ranges would have to take them by reference, or the caller would have to ensure no other copies exist. But you can still take forward ranges this way. It is not hard to do -- you just remove the requirement for copying from `isInputRange`, and add it to `isForwardRange`. [...] The assertion is that if the range is copyable, it's now a forward range and copying is allowed. It is *on you* as the range developer to ensure copying does not work if it's an input range.The only problem with this is that what you are asking the range developer to do is impossible. :) Consider what would happen if the user creates a pointer to a non-copyable input range. The pointer (a) is inherently copyable and (b) implements the range interface (via implicit dereferencing), but (c) the copies are not independent. (If your response to this is "ok, we'll just have isForwardRange explicitly reject pointers", keep in mind that this problem also applies to generic wrapper types that use `alias this` or `opDispatch`. Are we really going to tell our users that it's their fault if their code breaks because they put a range pointer inside one of these types?) So, there has to be some other way of distinguishing forward ranges from input ranges, beyond just checking whether they can be copied. For example, by using `next()` for input ranges vs. `head()`/`tail()` for forward ranges. (As [you pointed out earlier][1], reusing the `empty`/`front`/`popFront` API would not be a great idea.) Once you have this other method of distinguishing input and forward ranges, requiring forward ranges to be copyable no longer implies that pure input ranges MUST be non-copyable. So we can allow "pointer-to-input-range" to be an input range itself, even though it supports copying, and nothing will break. [1]: https://forum.dlang.org/post/yppqryhpumcucqspkqsx forum.dlang.org
Jan 23
On Tuesday, 23 January 2024 at 18:21:47 UTC, Paul Backus wrote:(If your response to this is "ok, we'll just have isForwardRange explicitly reject pointers", keep in mind that this problem also applies to generic wrapper types that use `alias this` or `opDispatch`. Are we really going to tell our users that it's their fault if their code breaks because they put a range pointer inside one of these types?)`isForwardRange` could require the range *type* to have `empty, front, popFront` defined. Not sure if that solves your `alias this`/`opDispatch` points. This would mean arrays would need the range methods built-in rather than using UFCS.
Jan 24
On Wednesday, 24 January 2024 at 12:26:43 UTC, Nick Treleaven wrote:On Tuesday, 23 January 2024 at 18:21:47 UTC, Paul Backus wrote:It could, but I don't think it's a great idea, because it would make ranges a special case. Along with all of the usual reasons special cases are bad, ranges are one of the first things new D programmers are going to encounter, and are pretty much the canonical example of introspection and structural typing in D. If we make ranges behave differently, users who learn from them are going to have a much harder time understanding every *other* use of introspection.(If your response to this is "ok, we'll just have isForwardRange explicitly reject pointers", keep in mind that this problem also applies to generic wrapper types that use `alias this` or `opDispatch`. Are we really going to tell our users that it's their fault if their code breaks because they put a range pointer inside one of these types?)`isForwardRange` could require the range *type* to have `empty, front, popFront` defined. Not sure if that solves your `alias this`/`opDispatch` points. This would mean arrays would need the range methods built-in rather than using UFCS.
Jan 24
On Tuesday, 23 January 2024 at 18:21:47 UTC, Paul Backus wrote:On Tuesday, 23 January 2024 at 17:25:47 UTC, Steven Schveighoffer wrote:Wouldn't that be on the person who wrapped the type? I wouldn't expect `RefCounted` to care that it's wrapping a range for instance. We could explicitly forbid pointers and classes, because you can't obviously implement value semantics using those. Indeed, since copying is defined by default, it's more onorous to properly create an input range. But so what? Can I tell you how many times people copy an input range without realizing the consequences today? At least if it's *explicit* that input ranges should not be copyable, then you will be more aware of the issue as the compiler complains about it.You are thinking about this the wrong way. Copyability becomes a trait of forward ranges, instead of input ranges. An input range would not be copyable, therefore code which takes input ranges would have to take them by reference, or the caller would have to ensure no other copies exist. But you can still take forward ranges this way. It is not hard to do -- you just remove the requirement for copying from `isInputRange`, and add it to `isForwardRange`. [...] The assertion is that if the range is copyable, it's now a forward range and copying is allowed. It is *on you* as the range developer to ensure copying does not work if it's an input range.The only problem with this is that what you are asking the range developer to do is impossible. :) Consider what would happen if the user creates a pointer to a non-copyable input range. The pointer (a) is inherently copyable and (b) implements the range interface (via implicit dereferencing), but (c) the copies are not independent. (If your response to this is "ok, we'll just have isForwardRange explicitly reject pointers", keep in mind that this problem also applies to generic wrapper types that use `alias this` or `opDispatch`. Are we really going to tell our users that it's their fault if their code breaks because they put a range pointer inside one of these types?)So, there has to be some other way of distinguishing forward ranges from input ranges, beyond just checking whether they can be copied. For example, by using `next()` for input ranges vs. `head()`/`tail()` for forward ranges. (As [you pointed out earlier][1], reusing the `empty`/`front`/`popFront` API would not be a great idea.)Yeah, I agree that if we change semantics, we have to change the naming, for the sake of all existing code. But the naming could be consistent between input/forward ranges as it is today. I'm not keen on the head/tail idea, as it seems way more clunky than I would like, and I'm not sold on the compiler making sure the `x = x.tail` operation is going to be efficient. I don't know how bad it would be to deal with, it's not something I have put a lot of consideration into, but I had thought that this mechanism of preventing copying for things that shouldn't be used from multiple copies sounds like the proper way to prevent problems by default. Sometimes it's good to start with what is simple and correct, even if it's not what you want, and see if you can massage it into what makes sense. -Steve
Jan 24
On Wednesday, 24 January 2024 at 16:18:11 UTC, Steven Schveighoffer wrote:On Tuesday, 23 January 2024 at 18:21:47 UTC, Paul Backus wrote:Yes, it would--which is exactly what I'd like to avoid. I don't want us to put ourselves in a position where we have to tell end users "you're holding it wrong" if they try to combine two generic library features like this. In general, we should not make assumptions in our library code unless we are capable of enforcing them. Since we are, unfortunately, not capable of enforcing the assumption that all copyable ranges have forward-range semantics, we should not allow ourselves make it. (Perhaps there's an opportunity for a language feature here? A `unique(T)` qualifier would help with this, for example.)(If your response to this is "ok, we'll just have isForwardRange explicitly reject pointers", keep in mind that this problem also applies to generic wrapper types that use `alias this` or `opDispatch`. Are we really going to tell our users that it's their fault if their code breaks because they put a range pointer inside one of these types?)Wouldn't that be on the person who wrapped the type? I wouldn't expect `RefCounted` to care that it's wrapping a range for instance.Indeed, since copying is defined by default, it's more onorous to properly create an input range. But so what? Can I tell you how many times people copy an input range without realizing the consequences today? At least if it's *explicit* that input ranges should not be copyable, then you will be more aware of the issue as the compiler complains about it.I agree that permitting input ranges to be copyable puts more of a burden on the authors of range algorithms. However, given that there are edge cases where the burden of preventing copies *must* fall on someone other than the range's author, I would rather that person be the author of the generic algorithm than the end user. My hope is that in practice, if Phobos itself contains some prominent examples of non-copyable ranges, these bugs will be found quickly during testing. That's obviously not as good as a reliable compile-time error, but it's not nothing.I don't know how bad it would be to deal with, it's not something I have put a lot of consideration into, but I had thought that this mechanism of preventing copying for things that shouldn't be used from multiple copies sounds like the proper way to prevent problems by default.For what it's worth, I agree that in practice, most input ranges should be non-copyable. [That `readf` bug][1] we discussed a little while ago would never have existed if `LockingTextReader` had been non-copyable from the start, for example. [1]: https://github.com/dlang/phobos/pull/8826
Jan 24
On Wednesday, 24 January 2024 at 17:35:32 UTC, Paul Backus wrote:In general, we should not make assumptions in our library code unless we are capable of enforcing them. Since we are, unfortunately, not capable of enforcing the assumption that all copyable ranges have forward-range semantics, we should not allow ourselves make it.Having thought about it some more, this is a bad argument, and I withdraw it. The entire range API is based on convention to begin with. Of course there's no enforcement mechanism!
Jan 24
On Wednesday, January 24, 2024 11:20:38 AM MST Paul Backus via Digitalmars-d wrote:On Wednesday, 24 January 2024 at 17:35:32 UTC, Paul Backus wrote:Yeah. We can enforce some stuff statically (e.g. we can require that a forward range be a dynamic array or a struct, since if it's a class or a pointer, we know for sure that it doesn't have the right copy semantics), but we can't actually enforce the correct copy semantics statically - just like we can't enforce the correct copy semantics with save right now. We can't even enforce that popFront removes an element from the range or that empty has anything to do with how many elements the range has. We can improve the situation with the API (e.g. getting rid of the requirement for save), and we can try to make it easier to use correctly and harder to use incorrectly, but unfortunately, there's no way to not rely on convention on some basis. - Jonathan M DavisIn general, we should not make assumptions in our library code unless we are capable of enforcing them. Since we are, unfortunately, not capable of enforcing the assumption that all copyable ranges have forward-range semantics, we should not allow ourselves make it.Having thought about it some more, this is a bad argument, and I withdraw it. The entire range API is based on convention to begin with. Of course there's no enforcement mechanism!
Jan 24
On Wednesday, 24 January 2024 at 18:20:38 UTC, Paul Backus wrote:The entire range API is based on convention to begin with. Of course there's no enforcement mechanism!Isn't `isInputRange`, `isForwardRange`, and the rest, kind of enforcement machanism?
Jan 24
On Wednesday, January 24, 2024 12:23:45 PM MST Alexandru Ermicioi via Digitalmars-d wrote:On Wednesday, 24 January 2024 at 18:20:38 UTC, Paul Backus wrote:Traits like that can test (and thus be used to enforce) _some_ of the semantics, but there are plenty of things that can't be tested statically (like what popFront actually does). They are able to test that certain code will compile with the range API (or that certain code won't compile) using the type that they're instantiated with, and they can test certain explicit stuff about the type (e.g. enforcing that it's a struct if we want to do that), but they can't actually test what the functions do. So, while we can enforce that a forward range is copyable, we can't enforce what its copy semantics are beyond disallowing stuff like classes or pointers to structs, since those are clearly reference types. But the struct itself could have a variety of copy semantics depending on its implementation, and there's no way to determine that statically. Ultimately, for a range to have the correct copy semantics, we have to rely on the programmer to implement them correctly, just like we have to rely on them implementing what front, popFront, and empty do correctly. - Jonathan M DavisThe entire range API is based on convention to begin with. Of course there's no enforcement mechanism!Isn't `isInputRange`, `isForwardRange`, and the rest, kind of enforcement machanism?
Jan 24
On Wednesday, 24 January 2024 at 20:08:56 UTC, Jonathan M Davis wrote:Traits like that can test (and thus be used to enforce) _some_ of the semantics, but there are plenty of things that can't be tested statically (like what popFront actually does). They are able to test that certain code will compile with the range API (or that certain code won't compile) using the type that they're instantiated with, and they can test certain explicit stuff about the type (e.g. enforcing that it's a struct if we want to do that), but they can't actually test what the functions do. So, while we can enforce that a forward range is copyable, we can't enforce what its copy semantics are beyond disallowing stuff like classes or pointers to structs, since those are clearly reference types. But the struct itself could have a variety of copy semantics depending on its implementation, and there's no way to determine that statically. Ultimately, for a range to have the correct copy semantics, we have to rely on the programmer to implement them correctly, just like we have to rely on them implementing what front, popFront, and empty do correctly.That's what unit tests are for, while those traits act like pseudo interfaces. Maybe there could be a test suite provided in Phobos for user to run over his own range implementations, in his own unit tests. This would make easy to spot whether user's range does or does not follow proper logic.
Jan 24
On Wednesday, January 24, 2024 2:31:19 PM MST Alexandru Ermicioi via Digitalmars-d wrote:On Wednesday, 24 January 2024 at 20:08:56 UTC, Jonathan M Davis wrote:Of course that's what unit tests are for, but that doesn't enforce anything. That's the programmers making sure that their code behaves the way that it should - that their code follows the conventions required by the range API. On the other hand, what Paul was talking about was actually statically enforcing the semantics via template constraints rather than relying on coventions, which is something that we can do with some portions of the semantics of the range API but can't do for a good chunk of it.Traits like that can test (and thus be used to enforce) _some_ of the semantics, but there are plenty of things that can't be tested statically (like what popFront actually does). They are able to test that certain code will compile with the range API (or that certain code won't compile) using the type that they're instantiated with, and they can test certain explicit stuff about the type (e.g. enforcing that it's a struct if we want to do that), but they can't actually test what the functions do. So, while we can enforce that a forward range is copyable, we can't enforce what its copy semantics are beyond disallowing stuff like classes or pointers to structs, since those are clearly reference types. But the struct itself could have a variety of copy semantics depending on its implementation, and there's no way to determine that statically. Ultimately, for a range to have the correct copy semantics, we have to rely on the programmer to implement them correctly, just like we have to rely on them implementing what front, popFront, and empty do correctly.That's what unit tests are for, while those traits act like pseudo interfaces.Maybe there could be a test suite provided in Phobos for user to run over his own range implementations, in his own unit tests. This would make easy to spot whether user's range does or does not follow proper logic.Well, we can provide test ranges to use to test a function with, but not much can be automated there, since the results would depend on what the function actually did. However, if there were set of standard range types to test with, then it would at least be easier to test a proper range of range types. Phobos itself doesn't always do a good job with that though. As for testing that a range behaves properly, that might be possible to automate, but since it's usually the case that a range comes from a function, that's not always straightforward. And while how the range should behave is largely defined by the range API, not all ranges are generic with regards to what they work with (e.g. what their element type is) or how they're constructed, so writing a generic set of tests could be challenging. Off the top of my head, I don't know how possible it would be, though I'm sure that it could be made to work with a subset of ranges. All in all, providing better testing tools is a good idea, but a lot of the question here is what we can statically enforce, since the more that we can statically enforce, the harder it will be to write range-based code that does the wrong thing, and the harder it will be to pass types to a range-based function which then won't behave properly even if the function is written properly. - Jonathan M Davis
Jan 24
On Wednesday, 24 January 2024 at 20:08:56 UTC, Jonathan M Davis wrote:Traits like that can test (and thus be used to enforce) _some_ of the semantics, but there are plenty of things that can't be tested statically (like what popFront actually does).Even if you could, what would you even test? Especially for `popFront`, what does it mean to “pop front”? A no-op could be a reasonable `popFront` implementation for a range that represents a value infinitely repeated, i.e. something like `cycle(1)`.
Feb 01
On Monday, January 22, 2024 8:41:35 AM MST Atila Neves via Digitalmars-d wrote:On Sunday, 21 January 2024 at 05:00:31 UTC, Jonathan M Davis wrote:Once you disable the copy/postblit constructor, you can't generally pass the range around or wrap it in another range - at least not without explicit calls to move. ref parameters and return values would solve some of that, but it wouldn't solve all of it (wrapping in particular, which ranges do heavily, won't work with ref). I would expect a range that can't be copied to be too annoying to even bother with. And in fact, they can't even work with foreach, because foreach copies the range that it iterates over (and has to in order to have the semantics match what hapens with dynamic arrays, though if we separate basic input ranges from forward ranges, then we can change what happens with basic input ranges). IMHO, if we want to go the route of trying to make them uncopyable, we pretty much might as well just get rid of the concept of basic input ranges and require that they use opApply (which isn't an entirely bad idea given how limited basic input ranges are in practice anyway). Basic input ranges are inherently either reference types or pseudo-reference types, and they can be made to work cleanly if we just require that they be reference types. Those are very different semantics from forward ranges, which are value types with regards to their iteration state (assuming that we require that copying them results in copies which can be independently iterated rather than having save). And that's why I'm arguing that we should split the two concepts rather than trying to treat forward ranges like an extension of basic input ranges - especially since basic input ranges are extremely hamstrung anyway, because you can't actually get an independent copy of them, and in practice, most stuff needs that. And if we did go with a solution for basic input ranges which made them non-copyable, then they couldn't be used with the same code that uses forward ranges anyway, so we might as well just give them a different API and stop treating them like the same thing when they really aren't.I've been thinking about this for a while now, but with the next version of Phobos which is in the early planning stages, we really should do some redesigning of ranges. Most of their API is just fine how it is, but there are some aspects of it which really should be changed if we want them to be better (the most obvious one being the removal of auto-decoding). But what I'd like to discuss specifically in this thread is fixing - and defining - the semantics of copying and assigning to ranges. Specifically, the semantics of stuff like [...]I don't think I've ever encountered a situation where reference ranges would have been desirable - I've never used one. I think that `.save` was a historical mistake, and that ranges that can be copied are forward ranges. Something like a range reading from stdin or a socket would/should disable the copy/postblit constructors.Has anyone here used the class-based ranges?Yes. Symmetry uses them. In general, it's best not to use them for forward ranges, but when you're dealing with code where you can't determine the type at compile time (like with an interpreter), then you need some some kind of runtime polymorphism, and you really don't have much choice. That being said, we can support that by wrapping such classes in structs and ensuring that the structs have the correct copy semantics rather than allowing forward ranges that are classes. So, it should be quite possible to get rid of save without losing any functionality, and we'd get much cleaner copy semantics in the process, making it so that forward ranges in general behave more like dynamic arrays. For basic input ranges, it should be perfectly fine for them to be classes - desirable even - simply because you can't actually get independent copies of them, and using a struct with pseudo-reference copy semantics is just begging for bugs, because it means that mutating the copy puts the original in an invalid state, whereas if we can require that all basic input ranges be full-on reference types (which could include structs and pointers to structs so long as they have the correct semantics), then we can actually rely on sane, consistent copy semantics for basic input ranges. It's just that they would be different from the copy semantics of forward ranges - which would also be true if we were somehow able to make it work to make basic input ranges noncopyable instead, though I really don't think that that will work. - Jonathan M Davis
Jan 22
On Sunday, 21 January 2024 at 05:00:31 UTC, Jonathan M Davis wrote:[...]I think one problem with just "next" as the api for an Input Range is that currently I don't know any safe way to return a reference to the item. If "next" returns a pointer we can't be sure the backing data structure (a growable array like std::vector for example) did not freed that memory yet but I guess this is the same for any range right now. Any way, it would be good to have an optimization for Nullable!T or a different type that when T is nullable (e.g. classes or pointers) it doesn't need a boolean flag but also loses the ability to store null inside it. That's one of the optimizations of "Option" in Rust, although in Rust references can't be null so it's ok to do this.
Jan 22
On Sunday, 21 January 2024 at 05:00:31 UTC, Jonathan M Davis wrote:wall-o-textCan this be summarized as ```d enum isRangeCopible(R)= is( (R r){ R rdup=r; r=rdup; } ); ```What I'd like to get out of this thread is feedback on how much sense this idea doesI think someone formal should collect all changes people want to make to ranges see my thread for .index and feature sets theres something about immutablity I would also suggest arrays should be more important then doubly linked lists
Jan 22