digitalmars.D - How is chunkBy supposed to behave on copy
- Dukc (32/32) Mar 18 2020 ```
- Paul Backus (12/18) Mar 18 2020 As I understand it, the result of using a range after copying it
- Steven Schveighoffer (15/49) Mar 18 2020 According to the range spec, save must be called if you wish to preserve...
- H. S. Teoh (26/35) Mar 18 2020 Because Andrei demanded it when I submitted the PR. The main reason is
- Steven Schveighoffer (17/31) Mar 18 2020 Ugh, I think this is way overkill in most cases, and depends heavily on
- H. S. Teoh (18/40) Mar 18 2020 [...]
- Steven Schveighoffer (9/17) Mar 18 2020 The problem is that you don't know what the bottleneck is, only the user...
- H. S. Teoh (9/23) Mar 18 2020 [...]
- H. S. Teoh (21/27) Mar 18 2020 The current range API design, which Andrei himself admitted was not the
- Dukc (3/7) Mar 19 2020 This means that even if the api does thing X right now, I should
- Steven Schveighoffer (4/11) Mar 19 2020 If you want to copy a forward range use save. Anything else is possible
- Dukc (4/7) Mar 19 2020 Ouch - my code has to be totally broken. And I don't think I'm
- Steven Schveighoffer (9/17) Mar 19 2020 You are not the only one. Almost all range functions are tested with and...
- Dukc (9/13) Mar 19 2020 Perhaps there is another way. Individual ranges can give
- H. S. Teoh (24/34) Mar 19 2020 The problem is, you *cannot* give any guarantees about copy behaviour,
- Dukc (5/16) Mar 19 2020 The documentation would say that the copy behaviour of the range
- H. S. Teoh (26/45) Mar 19 2020 A guarantee about copy behaviour potentially binds the Phobos
- H. S. Teoh (13/30) Mar 19 2020 Yeah, Andrei has said that in retrospect, .save was a design mistake, he
- Jonathan M Davis (14/32) Mar 20 2020 Ranges get copied all the time when passing them around. You're not goin...
- H. S. Teoh (14/33) Mar 20 2020 Yes, that's right. Actually, for by-value ranges the act of passing
- Ben Jones (4/20) Mar 20 2020 So range "copy" is really what a C++ person would call range
- H. S. Teoh (10/21) Mar 20 2020 [...]
- Jonathan M Davis (37/66) Mar 20 2020 You more or less have to view it that way, though no move is actually ta...
- H. S. Teoh (8/13) Mar 22 2020 [...]
- Paul Backus (6/18) Mar 22 2020 Technically, this is already legal--isInputRange accepts
- H. S. Teoh (9/27) Mar 23 2020 I don't think it's wise to do this with the existing codebase. A change
- Paul Backus (5/20) Mar 23 2020 Agreed. I was only talking about making pure input ranges
- Jonathan M Davis (9/19) Mar 23 2020 A range that can't be copied is useless. It won't even work with foreach...
- Steven Schveighoffer (10/28) Mar 23 2020 foreach(x; uncopyableRange.move)
- Pezbi Mahn (6/31) Mar 23 2020 That=E2=80=99s dope homie
``` void main() { import std; auto chunks = [14, 16, 18, 20, 22, 20, 18, 16] .chunkBy!((a, b) => a / 10 == b / 10); //[[14, 16, 18], [20, 22, 20], [18, 16]] chunks .map!(chunk => chunk.array) .array .writeln; //[] chunks .map!(chunk => chunk.array) .array .writeln; } ``` This is how chunkBy[1] currently behaves when copying. It essentially behaves like a reference range: it will only save it's state when `save` is explicitly called, not otherwise, even if the chunked source range is a forward range with value semantics. As most Phobos ranges preserve the value semantics of their source ranges, this is an odd exception, especially as the documentation says nothing about it. So the question is, how should it behave? I have looked at the implementation, it should be trivial to have it to use value semantics. On the oher hand, someone may rely on the present behaviour. [1] https://dlang.org/phobos/std_algorithm_iteration.html#.chunkBy
Mar 18 2020
On Wednesday, 18 March 2020 at 14:57:41 UTC, Dukc wrote:So the question is, how should it behave? I have looked at the implementation, it should be trivial to have it to use value semantics. On the oher hand, someone may rely on the present behaviour. [1] https://dlang.org/phobos/std_algorithm_iteration.html#.chunkByAs I understand it, the result of using a range after copying it is unspecified, and relying on any particular behavior in that case is a bug. So changing the behavior will only break code that is (in some sense) "already broken." Of course, given the above, the question of how it "should" behave is rendered moot. If you're copying a range, you must either use .save, or only use the copy and not the original from that point forward. In fact, if you want to catch as many errors as possible at compile time, the strictest approach would be to mark the range as non-copyable ( disable this(this)) and require the use of either .save or move to copy it.
Mar 18 2020
On 3/18/20 10:57 AM, Dukc wrote:``` void main() { import std; auto chunks = [14, 16, 18, 20, 22, 20, 18, 16] .chunkBy!((a, b) => a / 10 == b / 10); //[[14, 16, 18], [20, 22, 20], [18, 16]] chunks .map!(chunk => chunk.array) .array .writeln; //[] chunks .map!(chunk => chunk.array) .array .writeln; } ``` This is how chunkBy[1] currently behaves when copying. It essentially behaves like a reference range: it will only save it's state when `save` is explicitly called, not otherwise, even if the chunked source range is a forward range with value semantics. As most Phobos ranges preserve the value semantics of their source ranges, this is an odd exception, especially as the documentation says nothing about it. So the question is, how should it behave? I have looked at the implementation, it should be trivial to have it to use value semantics. On the oher hand, someone may rely on the present behaviour. [1] https://dlang.org/phobos/std_algorithm_iteration.html#.chunkByAccording to the range spec, save must be called if you wish to preserve state. In practice, this is seldom done, because most of the time, save just does `return this;`, so copying works just the same. The short answer is, use save. The long answer is, rangeBy uses reference counting internally to store the range data. I'm not sure why this is, as it seems possible to do without this mechanism. It seems there is some optimization surrounding pushing along the range without iterating it twice. But I don't know if that's a worthwhile optimization, especially if the allocation and reference counting are more expensive than the iteration. I would think you could get away with simply storing the range twice, and using .save to make sure you don't have problems. -Steve
Mar 18 2020
On Wed, Mar 18, 2020 at 12:18:04PM -0400, Steven Schveighoffer via Digitalmars-d wrote: [...]The long answer is, rangeBy uses reference counting internally to store the range data. I'm not sure why this is,Because Andrei demanded it when I submitted the PR. The main reason is to address the issue of what happens when you iterate the outer range while retaining a copy of an inner range. In my original implementation, advancing the inner range also advances the outer range, and vice versa, but only when it's an input range. When it's a forward range, it takes advantage of .save to decouple iteration on an inner range from iteration on the outer range. But that meant that iterating over the entire thing required traversing the original range twice: once in each inner range, and once on the outer range. Andrei didn't like this, so we hashed out several solutions and eventually settled on the reference counting one.as it seems possible to do without this mechanism. It seems there is some optimization surrounding pushing along the range without iterating it twice. But I don't know if that's a worthwhile optimization, especially if the allocation and reference counting are more expensive than the iteration.That's up for debate, but yes, the whole point of the reference counting thing is to avoid traversing a forward range twice when iterating over subranges. It really depends on what the original range does, IMO. If it's generating values on-the-fly with an expensive calculation, you could be saving quite a bit of work. But for simple ranges, yeah, reference-counting inner ranges is kinda overkill, probably with a performance hit.I would think you could get away with simply storing the range twice, and using .save to make sure you don't have problems.[...] This is what the original implementation did, but Andrei did not like it. T -- Старый друг лучше новых двух.
Mar 18 2020
On 3/18/20 12:37 PM, H. S. Teoh wrote:On Wed, Mar 18, 2020 at 12:18:04PM -0400, Steven Schveighoffer via Digitalmars-d wrote:Ugh, I think this is way overkill in most cases, and depends heavily on where the performance hit is. Not only that, but you are only seeing a benefit if you iterate a chunk completely (I think). For instance, an array has nearly zero cost to iterate elements, but the predicate for checking the chunking is probably way more expensive. The algorithm would be nicer if you simply iterated the array until you found the boundary, and then returned a slice as the element. Only one iteration through everything necessary (you pre-calculate the "next" range). In other cases, iterating the elements is going to be expensive, so you don't want to do that twice if possible. I think a good solution might be to provide different versions of the function or an enum designating what mechanism to prefer (e.g. IterationPolicy.MinimizePopfront or IterationPolicy.MinimizePredicateEval). And of course, the .save behavior sucks, as always. -Steveas it seems possible to do without this mechanism. It seems there is some optimization surrounding pushing along the range without iterating it twice. But I don't know if that's a worthwhile optimization, especially if the allocation and reference counting are more expensive than the iteration.That's up for debate, but yes, the whole point of the reference counting thing is to avoid traversing a forward range twice when iterating over subranges. It really depends on what the original range does, IMO. If it's generating values on-the-fly with an expensive calculation, you could be saving quite a bit of work. But for simple ranges, yeah, reference-counting inner ranges is kinda overkill, probably with a performance hit.
Mar 18 2020
On Wed, Mar 18, 2020 at 01:06:02PM -0400, Steven Schveighoffer via Digitalmars-d wrote:On 3/18/20 12:37 PM, H. S. Teoh wrote:[...][...]That's up for debate, but yes, the whole point of the reference counting thing is to avoid traversing a forward range twice when iterating over subranges.Ugh, I think this is way overkill in most cases, and depends heavily on where the performance hit is. Not only that, but you are only seeing a benefit if you iterate a chunk completely (I think).Yeah probably.For instance, an array has nearly zero cost to iterate elements, but the predicate for checking the chunking is probably way more expensive. The algorithm would be nicer if you simply iterated the array until you found the boundary, and then returned a slice as the element. Only one iteration through everything necessary (you pre-calculate the "next" range).We *could* just detect hasSlicing!R and switch to an alternative implementation that doesn't need to jump through all of those refcounting hoops.In other cases, iterating the elements is going to be expensive, so you don't want to do that twice if possible. I think a good solution might be to provide different versions of the function or an enum designating what mechanism to prefer (e.g. IterationPolicy.MinimizePopfront or IterationPolicy.MinimizePredicateEval).[...] It sounds like a good idea, but these days I'm wary of proliferating these implementation-detail-based parameters. They're a pain to write, a pain to use, and a pain to maintain, and once you have the option you're committed to supporting it even if one day you invent a better algorithm that completely changes the implementation. I much rather detect hasSlicing on the incoming range, and switching to a better implementation. T -- INTEL = Only half of "intelligence".
Mar 18 2020
On 3/18/20 2:30 PM, H. S. Teoh wrote:It sounds like a good idea, but these days I'm wary of proliferating these implementation-detail-based parameters. They're a pain to write, a pain to use, and a pain to maintain, and once you have the option you're committed to supporting it even if one day you invent a better algorithm that completely changes the implementation. I much rather detect hasSlicing on the incoming range, and switching to a better implementation.The problem is that you don't know what the bottleneck is, only the user does. auto r = arr.map!(elem => someHorribleCalculation(elem)); static assert(hasSlicing(typeof(r))); // 2x horrible calculations The thing I would do is provide the "implementation detail" mechanisms, but default to something reasonable. It could even change based on whether slicing is available (the default, that is). -Steve
Mar 18 2020
On Wed, Mar 18, 2020 at 02:55:35PM -0400, Steven Schveighoffer via Digitalmars-d wrote:On 3/18/20 2:30 PM, H. S. Teoh wrote:[...][...] Fair enough. Still, I think we should totally detect hasSlicing, or at least built-in arrays, and default that to the slicing implementation instead. T -- Bomb technician: If I'm running, try to keep up.I much rather detect hasSlicing on the incoming range, and switching to a better implementation.The problem is that you don't know what the bottleneck is, only the user does. auto r = arr.map!(elem => someHorribleCalculation(elem)); static assert(hasSlicing(typeof(r))); // 2x horrible calculations The thing I would do is provide the "implementation detail" mechanisms, but default to something reasonable. It could even change based on whether slicing is available (the default, that is).
Mar 18 2020
On Wed, Mar 18, 2020 at 02:57:41PM +0000, Dukc via Digitalmars-d wrote: [...]This is how chunkBy[1] currently behaves when copying. It essentially behaves like a reference range: it will only save it's state when `save` is explicitly called, not otherwise, even if the chunked source range is a forward range with value semantics.The current range API design, which Andrei himself admitted was not the best design in retrospect, does not specify behaviour on copying a range. IOW, it's the application-level equivalent of undefined behaviour. Generally, this is not a problem because usually you're working with your own range types which you already know the semantics of. But in generic code, assumptions of this sort are a no-no, precisely because of breakages of this sort when different ranges behave differently on copy. tl;dr: never copy a range directly, always use .save. Never assume a range remains unchanged after iterating a copy; always assume the worst and use .save whenever you wish the original range to remain unchanged afterwards.As most Phobos ranges preserve the value semantics of their source ranges,[...] This is a dangerous assumption. My personal advice is, if you expect a range to retain its contents after iteration, call .save explicitly. Don't assume anything not explicitly required by the range API. T -- Verbing weirds language. -- Calvin (& Hobbes)
Mar 18 2020
On Wednesday, 18 March 2020 at 16:29:53 UTC, H. S. Teoh wrote:This is a dangerous assumption. My personal advice is, if you expect a range to retain its contents after iteration, call .save explicitly. Don't assume anything not explicitly required by the range API.This means that even if the api does thing X right now, I should not assume it won't change unless it's documented, right?
Mar 19 2020
On 3/19/20 10:00 AM, Dukc wrote:On Wednesday, 18 March 2020 at 16:29:53 UTC, H. S. Teoh wrote:If you want to copy a forward range use save. Anything else is possible to break in an implementation detail later. -SteveThis is a dangerous assumption. My personal advice is, if you expect a range to retain its contents after iteration, call .save explicitly. Don't assume anything not explicitly required by the range API.This means that even if the api does thing X right now, I should not assume it won't change unless it's documented, right?
Mar 19 2020
On Thursday, 19 March 2020 at 14:11:20 UTC, Steven Schveighoffer wrote:If you want to copy a forward range use save. Anything else is possible to break in an implementation detail later. -SteveOuch - my code has to be totally broken. And I don't think I'm the only one.
Mar 19 2020
On 3/19/20 10:28 AM, Dukc wrote:On Thursday, 19 March 2020 at 14:11:20 UTC, Steven Schveighoffer wrote:You are not the only one. Almost all range functions are tested with and use arrays, which do exactly the same for .save or copying. The reason why `save` is such a bad solution is because there is generally no penalty for ignoring it (until there is). The only way to "fix" this IMO is to migrate to a system where standard (non-forward) input ranges are not copyable, and deprecate save. It likely will never happen. -SteveIf you want to copy a forward range use save. Anything else is possible to break in an implementation detail later.Ouch - my code has to be totally broken. And I don't think I'm the only one.
Mar 19 2020
On Thursday, 19 March 2020 at 14:46:39 UTC, Steven Schveighoffer wrote:The only way to "fix" this IMO is to migrate to a system where standard (non-forward) input ranges are not copyable, and deprecate save. It likely will never happen. -StevePerhaps there is another way. Individual ranges can give guarantees about their own copy behaviour if they want. We could document the current copy behaviour of existing Phobos ranges and say that relying on it is thereafter fine. But then the question with `chunkBy` is, should it take the opportunity to start to behave like most other ranges in this regard?
Mar 19 2020
On Thu, Mar 19, 2020 at 03:35:28PM +0000, Dukc via Digitalmars-d wrote:On Thursday, 19 March 2020 at 14:46:39 UTC, Steven Schveighoffer wrote:[...]The only way to "fix" this IMO is to migrate to a system where standard (non-forward) input ranges are not copyable, and deprecate save. It likely will never happen.Perhaps there is another way. Individual ranges can give guarantees about their own copy behaviour if they want. We could document the current copy behaviour of existing Phobos ranges and say that relying on it is thereafter fine.The problem is, you *cannot* give any guarantees about copy behaviour, because it depends on the behaviour of the incoming range. For example, if you pass the output of .byChunk to another range algorithm, that second algorithm cannot guarantee copy behaviour anymore. In fact, all you have to do is to wrap a forward range in an InputRangeObject (let's say you need to alternate between two different range types (but with compatible elements) at runtime, then you'll need to do this), and now you have a forward range with by-reference semantics that requires the use of .save in order to retain its current position. This is (likely) part of the reason why the range API does not specify the behaviour of a range once it's iterated over without using .save: it depends on implementation details.But then the question with `chunkBy` is, should it take the opportunity to start to behave like most other ranges in this regard?I don't see why this should be a compelling reason to change chunkBy -- since copy behaviour is not specified by the range API (for IMO good reasons -- it's implementation-specific). But I do agree that the implementation of chunkBy could be improved for other reasons, like not using the expensive ref-counting implementation when the underlying range is, say, a native array that you could just slice over. T -- MAS = Mana Ada Sistem?
Mar 19 2020
On Thursday, 19 March 2020 at 16:01:39 UTC, H. S. Teoh wrote:The problem is, you *cannot* give any guarantees about copy behaviour, because it depends on the behaviour of the incoming range. For example, if you pass the output of .byChunk to another range algorithm, that second algorithm cannot guarantee copy behaviour anymore. In fact, all you have to do is to wrap a forward range in an InputRangeObject (let's say you need to alternate between two different range types (but with compatible elements) at runtime, then you'll need to do this), and now you have a forward range with by-reference semantics that requires the use of .save in order to retain its current position.The documentation would say that the copy behaviour of the range is the same as the source range has. A guarantee to depend on the source range is still a guarantee, and definitely better than the present situation.
Mar 19 2020
On Thu, Mar 19, 2020 at 05:15:51PM +0000, Dukc via Digitalmars-d wrote:On Thursday, 19 March 2020 at 16:01:39 UTC, H. S. Teoh wrote:A guarantee about copy behaviour potentially binds the Phobos maintainers to the specifics of the current implementation of the algorithm. I'm not sure that's what we want, since it may limit future improvements. The thing about the range API is that it's like a contract between user code and Phobos; the way you use it should be according to the contract, anything not stated by the contract should not be relied upon even if currently it happens to work. That's the principle of encapsulation. Behaviours that arise from the specifics of how something is currently implemented are implementation details that shouldn't leak into the calling code, including assumptions about copy behaviour. Arguably, relying on a specific copy behaviour is an instance of leaky abstraction, since outside code shouldn't know nor depend upon Phobos internal implementation details. Part of the power of encapsulation is to leave certain non-essential things unspecified, so that implementors can have greater freedom in how they implement the API. Something that isn't directly related to the particular function (e.g., the primary function of byChunk) shouldn't be a part of the API, IMO, it should be left unspecified with the understanding that relying on the behaviour of something unspecified runs the risk of future breakage. The specific behaviour ought to be inside the "black box" of encapsulation, not relied upon by user code, much less specified in library docs. T -- "Maybe" is a strange word. When mom or dad says it it means "yes", but when my big brothers say it it means "no"! -- PJ jr.The problem is, you *cannot* give any guarantees about copy behaviour, because it depends on the behaviour of the incoming range. For example, if you pass the output of .byChunk to another range algorithm, that second algorithm cannot guarantee copy behaviour anymore. In fact, all you have to do is to wrap a forward range in an InputRangeObject (let's say you need to alternate between two different range types (but with compatible elements) at runtime, then you'll need to do this), and now you have a forward range with by-reference semantics that requires the use of .save in order to retain its current position.The documentation would say that the copy behaviour of the range is the same as the source range has. A guarantee to depend on the source range is still a guarantee, and definitely better than the present situation.
Mar 19 2020
On Thu, Mar 19, 2020 at 10:46:39AM -0400, Steven Schveighoffer via Digitalmars-d wrote:On 3/19/20 10:28 AM, Dukc wrote:Yeah, Andrei has said that in retrospect, .save was a design mistake, he should have made the input range vs. forward range distinction keyed on copyability or ref/non-ref instead. But to be fair, back in the day when the range API was first designed, D didn't have the necessary language facilities to be able to handle a better solution, unlike now when we have disable, copy ctors, and a bunch of other language features that make such a solution more feasible.On Thursday, 19 March 2020 at 14:11:20 UTC, Steven Schveighoffer wrote:You are not the only one. Almost all range functions are tested with and use arrays, which do exactly the same for .save or copying. The reason why `save` is such a bad solution is because there is generally no penalty for ignoring it (until there is).If you want to copy a forward range use save. Anything else is possible to break in an implementation detail later.Ouch - my code has to be totally broken. And I don't think I'm the only one.The only way to "fix" this IMO is to migrate to a system where standard (non-forward) input ranges are not copyable, and deprecate save. It likely will never happen.[...] It will happen if std.v2 happens... ;-) T -- "How are you doing?" "Doing what?"
Mar 19 2020
On Wednesday, March 18, 2020 10:29:53 AM MDT H. S. Teoh via Digitalmars-d wrote:On Wed, Mar 18, 2020 at 02:57:41PM +0000, Dukc via Digitalmars-d wrote: [...]Ranges get copied all the time when passing them around. You're not going to avoid it (even using them with foreach copies them). The key thing is not that a range shouldn't be copied without using save but that if a range is ever copied, then the original shouldn't be used again, and from that point on, only the copy should be used. So, if you pass a range to foreach, a function, or do anything else which would copy it, then don't use the original again, and if you want to use it again, then instead of copying it directly, call save to get an independent copy. Generic code should _never_ rely on the copying semantics of a range, and even in non-generic code, depending on the semantics of copying a range whose implementation you don't fully control is just begging for bugs. - Jonathan M DavisThis is how chunkBy[1] currently behaves when copying. It essentially behaves like a reference range: it will only save it's state when `save` is explicitly called, not otherwise, even if the chunked source range is a forward range with value semantics.The current range API design, which Andrei himself admitted was not the best design in retrospect, does not specify behaviour on copying a range. IOW, it's the application-level equivalent of undefined behaviour. Generally, this is not a problem because usually you're working with your own range types which you already know the semantics of. But in generic code, assumptions of this sort are a no-no, precisely because of breakages of this sort when different ranges behave differently on copy. tl;dr: never copy a range directly, always use .save. Never assume a range remains unchanged after iterating a copy; always assume the worst and use .save whenever you wish the original range to remain unchanged afterwards.
Mar 20 2020
On Fri, Mar 20, 2020 at 03:59:57PM -0600, Jonathan M Davis via Digitalmars-d wrote:On Wednesday, March 18, 2020 10:29:53 AM MDT H. S. Teoh via Digitalmars-d wrote:[...]Yes, that's right. Actually, for by-value ranges the act of passing them as an argument to a range function in the first place already copies them. The catch is really that once this happens, the caller or whoever retains the original copy should no longer assume that the range remains in the same place as before. For some ranges this is true, but for other ranges this assumption is invalid, and will lead to incorrect results.tl;dr: never copy a range directly, always use .save. Never assume a range remains unchanged after iterating a copy; always assume the worst and use .save whenever you wish the original range to remain unchanged afterwards.Ranges get copied all the time when passing them around. You're not going to avoid it (even using them with foreach copies them). The key thing is not that a range shouldn't be copied without using save but that if a range is ever copied, then the original shouldn't be used again, and from that point on, only the copy should be used.So, if you pass a range to foreach, a function, or do anything else which would copy it, then don't use the original again, and if you want to use it again, then instead of copying it directly, call save to get an independent copy. Generic code should _never_ rely on the copying semantics of a range, and even in non-generic code, depending on the semantics of copying a range whose implementation you don't fully control is just begging for bugs.[...] +1. T -- Knowledge is that area of ignorance that we arrange and classify. -- Ambrose Bierce
Mar 20 2020
On Friday, 20 March 2020 at 22:11:49 UTC, H. S. Teoh wrote:On Fri, Mar 20, 2020 at 03:59:57PM -0600, Jonathan M Davis via Digitalmars-d wrote:So range "copy" is really what a C++ person would call range "move" ? It might be a copy, or it might invalidate the original, depending on the type?[...][...][...]Yes, that's right. Actually, for by-value ranges the act of passing them as an argument to a range function in the first place already copies them. The catch is really that once this happens, the caller or whoever retains the original copy should no longer assume that the range remains in the same place as before. For some ranges this is true, but for other ranges this assumption is invalid, and will lead to incorrect results.[...][...] +1. T
Mar 20 2020
On Fri, Mar 20, 2020 at 10:15:18PM +0000, Ben Jones via Digitalmars-d wrote:On Friday, 20 March 2020 at 22:11:49 UTC, H. S. Teoh wrote:[...][...]Yes, that's right. Actually, for by-value ranges the act of passing them as an argument to a range function in the first place already copies them. The catch is really that once this happens, the caller or whoever retains the original copy should no longer assume that the range remains in the same place as before. For some ranges this is true, but for other ranges this assumption is invalid, and will lead to incorrect results.So range "copy" is really what a C++ person would call range "move" ? It might be a copy, or it might invalidate the original, depending on the type?The way it's currently implemented, yes, pretty much. Unless you're in control of the exact range type, you should always assume the worst and not rely on the state of the original range after passing it to a range function. If you need to retain the original state, use .save. T -- You are only young once, but you can stay immature indefinitely. -- azephrahel
Mar 20 2020
On Friday, March 20, 2020 4:15:18 PM MDT Ben Jones via Digitalmars-d wrote:On Friday, 20 March 2020 at 22:11:49 UTC, H. S. Teoh wrote:You more or less have to view it that way, though no move is actually taking place. The problem is that the semantics of what happens when a range is copied are completely implementation-dependent, making it so that you cannot depend on theem and thus basically have to consider the range to be in an invalid state once it's been copied even if it's not technically in an invalid state. If a range has by-value copying semantics, then when you copy it, you get the same as save. If it's a full-on reference type, then mutating the copy mutates the original. And worst of all, if you have a pseudo-reference type, then you can end up in a state where mutating the copy mutates only part of the original, effectively putting it in an invalid state. But even if you somehow never had to worry about pseudo-reference types, the mere fact that some ranges have by-value copying semantics whereas others are full-on reference types makes it so that you can't rely on what happens to a range once it's been copied. And if code is not being at minimum tested with both value-type ranges and reference-type ranges, the odds are _very_ high that it won't handle ranges that aren't value types correctly. Really, forward ranges should all have by-value copying (thus requiring that classes be wrapped in structs if they're going to be forward ranges), and save should be abolished, but that requires a major redesign and likely would only happen if we did some sort of Phobos v2 (as has occasionally been discussed). And exactly what should happen with basic input ranges is not clear, because while ideally, you'd just require that they have full-on reference semantics, that tends to mean that you're forcing them to be allocated on the heap, which isn't really the sort of thing that we want to force if we can avoid it. So, while it's clear what we should do with some aspects of the range API if we have the opportunity to redesign it, there are still issues that would have to be sorted out. Regardless, as things stand, you can't rely on the semantics of copying a range and basically have to consider that a range has become invalid once it's been copied. Unfortunately, it's not something that seems to be well understood and is often handled incorrectly in code. I've been pointing it out for years (including in my talk at dconf 2015), but we haven't done a good enough job in general messaging how the range API works, and this is one of the details that seems to be very easily missed. - Jonathan M DavisOn Fri, Mar 20, 2020 at 03:59:57PM -0600, Jonathan M Davis via Digitalmars-d wrote:So range "copy" is really what a C++ person would call range "move" ? It might be a copy, or it might invalidate the original, depending on the type?[...][...][...]Yes, that's right. Actually, for by-value ranges the act of passing them as an argument to a range function in the first place already copies them. The catch is really that once this happens, the caller or whoever retains the original copy should no longer assume that the range remains in the same place as before. For some ranges this is true, but for other ranges this assumption is invalid, and will lead to incorrect results.[...][...] +1. T
Mar 20 2020
On Fri, Mar 20, 2020 at 05:50:26PM -0600, Jonathan M Davis via Digitalmars-d wrote: [...]And exactly what should happen with basic input ranges is not clear, because while ideally, you'd just require that they have full-on reference semantics, that tends to mean that you're forcing them to be allocated on the heap, which isn't really the sort of thing that we want to force if we can avoid it.[...] You could just have input ranges be a struct with the copy ctor disable'd, perhaps? T -- Those who've learned LaTeX swear by it. Those who are learning LaTeX swear at it. -- Pete Bleackley
Mar 22 2020
On Sunday, 22 March 2020 at 15:43:45 UTC, H. S. Teoh wrote:On Fri, Mar 20, 2020 at 05:50:26PM -0600, Jonathan M Davis via Digitalmars-d wrote: [...]Technically, this is already legal--isInputRange accepts non-copyable structs. Of course, if you accept non-copyable ranges as legitimate, someone has to go and fix all of std.algorithm and std.range to handle them correctly, which is a non-trivial amount of work.And exactly what should happen with basic input ranges is not clear, because while ideally, you'd just require that they have full-on reference semantics, that tends to mean that you're forcing them to be allocated on the heap, which isn't really the sort of thing that we want to force if we can avoid it.[...] You could just have input ranges be a struct with the copy ctor disable'd, perhaps? T
Mar 22 2020
On Sun, Mar 22, 2020 at 04:23:27PM +0000, Paul Backus via Digitalmars-d wrote:On Sunday, 22 March 2020 at 15:43:45 UTC, H. S. Teoh wrote:[...]On Fri, Mar 20, 2020 at 05:50:26PM -0600, Jonathan M Davis via Digitalmars-d wrote: [...]And exactly what should happen with basic input ranges is not clear, because while ideally, you'd just require that they have full-on reference semantics, that tends to mean that you're forcing them to be allocated on the heap, which isn't really the sort of thing that we want to force if we can avoid it.[...] You could just have input ranges be a struct with the copy ctor disable'd, perhaps?Technically, this is already legal--isInputRange accepts non-copyable structs. Of course, if you accept non-copyable ranges as legitimate, someone has to go and fix all of std.algorithm and std.range to handle them correctly, which is a non-trivial amount of work.I don't think it's wise to do this with the existing codebase. A change as drastic as removing .save will likely require rewriting not just large chunks of Phobos, but existing user code as well. This is probably best left to Phobos v2, if that ever happens. T -- An elephant: A mouse built to government specifications. -- Robert Heinlein
Mar 23 2020
On Monday, 23 March 2020 at 17:37:16 UTC, H. S. Teoh wrote:On Sun, Mar 22, 2020 at 04:23:27PM +0000, Paul Backus via Digitalmars-d wrote:Agreed. I was only talking about making pure input ranges non-copyable (i.e., requiring all code that copies a range to use either .save or move), which is not (strictly speaking) a breaking change to the existing range interface.Technically, this is already legal--isInputRange accepts non-copyable structs. Of course, if you accept non-copyable ranges as legitimate, someone has to go and fix all of std.algorithm and std.range to handle them correctly, which is a non-trivial amount of work.I don't think it's wise to do this with the existing codebase. A change as drastic as removing .save will likely require rewriting not just large chunks of Phobos, but existing user code as well. This is probably best left to Phobos v2, if that ever happens. T
Mar 23 2020
On Sunday, March 22, 2020 9:43:45 AM MDT H. S. Teoh via Digitalmars-d wrote:On Fri, Mar 20, 2020 at 05:50:26PM -0600, Jonathan M Davis via Digitalmars-d wrote: [...]A range that can't be copied is useless. It won't even work with foreach, because anything you iterate over with foreach is copied when it's passed to foreach. And of course, idiomatic range code passes ranges all over the place, resulting in a lot of copying. And to wrap a range in another range (which is required for most range-based algorithms), you have to copy it. IMHO, it would make far more sense to just use opApply than to try to make a range non-copyable. - Jonathan M DavisAnd exactly what should happen with basic input ranges is not clear, because while ideally, you'd just require that they have full-on reference semantics, that tends to mean that you're forcing them to be allocated on the heap, which isn't really the sort of thing that we want to force if we can avoid it.[...] You could just have input ranges be a struct with the copy ctor disable'd, perhaps?
Mar 23 2020
On 3/23/20 2:15 PM, Jonathan M Davis wrote:On Sunday, March 22, 2020 9:43:45 AM MDT H. S. Teoh via Digitalmars-d wrote:foreach(x; uncopyableRange.move) foreach(x; ucr.refCounted) foreach(x; returnsRvalue()) Not only that, but once you define "non-copyable" as a thing for ranges, then you can handle this without having to always tack-on ".move" or whatnot. It would be possible. It wouldn't be as "pretty". But most ranges are at least forward ranges, so it wouldn't be a terrible problem. -SteveOn Fri, Mar 20, 2020 at 05:50:26PM -0600, Jonathan M Davis via Digitalmars-d wrote: [...]A range that can't be copied is useless. It won't even work with foreach, because anything you iterate over with foreach is copied when it's passed to foreach.And exactly what should happen with basic input ranges is not clear, because while ideally, you'd just require that they have full-on reference semantics, that tends to mean that you're forcing them to be allocated on the heap, which isn't really the sort of thing that we want to force if we can avoid it.[...] You could just have input ranges be a struct with the copy ctor disable'd, perhaps?
Mar 23 2020
That=E2=80=99s dope homie On Mon, Mar 23, 2020 at 2:15 PM Jonathan M Davis via Digitalmars-d < digitalmars-d puremagic.com> wrote:On Sunday, March 22, 2020 9:43:45 AM MDT H. S. Teoh via Digitalmars-d wrote:eOn Fri, Mar 20, 2020 at 05:50:26PM -0600, Jonathan M Davis via Digitalmars-d wrote: [...]And exactly what should happen with basic input ranges is not clear, because while ideally, you'd just require that they have full-on reference semantics, that tends to mean that you're forcing them to b=eA range that can't be copied is useless. It won't even work with foreach, because anything you iterate over with foreach is copied when it's passed to foreach. And of course, idiomatic range code passes ranges all over the place, resulting in a lot of copying. And to wrap a range in another rang=allocated on the heap, which isn't really the sort of thing that we want to force if we can avoid it.[...] You could just have input ranges be a struct with the copy ctor disable'd, perhaps?(which is required for most range-based algorithms), you have to copy it. IMHO, it would make far more sense to just use opApply than to try to mak=ea range non-copyable. - Jonathan M Davis
Mar 23 2020