digitalmars.D - More range woes: std.array.save is invalid
- H. S. Teoh (48/48) Dec 19 2012 http://d.puremagic.com/issues/show_bug.cgi?id=8061
- monarch_dodra (23/36) Dec 19 2012 I'd say that's the correct behavior: A range is just a way to
- monarch_dodra (4/12) Dec 19 2012 Most probably, algorithm is still not quite great about using
- Andrei Alexandrescu (4/18) Dec 19 2012 I think you're overthinking this. Save saves the position in the current...
- H. S. Teoh (10/33) Dec 19 2012 [...]
- Jonathan M Davis (4/9) Dec 20 2012 Then they have bugs in their implementation which need to be fixed. Ther...
- H. S. Teoh (9/18) Dec 20 2012 [...]
- monarch_dodra (7/32) Dec 20 2012 Why is that?
- H. S. Teoh (22/47) Dec 20 2012 Because the definition of .save only requires that the state of the
- monarch_dodra (16/58) Dec 20 2012 Yeah..., I think I get it now. Thanks for insisting.
- H. S. Teoh (10/57) Dec 20 2012 You're right, if joiner only iterates through .save'd copies of the
- Jonathan M Davis (11/14) Dec 20 2012 That just means that that anything involving ranges of ranges needs to b...
- Andrei Alexandrescu (4/17) Dec 20 2012 I think this whole issue is akin to the "transitory" discussion of a
- H. S. Teoh (20/39) Dec 20 2012 Yes, I just submitted a pull request for joiner (the no-separator
- Andrei Alexandrescu (3/7) Dec 20 2012 Absolutely, that would be awesome.
http://d.puremagic.com/issues/show_bug.cgi?id=8061 After being sidetracked by the stack-allocated buffer fiasco, I finally found the root problem in the above bug. The problem is std.array.save: property T[] save(T)(T[] a) safe pure nothrow { return a; } This implementation is only correct for certain values of T. It is wrong when T is a forward range, for example, because if you save a T[] and iterate over its elements, it will consume the T's in the array, so that the original range has elements that are now consumed. The correct implementation when T is a forward range is to return a copy of the array where the elements are obtained by calling T.save. The problem is, now .save is a potentially very expensive operation. At a deeper level, it can be argued that actually, std.array.save is not wrong, because you asked it to save the array, not the elements within the array. So in that sense, there's nothing wrong with the implementation. What is wrong here is that many algorithms in std.range and std.algorithm wrongly assume that just because the outer range (that is, T[]) is a forward range, *and* the inner ranges (that is, T) are also forward ranges, that therefore T[].save will save the current position of the *range of ranges*. Actually, the only thing that is guaranteed to be saved is the current position of the *containing range*, it does NOT save the position of the subranges at all (at least, not according to the current definition of .save, which is NOT transitive). This means that many wrapper ranges are broken, because they export a .save method that does not work correctly under certain circumstances. The correct way to wrap a range of ranges as a forward range, is if the wrapping range's .save method *copies* the outer range and populates it with .save'd subranges. Otherwise, it must be treated only as an input range. However, since there's no general way to copy a range, all of these wrapper ranges cannot be more than *input* ranges. That is, unless we redefine .save to be transitive. But then, it becomes just a variant of a general .dup function for ranges. And so we come back to a fundamental issue in D, that there's no generic way to duplicate a value. This is causing us a lot of grief in generic code (the whole transient .front issue is another instance of this problem, since transience wouldn't be a problem if there was a generic .dup method), and, as I see it, will continue to cause us more grief in the future, because there's no way you can assume that assigning anything to anything else will not magically become invalid when you subsequently perform a seemingly-unrelated operation. That makes writing generic code a veritable minefield. T -- Why can't you just be a nonconformist like everyone else? -- YHL
Dec 19 2012
On Thursday, 20 December 2012 at 07:05:08 UTC, H. S. Teoh wrote:The problem is std.array.save: property T[] save(T)(T[] a) safe pure nothrow { return a; } This implementation is only correct for certain values of T. It is wrong when T is a forward range, for example, because if you save a T[] and iterate over its elements, it will consume the T's in the array, so that the original range has elements that are now consumed. TI'd say that's the correct behavior: A range is just a way to iterate over contents, and "save" makes no promise (and *should* make no promise) to make copies of the range's elements. If you modify one of the elements in your range, then you'd better hope that change is seen by all other ranges iterating over the same data. What I'm basically saying is: //---- auto a = [1, 2, 3] auto b = a.save; b[0] = 5; assert(a[0] == 5); //Correct behavior //---- The fact that you have ints or forwards ranges as elements is irrelevant. --------------------------- BTW: Just the same way, "dup" will not transitively duplicate the contents of an array, or of std.container.array. --------------------------- To get your described behavior, then you should wrap your array into your own range, and make the array's elements part of your iteration definition. This way you'll surprise no one.
Dec 19 2012
On Thursday, 20 December 2012 at 07:05:08 UTC, H. S. Teoh wrote:What is wrong here is that many algorithms in std.range and std.algorithm wrongly assume that just because the outer range (that is, T[]) is a forward range, *and* the inner ranges (that is, T) are also forward ranges, that therefore T[].save will save the current position of the *range of ranges*.Most probably, algorithm is still not quite great about using save. This is especially true in regards to RoR, but that's a special case, and we are fixing those one by one.
Dec 19 2012
On 12/20/12 2:03 AM, H. S. Teoh wrote:http://d.puremagic.com/issues/show_bug.cgi?id=8061 After being sidetracked by the stack-allocated buffer fiasco, I finally found the root problem in the above bug. The problem is std.array.save: property T[] save(T)(T[] a) safe pure nothrow { return a; } This implementation is only correct for certain values of T. It is wrong when T is a forward range, for example, because if you save a T[] and iterate over its elements, it will consume the T's in the array, so that the original range has elements that are now consumed. The correct implementation when T is a forward range is to return a copy of the array where the elements are obtained by calling T.save.I think you're overthinking this. Save saves the position in the current range without any promise about its contents. Andrei
Dec 19 2012
On Thu, Dec 20, 2012 at 02:36:18AM -0500, Andrei Alexandrescu wrote:On 12/20/12 2:03 AM, H. S. Teoh wrote:[...] Yes, which means many current algorithms that take a range of ranges and returns a wrapper range are wrong, because they assume that the wrapper range can be a forward range when both the container and the subranges are forward ranges, but this is not a sufficient condition in the general case. T -- Doubt is a self-fulfilling prophecy.http://d.puremagic.com/issues/show_bug.cgi?id=8061 After being sidetracked by the stack-allocated buffer fiasco, I finally found the root problem in the above bug. The problem is std.array.save: property T[] save(T)(T[] a) safe pure nothrow { return a; } This implementation is only correct for certain values of T. It is wrong when T is a forward range, for example, because if you save a T[] and iterate over its elements, it will consume the T's in the array, so that the original range has elements that are now consumed. The correct implementation when T is a forward range is to return a copy of the array where the elements are obtained by calling T.save.I think you're overthinking this. Save saves the position in the current range without any promise about its contents.
Dec 19 2012
On Wednesday, December 19, 2012 23:44:12 H. S. Teoh wrote:Yes, which means many current algorithms that take a range of ranges and returns a wrapper range are wrong, because they assume that the wrapper range can be a forward range when both the container and the subranges are forward ranges, but this is not a sufficient condition in the general case.Then they have bugs in their implementation which need to be fixed. There's nothing wrong with std.array.save. It's doing the right thing. - Jonathan M Davis
Dec 20 2012
On Thu, Dec 20, 2012 at 12:08:24AM -0800, Jonathan M Davis wrote:On Wednesday, December 19, 2012 23:44:12 H. S. Teoh wrote:[...] OK, so the subject line is misleading. But the bigger issue is that wrappers of forward ranges of forward ranges can only be input ranges, in spite of the fact that, in theory, you can save each component of the given range of ranges. T -- Nobody is perfect. I am Nobody. -- pepoluan, GKC forumYes, which means many current algorithms that take a range of ranges and returns a wrapper range are wrong, because they assume that the wrapper range can be a forward range when both the container and the subranges are forward ranges, but this is not a sufficient condition in the general case.Then they have bugs in their implementation which need to be fixed. There's nothing wrong with std.array.save. It's doing the right thing.
Dec 20 2012
On Thursday, 20 December 2012 at 15:25:27 UTC, H. S. Teoh wrote:On Thu, Dec 20, 2012 at 12:08:24AM -0800, Jonathan M Davis wrote:Why is that? If the wrapper is *designed* as being a RoR, then the "elements" of the range are defined as being part of the iteration scheme. Once you have defined it that way, you can have the wrapper save the top Range, as well as each sub-range individually, and then the wrapper is Forward... No?On Wednesday, December 19, 2012 23:44:12 H. S. Teoh wrote:[...] OK, so the subject line is misleading. But the bigger issue is that wrappers of forward ranges of forward ranges can only be input ranges, in spite of the fact that, in theory, you can save each component of the given range of ranges. TYes, which means many current algorithms that take a range of ranges and returns a wrapper range are wrong, because they assume that the wrapper range can be a forward range when both the container and the subranges are forward ranges, but this is not a sufficient condition in the general case.Then they have bugs in their implementation which need to be fixed. There's nothing wrong with std.array.save. It's doing the right thing.
Dec 20 2012
On Thu, Dec 20, 2012 at 04:47:10PM +0100, monarch_dodra wrote:On Thursday, 20 December 2012 at 15:25:27 UTC, H. S. Teoh wrote:[...]On Thu, Dec 20, 2012 at 12:08:24AM -0800, Jonathan M Davis wrote:On Wednesday, December 19, 2012 23:44:12 H. S. Teoh wrote:[...] OK, so the subject line is misleading. But the bigger issue is that wrappers of forward ranges of forward ranges can only be input ranges, in spite of the fact that, in theory, you can save each component of the given range of ranges.Yes, which means many current algorithms that take a range of ranges and returns a wrapper range are wrong, because they assume that the wrapper range can be a forward range when both the container and the subranges are forward ranges, but this is not a sufficient condition in the general case.Then they have bugs in their implementation which need to be fixed. There's nothing wrong with std.array.save. It's doing the right thing.Why is that?Because the definition of .save only requires that the state of the outer range is saved. Nothing is guaranteed about the state of the inner ranges.If the wrapper is *designed* as being a RoR, then the "elements" of the range are defined as being part of the iteration scheme. Once you have defined it that way, you can have the wrapper save the top Range, as well as each sub-range individually, and then the wrapper is Forward... No?In *theory*, yes, you can do this. But currently, it's not possible to do this generically, because .save has to return the same type as the outer range. The outer range's .save method does NOT call .save on the inner ranges (e.g., an array of forward ranges: std.array.save simply returns a shallow copy of the original array). So the inner ranges will be invalidated by iterating either the .save'd range, or the original range. You have to return an outer range in which all of its elements have been replaced with .save'd copies. But there is no method in the outer range that does this, so there is no way to do this in generic code. *If* .save is allowed to return a different type, then yes, you can return a wrapper of the outer range that lets you iterate through .save'd copies of the inner ranges. But AFAIK this isn't allowed by the current definition of .save. T -- Why waste time learning, when ignorance is instantaneous? -- Hobbes, from Calvin & Hobbes
Dec 20 2012
On Thursday, 20 December 2012 at 16:04:43 UTC, H. S. Teoh wrote:On Thu, Dec 20, 2012 at 04:47:10PM +0100, monarch_dodra wrote: [...]Yeah..., I think I get it now. Thanks for insisting. While you _can_ save the outer range, if you try to replace the individual elements of the saved range with saved copies, at the end of the day, both your outer ranges will still reference the same ranges, saved or not. Ok. Now we can move forward. One of the solutions I see would be that the RoR wrapper (we're talking about Joiner, right?), never mutate the elements of the RoR directly, but first store a *saved* copy of the element into a _current. This way, you only need to save the *outer* range, but not *ALL* of the inner ranges individually. This is good because it keeps things in check performance wise. I commented on your fix to joiner: https://github.com/D-Programming-Language/phobos/pull/987/files#r2478956 This may solve the problem actually.Why is that?Because the definition of .save only requires that the state of the outer range is saved. Nothing is guaranteed about the state of the inner ranges.If the wrapper is *designed* as being a RoR, then the "elements" of the range are defined as being part of the iteration scheme. Once you have defined it that way, you can have the wrapper save the top Range, as well as each sub-range individually, and then the wrapper is Forward... No?In *theory*, yes, you can do this. But currently, it's not possible to do this generically, because .save has to return the same type as the outer range. The outer range's .save method does NOT call .save on the inner ranges (e.g., an array of forward ranges: std.array.save simply returns a shallow copy of the original array). So the inner ranges will be invalidated by iterating either the .save'd range, or the original range. You have to return an outer range in which all of its elements have been replaced with .save'd copies. But there is no method in the outer range that does this, so there is no way to do this in generic code. *If* .save is allowed to return a different type, then yes, you can return a wrapper of the outer range that lets you iterate through .save'd copies of the inner ranges. But AFAIK this isn't allowed by the current definition of .save. T
Dec 20 2012
On Thu, Dec 20, 2012 at 05:30:39PM +0100, monarch_dodra wrote:On Thursday, 20 December 2012 at 16:04:43 UTC, H. S. Teoh wrote:[...]On Thu, Dec 20, 2012 at 04:47:10PM +0100, monarch_dodra wrote: [...]Why is that?Because the definition of .save only requires that the state of the outer range is saved. Nothing is guaranteed about the state of the inner ranges.If the wrapper is *designed* as being a RoR, then the "elements" of the range are defined as being part of the iteration scheme. Once you have defined it that way, you can have the wrapper save the top Range, as well as each sub-range individually, and then the wrapper is Forward... No?In *theory*, yes, you can do this. But currently, it's not possible to do this generically, because .save has to return the same type as the outer range. The outer range's .save method does NOT call .save on the inner ranges (e.g., an array of forward ranges: std.array.save simply returns a shallow copy of the original array). So the inner ranges will be invalidated by iterating either the .save'd range, or the original range. You have to return an outer range in which all of its elements have been replaced with .save'd copies. But there is no method in the outer range that does this, so there is no way to do this in generic code.Yeah..., I think I get it now. Thanks for insisting. While you _can_ save the outer range, if you try to replace the individual elements of the saved range with saved copies, at the end of the day, both your outer ranges will still reference the same ranges, saved or not. Ok. Now we can move forward. One of the solutions I see would be that the RoR wrapper (we're talking about Joiner, right?), never mutate the elements of the RoR directly, but first store a *saved* copy of the element into a _current. This way, you only need to save the *outer* range, but not *ALL* of the inner ranges individually. This is good because it keeps things in check performance wise. I commented on your fix to joiner: https://github.com/D-Programming-Language/phobos/pull/987/files#r2478956 This may solve the problem actually.You're right, if joiner only iterates through .save'd copies of the subranges, then there will be no problem. Any external modifications is not our responsibility, because as long as you only iterate over the joined range (without mutation), it will not cause unexpected modifications of the original range. T -- One reason that few people are aware there are programs running the internet is that they never crash in any significant way: the free software underlying the internet is reliable to the point of invisibility. -- Glyn Moody, from the article "Giving it all away"
Dec 20 2012
On Thursday, December 20, 2012 08:03:04 H. S. Teoh wrote:Because the definition of .save only requires that the state of the outer range is saved. Nothing is guaranteed about the state of the inner ranges.That just means that that anything involving ranges of ranges needs to be written with the understanding that saving the outer range doesn't save the inner ones. So, we may have bugs with regards to this (similar to how we have bugs related to not calling save when we should), but there's still no problem with save itself - only with how it's used. So much of this would have been easier though if reference type ranges had been disallowed. Too late now though. We'll just have to fix such bugs as we find them and strengthen the unit tests so that they get caught like they should be. - Jonathan M Davis
Dec 20 2012
On 12/20/12 12:48 PM, Jonathan M Davis wrote:On Thursday, December 20, 2012 08:03:04 H. S. Teoh wrote:I think this whole issue is akin to the "transitory" discussion of a while ago. AndreiBecause the definition of .save only requires that the state of the outer range is saved. Nothing is guaranteed about the state of the inner ranges.That just means that that anything involving ranges of ranges needs to be written with the understanding that saving the outer range doesn't save the inner ones. So, we may have bugs with regards to this (similar to how we have bugs related to not calling save when we should), but there's still no problem with save itself - only with how it's used. So much of this would have been easier though if reference type ranges had been disallowed. Too late now though. We'll just have to fix such bugs as we find them and strengthen the unit tests so that they get caught like they should be.
Dec 20 2012
On Thu, Dec 20, 2012 at 01:47:16PM -0500, Andrei Alexandrescu wrote:On 12/20/12 12:48 PM, Jonathan M Davis wrote:Yes, I just submitted a pull request for joiner (the no-separator variant), and am working on another pull for the with-separator variant of joiner. We will also need to review all algorithms/wrapper ranges that take a range-of-range parameter for correctness.On Thursday, December 20, 2012 08:03:04 H. S. Teoh wrote:Because the definition of .save only requires that the state of the outer range is saved. Nothing is guaranteed about the state of the inner ranges.That just means that that anything involving ranges of ranges needs to be written with the understanding that saving the outer range doesn't save the inner ones. So, we may have bugs with regards to this (similar to how we have bugs related to not calling save when we should), but there's still no problem with save itself - only with how it's used.[...] Which has still not be resolved. :-/ Can we at the very least agree that any Phobos algorithm that *can* be made to work with transient ranges (without incurring performance or other problems), should? I can work through the list I posted some time ago and submit pulls for them. It would be better than doing nothing at all, though this is IMO still not fully satisfactory. If nothing else, I think we should update the documentation to warn users of potential pitfalls with using transient ranges, and indicate clearly which algorithms have been vetted for safe operation with transient ranges. T -- Why are you blatanly misspelling "blatant"? -- Branden RobinsonSo much of this would have been easier though if reference type ranges had been disallowed. Too late now though. We'll just have to fix such bugs as we find them and strengthen the unit tests so that they get caught like they should be.I think this whole issue is akin to the "transitory" discussion of a while ago.
Dec 20 2012
On 12/20/12 2:07 PM, H. S. Teoh wrote:Can we at the very least agree that any Phobos algorithm that *can* be made to work with transient ranges (without incurring performance or other problems), should? I can work through the list I posted some time ago and submit pulls for them.Absolutely, that would be awesome. Andrei
Dec 20 2012