www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - More range woes: std.array.save is invalid

reply "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
http://d.puremagic.com/issues/show_bug.cgi?id=8061

After being sidetracked by the stack-allocated buffer fiasco, I finally
found the root problem in the above bug.

The problem is std.array.save:

 property T[] save(T)(T[] a)  safe pure nothrow
{
    return a;
}

This implementation is only correct for certain values of T. It is wrong
when T is a forward range, for example, because if you save a T[] and
iterate over its elements, it will consume the T's in the array, so that
the original range has elements that are now consumed.

The correct implementation when T is a forward range is to return a copy
of the array where the elements are obtained by calling T.save.

The problem is, now .save is a potentially very expensive operation.

At a deeper level, it can be argued that actually, std.array.save is not
wrong, because you asked it to save the array, not the elements within
the array. So in that sense, there's nothing wrong with the
implementation.

What is wrong here is that many algorithms in std.range and
std.algorithm wrongly assume that just because the outer range (that is,
T[]) is a forward range, *and* the inner ranges (that is, T) are also
forward ranges, that therefore T[].save will save the current position
of the *range of ranges*.  Actually, the only thing that is guaranteed
to be saved is the current position of the *containing range*, it does
NOT save the position of the subranges at all (at least, not according
to the current definition of .save, which is NOT transitive). This means
that many wrapper ranges are broken, because they export a .save method
that does not work correctly under certain circumstances.

The correct way to wrap a range of ranges as a forward range, is if the
wrapping range's .save method *copies* the outer range and populates it
with .save'd subranges. Otherwise, it must be treated only as an input
range. However, since there's no general way to copy a range, all of
these wrapper ranges cannot be more than *input* ranges.

That is, unless we redefine .save to be transitive. But then, it becomes
just a variant of a general .dup function for ranges. And so we come
back to a fundamental issue in D, that there's no generic way to
duplicate a value. This is causing us a lot of grief in generic code
(the whole transient .front issue is another instance of this problem,
since transience wouldn't be a problem if there was a generic .dup
method), and, as I see it, will continue to cause us more grief in the
future, because there's no way you can assume that assigning anything to
anything else will not magically become invalid when you subsequently
perform a seemingly-unrelated operation. That makes writing generic code
a veritable minefield.


T

-- 
Why can't you just be a nonconformist like everyone else? -- YHL
Dec 19 2012
next sibling parent "monarch_dodra" <monarchdodra gmail.com> writes:
On Thursday, 20 December 2012 at 07:05:08 UTC, H. S. Teoh wrote:
 The problem is std.array.save:

  property T[] save(T)(T[] a)  safe pure nothrow
 {
     return a;
 }

 This implementation is only correct for certain values of T. It 
 is wrong
 when T is a forward range, for example, because if you save a 
 T[] and
 iterate over its elements, it will consume the T's in the 
 array, so that
 the original range has elements that are now consumed.

 T
I'd say that's the correct behavior: A range is just a way to iterate over contents, and "save" makes no promise (and *should* make no promise) to make copies of the range's elements. If you modify one of the elements in your range, then you'd better hope that change is seen by all other ranges iterating over the same data. What I'm basically saying is: //---- auto a = [1, 2, 3] auto b = a.save; b[0] = 5; assert(a[0] == 5); //Correct behavior //---- The fact that you have ints or forwards ranges as elements is irrelevant. --------------------------- BTW: Just the same way, "dup" will not transitively duplicate the contents of an array, or of std.container.array. --------------------------- To get your described behavior, then you should wrap your array into your own range, and make the array's elements part of your iteration definition. This way you'll surprise no one.
Dec 19 2012
prev sibling next sibling parent "monarch_dodra" <monarchdodra gmail.com> writes:
On Thursday, 20 December 2012 at 07:05:08 UTC, H. S. Teoh wrote:
 What is wrong here is that many algorithms in std.range and
 std.algorithm wrongly assume that just because the outer range 
 (that is,
 T[]) is a forward range, *and* the inner ranges (that is, T) 
 are also
 forward ranges, that therefore T[].save will save the current 
 position
 of the *range of ranges*.
Most probably, algorithm is still not quite great about using save. This is especially true in regards to RoR, but that's a special case, and we are fixing those one by one.
Dec 19 2012
prev sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 12/20/12 2:03 AM, H. S. Teoh wrote:
 http://d.puremagic.com/issues/show_bug.cgi?id=8061

 After being sidetracked by the stack-allocated buffer fiasco, I finally
 found the root problem in the above bug.

 The problem is std.array.save:

  property T[] save(T)(T[] a)  safe pure nothrow
 {
      return a;
 }

 This implementation is only correct for certain values of T. It is wrong
 when T is a forward range, for example, because if you save a T[] and
 iterate over its elements, it will consume the T's in the array, so that
 the original range has elements that are now consumed.

 The correct implementation when T is a forward range is to return a copy
 of the array where the elements are obtained by calling T.save.
I think you're overthinking this. Save saves the position in the current range without any promise about its contents. Andrei
Dec 19 2012
next sibling parent "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Thu, Dec 20, 2012 at 02:36:18AM -0500, Andrei Alexandrescu wrote:
 On 12/20/12 2:03 AM, H. S. Teoh wrote:
http://d.puremagic.com/issues/show_bug.cgi?id=8061

After being sidetracked by the stack-allocated buffer fiasco, I
finally found the root problem in the above bug.

The problem is std.array.save:

 property T[] save(T)(T[] a)  safe pure nothrow
{
     return a;
}

This implementation is only correct for certain values of T. It is
wrong when T is a forward range, for example, because if you save a
T[] and iterate over its elements, it will consume the T's in the
array, so that the original range has elements that are now consumed.

The correct implementation when T is a forward range is to return a
copy of the array where the elements are obtained by calling T.save.
I think you're overthinking this. Save saves the position in the current range without any promise about its contents.
[...] Yes, which means many current algorithms that take a range of ranges and returns a wrapper range are wrong, because they assume that the wrapper range can be a forward range when both the container and the subranges are forward ranges, but this is not a sufficient condition in the general case. T -- Doubt is a self-fulfilling prophecy.
Dec 19 2012
prev sibling next sibling parent Jonathan M Davis <jmdavisProg gmx.com> writes:
On Wednesday, December 19, 2012 23:44:12 H. S. Teoh wrote:
 Yes, which means many current algorithms that take a range of ranges and
 returns a wrapper range are wrong, because they assume that the wrapper
 range can be a forward range when both the container and the subranges
 are forward ranges, but this is not a sufficient condition in the
 general case.
Then they have bugs in their implementation which need to be fixed. There's nothing wrong with std.array.save. It's doing the right thing. - Jonathan M Davis
Dec 20 2012
prev sibling parent reply "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Thu, Dec 20, 2012 at 12:08:24AM -0800, Jonathan M Davis wrote:
 On Wednesday, December 19, 2012 23:44:12 H. S. Teoh wrote:
 Yes, which means many current algorithms that take a range of ranges
 and returns a wrapper range are wrong, because they assume that the
 wrapper range can be a forward range when both the container and the
 subranges are forward ranges, but this is not a sufficient condition
 in the general case.
Then they have bugs in their implementation which need to be fixed. There's nothing wrong with std.array.save. It's doing the right thing.
[...] OK, so the subject line is misleading. But the bigger issue is that wrappers of forward ranges of forward ranges can only be input ranges, in spite of the fact that, in theory, you can save each component of the given range of ranges. T -- Nobody is perfect. I am Nobody. -- pepoluan, GKC forum
Dec 20 2012
parent reply "monarch_dodra" <monarchdodra gmail.com> writes:
On Thursday, 20 December 2012 at 15:25:27 UTC, H. S. Teoh wrote:
 On Thu, Dec 20, 2012 at 12:08:24AM -0800, Jonathan M Davis 
 wrote:
 On Wednesday, December 19, 2012 23:44:12 H. S. Teoh wrote:
 Yes, which means many current algorithms that take a range 
 of ranges
 and returns a wrapper range are wrong, because they assume 
 that the
 wrapper range can be a forward range when both the container 
 and the
 subranges are forward ranges, but this is not a sufficient 
 condition
 in the general case.
Then they have bugs in their implementation which need to be fixed. There's nothing wrong with std.array.save. It's doing the right thing.
[...] OK, so the subject line is misleading. But the bigger issue is that wrappers of forward ranges of forward ranges can only be input ranges, in spite of the fact that, in theory, you can save each component of the given range of ranges. T
Why is that? If the wrapper is *designed* as being a RoR, then the "elements" of the range are defined as being part of the iteration scheme. Once you have defined it that way, you can have the wrapper save the top Range, as well as each sub-range individually, and then the wrapper is Forward... No?
Dec 20 2012
next sibling parent reply "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Thu, Dec 20, 2012 at 04:47:10PM +0100, monarch_dodra wrote:
 On Thursday, 20 December 2012 at 15:25:27 UTC, H. S. Teoh wrote:
On Thu, Dec 20, 2012 at 12:08:24AM -0800, Jonathan M Davis wrote:
On Wednesday, December 19, 2012 23:44:12 H. S. Teoh wrote:
 Yes, which means many current algorithms that take a range of
 ranges and returns a wrapper range are wrong, because they assume
 that the wrapper range can be a forward range when both the
 container and the subranges are forward ranges, but this is not a
 sufficient condition in the general case.
Then they have bugs in their implementation which need to be fixed. There's nothing wrong with std.array.save. It's doing the right thing.
[...] OK, so the subject line is misleading. But the bigger issue is that wrappers of forward ranges of forward ranges can only be input ranges, in spite of the fact that, in theory, you can save each component of the given range of ranges.
[...]
 Why is that?
Because the definition of .save only requires that the state of the outer range is saved. Nothing is guaranteed about the state of the inner ranges.
 If the wrapper is *designed* as being a RoR, then the "elements" of
 the range are defined as being part of the iteration scheme.  Once you
 have defined it that way, you can have the wrapper save the top Range,
 as well as each sub-range individually, and then the wrapper is
 Forward... No?
In *theory*, yes, you can do this. But currently, it's not possible to do this generically, because .save has to return the same type as the outer range. The outer range's .save method does NOT call .save on the inner ranges (e.g., an array of forward ranges: std.array.save simply returns a shallow copy of the original array). So the inner ranges will be invalidated by iterating either the .save'd range, or the original range. You have to return an outer range in which all of its elements have been replaced with .save'd copies. But there is no method in the outer range that does this, so there is no way to do this in generic code. *If* .save is allowed to return a different type, then yes, you can return a wrapper of the outer range that lets you iterate through .save'd copies of the inner ranges. But AFAIK this isn't allowed by the current definition of .save. T -- Why waste time learning, when ignorance is instantaneous? -- Hobbes, from Calvin & Hobbes
Dec 20 2012
parent reply "monarch_dodra" <monarchdodra gmail.com> writes:
On Thursday, 20 December 2012 at 16:04:43 UTC, H. S. Teoh wrote:
 On Thu, Dec 20, 2012 at 04:47:10PM +0100, monarch_dodra wrote:
 [...]
 Why is that?
Because the definition of .save only requires that the state of the outer range is saved. Nothing is guaranteed about the state of the inner ranges.
 If the wrapper is *designed* as being a RoR, then the 
 "elements" of
 the range are defined as being part of the iteration scheme.  
 Once you
 have defined it that way, you can have the wrapper save the 
 top Range,
 as well as each sub-range individually, and then the wrapper is
 Forward... No?
In *theory*, yes, you can do this. But currently, it's not possible to do this generically, because .save has to return the same type as the outer range. The outer range's .save method does NOT call .save on the inner ranges (e.g., an array of forward ranges: std.array.save simply returns a shallow copy of the original array). So the inner ranges will be invalidated by iterating either the .save'd range, or the original range. You have to return an outer range in which all of its elements have been replaced with .save'd copies. But there is no method in the outer range that does this, so there is no way to do this in generic code. *If* .save is allowed to return a different type, then yes, you can return a wrapper of the outer range that lets you iterate through .save'd copies of the inner ranges. But AFAIK this isn't allowed by the current definition of .save. T
Yeah..., I think I get it now. Thanks for insisting. While you _can_ save the outer range, if you try to replace the individual elements of the saved range with saved copies, at the end of the day, both your outer ranges will still reference the same ranges, saved or not. Ok. Now we can move forward. One of the solutions I see would be that the RoR wrapper (we're talking about Joiner, right?), never mutate the elements of the RoR directly, but first store a *saved* copy of the element into a _current. This way, you only need to save the *outer* range, but not *ALL* of the inner ranges individually. This is good because it keeps things in check performance wise. I commented on your fix to joiner: https://github.com/D-Programming-Language/phobos/pull/987/files#r2478956 This may solve the problem actually.
Dec 20 2012
parent "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Thu, Dec 20, 2012 at 05:30:39PM +0100, monarch_dodra wrote:
 On Thursday, 20 December 2012 at 16:04:43 UTC, H. S. Teoh wrote:
On Thu, Dec 20, 2012 at 04:47:10PM +0100, monarch_dodra wrote:
[...]
Why is that?
Because the definition of .save only requires that the state of the outer range is saved. Nothing is guaranteed about the state of the inner ranges.
If the wrapper is *designed* as being a RoR, then the "elements" of
the range are defined as being part of the iteration scheme.  Once
you have defined it that way, you can have the wrapper save the top
Range, as well as each sub-range individually, and then the wrapper
is Forward... No?
In *theory*, yes, you can do this. But currently, it's not possible to do this generically, because .save has to return the same type as the outer range. The outer range's .save method does NOT call .save on the inner ranges (e.g., an array of forward ranges: std.array.save simply returns a shallow copy of the original array). So the inner ranges will be invalidated by iterating either the .save'd range, or the original range. You have to return an outer range in which all of its elements have been replaced with .save'd copies. But there is no method in the outer range that does this, so there is no way to do this in generic code.
[...]
 Yeah..., I think I get it now. Thanks for insisting.
 
 While you _can_ save the outer range, if you try to replace the
 individual elements of the saved range with saved copies, at the end
 of the day, both your outer ranges will still reference the same
 ranges, saved or not. Ok. Now we can move forward.
 
 One of the solutions I see would be that the RoR wrapper (we're
 talking about Joiner, right?), never mutate the elements of the RoR
 directly, but first store a *saved* copy of the element into a
 _current.
 
 This way, you only need to save the *outer* range, but not *ALL* of
 the inner ranges individually. This is good because it keeps things
 in check performance wise.
 
 I commented on your fix to joiner:
 https://github.com/D-Programming-Language/phobos/pull/987/files#r2478956
 This may solve the problem actually.
You're right, if joiner only iterates through .save'd copies of the subranges, then there will be no problem. Any external modifications is not our responsibility, because as long as you only iterate over the joined range (without mutation), it will not cause unexpected modifications of the original range. T -- One reason that few people are aware there are programs running the internet is that they never crash in any significant way: the free software underlying the internet is reliable to the point of invisibility. -- Glyn Moody, from the article "Giving it all away"
Dec 20 2012
prev sibling parent reply "Jonathan M Davis" <jmdavisProg gmx.com> writes:
On Thursday, December 20, 2012 08:03:04 H. S. Teoh wrote:
 Because the definition of .save only requires that the state of the
 outer range is saved. Nothing is guaranteed about the state of the inner
 ranges.
That just means that that anything involving ranges of ranges needs to be written with the understanding that saving the outer range doesn't save the inner ones. So, we may have bugs with regards to this (similar to how we have bugs related to not calling save when we should), but there's still no problem with save itself - only with how it's used. So much of this would have been easier though if reference type ranges had been disallowed. Too late now though. We'll just have to fix such bugs as we find them and strengthen the unit tests so that they get caught like they should be. - Jonathan M Davis
Dec 20 2012
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 12/20/12 12:48 PM, Jonathan M Davis wrote:
 On Thursday, December 20, 2012 08:03:04 H. S. Teoh wrote:
 Because the definition of .save only requires that the state of the
 outer range is saved. Nothing is guaranteed about the state of the inner
 ranges.
That just means that that anything involving ranges of ranges needs to be written with the understanding that saving the outer range doesn't save the inner ones. So, we may have bugs with regards to this (similar to how we have bugs related to not calling save when we should), but there's still no problem with save itself - only with how it's used. So much of this would have been easier though if reference type ranges had been disallowed. Too late now though. We'll just have to fix such bugs as we find them and strengthen the unit tests so that they get caught like they should be.
I think this whole issue is akin to the "transitory" discussion of a while ago. Andrei
Dec 20 2012
parent reply "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Thu, Dec 20, 2012 at 01:47:16PM -0500, Andrei Alexandrescu wrote:
 On 12/20/12 12:48 PM, Jonathan M Davis wrote:
On Thursday, December 20, 2012 08:03:04 H. S. Teoh wrote:
Because the definition of .save only requires that the state of the
outer range is saved. Nothing is guaranteed about the state of the
inner ranges.
That just means that that anything involving ranges of ranges needs to be written with the understanding that saving the outer range doesn't save the inner ones. So, we may have bugs with regards to this (similar to how we have bugs related to not calling save when we should), but there's still no problem with save itself - only with how it's used.
Yes, I just submitted a pull request for joiner (the no-separator variant), and am working on another pull for the with-separator variant of joiner. We will also need to review all algorithms/wrapper ranges that take a range-of-range parameter for correctness.
So much of this would have been easier though if reference type
ranges had been disallowed. Too late now though. We'll just have to
fix such bugs as we find them and strengthen the unit tests so that
they get caught like they should be.
I think this whole issue is akin to the "transitory" discussion of a while ago.
[...] Which has still not be resolved. :-/ Can we at the very least agree that any Phobos algorithm that *can* be made to work with transient ranges (without incurring performance or other problems), should? I can work through the list I posted some time ago and submit pulls for them. It would be better than doing nothing at all, though this is IMO still not fully satisfactory. If nothing else, I think we should update the documentation to warn users of potential pitfalls with using transient ranges, and indicate clearly which algorithms have been vetted for safe operation with transient ranges. T -- Why are you blatanly misspelling "blatant"? -- Branden Robinson
Dec 20 2012
parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 12/20/12 2:07 PM, H. S. Teoh wrote:
 Can we at the very least agree that any Phobos algorithm that *can* be
 made to work with transient ranges (without incurring performance or
 other problems), should? I can work through the list I posted some time
 ago and submit pulls for them.
Absolutely, that would be awesome. Andrei
Dec 20 2012