digitalmars.D - More range woes: std.array.save is invalid

H. S. Teoh (48/48) Dec 19 2012 http://d.puremagic.com/issues/show_bug.cgi?id=8061

monarch_dodra (23/36) Dec 19 2012 I'd say that's the correct behavior: A range is just a way to
monarch_dodra (4/12) Dec 19 2012 Most probably, algorithm is still not quite great about using
Andrei Alexandrescu (4/18) Dec 19 2012 I think you're overthinking this. Save saves the position in the current...

H. S. Teoh (10/33) Dec 19 2012 [...]
Jonathan M Davis (4/9) Dec 20 2012 Then they have bugs in their implementation which need to be fixed. Ther...
H. S. Teoh (9/18) Dec 20 2012 [...]

monarch_dodra (7/32) Dec 20 2012 Why is that?

H. S. Teoh (22/47) Dec 20 2012 Because the definition of .save only requires that the state of the

monarch_dodra (16/58) Dec 20 2012 Yeah..., I think I get it now. Thanks for insisting.

H. S. Teoh (10/57) Dec 20 2012 You're right, if joiner only iterates through .save'd copies of the

Jonathan M Davis (11/14) Dec 20 2012 That just means that that anything involving ranges of ranges needs to b...

Andrei Alexandrescu (4/17) Dec 20 2012 I think this whole issue is akin to the "transitory" discussion of a

H. S. Teoh (20/39) Dec 20 2012 Yes, I just submitted a pull request for joiner (the no-separator

Andrei Alexandrescu (3/7) Dec 20 2012 Absolutely, that would be awesome.

"H. S. Teoh" <hsteoh quickfur.ath.cx> writes:

http://d.puremagic.com/issues/show_bug.cgi?id=8061

After being sidetracked by the stack-allocated buffer fiasco, I finally
found the root problem in the above bug.

The problem is std.array.save:

 property T[] save(T)(T[] a)  safe pure nothrow
{
    return a;
}

This implementation is only correct for certain values of T. It is wrong
when T is a forward range, for example, because if you save a T[] and
iterate over its elements, it will consume the T's in the array, so that
the original range has elements that are now consumed.

The correct implementation when T is a forward range is to return a copy
of the array where the elements are obtained by calling T.save.

The problem is, now .save is a potentially very expensive operation.

At a deeper level, it can be argued that actually, std.array.save is not
wrong, because you asked it to save the array, not the elements within
the array. So in that sense, there's nothing wrong with the
implementation.

What is wrong here is that many algorithms in std.range and
std.algorithm wrongly assume that just because the outer range (that is,
T[]) is a forward range, *and* the inner ranges (that is, T) are also
forward ranges, that therefore T[].save will save the current position
of the *range of ranges*.  Actually, the only thing that is guaranteed
to be saved is the current position of the *containing range*, it does
NOT save the position of the subranges at all (at least, not according
to the current definition of .save, which is NOT transitive). This means
that many wrapper ranges are broken, because they export a .save method
that does not work correctly under certain circumstances.

The correct way to wrap a range of ranges as a forward range, is if the
wrapping range's .save method *copies* the outer range and populates it
with .save'd subranges. Otherwise, it must be treated only as an input
range. However, since there's no general way to copy a range, all of
these wrapper ranges cannot be more than *input* ranges.

That is, unless we redefine .save to be transitive. But then, it becomes
just a variant of a general .dup function for ranges. And so we come
back to a fundamental issue in D, that there's no generic way to
duplicate a value. This is causing us a lot of grief in generic code
(the whole transient .front issue is another instance of this problem,
since transience wouldn't be a problem if there was a generic .dup
method), and, as I see it, will continue to cause us more grief in the
future, because there's no way you can assume that assigning anything to
anything else will not magically become invalid when you subsequently
perform a seemingly-unrelated operation. That makes writing generic code
a veritable minefield.


T

-- 
Why can't you just be a nonconformist like everyone else? -- YHL

Dec 19 2012

"monarch_dodra" <monarchdodra gmail.com> writes:

On Thursday, 20 December 2012 at 07:05:08 UTC, H. S. Teoh wrote:
 The problem is std.array.save:

  property T[] save(T)(T[] a)  safe pure nothrow
 {
     return a;
 }

 This implementation is only correct for certain values of T. It 
 is wrong
 when T is a forward range, for example, because if you save a 
 T[] and
 iterate over its elements, it will consume the T's in the 
 array, so that
 the original range has elements that are now consumed.

 T

I'd say that's the correct behavior: A range is just a way to 
iterate over contents, and "save" makes no promise (and *should* 
make no promise) to make copies of the range's elements.

If you modify one of the elements in your range, then you'd 
better hope that change is seen by all other ranges iterating 
over the same data.

What I'm basically saying is:
//----
auto a = [1, 2, 3]
auto b = a.save;
b[0] = 5;
assert(a[0] == 5); //Correct behavior
//----

The fact that you have ints or forwards ranges as elements is 
irrelevant.

---------------------------
BTW: Just the same way, "dup" will not transitively duplicate the 
contents of an array, or of std.container.array.

---------------------------
To get your described behavior, then you should wrap your array 
into your own range, and make the array's elements part of your 
iteration definition. This way you'll surprise no one.

Dec 19 2012

"monarch_dodra" <monarchdodra gmail.com> writes:

On Thursday, 20 December 2012 at 07:05:08 UTC, H. S. Teoh wrote:
 What is wrong here is that many algorithms in std.range and
 std.algorithm wrongly assume that just because the outer range 
 (that is,
 T[]) is a forward range, *and* the inner ranges (that is, T) 
 are also
 forward ranges, that therefore T[].save will save the current 
 position
 of the *range of ranges*.

Most probably, algorithm is still not quite great about using 
save. This is especially true in regards to RoR, but that's a 
special case, and we are fixing those one by one.

Dec 19 2012

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 12/20/12 2:03 AM, H. S. Teoh wrote:
 http://d.puremagic.com/issues/show_bug.cgi?id=8061

 After being sidetracked by the stack-allocated buffer fiasco, I finally
 found the root problem in the above bug.

 The problem is std.array.save:

  property T[] save(T)(T[] a)  safe pure nothrow
 {
      return a;
 }

 This implementation is only correct for certain values of T. It is wrong
 when T is a forward range, for example, because if you save a T[] and
 iterate over its elements, it will consume the T's in the array, so that
 the original range has elements that are now consumed.

 The correct implementation when T is a forward range is to return a copy
 of the array where the elements are obtained by calling T.save.

I think you're overthinking this. Save saves the position in the current 
range without any promise about its contents.

Andrei

Dec 19 2012

"H. S. Teoh" <hsteoh quickfur.ath.cx> writes:

On Thu, Dec 20, 2012 at 02:36:18AM -0500, Andrei Alexandrescu wrote:
 On 12/20/12 2:03 AM, H. S. Teoh wrote:
http://d.puremagic.com/issues/show_bug.cgi?id=8061

After being sidetracked by the stack-allocated buffer fiasco, I
finally found the root problem in the above bug.

The problem is std.array.save:

 property T[] save(T)(T[] a)  safe pure nothrow
{
     return a;
}

This implementation is only correct for certain values of T. It is
wrong when T is a forward range, for example, because if you save a
T[] and iterate over its elements, it will consume the T's in the
array, so that the original range has elements that are now consumed.

The correct implementation when T is a forward range is to return a
copy of the array where the elements are obtained by calling T.save.

 
 I think you're overthinking this. Save saves the position in the
 current range without any promise about its contents.

[...]

Yes, which means many current algorithms that take a range of ranges and
returns a wrapper range are wrong, because they assume that the wrapper
range can be a forward range when both the container and the subranges
are forward ranges, but this is not a sufficient condition in the
general case.


T

-- 
Doubt is a self-fulfilling prophecy.

Dec 19 2012

Jonathan M Davis <jmdavisProg gmx.com> writes:

On Wednesday, December 19, 2012 23:44:12 H. S. Teoh wrote:
 Yes, which means many current algorithms that take a range of ranges and
 returns a wrapper range are wrong, because they assume that the wrapper
 range can be a forward range when both the container and the subranges
 are forward ranges, but this is not a sufficient condition in the
 general case.

Then they have bugs in their implementation which need to be fixed. There's 
nothing wrong with std.array.save. It's doing the right thing.

- Jonathan M Davis

Dec 20 2012

"H. S. Teoh" <hsteoh quickfur.ath.cx> writes:

On Thu, Dec 20, 2012 at 12:08:24AM -0800, Jonathan M Davis wrote:
 On Wednesday, December 19, 2012 23:44:12 H. S. Teoh wrote:
 Yes, which means many current algorithms that take a range of ranges
 and returns a wrapper range are wrong, because they assume that the
 wrapper range can be a forward range when both the container and the
 subranges are forward ranges, but this is not a sufficient condition
 in the general case.

 
 Then they have bugs in their implementation which need to be fixed.
 There's nothing wrong with std.array.save. It's doing the right thing.

[...]

OK, so the subject line is misleading.

But the bigger issue is that wrappers of forward ranges of forward
ranges can only be input ranges, in spite of the fact that, in theory,
you can save each component of the given range of ranges.


T

-- 
Nobody is perfect.  I am Nobody. -- pepoluan, GKC forum

Dec 20 2012

"monarch_dodra" <monarchdodra gmail.com> writes:

On Thursday, 20 December 2012 at 15:25:27 UTC, H. S. Teoh wrote:
 On Thu, Dec 20, 2012 at 12:08:24AM -0800, Jonathan M Davis 
 wrote:
 On Wednesday, December 19, 2012 23:44:12 H. S. Teoh wrote:
 Yes, which means many current algorithms that take a range 
 of ranges
 and returns a wrapper range are wrong, because they assume 
 that the
 wrapper range can be a forward range when both the container 
 and the
 subranges are forward ranges, but this is not a sufficient 
 condition
 in the general case.

 
 Then they have bugs in their implementation which need to be 
 fixed.
 There's nothing wrong with std.array.save. It's doing the 
 right thing.

 [...]

 OK, so the subject line is misleading.

 But the bigger issue is that wrappers of forward ranges of 
 forward
 ranges can only be input ranges, in spite of the fact that, in 
 theory,
 you can save each component of the given range of ranges.


 T

Why is that?

If the wrapper is *designed* as being a RoR, then the "elements"
of the range are defined as being part of the iteration scheme.
Once you have defined it that way, you can have the wrapper save
the top Range, as well as each sub-range individually, and then
the wrapper is Forward... No?

Dec 20 2012

"H. S. Teoh" <hsteoh quickfur.ath.cx> writes:

On Thu, Dec 20, 2012 at 04:47:10PM +0100, monarch_dodra wrote:
 On Thursday, 20 December 2012 at 15:25:27 UTC, H. S. Teoh wrote:
On Thu, Dec 20, 2012 at 12:08:24AM -0800, Jonathan M Davis wrote:
On Wednesday, December 19, 2012 23:44:12 H. S. Teoh wrote:
 Yes, which means many current algorithms that take a range of
 ranges and returns a wrapper range are wrong, because they assume
 that the wrapper range can be a forward range when both the
 container and the subranges are forward ranges, but this is not a
 sufficient condition in the general case.

Then they have bugs in their implementation which need to be fixed.
There's nothing wrong with std.array.save. It's doing the right
thing.

[...]

OK, so the subject line is misleading.

But the bigger issue is that wrappers of forward ranges of forward
ranges can only be input ranges, in spite of the fact that, in
theory, you can save each component of the given range of ranges.


[...]
 Why is that?

Because the definition of .save only requires that the state of the
outer range is saved. Nothing is guaranteed about the state of the inner
ranges.


 If the wrapper is *designed* as being a RoR, then the "elements" of
 the range are defined as being part of the iteration scheme.  Once you
 have defined it that way, you can have the wrapper save the top Range,
 as well as each sub-range individually, and then the wrapper is
 Forward... No?

In *theory*, yes, you can do this. But currently, it's not possible to
do this generically, because .save has to return the same type as the
outer range. The outer range's .save method does NOT call .save on the
inner ranges (e.g., an array of forward ranges: std.array.save simply
returns a shallow copy of the original array). So the inner ranges will
be invalidated by iterating either the .save'd range, or the original
range.

You have to return an outer range in which all of its elements have been
replaced with .save'd copies. But there is no method in the outer range
that does this, so there is no way to do this in generic code.

*If* .save is allowed to return a different type, then yes, you can
return a wrapper of the outer range that lets you iterate through
.save'd copies of the inner ranges. But AFAIK this isn't allowed by the
current definition of .save.


T

-- 
Why waste time learning, when ignorance is instantaneous? -- Hobbes, from
Calvin & Hobbes

Dec 20 2012

"monarch_dodra" <monarchdodra gmail.com> writes:

On Thursday, 20 December 2012 at 16:04:43 UTC, H. S. Teoh wrote:
 On Thu, Dec 20, 2012 at 04:47:10PM +0100, monarch_dodra wrote:
 [...]
 Why is that?

 Because the definition of .save only requires that the state of 
 the
 outer range is saved. Nothing is guaranteed about the state of 
 the inner
 ranges.


 If the wrapper is *designed* as being a RoR, then the 
 "elements" of
 the range are defined as being part of the iteration scheme.  
 Once you
 have defined it that way, you can have the wrapper save the 
 top Range,
 as well as each sub-range individually, and then the wrapper is
 Forward... No?

 In *theory*, yes, you can do this. But currently, it's not 
 possible to
 do this generically, because .save has to return the same type 
 as the
 outer range. The outer range's .save method does NOT call .save 
 on the
 inner ranges (e.g., an array of forward ranges: std.array.save 
 simply
 returns a shallow copy of the original array). So the inner 
 ranges will
 be invalidated by iterating either the .save'd range, or the 
 original
 range.

 You have to return an outer range in which all of its elements 
 have been
 replaced with .save'd copies. But there is no method in the 
 outer range
 that does this, so there is no way to do this in generic code.

 *If* .save is allowed to return a different type, then yes, you 
 can
 return a wrapper of the outer range that lets you iterate 
 through
 .save'd copies of the inner ranges. But AFAIK this isn't 
 allowed by the
 current definition of .save.


 T

Yeah..., I think I get it now. Thanks for insisting.

While you _can_ save the outer range, if you try to replace the 
individual elements of the saved range with saved copies, at the 
end of the day, both your outer ranges will still reference the 
same ranges, saved or not. Ok. Now we can move forward.

One of the solutions I see would be that the RoR wrapper (we're 
talking about Joiner, right?), never mutate the elements of the 
RoR directly, but first store a *saved* copy of the element into 
a _current.

This way, you only need to save the *outer* range, but not *ALL* 
of the inner ranges individually. This is good because it keeps 
things in check performance wise.

I commented on your fix to joiner:
https://github.com/D-Programming-Language/phobos/pull/987/files#r2478956
This may solve the problem actually.

Dec 20 2012

"H. S. Teoh" <hsteoh quickfur.ath.cx> writes:

On Thu, Dec 20, 2012 at 05:30:39PM +0100, monarch_dodra wrote:
 On Thursday, 20 December 2012 at 16:04:43 UTC, H. S. Teoh wrote:
On Thu, Dec 20, 2012 at 04:47:10PM +0100, monarch_dodra wrote:
[...]
Why is that?

Because the definition of .save only requires that the state of the
outer range is saved. Nothing is guaranteed about the state of the
inner ranges.


If the wrapper is *designed* as being a RoR, then the "elements" of
the range are defined as being part of the iteration scheme.  Once
you have defined it that way, you can have the wrapper save the top
Range, as well as each sub-range individually, and then the wrapper
is Forward... No?

In *theory*, yes, you can do this. But currently, it's not possible
to do this generically, because .save has to return the same type as
the outer range. The outer range's .save method does NOT call .save
on the inner ranges (e.g., an array of forward ranges: std.array.save
simply returns a shallow copy of the original array). So the inner
ranges will be invalidated by iterating either the .save'd range, or
the original range.

You have to return an outer range in which all of its elements have
been replaced with .save'd copies. But there is no method in the
outer range that does this, so there is no way to do this in generic
code.


[...]
 Yeah..., I think I get it now. Thanks for insisting.
 
 While you _can_ save the outer range, if you try to replace the
 individual elements of the saved range with saved copies, at the end
 of the day, both your outer ranges will still reference the same
 ranges, saved or not. Ok. Now we can move forward.
 
 One of the solutions I see would be that the RoR wrapper (we're
 talking about Joiner, right?), never mutate the elements of the RoR
 directly, but first store a *saved* copy of the element into a
 _current.
 
 This way, you only need to save the *outer* range, but not *ALL* of
 the inner ranges individually. This is good because it keeps things
 in check performance wise.
 
 I commented on your fix to joiner:
 https://github.com/D-Programming-Language/phobos/pull/987/files#r2478956
 This may solve the problem actually.

You're right, if joiner only iterates through .save'd copies of the
subranges, then there will be no problem. Any external modifications is
not our responsibility, because as long as you only iterate over the
joined range (without mutation), it will not cause unexpected
modifications of the original range.


T

-- 
One reason that few people are aware there are programs running the internet is
that they never crash in any significant way: the free software underlying the
internet is reliable to the point of invisibility. -- Glyn Moody, from the
article "Giving it all away"

Dec 20 2012

"Jonathan M Davis" <jmdavisProg gmx.com> writes:

On Thursday, December 20, 2012 08:03:04 H. S. Teoh wrote:
 Because the definition of .save only requires that the state of the
 outer range is saved. Nothing is guaranteed about the state of the inner
 ranges.

That just means that that anything involving ranges of ranges needs to be 
written with the understanding that saving the outer range doesn't save the 
inner ones. So, we may have bugs with regards to this (similar to how we have 
bugs related to not calling save when we should), but there's still no problem 
with save itself - only with how it's used.

So much of this would have been easier though if reference type ranges had 
been disallowed. Too late now though. We'll just have to fix such bugs as we 
find them and strengthen the unit tests so that they get caught like they 
should be.

- Jonathan M Davis

Dec 20 2012

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 12/20/12 12:48 PM, Jonathan M Davis wrote:
 On Thursday, December 20, 2012 08:03:04 H. S. Teoh wrote:
 Because the definition of .save only requires that the state of the
 outer range is saved. Nothing is guaranteed about the state of the inner
 ranges.

 That just means that that anything involving ranges of ranges needs to be
 written with the understanding that saving the outer range doesn't save the
 inner ones. So, we may have bugs with regards to this (similar to how we have
 bugs related to not calling save when we should), but there's still no problem
 with save itself - only with how it's used.

 So much of this would have been easier though if reference type ranges had
 been disallowed. Too late now though. We'll just have to fix such bugs as we
 find them and strengthen the unit tests so that they get caught like they
 should be.

I think this whole issue is akin to the "transitory" discussion of a 
while ago.

Andrei

Dec 20 2012

"H. S. Teoh" <hsteoh quickfur.ath.cx> writes:

On Thu, Dec 20, 2012 at 01:47:16PM -0500, Andrei Alexandrescu wrote:
 On 12/20/12 12:48 PM, Jonathan M Davis wrote:
On Thursday, December 20, 2012 08:03:04 H. S. Teoh wrote:
Because the definition of .save only requires that the state of the
outer range is saved. Nothing is guaranteed about the state of the
inner ranges.

That just means that that anything involving ranges of ranges needs
to be written with the understanding that saving the outer range
doesn't save the inner ones. So, we may have bugs with regards to
this (similar to how we have bugs related to not calling save when we
should), but there's still no problem with save itself - only with
how it's used.


Yes, I just submitted a pull request for joiner (the no-separator
variant), and am working on another pull for the with-separator variant
of joiner.

We will also need to review all algorithms/wrapper ranges that take a
range-of-range parameter for correctness.


So much of this would have been easier though if reference type
ranges had been disallowed. Too late now though. We'll just have to
fix such bugs as we find them and strengthen the unit tests so that
they get caught like they should be.

 
 I think this whole issue is akin to the "transitory" discussion of a
 while ago.

[...]

Which has still not be resolved. :-/

Can we at the very least agree that any Phobos algorithm that *can* be
made to work with transient ranges (without incurring performance or
other problems), should? I can work through the list I posted some time
ago and submit pulls for them.

It would be better than doing nothing at all, though this is IMO still
not fully satisfactory. If nothing else, I think we should update the
documentation to warn users of potential pitfalls with using transient
ranges, and indicate clearly which algorithms have been vetted for safe
operation with transient ranges.


T

-- 
Why are you blatanly misspelling "blatant"? -- Branden Robinson

Dec 20 2012

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 12/20/12 2:07 PM, H. S. Teoh wrote:
 Can we at the very least agree that any Phobos algorithm that *can* be
 made to work with transient ranges (without incurring performance or
 other problems), should? I can work through the list I posted some time
 ago and submit pulls for them.

Absolutely, that would be awesome.

Andrei

Dec 20 2012

D Programming

C/C++ Programming

Other

digitalmars.D - More range woes: std.array.save is invalid