digitalmars.D.learn - Range returning an array

Joseph Rushton Wakeling (41/41) Apr 09 2013 Hello all,

=?UTF-8?B?QWxpIMOHZWhyZWxp?= (6/8) Apr 09 2013 Your problem is the exact situation that stdio.ByLine has faced. There,

bearophile (7/12) Apr 09 2013 In similar situations I add a boolean template argument that

Steven Schveighoffer (33/34) Apr 09 2013 1. documentation. Make sure the user of the range knows that this data ...

Joseph Rushton Wakeling (10/15) Apr 09 2013 In this case it's quite unlikely anyone will ever want to use the code a...

Steven Schveighoffer (31/40) Apr 09 2013 Well here is another solution:

Joseph Rushton Wakeling (26/27) Apr 09 2013 I did consider something like that. The actual one I have come to after
Joseph Rushton Wakeling (6/7) Apr 09 2013 By the way: the reason that I rejected the temporary-variable choice was...

Steven Schveighoffer (5/15) Apr 09 2013 Calling front after empty is not good range policy, once empty, front is...

H. S. Teoh (9/24) Apr 09 2013 [...]

Steven Schveighoffer (12/33) Apr 09 2013 That is not necessary. A range does not just have to be a range, it jus...

H. S. Teoh (9/42) Apr 09 2013 [...]

Jesse Phillips (7/21) Apr 10 2013 I'm pretty sure he realizes this, his original code shows how he

Joseph Rushton Wakeling (4/10) Apr 11 2013 Agree. I do also see the risks in these assumptions with respect to fro...

Joseph Rushton Wakeling (11/13) Apr 11 2013 It's not necessarily required to actually call front with the range empt...

Joseph Rushton Wakeling <joseph.wakeling webdrake.net> writes:

Hello all,

I have a situation in a program I'm writing where it's convenient to define a
range whose return value is an array.  Something like:

struct MySimulation(T)
{
	T[] var;
	T diff, convergence;

	auto front()  property
	{
		return var;
	}

	bool empty()  property
	{
		return (diff < convergence);
	}

	void popFront()
	{
		// update values in var
		// and calculate diff
	}
}

Now, the problem with a design like this is that it's unsafe, in the sense that,
let's say I do something like,

	T[] output;
	foreach(opvar; mySim)
		output = opvar;

... then, at the end of this loop, output will not be the same as it was set to
with the last assignment, because popFront() will have been called one last time
prior to the "empty" condition being set to true.

Now, there are a number of ways I can think of to avoid this.  One is to return
var.dup rather than var from front(), but this is undesirable because it will
hit performance in a big way.  Another might be to calculate the updated value
of var in a temporary array, and update var itself only if the diff is greater
than the convergence criterion.

However, I'm curious enough about this problem that I thought I'd throw it open
to everyone else to see if anyone has a better idea than one of these.  The
challenge here is to ensure really good performance while not risking bad or
incorrect results leaking outside of the range.

Any thoughts? :-)

Thanks & best wishes,

    -- Joe

Apr 09 2013

=?UTF-8?B?QWxpIMOHZWhyZWxp?= <acehreli yahoo.com> writes:

On 04/09/2013 01:03 PM, Joseph Rushton Wakeling wrote:

 to return var.dup rather than var from front(), but this is
 undesirable because it will hit performance in a big way.

Your problem is the exact situation that stdio.ByLine has faced. There, 
the choice has been to use an internal buffer just like your 'var' and 
leave the .dup call to the user for when it is needed.

That choice is faster but sometimes causes bugs.

Ali

Apr 09 2013

"bearophile" <bearophileHUGS lycos.com> writes:

Ali Çehreli:

 Your problem is the exact situation that stdio.ByLine has 
 faced. There, the choice has been to use an internal buffer 
 just like your 'var' and leave the .dup call to the user for 
 when it is needed.

 That choice is faster but sometimes causes bugs.

In similar situations I add a boolean template argument that 
switches on or off the dupping of the given result. On default 
it's on. So it's safe on default and faster on request, and this 
goes well with the D zen.

Bye,
bearophile

Apr 09 2013

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Tue, 09 Apr 2013 16:03:01 -0400, Joseph Rushton Wakeling  
<joseph.wakeling webdrake.net> wrote:


 Any thoughts? :-)

1. documentation.  Make sure the user of the range knows that this data is  
going to be re-used.
2. Make front() return const if possible.  It is another signal that you  
aren't supposed to keep this data.
3. For your specific situation, add lastFront():

struct MySimulation(T)
{
	T[] var;
	T[] lastVar;
	T diff, convergence;

	auto front()  property
	{
		return var;
	}

	bool empty()  property
	{
		return (diff < convergence);
	}

	void popFront()
	{
		lastVar[] = var[];
		// update values in var
		// and calculate diff
	}
	
	auto lastFront()  property
	{
		return lastVar;
	}
}

-Steve

Apr 09 2013

Joseph Rushton Wakeling <joseph.wakeling webdrake.net> writes:

On 04/09/2013 11:02 PM, Steven Schveighoffer wrote:
 1. documentation.  Make sure the user of the range knows that this data is
going
 to be re-used.

In this case it's quite unlikely anyone will ever want to use the code apart
from me, but yes, good point. :-)

 2. Make front() return const if possible.  It is another signal that you aren't
 supposed to keep this data.
 3. For your specific situation, add lastFront():

It's an interesting thought.  I don't think it's ultimately the right way to go
-- yes, my application rests strongly on finding the last value, but the problem
is very simply that popFront kills the value _before_ finding out if the range
is now empty.

The only logical option that I can see is to tweak things so that doesn't
happen, which is possible but probably a little bit more finnicky than the
current implementation.

Apr 09 2013

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Tue, 09 Apr 2013 18:09:07 -0400, Joseph Rushton Wakeling  
<joseph.wakeling webdrake.net> wrote:

 On 04/09/2013 11:02 PM, Steven Schveighoffer wrote:

 3. For your specific situation, add lastFront():

 It's an interesting thought.  I don't think it's ultimately the right  
 way to go
 -- yes, my application rests strongly on finding the last value, but the  
 problem
 is very simply that popFront kills the value _before_ finding out if the  
 range
 is now empty.

Well here is another solution:

struct MySimulation(T)
{
	T[] var;
         T[] tmpvar;
	T diff, convergence;

	auto front()  property
	{
		return var;
	}

	bool empty()  property
	{
		return (diff < convergence);
	}

	void popFront()
	{
		tmpvar[] = var[];
		// update values in var
		// and calculate diff
		if(empty)
		{
			var[] = tmpvar[]; // revert
			var = null; // optional, detach from original array
		}
	}
}

You could also use alloca to avoid storing tmpvar as a struct member and  
also avoid allocation.

-Steve

Apr 09 2013

Joseph Rushton Wakeling <joseph.wakeling webdrake.net> writes:

On 04/10/2013 12:33 AM, Steven Schveighoffer wrote:
 Well here is another solution

I did consider something like that.  The actual one I have come to after
thinking about it is like this:

struct MySimulation(T)
{
	T[] var;
	T diff = T.max;
	T convergence;
	_bool empty = false;

	auto front()  property
	{
		return var;
	}

	bool empty()  property
	{
		return _empty;
	}

	void popFront()
	{
		if(diff < convergence)
			_empty = true;
		else
			// update var and calculate new diff
	}
}

How does this look to people?

Apr 09 2013

Joseph Rushton Wakeling <joseph.wakeling webdrake.net> writes:

On 04/10/2013 12:50 AM, Joseph Rushton Wakeling wrote:
 I did consider something like that.

By the way: the reason that I rejected the temporary-variable choice was that I
couldn't really see the difference cost-wise between doing that, versus
returning var.dup from front().  Especially as it's not necessarily guaranteed
that front will be called frequently (I might just popFront() until the range is
empty and then take the final front value).

Apr 09 2013

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Tue, 09 Apr 2013 18:53:56 -0400, Joseph Rushton Wakeling  
<joseph.wakeling webdrake.net> wrote:

 On 04/10/2013 12:50 AM, Joseph Rushton Wakeling wrote:
 I did consider something like that.

 By the way: the reason that I rejected the temporary-variable choice was  
 that I
 couldn't really see the difference cost-wise between doing that, versus
 returning var.dup from front().  Especially as it's not necessarily  
 guaranteed
 that front will be called frequently (I might just popFront() until the  
 range is
 empty and then take the final front value).

Calling front after empty is not good range policy, once empty, front is  
possibly invalid or points at invalid memory.

-Steve

Apr 09 2013

"H. S. Teoh" <hsteoh quickfur.ath.cx> writes:

On Tue, Apr 09, 2013 at 07:18:25PM -0400, Steven Schveighoffer wrote:
 On Tue, 09 Apr 2013 18:53:56 -0400, Joseph Rushton Wakeling
 <joseph.wakeling webdrake.net> wrote:
 
On 04/10/2013 12:50 AM, Joseph Rushton Wakeling wrote:
I did consider something like that.

By the way: the reason that I rejected the temporary-variable choice
was that I couldn't really see the difference cost-wise between doing
that, versus returning var.dup from front().  Especially as it's not
necessarily guaranteed that front will be called frequently (I might
just popFront() until the range is empty and then take the final
front value).

 
 Calling front after empty is not good range policy, once empty,
 front is possibly invalid or points at invalid memory.

[...]

I believe it was proposed that .front should assert if .empty returns
true. Personally I think that's a bit too extreme, but nevertheleses, I
agree that calling .front after .empty returns true is sloppy coding,
and can easily lead to bugs or runtime errors.


T

-- 
The two rules of success: 1. Don't tell everything you know. -- YHL

Apr 09 2013

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Tue, 09 Apr 2013 19:31:59 -0400, H. S. Teoh <hsteoh quickfur.ath.cx>  
wrote:

 On Tue, Apr 09, 2013 at 07:18:25PM -0400, Steven Schveighoffer wrote:
 On Tue, 09 Apr 2013 18:53:56 -0400, Joseph Rushton Wakeling
 <joseph.wakeling webdrake.net> wrote:

On 04/10/2013 12:50 AM, Joseph Rushton Wakeling wrote:
I did consider something like that.

By the way: the reason that I rejected the temporary-variable choice
was that I couldn't really see the difference cost-wise between doing
that, versus returning var.dup from front().  Especially as it's not
necessarily guaranteed that front will be called frequently (I might
just popFront() until the range is empty and then take the final
front value).

 Calling front after empty is not good range policy, once empty,
 front is possibly invalid or points at invalid memory.

 [...]

 I believe it was proposed that .front should assert if .empty returns
 true. Personally I think that's a bit too extreme, but nevertheleses, I
 agree that calling .front after .empty returns true is sloppy coding,
 and can easily lead to bugs or runtime errors.

That is not necessary.  A range does not just have to be a range, it just  
has to support a range interface.

It can be perfectly valid for front to not assert.  It can be also valid  
for it to assert.  It just means that functions that generically operate  
on ranges cannot count on front working after empty is true (and that  
WOULD be a bug).

For example, if the OP replaced his range with someone else's, he may run  
into that situation.  But as long as he is the author, it is not illegal  
for an object that is also a range to be empty and have front still work.

-Steve

Apr 09 2013

"H. S. Teoh" <hsteoh quickfur.ath.cx> writes:

On Tue, Apr 09, 2013 at 09:59:42PM -0400, Steven Schveighoffer wrote:
 On Tue, 09 Apr 2013 19:31:59 -0400, H. S. Teoh
 <hsteoh quickfur.ath.cx> wrote:
 
On Tue, Apr 09, 2013 at 07:18:25PM -0400, Steven Schveighoffer wrote:
On Tue, 09 Apr 2013 18:53:56 -0400, Joseph Rushton Wakeling
<joseph.wakeling webdrake.net> wrote:

On 04/10/2013 12:50 AM, Joseph Rushton Wakeling wrote:
I did consider something like that.

By the way: the reason that I rejected the temporary-variable
choice was that I couldn't really see the difference cost-wise
between doing that, versus returning var.dup from front().
Especially as it's not necessarily guaranteed that front will be
called frequently (I might just popFront() until the range is empty
and then take the final front value).

Calling front after empty is not good range policy, once empty,
front is possibly invalid or points at invalid memory.

[...]

I believe it was proposed that .front should assert if .empty returns
true. Personally I think that's a bit too extreme, but nevertheleses,
I agree that calling .front after .empty returns true is sloppy
coding, and can easily lead to bugs or runtime errors.

 
 That is not necessary.  A range does not just have to be a range, it
 just has to support a range interface.
 
 It can be perfectly valid for front to not assert.  It can be also
 valid for it to assert.  It just means that functions that generically
 operate on ranges cannot count on front working after empty is true
 (and that WOULD be a bug).

[...]

Yes, that was my point. I didn't mean to say that front *should* assert,
just that it *might* for some ranges. (In fact, I've written my own
ranges that do support calling .front even when .empty is true.) In
generic code, though, one should not assume one way or another.


T

-- 
Study gravitation, it's a field with a lot of potential.

Apr 09 2013

"Jesse Phillips" <Jessekphillips+D gmail.com> writes:

On Tuesday, 9 April 2013 at 23:18:26 UTC, Steven Schveighoffer 
wrote:
 On Tue, 09 Apr 2013 18:53:56 -0400, Joseph Rushton Wakeling 
 <joseph.wakeling webdrake.net> wrote:
 By the way: the reason that I rejected the temporary-variable 
 choice was that I
 couldn't really see the difference cost-wise between doing 
 that, versus
 returning var.dup from front().  Especially as it's not 
 necessarily guaranteed
 that front will be called frequently (I might just popFront() 
 until the range is
 empty and then take the final front value).

 Calling front after empty is not good range policy, once empty, 
 front is possibly invalid or points at invalid memory.

 -Steve

I'm pretty sure he realizes this, his original code shows how he 
is doing this. He expects 'output' to hold the final front value, 
but instead it holds the empty "value."

Joseph, I think you will have to profile to decide which is the 
fastest for your use case.

Apr 10 2013

Joseph Rushton Wakeling <joseph.wakeling webdrake.net> writes:

On 04/10/2013 08:22 PM, Jesse Phillips wrote:
 I'm pretty sure he realizes this, his original code shows how he is doing this.
 He expects 'output' to hold the final front value, but instead it holds the
 empty "value."
 
 Joseph, I think you will have to profile to decide which is the fastest for
your
 use case.

Agree.  I do also see the risks in these assumptions with respect to front --
they can be safely worked around within my code but it's not a very friendly
construction to put in front of other people.

Apr 11 2013

Joseph Rushton Wakeling <joseph.wakeling webdrake.net> writes:

On 04/10/2013 01:18 AM, Steven Schveighoffer wrote:
 Calling front after empty is not good range policy, once empty, front is
 possibly invalid or points at invalid memory.

It's not necessarily required to actually call front with the range empty: one
could do something like,

	for(; !sim.empty; sim.popFront())
		v = sim.front;

... and the way that popFront() operates means that, while the output of front()
may be transient in general, the last value will remain.

Though I agree that, even in this case, it's is a potentially problematic
situation (there's something weird about the idea of a range that is transient
except for the last entry).  I will have to profile with .dup enabled and see
what changes.

Apr 11 2013

D Programming

C/C++ Programming

Other

digitalmars.D.learn - Range returning an array