www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - Trying to understand RandomSample struct in std.random

reply Joseph Rushton Wakeling <joseph.wakeling webdrake.net> writes:
Hello all,

I'm trying to understand the internal operations of the RandomSample struct in 
std.random.

What gets output at the end is clearly an array containing a subset of the 
original input.  But I can't understand how this is constructed.

The constructor sets various initial values and then calls a function prime() 
[looks like an implementation of Knuth's Algorithm S] which returns when the 
current _index value is selected.  Otherwise, it moves along to the next index 
value.

What I don't understand is where the selected values are stored or how they
come 
to be available.

I also don't understand the corresponding randomSample functions where a 
specified random number generator is passed; how it can work to set ret.gen = 
gen _after_ the RandomSample struct has been initialized.

Can anyone help me to understand how this struct works?

Thanks & best wishes,

     -- Joe
Apr 12 2012
parent reply =?UTF-8?B?QWxpIMOHZWhyZWxp?= <acehreli yahoo.com> writes:
On 04/12/2012 11:30 AM, Joseph Rushton Wakeling wrote:

 What gets output at the end is clearly an array containing a subset of
 the original input.
That's misleading. RandomSample is a lazy range. The output seems to be elements of an array only as you pull data out of this range.
 The constructor sets various initial values and then calls a function
 prime() [looks like an implementation of Knuth's Algorithm S] which
 returns when the current _index value is selected. Otherwise, it moves
 along to the next index value.

 What I don't understand is where the selected values are stored or how
 they come to be available.
The actual values are produced by the range that is passed to RandomSample. RandomSample merely skips over some of those values.
 I also don't understand the corresponding randomSample functions where a
 specified random number generator is passed; how it can work to set
 ret.gen = gen _after_ the RandomSample struct has been initialized.
It is ok because the returned RandomSample range is not used yet. It will be used only after having been returned from the randomSample() convenience function.
 Can anyone help me to understand how this struct works?
I shamelessly provide a link to a chapter that I have written, which I think covers all of these topics: http://ddili.org/ders/d.en/ranges.html
 Thanks & best wishes,

 -- Joe
Ali
Apr 12 2012
parent reply Joseph Rushton Wakeling <joseph.wakeling webdrake.net> writes:
On 12/04/12 20:39, Ali Çehreli wrote:
 That's misleading. RandomSample is a lazy range. The output seems to be
elements
 of an array only as you pull data out of this range.
Ahhh, it's lazy evaluation. That would explain why you can set ret.gen = gen in the randomSample functions and have it work. (... realized that before reading your explicit statement to that effect:-)
 I shamelessly provide a link to a chapter that I have written, which I think
 covers all of these topics:

 http://ddili.org/ders/d.en/ranges.html
No shame required, your textbook is very nice :-) I think I understand better now -- that the initial value is calculated by the call to prime() in the constructor, which leaves _input.front as being the first selected value; and then popFront() calls prime() again to set the next value, and so on. What I _still_ don't understand is the statement in the constructor: // we should skip some elements initially so we don't always // start with the first ... as it seems to me that doing so would bugger up the selection algorithm. The whole point of Algorithm S is that it considers one by one each of the possible values in sequence.
Apr 12 2012
parent =?UTF-8?B?QWxpIMOHZWhyZWxp?= <acehreli yahoo.com> writes:
On 04/12/2012 01:27 PM, Joseph Rushton Wakeling wrote:

 What I _still_ don't understand is the statement in the constructor:

 // we should skip some elements initially so we don't always
 // start with the first

 ... as it seems to me that doing so would bugger up the selection
 algorithm. The whole point of Algorithm S is that it considers one by
 one each of the possible values in sequence.
It is to keep front() trivial. Unless popFront() is called, front() should always return the same element. That's why the constructor goes to the first selection up front so that the first call to front() need not calculate anything. As a result it reduces the laziness of the whole thing though. Even if popFront() is never called, the original range is consumed. Ali
Apr 12 2012