digitalmars.D - A simple solution for randomness copy problem

Ilya Yaroshenko (65/75) Nov 29 2016 Hello,

Joseph Rushton Wakeling (25/34) Nov 30 2016 Oh, nice. This feels like a bit of a "facepalm" moment as in,

Ilya Yaroshenko (9/16) Nov 30 2016 Of course they can work with random devices. It looks strange to

Joseph Rushton Wakeling (7/12) Nov 30 2016 The terminology here (from the C++11 standard) is:

Dicebot (4/4) Nov 30 2016 One minor nitpick - please avoid calling postblit constructor a

Ilya Yaroshenko <ilyayaroshenko gmail.com> writes:

Hello,

The problem is that a structure for a random algorithm or a 
random variable may holds its own precomputed random state, which 
is not correct to copy.

 From [1] by Joseph Rushton Wakeling:
 Essentially the sampling algorithm carries out its own little 
 internal pseudo-random process which falls back to using the 
 RNG when necessary.
 I've had a lot of conversations with different people around 
 this issue, and I've tried out a bunch of different ideas, 
 including functor RNGs which would be wrapped with a range API 
 accessing them via pointer (a suggestion Dicebot among others 
 made to me).  In every case, I've run into difficulty when 
 moving from RNGs and distributions to random algorithms like 
 RandomCover and RandomSample.

The solution is to add a `hot` flag that indicates that 
precomputed random values can be used and reset this flag in copy 
constructor. It works without performance issues for the Vitter's 
algorithm and Normal sampling (of course if you don't copy the 
struct for each call).

Both Vitter's random sample algorithm and normal distribution 
needs this flag or analog anyway (at least in my implementations).

Below are 2 reduced code samples. The first one is for Vitter's 
strides for random sample and the second one is for Normal 
distribution.
The actual code for random sample (`sample`) can be found at [2], 
and for `NormalVairable` at 3.

[1] 
http://forum.dlang.org/post/gpsilefdillwtleuwohl forum.dlang.org
[2] 
https://github.com/libmir/mir-random/blob/master/source/mir/random/algorithm.d
[3] 
https://github.com/libmir/mir-random/blob/master/source/mir/random/variable.d

----------- Vitter's Algorithm D ---------
struct VitterStrides
{
     private double vprime; // A future/pregenerated random value
     private bool hot; // A flag that indicates that vprime can be 
used

     this(this)
     {
         hot = false;
     }

     sizediff_t opCall(G)(ref G gen)
     {
         ...
     }
}

---------- Normal Distribution ----------
 RandomVariable struct NormalVariable(T)
     if (isFloatingPoint!T)
{
     private T y = 0;
     private bool hot;

     this(this)
     {
         hot = false;
     }

     ///
     T opCall(G)(ref G gen)
         if (isSaturatedRandomEngine!G)
     {
         T x = void;
         if (hot)
         {
             hot = false;
             x = y;
         }
         else
         {
             // compute independent x, y at once
         }
         ...
     }
}

Nov 29 2016

Joseph Rushton Wakeling <joseph.wakeling webdrake.net> writes:

On Tuesday, 29 November 2016 at 08:50:52 UTC, Ilya Yaroshenko 
wrote:
 The solution is to add a `hot` flag that indicates that 
 precomputed random values can be used and reset this flag in 
 copy constructor. It works without performance issues for the 
 Vitter's algorithm and Normal sampling (of course if you don't 
 copy the struct for each call).

Oh, nice.  This feels like a bit of a "facepalm" moment as in, 
"How did no one think of that before?" :-P

It could be a heavy performance hit for larger state variables 
(e.g. a Ziggurat), but on that note ...

 Both Vitter's random sample algorithm and normal distribution 
 needs this flag or analog anyway (at least in my 
 implementations).

I'm not sure that it's so important for distributions.  Unlike 
the general case of random algorithms (which likely need to be 
created multiple times in inner program loops), distributions are 
much more likely to be created at a relatively high level and 
then _used_ multiple times in an inner loop.

That means that in principle, distributions could be treated in 
the same way as RNGs, i.e. just ban copying by value and expect 
them to be passed around by ref or by pointer.

Of course, including the `hot` flag could offer best of both 
worlds: if you pass around by ref then you avoid needing to 
recreate the internal state, if you pass around by value then the 
internal state is regenerated for statistical safety.

I'll try this out in detail with the Phobos std.random stuff and 
see how it plays.  Thank you _very_ much for focusing on this 
problem.

         if (isSaturatedRandomEngine!G)

Question on your terminology here: while saturated makes sense, 
is it really your intention to restrict things to random 
_engines_ (i.e. pseudo-random algorithms)?  Surely it should also 
be possible for this functionality to work with random devices?

Nov 30 2016

Ilya Yaroshenko <ilyayaroshenko gmail.com> writes:

On Wednesday, 30 November 2016 at 11:29:25 UTC, Joseph Rushton 
Wakeling wrote:
 On Tuesday, 29 November 2016 at 08:50:52 UTC, Ilya Yaroshenko
         if (isSaturatedRandomEngine!G)

 Question on your terminology here: while saturated makes sense, 
 is it really your intention to restrict things to random 
 _engines_ (i.e. pseudo-random algorithms)?  Surely it should 
 also be possible for this functionality to work with random 
 devices?

Of course they can work with random devices. It looks strange to 
me to have explicit  API difference between engines and devices. 
A random devices can be marked as random engines or we can add a 
simple generic adaptor (Device->Engine) for them. --Ilya

I will be happy to se your PR for random devices to mir-random.

Thanks,
Ilya

Nov 30 2016

Joseph Rushton Wakeling <joseph.wakeling webdrake.net> writes:

On Wednesday, 30 November 2016 at 12:28:28 UTC, Ilya Yaroshenko 
wrote:
 Of course they can work with random devices. It looks strange 
 to me to have explicit  API difference between engines and 
 devices. A random devices can be marked as random engines or we 
 can add a simple generic adaptor (Device->Engine) for them.

The terminology here (from the C++11 <random> standard) is:

   * generator == source of uniformly-distributed random bits

   * engine == pseudo-random generator (i.e. seedable sequences)

   * device == 'truly random' generator

 I will be happy to se your PR for random devices to mir-random.

I'll see what I can do.

Nov 30 2016

Dicebot <public dicebot.lv> writes:

One minor nitpick - please avoid calling postblit constructor a 
"copy constructor", it tends to mislead developers with C++ 
origins into expecting copy constructor they are used to - and 
disappointment when it proves to not be the case.

Nov 30 2016

D Programming

C/C++ Programming

Other

digitalmars.D - A simple solution for randomness copy problem