## digitalmars.D - A simple solution for randomness copy problem

Ilya Yaroshenko <ilyayaroshenko gmail.com> writes:
```Hello,

The problem is that a structure for a random algorithm or a
random variable may holds its own precomputed random state, which
is not correct to copy.

From [1] by Joseph Rushton Wakeling:
Essentially the sampling algorithm carries out its own little
internal pseudo-random process which falls back to using the
RNG when necessary.
I've had a lot of conversations with different people around
this issue, and I've tried out a bunch of different ideas,
including functor RNGs which would be wrapped with a range API
accessing them via pointer (a suggestion Dicebot among others
made to me).  In every case, I've run into difficulty when
moving from RNGs and distributions to random algorithms like
RandomCover and RandomSample.

The solution is to add a `hot` flag that indicates that
precomputed random values can be used and reset this flag in copy
constructor. It works without performance issues for the Vitter's
algorithm and Normal sampling (of course if you don't copy the
struct for each call).

Both Vitter's random sample algorithm and normal distribution
needs this flag or analog anyway (at least in my implementations).

Below are 2 reduced code samples. The first one is for Vitter's
strides for random sample and the second one is for Normal
distribution.
The actual code for random sample (`sample`) can be found at [2],
and for `NormalVairable` at 3.

[1]
http://forum.dlang.org/post/gpsilefdillwtleuwohl forum.dlang.org
[2]
https://github.com/libmir/mir-random/blob/master/source/mir/random/algorithm.d
[3]
https://github.com/libmir/mir-random/blob/master/source/mir/random/variable.d

----------- Vitter's Algorithm D ---------
struct VitterStrides
{
private double vprime; // A future/pregenerated random value
private bool hot; // A flag that indicates that vprime can be
used

this(this)
{
hot = false;
}

sizediff_t opCall(G)(ref G gen)
{
...
}
}

---------- Normal Distribution ----------
RandomVariable struct NormalVariable(T)
if (isFloatingPoint!T)
{
private T y = 0;
private bool hot;

this(this)
{
hot = false;
}

///
T opCall(G)(ref G gen)
if (isSaturatedRandomEngine!G)
{
T x = void;
if (hot)
{
hot = false;
x = y;
}
else
{
// compute independent x, y at once
}
...
}
}
```
Nov 29 2016
Joseph Rushton Wakeling <joseph.wakeling webdrake.net> writes:
```On Tuesday, 29 November 2016 at 08:50:52 UTC, Ilya Yaroshenko
wrote:
The solution is to add a `hot` flag that indicates that
precomputed random values can be used and reset this flag in
copy constructor. It works without performance issues for the
Vitter's algorithm and Normal sampling (of course if you don't
copy the struct for each call).

Oh, nice.  This feels like a bit of a "facepalm" moment as in,
"How did no one think of that before?" :-P

It could be a heavy performance hit for larger state variables
(e.g. a Ziggurat), but on that note ...

Both Vitter's random sample algorithm and normal distribution
needs this flag or analog anyway (at least in my
implementations).

I'm not sure that it's so important for distributions.  Unlike
the general case of random algorithms (which likely need to be
created multiple times in inner program loops), distributions are
much more likely to be created at a relatively high level and
then _used_ multiple times in an inner loop.

That means that in principle, distributions could be treated in
the same way as RNGs, i.e. just ban copying by value and expect
them to be passed around by ref or by pointer.

Of course, including the `hot` flag could offer best of both
worlds: if you pass around by ref then you avoid needing to
recreate the internal state, if you pass around by value then the
internal state is regenerated for statistical safety.

I'll try this out in detail with the Phobos std.random stuff and
see how it plays.  Thank you _very_ much for focusing on this
problem.

if (isSaturatedRandomEngine!G)

Question on your terminology here: while saturated makes sense,
is it really your intention to restrict things to random
_engines_ (i.e. pseudo-random algorithms)?  Surely it should also
be possible for this functionality to work with random devices?
```
Nov 30 2016
Ilya Yaroshenko <ilyayaroshenko gmail.com> writes:
```On Wednesday, 30 November 2016 at 11:29:25 UTC, Joseph Rushton
Wakeling wrote:
On Tuesday, 29 November 2016 at 08:50:52 UTC, Ilya Yaroshenko
if (isSaturatedRandomEngine!G)

Question on your terminology here: while saturated makes sense,
is it really your intention to restrict things to random
_engines_ (i.e. pseudo-random algorithms)?  Surely it should
also be possible for this functionality to work with random
devices?

Of course they can work with random devices. It looks strange to
me to have explicit  API difference between engines and devices.
A random devices can be marked as random engines or we can add a
simple generic adaptor (Device->Engine) for them. --Ilya

I will be happy to se your PR for random devices to mir-random.

Thanks,
Ilya
```
Nov 30 2016
Joseph Rushton Wakeling <joseph.wakeling webdrake.net> writes:
```On Wednesday, 30 November 2016 at 12:28:28 UTC, Ilya Yaroshenko
wrote:
Of course they can work with random devices. It looks strange
to me to have explicit  API difference between engines and
devices. A random devices can be marked as random engines or we
can add a simple generic adaptor (Device->Engine) for them.

The terminology here (from the C++11 <random> standard) is:

* generator == source of uniformly-distributed random bits

* engine == pseudo-random generator (i.e. seedable sequences)

* device == 'truly random' generator

I will be happy to se your PR for random devices to mir-random.

I'll see what I can do.
```
Nov 30 2016
Dicebot <public dicebot.lv> writes:
```One minor nitpick - please avoid calling postblit constructor a
"copy constructor", it tends to mislead developers with C++
origins into expecting copy constructor they are used to - and
disappointment when it proves to not be the case.
```
Nov 30 2016