digitalmars.D - std.random2

Joseph Rushton Wakeling (54/54) Jan 08 2013 Hello all,

ixid (16/88) Jan 08 2013 I imagine there has been some detailed discussion of the

Joseph Rushton Wakeling (6/8) Jan 08 2013 I appreciate your concern on this point, but I don't think it's the righ...
H. S. Teoh (17/30) Jan 08 2013 For one thing, I'd say definitely make RNGs have reference semantics.

monarch_dodra (31/68) Jan 08 2013 I'd argue PRNG's should even be forward ranges. For one, the

Joseph Rushton Wakeling (11/21) Jan 08 2013 That's a good point; I'd not really thought about the implications of .s...

Dmitry Olshansky (16/41) Jan 08 2013 I'd push a well working (from my POV) design of polymorphic adapters.

Fawzi Mohamed (4/46) Jan 09 2013 I think that is the correct approach, and was what I wanted to write

Joseph Rushton Wakeling <joseph.wakeling webdrake.net> writes:

Hello all,

Following discussion on the pull request adding normal random number generation 
to std.random [ https://github.com/D-Programming-Language/phobos/pull/1029 ], 
some issues were raised which are best discussed with the whole community.

The heart of this is the design of pseudo-random number generators (PRNGs) in 
Phobos.  Currently these are implemented as value types, which has a number of 
unpleasant effects:

    -- They're expensive to pass around (some PRNGs have a size of a MB or more)

    -- Passing by value is statistically unsafe, as it can result in identical
       random sequences being generated in different parts of the code.  This
       already affects at least one part of std.random itself: see,
       http://d.puremagic.com/issues/show_bug.cgi?id=8247
       http://forum.dlang.org/thread/oiczxzkzxketxitncghl forum.dlang.org

    -- Simply passing by reference isn't an adequate solution, as there will be
       cases (as with RandomSample, detailed in bug 8247) where you have to
       store the RNG.  Storing e.g. a pointer or reference would be unsafe; the
       only adequate solution is that PRNGs be (safe) reference types.

monarch_dodra did some work on this which was set aside in the short term 
because it would (unavoidably) be a breaking change [ 
https://github.com/D-Programming-Language/phobos/pull/893 ].  To avoid this,
the 
proposed solution is to create a std.random2.

However, these issues seem to me to have broader implications for the design of 
random-number functionality in Phobos, so if we're going to do a std.random2, 
it's worth mapping them out so as to get them right.

The most obvious (to me) is that these issues which apply to PRNGs apply
equally 
well to random number distributions.  For example the Ziggurat algorithm 
requires storing several hundred constants, so passing by value is expensive; 
and several different algorithms generate and store multiple random variates at 
a time, so copying/passing by value will result in unintended correlations in 
sequences of variates.

This has further implications if for example we want to create a 
VariateGenerator (or perhaps in D-ish terms, a VariateRange) which couples a 
random distribution with a PRNG -- this is unlikely to work unless both the 
random distribution and the PRNG are reference types.

Finally, there are more general issues about how new functionality should be 
implemented.  C++11 is given as a model in the std.random documentation, but 
this is clearly a guide rather than something to copy blindly -- existing 
functions and structs already deviate from it in ways that reflect D's 
preference for ranges and its superior generics.  We need a clear picture of
how 
to do this for functionality that has not yet been implemented.

For example: in the case of random distributions, the current example of 
uniform() offers only a function interface and therefore little guidance about 
how to create struct implementations for more complex algorithms which require 
persistent storage (e.g. Ziggurat or Box-Muller).  Should they follow 
C++11/Boost.Random in returning variates via opCall, or should they be coupled 
with a PRNG at construction time (as with RandomSample) and implement a range
of 
variates?

My inclination here is to take some time to map out the different 
interface/design options and to present the choices to the community for review 
as a precursor to creating a std.random2.  It seems the only really sensible 
choice to ensure that we get a good future-proof design.

What does everyone think?

Best wishes,

     -- Joe

Jan 08 2013

"ixid" <nuaccount gmail.com> writes:

On Tuesday, 8 January 2013 at 17:52:24 UTC, Joseph Rushton 
Wakeling wrote:
 Hello all,

 Following discussion on the pull request adding normal random 
 number generation to std.random [ 
 https://github.com/D-Programming-Language/phobos/pull/1029 ], 
 some issues were raised which are best discussed with the whole 
 community.

 The heart of this is the design of pseudo-random number 
 generators (PRNGs) in Phobos.  Currently these are implemented 
 as value types, which has a number of unpleasant effects:

    -- They're expensive to pass around (some PRNGs have a size 
 of a MB or more)

    -- Passing by value is statistically unsafe, as it can 
 result in identical
       random sequences being generated in different parts of 
 the code.  This
       already affects at least one part of std.random itself: 
 see,
       http://d.puremagic.com/issues/show_bug.cgi?id=8247
       
 http://forum.dlang.org/thread/oiczxzkzxketxitncghl forum.dlang.org

    -- Simply passing by reference isn't an adequate solution, 
 as there will be
       cases (as with RandomSample, detailed in bug 8247) where 
 you have to
       store the RNG.  Storing e.g. a pointer or reference would 
 be unsafe; the
       only adequate solution is that PRNGs be (safe) reference 
 types.

 monarch_dodra did some work on this which was set aside in the 
 short term because it would (unavoidably) be a breaking change 
 [ https://github.com/D-Programming-Language/phobos/pull/893 ].  
 To avoid this, the proposed solution is to create a std.random2.

 However, these issues seem to me to have broader implications 
 for the design of random-number functionality in Phobos, so if 
 we're going to do a std.random2, it's worth mapping them out so 
 as to get them right.

 The most obvious (to me) is that these issues which apply to 
 PRNGs apply equally well to random number distributions.  For 
 example the Ziggurat algorithm requires storing several hundred 
 constants, so passing by value is expensive; and several 
 different algorithms generate and store multiple random 
 variates at a time, so copying/passing by value will result in 
 unintended correlations in sequences of variates.

 This has further implications if for example we want to create 
 a VariateGenerator (or perhaps in D-ish terms, a VariateRange) 
 which couples a random distribution with a PRNG -- this is 
 unlikely to work unless both the random distribution and the 
 PRNG are reference types.

 Finally, there are more general issues about how new 
 functionality should be implemented.  C++11 is given as a model 
 in the std.random documentation, but this is clearly a guide 
 rather than something to copy blindly -- existing functions and 
 structs already deviate from it in ways that reflect D's 
 preference for ranges and its superior generics.  We need a 
 clear picture of how to do this for functionality that has not 
 yet been implemented.

 For example: in the case of random distributions, the current 
 example of uniform() offers only a function interface and 
 therefore little guidance about how to create struct 
 implementations for more complex algorithms which require 
 persistent storage (e.g. Ziggurat or Box-Muller).  Should they 
 follow C++11/Boost.Random in returning variates via opCall, or 
 should they be coupled with a PRNG at construction time (as 
 with RandomSample) and implement a range of variates?

 My inclination here is to take some time to map out the 
 different interface/design options and to present the choices 
 to the community for review as a precursor to creating a 
 std.random2.  It seems the only really sensible choice to 
 ensure that we get a good future-proof design.

 What does everyone think?

 Best wishes,

     -- Joe

I imagine there has been some detailed discussion of the 
std.nameX idea of libraries so forgive me if this has been 
discussed. Using this as an approach to essentially replacing 
libraries instead of the depreciation route would seem to 
potentially lead to a situation where you have std.name1, 2, 3, 4 
ad infinitum which may not have exactly the same feature set or 
use cases and would lead to people hunting around multiple 
libraries which on the face of it are for the same thing and 
using them in projects. Would it not be better to bite the bullet 
and depreciate? random2 simply sounds like random done properly, 
with random and random2 people would often use random, and so 
still suffer the consequences of its issues, and possibly 
overlook random2. You're depreciating because it's broken, not 
because people want to mess around with trivialities.

Jan 08 2013

Joseph Rushton Wakeling <joseph.wakeling webdrake.net> writes:

On 01/08/2013 07:38 PM, ixid wrote:
 I imagine there has been some detailed discussion of the std.nameX idea of
 libraries so forgive me if this has been discussed.

I appreciate your concern on this point, but I don't think it's the right thing 
to focus on in this specific discussion.

What I really want address is: how do we get the design of std.random _right_?

How we go about incorporating that new design into Phobos with minimal hassle 
for users is a different issue and one we can face when the time comes.

Jan 08 2013

"H. S. Teoh" <hsteoh quickfur.ath.cx> writes:

On Tue, Jan 08, 2013 at 07:46:46PM +0100, Joseph Rushton Wakeling wrote:
 On 01/08/2013 07:38 PM, ixid wrote:
I imagine there has been some detailed discussion of the std.nameX
idea of libraries so forgive me if this has been discussed.

 
 I appreciate your concern on this point, but I don't think it's the
 right thing to focus on in this specific discussion.
 
 What I really want address is: how do we get the design of std.random
 _right_?
 
 How we go about incorporating that new design into Phobos with
 minimal hassle for users is a different issue and one we can face
 when the time comes.

For one thing, I'd say definitely make RNGs have reference semantics.
Passing by value just doesn't make sense; for large generators it
consumes too much stack space and risks stack overflow, and in any case
copying RNGs unintentionally causes duplicated sequences, which is very
bad.

For those cases where you *want* to copy an RNG, it should be made into
a forward range and then you use .save explicitly.

I wonder if it even makes sense to force RNGs to inherit from a common
base class, so that reference semantics are enforced even for
user-defined types. But this may be a bit too heavy-handed (and it will
alienate the no-GC crowd).


T

-- 
The peace of mind---from knowing that viruses which exploit Microsoft
system vulnerabilities cannot touch Linux---is priceless. -- Frustrated
system administrator.

Jan 08 2013

"monarch_dodra" <monarchdodra gmail.com> writes:

On Tuesday, 8 January 2013 at 20:40:00 UTC, H. S. Teoh wrote:
 On Tue, Jan 08, 2013 at 07:46:46PM +0100, Joseph Rushton 
 Wakeling wrote:
 On 01/08/2013 07:38 PM, ixid wrote:
I imagine there has been some detailed discussion of the 
std.nameX
idea of libraries so forgive me if this has been discussed.

 
 I appreciate your concern on this point, but I don't think 
 it's the
 right thing to focus on in this specific discussion.
 
 What I really want address is: how do we get the design of 
 std.random
 _right_?
 
 How we go about incorporating that new design into Phobos with
 minimal hassle for users is a different issue and one we can 
 face
 when the time comes.

 For one thing, I'd say definitely make RNGs have reference 
 semantics.
 Passing by value just doesn't make sense; for large generators 
 it
 consumes too much stack space and risks stack overflow, and in 
 any case
 copying RNGs unintentionally causes duplicated sequences, which 
 is very
 bad.

 For those cases where you *want* to copy an RNG, it should be 
 made into
 a forward range and then you use .save explicitly.

I'd argue PRNG's should even be forward ranges. For one, the 
actual saving would be duplicating the PRNG payload itself (very 
costly). Further more, it would mean the danger of duplicate 
sequence still exist: a lot of algorithms, such a "fill", start 
by saving their input, and then iterating on the copy...

I'm all for them having "dup" though.

Worst case scenario, I had written an adaptor that turns a 
InputRangePRNG into a ForwardRangePRNG, but this is at the 
*explicit* request of the *user*, and not by any underlying 
algorithm.

And even then, I'd *still* advise against using this adaptor. If 
you *must* store and iterate on random values several time, just 
copy them in an array doing prng.take(N).array(). Clean, 
efficient, safe.

 jmdavis was against making them input ranges, saying input 
ranges just aren't useful compared to a forward range. I think 
that:
1. A PRNG naturally models input, and not forward.
2. Algorithms that operate on actual random sequenes shouldn't 
rely on the PRNG to be saveable anyways: A *true* RNG simply 
cannot provide save.

 I wonder if it even makes sense to force RNGs to inherit from a 
 common
 base class, so that reference semantics are enforced even for
 user-defined types. But this may be a bit too heavy-handed (and 
 it will
 alienate the no-GC crowd).

Force: No. However, we *can* provide tools to help with the 
operation though.

Since the recommended implementation of a PRNG is "pointer to 
payload", I had written an adapter template which took a "Stack 
payload implementation" and turned it into a reference semantic 
type. This meant I pretty much transformed all our PRNGs into 
reference ranges, without even touching any of their 
implementations.

If a user doesn't want it, fine.

Jan 08 2013

Joseph Rushton Wakeling <joseph.wakeling webdrake.net> writes:

On 01/08/2013 10:05 PM, monarch_dodra wrote:
 I'd argue PRNG's should even be forward ranges. For one, the actual saving
would
 be duplicating the PRNG payload itself (very costly). Further more, it would
 mean the danger of duplicate sequence still exist: a lot of algorithms, such a
 "fill", start by saving their input, and then iterating on the copy...

That's a good point; I'd not really thought about the implications of .save in 
that respect.

 I'm all for them having "dup" though.

Got to be able to have the option to save the random state to stop the player 
reloading to get a different random outcome ... :-)

  jmdavis was against making them input ranges, saying input ranges just aren't
 useful compared to a forward range. I think that:
 1. A PRNG naturally models input, and not forward.
 2. Algorithms that operate on actual random sequenes shouldn't rely on the PRNG
 to be saveable anyways: A *true* RNG simply cannot provide save.

I would say that it's not useful if a PRNG makes it easy for you to shoot 
yourself in the foot with flawed statistics.

Now, that said, the problem with something like a take(N) approach is that some 
algorithms might not take a predictable number of random variates. 
RandomSample 
is a good example of that, although fortunately in this case the ForwardRange 
characteristics are not required.

Jan 08 2013

Dmitry Olshansky <dmitry.olsh gmail.com> writes:

09-Jan-2013 00:38, H. S. Teoh пишет:
 On Tue, Jan 08, 2013 at 07:46:46PM +0100, Joseph Rushton Wakeling wrote:
 On 01/08/2013 07:38 PM, ixid wrote:
 I imagine there has been some detailed discussion of the std.nameX
 idea of libraries so forgive me if this has been discussed.

 I appreciate your concern on this point, but I don't think it's the
 right thing to focus on in this specific discussion.

 What I really want address is: how do we get the design of std.random
 _right_?

 How we go about incorporating that new design into Phobos with
 minimal hassle for users is a different issue and one we can face
 when the time comes.

 For one thing, I'd say definitely make RNGs have reference semantics.
 Passing by value just doesn't make sense; for large generators it
 consumes too much stack space and risks stack overflow, and in any case
 copying RNGs unintentionally causes duplicated sequences, which is very
 bad.

 For those cases where you *want* to copy an RNG, it should be made into
 a forward range and then you use .save explicitly.

 I wonder if it even makes sense to force RNGs to inherit from a common
 base class, so that reference semantics are enforced even for
 user-defined types. But this may be a bit too heavy-handed (and it will
 alienate the no-GC crowd).

I'd push a well working (from my POV) design of polymorphic adapters. 
The idea is that wrapping is doable but buing the speed back is unsolved 
problem. BTW there is one for input ranges - InputRangeObject.

Then each generator/distribution defines specific structs and there is 
one templated polymorphic wrapper that has a common base class.

IMO this gives the best of both worlds (this is how std.digest was 
designed) at no implementation cost.

The no-GC, performance and "just give me this one little PRGN" needs are 
served with specific structs.

Then polymorphic behavior with hot-swappable PRNG, "I'm coming from 
Java/Ruby/..." etc. needs are served with a wrapper + base class or 
interface. It may even give a per struct aliases of the respective 
wrapper to be more user-friendly.


-- 
Dmitry Olshansky

Jan 08 2013

Fawzi Mohamed <fawzi gmx.ch> writes:

On 01/08/13 22:23, Dmitry Olshansky wrote:
 09-Jan-2013 00:38, H. S. Teoh пишет:
 On Tue, Jan 08, 2013 at 07:46:46PM +0100, Joseph Rushton Wakeling wrote:
 On 01/08/2013 07:38 PM, ixid wrote:
 I imagine there has been some detailed discussion of the std.nameX
 idea of libraries so forgive me if this has been discussed.

 I appreciate your concern on this point, but I don't think it's the
 right thing to focus on in this specific discussion.

 What I really want address is: how do we get the design of std.random
 _right_?

 How we go about incorporating that new design into Phobos with
 minimal hassle for users is a different issue and one we can face
 when the time comes.

 For one thing, I'd say definitely make RNGs have reference semantics.
 Passing by value just doesn't make sense; for large generators it
 consumes too much stack space and risks stack overflow, and in any case
 copying RNGs unintentionally causes duplicated sequences, which is very
 bad.

 For those cases where you *want* to copy an RNG, it should be made into
 a forward range and then you use .save explicitly.

 I wonder if it even makes sense to force RNGs to inherit from a common
 base class, so that reference semantics are enforced even for
 user-defined types. But this may be a bit too heavy-handed (and it will
 alienate the no-GC crowd).

 I'd push a well working (from my POV) design of polymorphic adapters. 
 The idea is that wrapping is doable but buing the speed back is 
 unsolved problem. BTW there is one for input ranges - InputRangeObject.

 Then each generator/distribution defines specific structs and there is 
 one templated polymorphic wrapper that has a common base class.

 IMO this gives the best of both worlds (this is how std.digest was 
 designed) at no implementation cost.

 The no-GC, performance and "just give me this one little PRGN" needs 
 are served with specific structs.

 Then polymorphic behavior with hot-swappable PRNG, "I'm coming from 
 Java/Ruby/..." etc. needs are served with a wrapper + base class or 
 interface. It may even give a per struct aliases of the respective 
 wrapper to be more user-friendly.

I think that is the correct approach, and was what I wanted to write 
yestarday, but I had no internet... a good idea is bound to crop up 
again :).

Jan 09 2013

D Programming

C/C++ Programming

Other

digitalmars.D - std.random2