www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - std.random2

reply Joseph Rushton Wakeling <joseph.wakeling webdrake.net> writes:
Hello all,

Following discussion on the pull request adding normal random number generation 
to std.random [ https://github.com/D-Programming-Language/phobos/pull/1029 ], 
some issues were raised which are best discussed with the whole community.

The heart of this is the design of pseudo-random number generators (PRNGs) in 
Phobos.  Currently these are implemented as value types, which has a number of 
unpleasant effects:

    -- They're expensive to pass around (some PRNGs have a size of a MB or more)

    -- Passing by value is statistically unsafe, as it can result in identical
       random sequences being generated in different parts of the code.  This
       already affects at least one part of std.random itself: see,
       http://d.puremagic.com/issues/show_bug.cgi?id=8247
       http://forum.dlang.org/thread/oiczxzkzxketxitncghl forum.dlang.org

    -- Simply passing by reference isn't an adequate solution, as there will be
       cases (as with RandomSample, detailed in bug 8247) where you have to
       store the RNG.  Storing e.g. a pointer or reference would be unsafe; the
       only adequate solution is that PRNGs be (safe) reference types.

monarch_dodra did some work on this which was set aside in the short term 
because it would (unavoidably) be a breaking change [ 
https://github.com/D-Programming-Language/phobos/pull/893 ].  To avoid this,
the 
proposed solution is to create a std.random2.

However, these issues seem to me to have broader implications for the design of 
random-number functionality in Phobos, so if we're going to do a std.random2, 
it's worth mapping them out so as to get them right.

The most obvious (to me) is that these issues which apply to PRNGs apply
equally 
well to random number distributions.  For example the Ziggurat algorithm 
requires storing several hundred constants, so passing by value is expensive; 
and several different algorithms generate and store multiple random variates at 
a time, so copying/passing by value will result in unintended correlations in 
sequences of variates.

This has further implications if for example we want to create a 
VariateGenerator (or perhaps in D-ish terms, a VariateRange) which couples a 
random distribution with a PRNG -- this is unlikely to work unless both the 
random distribution and the PRNG are reference types.

Finally, there are more general issues about how new functionality should be 
implemented.  C++11 is given as a model in the std.random documentation, but 
this is clearly a guide rather than something to copy blindly -- existing 
functions and structs already deviate from it in ways that reflect D's 
preference for ranges and its superior generics.  We need a clear picture of
how 
to do this for functionality that has not yet been implemented.

For example: in the case of random distributions, the current example of 
uniform() offers only a function interface and therefore little guidance about 
how to create struct implementations for more complex algorithms which require 
persistent storage (e.g. Ziggurat or Box-Muller).  Should they follow 
C++11/Boost.Random in returning variates via opCall, or should they be coupled 
with a PRNG at construction time (as with RandomSample) and implement a range
of 
variates?

My inclination here is to take some time to map out the different 
interface/design options and to present the choices to the community for review 
as a precursor to creating a std.random2.  It seems the only really sensible 
choice to ensure that we get a good future-proof design.

What does everyone think?

Best wishes,

     -- Joe
Jan 08 2013
parent reply "ixid" <nuaccount gmail.com> writes:
On Tuesday, 8 January 2013 at 17:52:24 UTC, Joseph Rushton 
Wakeling wrote:
 Hello all,

 Following discussion on the pull request adding normal random 
 number generation to std.random [ 
 https://github.com/D-Programming-Language/phobos/pull/1029 ], 
 some issues were raised which are best discussed with the whole 
 community.

 The heart of this is the design of pseudo-random number 
 generators (PRNGs) in Phobos.  Currently these are implemented 
 as value types, which has a number of unpleasant effects:

    -- They're expensive to pass around (some PRNGs have a size 
 of a MB or more)

    -- Passing by value is statistically unsafe, as it can 
 result in identical
       random sequences being generated in different parts of 
 the code.  This
       already affects at least one part of std.random itself: 
 see,
       http://d.puremagic.com/issues/show_bug.cgi?id=8247
       
 http://forum.dlang.org/thread/oiczxzkzxketxitncghl forum.dlang.org

    -- Simply passing by reference isn't an adequate solution, 
 as there will be
       cases (as with RandomSample, detailed in bug 8247) where 
 you have to
       store the RNG.  Storing e.g. a pointer or reference would 
 be unsafe; the
       only adequate solution is that PRNGs be (safe) reference 
 types.

 monarch_dodra did some work on this which was set aside in the 
 short term because it would (unavoidably) be a breaking change 
 [ https://github.com/D-Programming-Language/phobos/pull/893 ].  
 To avoid this, the proposed solution is to create a std.random2.

 However, these issues seem to me to have broader implications 
 for the design of random-number functionality in Phobos, so if 
 we're going to do a std.random2, it's worth mapping them out so 
 as to get them right.

 The most obvious (to me) is that these issues which apply to 
 PRNGs apply equally well to random number distributions.  For 
 example the Ziggurat algorithm requires storing several hundred 
 constants, so passing by value is expensive; and several 
 different algorithms generate and store multiple random 
 variates at a time, so copying/passing by value will result in 
 unintended correlations in sequences of variates.

 This has further implications if for example we want to create 
 a VariateGenerator (or perhaps in D-ish terms, a VariateRange) 
 which couples a random distribution with a PRNG -- this is 
 unlikely to work unless both the random distribution and the 
 PRNG are reference types.

 Finally, there are more general issues about how new 
 functionality should be implemented.  C++11 is given as a model 
 in the std.random documentation, but this is clearly a guide 
 rather than something to copy blindly -- existing functions and 
 structs already deviate from it in ways that reflect D's 
 preference for ranges and its superior generics.  We need a 
 clear picture of how to do this for functionality that has not 
 yet been implemented.

 For example: in the case of random distributions, the current 
 example of uniform() offers only a function interface and 
 therefore little guidance about how to create struct 
 implementations for more complex algorithms which require 
 persistent storage (e.g. Ziggurat or Box-Muller).  Should they 
 follow C++11/Boost.Random in returning variates via opCall, or 
 should they be coupled with a PRNG at construction time (as 
 with RandomSample) and implement a range of variates?

 My inclination here is to take some time to map out the 
 different interface/design options and to present the choices 
 to the community for review as a precursor to creating a 
 std.random2.  It seems the only really sensible choice to 
 ensure that we get a good future-proof design.

 What does everyone think?

 Best wishes,

     -- Joe
I imagine there has been some detailed discussion of the std.nameX idea of libraries so forgive me if this has been discussed. Using this as an approach to essentially replacing libraries instead of the depreciation route would seem to potentially lead to a situation where you have std.name1, 2, 3, 4 ad infinitum which may not have exactly the same feature set or use cases and would lead to people hunting around multiple libraries which on the face of it are for the same thing and using them in projects. Would it not be better to bite the bullet and depreciate? random2 simply sounds like random done properly, with random and random2 people would often use random, and so still suffer the consequences of its issues, and possibly overlook random2. You're depreciating because it's broken, not because people want to mess around with trivialities.
Jan 08 2013
next sibling parent Joseph Rushton Wakeling <joseph.wakeling webdrake.net> writes:
On 01/08/2013 07:38 PM, ixid wrote:
 I imagine there has been some detailed discussion of the std.nameX idea of
 libraries so forgive me if this has been discussed.
I appreciate your concern on this point, but I don't think it's the right thing to focus on in this specific discussion. What I really want address is: how do we get the design of std.random _right_? How we go about incorporating that new design into Phobos with minimal hassle for users is a different issue and one we can face when the time comes.
Jan 08 2013
prev sibling parent reply "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Tue, Jan 08, 2013 at 07:46:46PM +0100, Joseph Rushton Wakeling wrote:
 On 01/08/2013 07:38 PM, ixid wrote:
I imagine there has been some detailed discussion of the std.nameX
idea of libraries so forgive me if this has been discussed.
I appreciate your concern on this point, but I don't think it's the right thing to focus on in this specific discussion. What I really want address is: how do we get the design of std.random _right_? How we go about incorporating that new design into Phobos with minimal hassle for users is a different issue and one we can face when the time comes.
For one thing, I'd say definitely make RNGs have reference semantics. Passing by value just doesn't make sense; for large generators it consumes too much stack space and risks stack overflow, and in any case copying RNGs unintentionally causes duplicated sequences, which is very bad. For those cases where you *want* to copy an RNG, it should be made into a forward range and then you use .save explicitly. I wonder if it even makes sense to force RNGs to inherit from a common base class, so that reference semantics are enforced even for user-defined types. But this may be a bit too heavy-handed (and it will alienate the no-GC crowd). T -- The peace of mind---from knowing that viruses which exploit Microsoft system vulnerabilities cannot touch Linux---is priceless. -- Frustrated system administrator.
Jan 08 2013
next sibling parent reply "monarch_dodra" <monarchdodra gmail.com> writes:
On Tuesday, 8 January 2013 at 20:40:00 UTC, H. S. Teoh wrote:
 On Tue, Jan 08, 2013 at 07:46:46PM +0100, Joseph Rushton 
 Wakeling wrote:
 On 01/08/2013 07:38 PM, ixid wrote:
I imagine there has been some detailed discussion of the 
std.nameX
idea of libraries so forgive me if this has been discussed.
I appreciate your concern on this point, but I don't think it's the right thing to focus on in this specific discussion. What I really want address is: how do we get the design of std.random _right_? How we go about incorporating that new design into Phobos with minimal hassle for users is a different issue and one we can face when the time comes.
For one thing, I'd say definitely make RNGs have reference semantics. Passing by value just doesn't make sense; for large generators it consumes too much stack space and risks stack overflow, and in any case copying RNGs unintentionally causes duplicated sequences, which is very bad. For those cases where you *want* to copy an RNG, it should be made into a forward range and then you use .save explicitly.
I'd argue PRNG's should even be forward ranges. For one, the actual saving would be duplicating the PRNG payload itself (very costly). Further more, it would mean the danger of duplicate sequence still exist: a lot of algorithms, such a "fill", start by saving their input, and then iterating on the copy... I'm all for them having "dup" though. Worst case scenario, I had written an adaptor that turns a InputRangePRNG into a ForwardRangePRNG, but this is at the *explicit* request of the *user*, and not by any underlying algorithm. And even then, I'd *still* advise against using this adaptor. If you *must* store and iterate on random values several time, just copy them in an array doing prng.take(N).array(). Clean, efficient, safe. jmdavis was against making them input ranges, saying input ranges just aren't useful compared to a forward range. I think that: 1. A PRNG naturally models input, and not forward. 2. Algorithms that operate on actual random sequenes shouldn't rely on the PRNG to be saveable anyways: A *true* RNG simply cannot provide save.
 I wonder if it even makes sense to force RNGs to inherit from a 
 common
 base class, so that reference semantics are enforced even for
 user-defined types. But this may be a bit too heavy-handed (and 
 it will
 alienate the no-GC crowd).
Force: No. However, we *can* provide tools to help with the operation though. Since the recommended implementation of a PRNG is "pointer to payload", I had written an adapter template which took a "Stack payload implementation" and turned it into a reference semantic type. This meant I pretty much transformed all our PRNGs into reference ranges, without even touching any of their implementations. If a user doesn't want it, fine.
Jan 08 2013
parent Joseph Rushton Wakeling <joseph.wakeling webdrake.net> writes:
On 01/08/2013 10:05 PM, monarch_dodra wrote:
 I'd argue PRNG's should even be forward ranges. For one, the actual saving
would
 be duplicating the PRNG payload itself (very costly). Further more, it would
 mean the danger of duplicate sequence still exist: a lot of algorithms, such a
 "fill", start by saving their input, and then iterating on the copy...
That's a good point; I'd not really thought about the implications of .save in that respect.
 I'm all for them having "dup" though.
Got to be able to have the option to save the random state to stop the player reloading to get a different random outcome ... :-)
  jmdavis was against making them input ranges, saying input ranges just aren't
 useful compared to a forward range. I think that:
 1. A PRNG naturally models input, and not forward.
 2. Algorithms that operate on actual random sequenes shouldn't rely on the PRNG
 to be saveable anyways: A *true* RNG simply cannot provide save.
I would say that it's not useful if a PRNG makes it easy for you to shoot yourself in the foot with flawed statistics. Now, that said, the problem with something like a take(N) approach is that some algorithms might not take a predictable number of random variates. RandomSample is a good example of that, although fortunately in this case the ForwardRange characteristics are not required.
Jan 08 2013
prev sibling parent reply Dmitry Olshansky <dmitry.olsh gmail.com> writes:
09-Jan-2013 00:38, H. S. Teoh пишет:
 On Tue, Jan 08, 2013 at 07:46:46PM +0100, Joseph Rushton Wakeling wrote:
 On 01/08/2013 07:38 PM, ixid wrote:
 I imagine there has been some detailed discussion of the std.nameX
 idea of libraries so forgive me if this has been discussed.
I appreciate your concern on this point, but I don't think it's the right thing to focus on in this specific discussion. What I really want address is: how do we get the design of std.random _right_? How we go about incorporating that new design into Phobos with minimal hassle for users is a different issue and one we can face when the time comes.
For one thing, I'd say definitely make RNGs have reference semantics. Passing by value just doesn't make sense; for large generators it consumes too much stack space and risks stack overflow, and in any case copying RNGs unintentionally causes duplicated sequences, which is very bad. For those cases where you *want* to copy an RNG, it should be made into a forward range and then you use .save explicitly. I wonder if it even makes sense to force RNGs to inherit from a common base class, so that reference semantics are enforced even for user-defined types. But this may be a bit too heavy-handed (and it will alienate the no-GC crowd).
I'd push a well working (from my POV) design of polymorphic adapters. The idea is that wrapping is doable but buing the speed back is unsolved problem. BTW there is one for input ranges - InputRangeObject. Then each generator/distribution defines specific structs and there is one templated polymorphic wrapper that has a common base class. IMO this gives the best of both worlds (this is how std.digest was designed) at no implementation cost. The no-GC, performance and "just give me this one little PRGN" needs are served with specific structs. Then polymorphic behavior with hot-swappable PRNG, "I'm coming from Java/Ruby/..." etc. needs are served with a wrapper + base class or interface. It may even give a per struct aliases of the respective wrapper to be more user-friendly. -- Dmitry Olshansky
Jan 08 2013
parent Fawzi Mohamed <fawzi gmx.ch> writes:
On 01/08/13 22:23, Dmitry Olshansky wrote:
 09-Jan-2013 00:38, H. S. Teoh пишет:
 On Tue, Jan 08, 2013 at 07:46:46PM +0100, Joseph Rushton Wakeling wrote:
 On 01/08/2013 07:38 PM, ixid wrote:
 I imagine there has been some detailed discussion of the std.nameX
 idea of libraries so forgive me if this has been discussed.
I appreciate your concern on this point, but I don't think it's the right thing to focus on in this specific discussion. What I really want address is: how do we get the design of std.random _right_? How we go about incorporating that new design into Phobos with minimal hassle for users is a different issue and one we can face when the time comes.
For one thing, I'd say definitely make RNGs have reference semantics. Passing by value just doesn't make sense; for large generators it consumes too much stack space and risks stack overflow, and in any case copying RNGs unintentionally causes duplicated sequences, which is very bad. For those cases where you *want* to copy an RNG, it should be made into a forward range and then you use .save explicitly. I wonder if it even makes sense to force RNGs to inherit from a common base class, so that reference semantics are enforced even for user-defined types. But this may be a bit too heavy-handed (and it will alienate the no-GC crowd).
I'd push a well working (from my POV) design of polymorphic adapters. The idea is that wrapping is doable but buing the speed back is unsolved problem. BTW there is one for input ranges - InputRangeObject. Then each generator/distribution defines specific structs and there is one templated polymorphic wrapper that has a common base class. IMO this gives the best of both worlds (this is how std.digest was designed) at no implementation cost. The no-GC, performance and "just give me this one little PRGN" needs are served with specific structs. Then polymorphic behavior with hot-swappable PRNG, "I'm coming from Java/Ruby/..." etc. needs are served with a wrapper + base class or interface. It may even give a per struct aliases of the respective wrapper to be more user-friendly.
I think that is the correct approach, and was what I wanted to write yestarday, but I had no internet... a good idea is bound to crop up again :).

Jan 09 2013