digitalmars.D - Normal/Gaussian random number generation for D
- Joseph Rushton Wakeling (62/62) Oct 22 2012 Hello all,
- bearophile (12/19) Oct 23 2012 Joseph Rushton Wakeling:
- Joseph Rushton Wakeling (11/17) Oct 23 2012 Most of the existing std.random is based around the planned C++11 random...
- jerro (13/18) Oct 23 2012 I have an implementation of the Ziggurat algorithm at
- Joseph Rushton Wakeling (11/23) Oct 23 2012 The caveat here is that the Ziggurat algorithm is both fast and AFAIK op...
- Joseph Rushton Wakeling (44/48) Oct 24 2012 OK, so I had a reasonable (not totally in-depth!) look through. Looks g...
- jerro (22/27) Oct 24 2012 The first change I made to the algorithm is shifting layer
- jerro (25/77) Oct 24 2012 I already have a NormalDist struct at
- Joseph Rushton Wakeling (30/49) Oct 25 2012 I'm not expert on this, but there are various tests described in this pa...
- jerro (37/70) Oct 26 2012 I don't see any downside to this.
- Joseph Rushton Wakeling (41/67) Nov 06 2012 Sorry for delayed response, last week was an unfortunate mix of getting ...
- jerro (67/136) Nov 06 2012 No, I think that if the engine defines initialize() function
- Joseph Rushton Wakeling (5/16) Nov 06 2012 I guess I was concerned about the possibility of a sequence of normal ra...
- Joseph Rushton Wakeling (2/3) Nov 06 2012 ... to initialize in the constructor, that is.
- Joseph Rushton Wakeling (20/31) Nov 11 2012 What's wrong with such initialization being in the constructor of the re...
- jerro (7/28) Nov 11 2012 I agree that those helper functions are not very important, but
- Joseph Rushton Wakeling (6/11) Nov 11 2012 Aaack, I'd completely forgotten that. OK, so we can go with the initial...
- Joseph Rushton Wakeling (6/6) Nov 20 2012 Just to update everyone, Jerro and I have been knocking around a few dis...
- jerro (5/11) Nov 20 2012 I agree, it's a good idea to ask the community for feedback. I've
- Joseph Rushton Wakeling (21/23) Nov 20 2012 I replied in the GitHub thread before reading this email! :-P
- jerro (5/12) Nov 20 2012 By functionality, do you mean just the API or also the
- Joseph Rushton Wakeling (4/7) Nov 21 2012 A bit of both, but mainly the API. I'm interested in how obvious it is ...
Hello all, A request for feedback on the design of this code before I put it through some serious testing and submit a formal pull request to Phobos' std.random. The goal here is to provide an interface that is close to the Boost/C++11 design while allowing for some extra interesting possibilities. In particular the goal is to allow the possibility of different underlying generation algorithms or "engines" that can serve different requirements regarding speed, precision, memory consumption etc. See: http://www.cse.cuhk.edu.hk/%7Ephwl/mt/public/archives/papers/grng_acmcs07.pdf ... for further details. The present code contains both a function and struct interface for normal random number generation; the latter stores the parameters of the distribution (mean and standard deviation) and an instance of the underlying engine, but not the random number generator. (An alternative design choice for the struct would have been to design it as a range with front/popFront, but for this to work I'd have had to also have it store the RNG, and this would run into the same problems as RandomSample has due to RNGs not being reference types. Boost/C++11 also do not store the RNG internally but implement a variate_generator class which allows the coupling of a random-number distribution and an underlying generator; this is the approach I'd take for defining say a RandomVariate range.) I've so far implemented one underlying engine, the Box-Muller method currently used in Boost. This is not the best engine possible -- an implementation of the Ziggurat algorithm is desirable -- but it represents what is typically used in other libraries. The essence of Box-Muller is that it generates (uniformly-distributed) numbers a pair at a time and then uses those to generate a corresponding pair of normally-distributed numbers. The method is described here: https://en.wikipedia.org/wiki/Box-Muller_transform I've stuck very closely to the implementation in Boost, to the point of reproducing a "trick" Boost uses to get round the fact that its uniform random number generator only supports the interval [a, b) whereas the canonical description of Box-Muller requires the random numbers to be distributed in the interval (0, 1]. Obviously in D this is avoidable, but I felt there was some advantage in reproducing identical results to Boost, at least within the limits of floating-point numerical rounding (and maybe also because D's PI variable goes to a greater number of decimal places). From a licensing point of view this close reproduction isn't an issue, as Boost and Phobos both use the same licence, but there'll be a credit to the original Boost authors in any pull request. However, if there's a desire to differentiate the code more strongly, that can be arranged :-) I've attached a small C++ code example to demonstrate the common output for the D version and that of Boost. Anyway, what I'd really appreciate feedback on is the interface design for these functions and structs. A few concrete questions: (1) Does the order of template arguments seem optimal? My inclination is perhaps to tweak it so that the Uniform RNG type goes first, the underlying engine second, and the input/output type T goes last, as these are most likely the order in which people will want to change things. (2) Is there a desire to have a separate input and output type? Alternatively, should I perhaps treat all input and internal processing as using the real type and just use T for the output? (3) Given the default template arguments provided, is there any way to ensure that an instance of the Normal struct can be implemented like this: auto normal01 = Normal(0, 1); rather than having to do this: auto normal01 = Normal!()(0, 1); Yes, I could create a function that returns an instance, but I'm not sure what to call it given that I want the normal() function to serve a similar purpose as uniform() ... :-) Thanks in advance for any and all feedback. Best wishes, -- Joe
Oct 22 2012
Joseph Rushton Wakeling: Thank you for the work on this. A normals generator is quite needed in Phobos.I've so far implemented one underlying engine, the Box-Muller method currently used in Boost. This is not the best engine possible -- an implementation of the Ziggurat algorithm is desirable -- but it represents what is typically used in other libraries.Using the Boost API is useful. And using D-specific features/syntax is equally good. Using the Ziggurat algorithm is desired, otherwise people will have to implement it outside Phobos, because it's better if you have to generate many normal distributed values (and sometimes I need many of them). is it possible to have both algorithms? Maybe with a template argument? Bye, bearophile
Oct 23 2012
On 10/23/2012 03:50 PM, bearophile wrote:Using the Boost API is useful. And using D-specific features/syntax is equally good.Most of the existing std.random is based around the planned C++11 random API, which in turn derives from boost. There are several ways in which D seems able to do things more elegantly, however ... :-)Using the Ziggurat algorithm is desired, otherwise people will have to implement it outside Phobos, because it's better if you have to generate many normal distributed values (and sometimes I need many of them). is it possible to have both algorithms? Maybe with a template argument?Yes, that's exactly the plan. You see how there is an "engine" template argument (NormalRandomNumberEngine) which is currently by default Box-Muller, but could be Ziggurat or others (Marsaglia etc.) Other engines will be added later, Box-Muller is there as the first and easiest to implement. The idea of a NormalRandomNumberEngine is that it is a struct which returns via opCall a number drawn from the distribution N(0, 1). This is trivial to transform into N(mean, sigma).
Oct 23 2012
Using the Ziggurat algorithm is desired, otherwise people will have to implement it outside Phobos, because it's better if you have to generate many normal distributed values (and sometimes I need many of them). is it possible to have both algorithms? Maybe with a template argument?I have an implementation of the Ziggurat algorithm at https://github.com/jerro/phobos/blob/master/std/random.d#L2035. It modified the Ziggurat algorithm a bit, so that it doesn't need as many layers to work well, which reduces the memory consumption and makes initialization faster. The cauchy, normal and exponential functions currently use global tables, but the zigguratAlgorithm function can also be used to implement a struct or class based API. Joseph mentions having different engines for generating normally distributed random numbers, so maybe I could write an engine for his API based on what I already have. If we had multiple engines, I don't think Ziggurat algorithm should be the default, though, because it requires relatively expensive initialization.
Oct 23 2012
On 10/23/2012 04:36 PM, jerro wrote:I have an implementation of the Ziggurat algorithm at https://github.com/jerro/phobos/blob/master/std/random.d#L2035. It modified the Ziggurat algorithm a bit, so that it doesn't need as many layers to work well, which reduces the memory consumption and makes initialization faster. The cauchy, normal and exponential functions currently use global tables, but the zigguratAlgorithm function can also be used to implement a struct or class based API.Oh, nice! I will take a look at this in the next days.Joseph mentions having different engines for generating normally distributed random numbers, so maybe I could write an engine for his API based on what I already have. If we had multiple engines, I don't think Ziggurat algorithm should be the default, though, because it requires relatively expensive initialization.The caveat here is that the Ziggurat algorithm is both fast and AFAIK optimal from the statistical point of view -- yes, it's costly space-wise, but it's also safe. The Polar method might be a nice alternative that's faster than Box-Muller, that has good statistical properties, and that requires much less space. (Box-Muller actually isn't really that good statistically.) The choice probably needs to depend on the typical number of normal variates we think people are going to want to be generating. The way my code is set up you should only need to allocate one engine per thread, so it seems like a once-off hit that's trivial in the context of any moderately sized simulation.
Oct 23 2012
On 10/23/2012 04:36 PM, jerro wrote:I have an implementation of the Ziggurat algorithm at https://github.com/jerro/phobos/blob/master/std/random.d#L2035.OK, so I had a reasonable (not totally in-depth!) look through. Looks good! And of course reminds of the other benefit of Ziggurat, that it can be used for multiple different random number distributions. It looks like it should be readily possible to integrate the core Ziggurat functionality and to convert the normal() function in your code to be a NormalRandomNumberEngine struct -- so assuming you like my own architecture proposal, let's do this (I'm happy to hear suggestions for alternative designs, as I don't feel particularly confident in my API-designing abilities). For the other distributions, my feeling is that in some cases there's a value in also having this "engine" approach, e.g. for exponentially-distributed numbers one could use Ziggurat or one could use the approach T u = uniform!("[)", T, T, UniformRandomNumberGenerator)(0.0, 1.0, urng); return -log(1 - u)/lambda; ... which is not as fast but has a much lower memory footprint.It modified the Ziggurat algorithm a bit, so that it doesn't need as many layers to work well, which reduces the memory consumption and makes initialization faster.Can you expand on this, and maybe provide a reference? I don't doubt your code's effectiveness but I think where RNGs are concerned we really need to be able to justify our algorithmic choices. There's too much literature out there showing how commonly-used algorithms actually carry statistical flaws. Bigger picture on my approach to non-uniform random number distributions. The goal is to have the following: * Where useful, it should be possible to define and use multiple different internal "engines" for generating random numbers from the given distribution * For each distribution, there should be a function interface and a struct interface. * The struct implementation should store the distribution parameters and an instance of the internal engine (if any). * The function implementation should have 2 versions: one which allows the user to pass an engine of choice as input, one which contains a static instance of the specified engine (hence, thread-safe, distinguished according to both engine type and underlying uniform RNG type). * ... unless there's no call for distinct underlying engines, in which case the function version just takes parameters and uniform RNG :-) * The struct version should be useful to couple with an RNG instance to create an arbitrary random-number range à la Boost's variate_generator class. So, a nice way to expand on our respective approaches might be to incorporate your Ziggurat, adapt it to my Normal engine API, and for me to write a similar setup for exponentially-distributed random numbers that uses the simple approach above and can also use your Ziggurat implementation. What do you think? Best wishes, -- Joe
Oct 24 2012
Can you expand on this, and maybe provide a reference? I don't doubt your code's effectiveness but I think where RNGs are concerned we really need to be able to justify our algorithmic choices. There's too much literature out there showing how commonly-used algorithms actually carry statistical flaws.The first change I made to the algorithm is shifting layer boundaries. Basic Ziggurat algorithm uses n layers with area A. Because computing samples from the bottom layer and the top layer is typically more expensive than computing samples from other layers, I shifted the layer boundaries so that the top and the bottom layers have areas A / 2 and other n - 1 layers have area A. That way we only need to draw half as many samples from the top and the bottom layer. To better describe the second change, I drew a picture and uploaded it to http://i.imgur.com/bDRpP.png . The basic Ziggurat algorithm first chooses the layer randomly and then chooses a random number x between 0 and xavg. If x is below xmin, it returns it. The algorithm I use does this too. When x is above xmin, the basic Ziggurat algorithm chooses a random point in the outer layer area (see the picture), checks if y < f(y), returns its x coordinate if it is, and chooses a new point otherwise. My algorithm uses precomputed low and high x offsets for each layer. It chooses a point below the high diagonal, and checks if it is below the low diagonal. If it is (and it is in most cases), it returns its x coordinate without computing f(x). If only checks if y < f(x) when y is above the low diagonal. That way it doesn't need to make as many calls to f(x).
Oct 24 2012
It looks like it should be readily possible to integrate the core Ziggurat functionality and to convert the normal() function in your code to be a NormalRandomNumberEngine structI already have a NormalDist struct at https://github.com/jerro/phobos/blob/new-api/std/random.d#L2363 - converting that should be trivial.For the other distributions, my feeling is that in some cases there's a value in also having this "engine" approach, e.g. for exponentially-distributed numbers one could use Ziggurat or one could use the approach T u = uniform!("[)", T, T, UniformRandomNumberGenerator)(0.0, 1.0, urng); return -log(1 - u)/lambda; ... which is not as fast but has a much lower memory footprint.I agree that the it's a good thing to design the API so that we can use different enginesCan you expand on this, and maybe provide a reference? I don't doubt your code's effectiveness but I think where RNGs are concerned we really need to be able to justify our algorithmic choices. There's too much literature out there showing how commonly-used algorithms actually carry statistical flaws.I have only tested a distribution of the output samples, which seems to be correct. I agree that we should try to avoid statistical flaws as much as possible. Do you know of any good tests for nonuniform random number generators?Bigger picture on my approach to non-uniform random number distributions. The goal is to have the following: * Where useful, it should be possible to define and use multiple different internal "engines" for generating random numbers from the given distribution * For each distribution, there should be a function interface and a struct interface. * The struct implementation should store the distribution parameters and an instance of the internal engine (if any). * The function implementation should have 2 versions: one which allows the user to pass an engine of choice as input, one which contains a static instance of the specified engine (hence, thread-safe, distinguished according to both engine type and underlying uniform RNG type).* ... unless there's no call for distinct underlying engines, in which case the function version just takes parameters and uniform RNG :-) * The struct version should be useful to couple with an RNG instance to create an arbitrary random-number range à la Boost's variate_generator class.Seems OK to me. Maybe the function that uses a static engine should have one version that takes Rng as a parameter and one that uses rndGen, similar to how uniform() works now? How would the static engines be initialized? They could be initialized on first use, but this would slow sample generation a bit. Another options is to initialize them in the static constructor. I think it would be best to make the engine a static field of a struct template, and initialize it in the structs static constructor (similar to what my ZigguratTable struct does). That way you don't need to check if the engine is initialized every time you generate a sample and the engine isn't initialized unless the function template that uses it is instantiated.So, a nice way to expand on our respective approaches might be to incorporate your Ziggurat, adapt it to my Normal engine API, and for me to write a similar setup for exponentially-distributed random numbers that uses the simple approach above and can also use your Ziggurat implementation. What do you think?I think we should do that. Do you already have some code somewhere where I can see it? I need to know what API the NormalRandomNumberEngine struct should have.
Oct 24 2012
On 10/24/2012 11:23 PM, jerro wrote:I already have a NormalDist struct at https://github.com/jerro/phobos/blob/new-api/std/random.d#L2363 - converting that should be trivial.I'll have a go at this some time in the next days. :-)I have only tested a distribution of the output samples, which seems to be correct. I agree that we should try to avoid statistical flaws as much as possible. Do you know of any good tests for nonuniform random number generators?I'm not expert on this, but there are various tests described in this paper which compares different normal random number generation techniques: http://www.cse.cuhk.edu.hk/%7Ephwl/mt/public/archives/papers/grng_acmcs07.pdf ... and the topic is also discussed in this paper: http://www.doornik.com/research/ziggurat.pdf I don't have sufficient knowhow to really feel confident about how to test these things though, and part of me feels it might be worth at some point approaching some stats experts and asking them to help define a test suite for D's random number functionality.Seems OK to me. Maybe the function that uses a static engine should have one version that takes Rng as a parameter and one that uses rndGen, similar to how uniform() works now?Well, as I defined the function, the UniformRNG input has a default value of rndGen, so if you call it just as normal(mean, sigma); it should work as intended. But if there's some factor which means it would be better to define a separate function which doesn't receive the RNG as input, I can do that.How would the static engines be initialized? They could be initialized on first use, but this would slow sample generation a bit. Another options is to initialize them in the static constructor. I think it would be best to make the engine a static field of a struct template, and initialize it in the structs static constructor (similar to what my ZigguratTable struct does). That way you don't need to check if the engine is initialized every time you generate a sample and the engine isn't initialized unless the function template that uses it is instantiated.I think it probably depends on the details of the technique. E.g. with Box-Muller you cache 2 random values, and you need to re-generate them for every pair of normal random variates the user requests, so there's no problem with having it initialized in opCall on the first call to the engine. In general I favour that approach of initializing in opCall because it means that you don't have to ever pass anything to the constructor and also if I have a simulation which uses a lot of random numbers, its output won't be changed simply by creating an instance of a normal random number generator -- only when I start actually _using_ that generator in place of some other source of randomness.I think we should do that. Do you already have some code somewhere where I can see it? I need to know what API the NormalRandomNumberEngine struct should have.Did you get the code I attached to my original post on this topic? Is there something extra that you need? I'm going to turn that into patches for Phobos which I'll put up on GitHub in the next days, so we can pull and push and test/write together as needed. Best wishes, -- Joe
Oct 25 2012
I'm not expert on this, but there are various tests described in this paper which compares different normal random number generation techniques: http://www.cse.cuhk.edu.hk/%7Ephwl/mt/public/archives/papers/grng_acmcs07.pdfI'll implement the tests described in this paper.... and the topic is also discussed in this paper: http://www.doornik.com/research/ziggurat.pdfI don't have sufficient knowhow to really feel confident about how to test these things though, and part of me feels it might be worth at some point approaching some stats experts and asking them to help define a test suite for D's random number functionality.Maybe there are testsuites already written for this.Well, as I defined the function, the UniformRNG input has a default value of rndGen, so if you call it just as normal(mean, sigma); it should work as intended. But if there's some factor which means it would be better to define a separate function which doesn't receive the RNG as input, I can do that.I don't see any downside to this.I think it probably depends on the details of the technique. E.g. with Box-Muller you cache 2 random values, and you need to re-generate them for every pair of normal random variates the user requests, so there's no problem with having it initialized in opCall on the first call to the engine. In general I favour that approach of initializing in opCall because it means that you don't have to ever pass anything to the constructor and also if I have a simulation which uses a lot of random numbers, its output won't be changed simply by creating an instance of a normal random number generator -- only when I start actually _using_ that generator in place of some other source of randomness.I have only been thinking about the Ziggurat algorithm, but you are right, it does depend on the details of the technique. For Box-Muller (and other engines that cache samples) it only makes sense to compute the first samples in the opCall. But for the Ziggurat algorithm, tables that must be computed before you can start sampling aren't changed during sampling and computing the tables doesn't require any additional arguments. So it makes the most sense for those tables to be initialized in the struct's constructor in the struct based API. In my previous post I was talking about initializing static instances of the engine used in the normal() function. The advantage of initializing in a static constructor is that you don't need an additional check every time the normal() function is called. But because we will also have a struct based API, that will not require such checks (at least not for all engines), this isn't really that important. So we can also initialize the global engine instance in a call to normal(), if this simplifies things.Did you get the code I attached to my original post on this topic? Is there something extra that you need?I didn't notice the attachments, sorry (but I see them now). It seems it will be easy enough to write NormalZigguratEngine, there's just one thing that I think should be added to the Normal struct. Engines should have a way to do their initialization in Normal's constructor (where possible), so I think something like this: static if(is(typeof(_engine.initialize()))) _engine.initialize(); should be added to Normal's constructor. Or maybe _engine = NormalRandomNumberEngine(); so that a static opCall can be used.I'm going to turn that into patches for Phobos which I'll put up on GitHub in the next days, so we can pull and push and test/write together as needed.Maybe we should also have a separate file in a separate branch for tests. There will probably be a lot of code needed to test this well and the tests could take a long time to run, so I don't think they should go into unit test blocks (we can put some simpler tests in unit test blocks later). It may also be useful to use external libraries such as dstats for tests.
Oct 26 2012
Sorry for delayed response, last week was an unfortunate mix of getting sick and then having a bunch of work-related stuff ... On 10/26/2012 11:03 PM, jerro wrote:Which "this" do you mean? My current approach, or the adding of an extra separate function? :-)Well, as I defined the function, the UniformRNG input has a default value of rndGen, so if you call it just as normal(mean, sigma); it should work as intended. But if there's some factor which means it would be better to define a separate function which doesn't receive the RNG as input, I can do that.I don't see any downside to this.I have only been thinking about the Ziggurat algorithm, but you are right, it does depend on the details of the technique. For Box-Muller (and other engines that cache samples) it only makes sense to compute the first samples in the opCall. But for the Ziggurat algorithm, tables that must be computed before you can start sampling aren't changed during sampling and computing the tables doesn't require any additional arguments. So it makes the most sense for those tables to be initialized in the struct's constructor in the struct based API.So we should assume by default then that the struct's constructor should take an RNG as input, to enable it to calculate these first values if it needs to?In my previous post I was talking about initializing static instances of the engine used in the normal() function. The advantage of initializing in a static constructor is that you don't need an additional check every time the normal() function is called. But because we will also have a struct based API, that will not require such checks (at least not for all engines), this isn't really that important. So we can also initialize the global engine instance in a call to normal(), if this simplifies things.I guess my feeling here is that the values generated by an RNG should depend on when it is called, and not at all on when it is instantiated. i.e. if I do something like auto nrng = Normal!()(0, 1); writeln( uniform(0.0, 1.0) ); writeln( uniform(0.0, 1.0) ); writeln( nrng() ); writeln( nrng() ); I should get the same output as if I do, writeln( uniform(0.0, 1.0) ); writeln( uniform(0.0, 1.0) ); auto nrng = Normal!()(0, 1); writeln( nrng() ); writeln( nrng() ); You can also think that if I change from e.g. auto nrng = Normal!(real, Engine1)(0, 1); writeln( uniform(0.0, 1.0) ); writeln( uniform(0.0, 1.0) ); writeln( nrng() ); writeln( nrng() ); to auto nrng = Normal!(real, Engine2)(0, 1); writeln( uniform(0.0, 1.0) ); writeln( uniform(0.0, 1.0) ); writeln( nrng() ); writeln( nrng() ); ... then I would expect to see different results from the normal RNG but identical results from uniform(). If the constructor of the normal engine calls the RNG, the uniform() results will change, no?Yes, a random.d test suite probably should be another project. Regardless of tests, let's focus for now on getting the API right for this case of non-uniform-random-number-generator-with-internal-engine, with normal and exponential as the initial cases. We can always label some engines as "use at own risk" in the short term!I'm going to turn that into patches for Phobos which I'll put up on GitHub in the next days, so we can pull and push and test/write together as needed.Maybe we should also have a separate file in a separate branch for tests. There will probably be a lot of code needed to test this well and the tests could take a long time to run, so I don't think they should go into unit test blocks (we can put some simpler tests in unit test blocks later). It may also be useful to use external libraries such as dstats for tests.
Nov 06 2012
Your current approach.I don't see any downside to this.Which "this" do you mean? My current approach, or the adding of an extra separate function? :-)I have only been thinking about the Ziggurat algorithm, but you are right, it does depend on the details of the technique. For Box-Muller (and other engines that cache samples) it only makes sense to compute the first samples in the opCall. But for the Ziggurat algorithm, tables that must be computed before you can start sampling aren't changed during sampling and computing the tables doesn't require any additional arguments. So it makes the most sense for those tables to be initialized in the struct's constructor in the struct based API.So we should assume by default then that the struct's constructor should take an RNG as input, to enable it to calculate these first values if it needs to?No, I think that if the engine defines initialize() function (with no parameters), it should be called in the constructor of Normal. I don't think the constructor of Normal should take an RNG as input. I think what you currently do in normal.d is fine, I would just add something like this: static if(is(typeof(_engine.initialize()))) _engine.initialize(); to Normal's constructor. Then ZigguratEngine can define initialize() and do its initialization there (it doesn't need a uniform RNG for initialization) and NormalBoxMullerEngine can remain unchanged.In my previous post I was talking about initializing static instances of the engine used in the normal() function. The advantage of initializing in a static constructor is that you don't need an additional check every time the normal() function is called. But because we will also have a struct based API, that will not require such checks (at least not for all engines), this isn't really that important. So we can also initialize the global engine instance in a call to normal(), if this simplifies things.I guess my feeling here is that the values generated by an RNG should depend on when it is called, and not at all on when it is instantiated. i.e. if I do something like auto nrng = Normal!()(0, 1); writeln( uniform(0.0, 1.0) ); writeln( uniform(0.0, 1.0) ); writeln( nrng() ); writeln( nrng() ); I should get the same output as if I do, writeln( uniform(0.0, 1.0) ); writeln( uniform(0.0, 1.0) ); auto nrng = Normal!()(0, 1); writeln( nrng() ); writeln( nrng() ); You can also think that if I change from e.g. auto nrng = Normal!(real, Engine1)(0, 1); writeln( uniform(0.0, 1.0) ); writeln( uniform(0.0, 1.0) ); writeln( nrng() ); writeln( nrng() ); to auto nrng = Normal!(real, Engine2)(0, 1); writeln( uniform(0.0, 1.0) ); writeln( uniform(0.0, 1.0) ); writeln( nrng() ); writeln( nrng() ); ... then I would expect to see different results from the normal RNG but identical results from uniform(). If the constructor of the normal engine calls the RNG, the uniform() results will change, no?I was only talking about the part of initialization that doesn't use a RNG. I agree that everything that uses a RNG should be done in opCall (or inside a normal() function in the function interface). For Box-Muller, I think the approach you currently use in NormalBoxMullerEngine is the most reasonable one. But a Ziggurat engines needs to compute some tables before it can start generating samples. It doesn't need a RNG to do that and the tables do not change after initialization. I think it's obvious that that initialization that doesn't need a RNG should be done in Normal's constructor for the struct interface. What is not so obvious is where the initialization of static data that doesn't require a RNG should be done for function interface. That initialization can be done in normal() function, or it can be done in a static constructor. If you do it in normal(), you need to do an extra check on each call to normal(). This isn't really a problem as long as the struct's opCall and the version of normal() that takes the engine as a parameter don't do such redundant checks. Then the users that care about the difference in performance can just use one of these interfaces.Yes, a random.d test suite probably should be another project. Regardless of tests, let's focus for now on getting the API right for this case of non-uniform-random-number-generator-with-internal-engine, with normal and exponential as the initial cases.In general I like the API in file normal.d attached to your original post. I think the engines should have an option to do some initialization in Normal's constructor, though. We could achieve that by calling _engine.initialize in Normal's constructors, if such method exists. This method would also need to be called on the static instance of normal engine used in the normal() function. We could add something like this to the first version of normal: static if(is(typeof(engine.initialize()))) { static bool isInitialized; if(!isInitialized) engine.initialize(); } Another option would be to do this: struct GlobalEngine(Engine) { Engine instance; static this() { instance.initialize(); } } And then inside the version of normal that doesn't take an engine as a parameter: alias GlobalEngine!NormalRandomNumberEngine E; return normal(mean, sigma, E.instance, urng); The users would need to construct their own instance of engine to use the function that takes engine as a parameter. So it would make sense to add helper functions for creating engine instances. There's one change that I think would make the API more convenient. Normal struct and the engine don't store an instance of a RNG , so they don't need to take it as a template parameter. We could make opCall methods templates instead. That way the users would never need to explicitly specify the type of the RNG.
Nov 06 2012
On 11/06/2012 07:14 PM, jerro wrote:I was only talking about the part of initialization that doesn't use a RNG. I agree that everything that uses a RNG should be done in opCall (or inside a normal() function in the function interface). For Box-Muller, I think the approach you currently use in NormalBoxMullerEngine is the most reasonable one. But a Ziggurat engines needs to compute some tables before it can start generating samples. It doesn't need a RNG to do that and the tables do not change after initialization.Ahh, OK, clear. Then obviously your suggested approach is correct.There's one change that I think would make the API more convenient. Normal struct and the engine don't store an instance of a RNG , so they don't need to take it as a template parameter. We could make opCall methods templates instead. That way the users would never need to explicitly specify the type of the RNG.I guess I was concerned about the possibility of a sequence of normal random numbers generated by a mix of different RNGs, but to be honest, if someone really, really wants to do that, I'm not sure I should stop them. :-)
Nov 06 2012
On 11/06/2012 07:39 PM, Joseph Rushton Wakeling wrote:Ahh, OK, clear. Then obviously your suggested approach is correct.... to initialize in the constructor, that is.
Nov 06 2012
On 11/06/2012 07:14 PM, jerro wrote:In general I like the API in file normal.d attached to your original post. I think the engines should have an option to do some initialization in Normal's constructor, though. We could achieve that by calling _engine.initialize in Normal's constructors, if such method exists.What's wrong with such initialization being in the constructor of the relevant NormalEngine? I think that was your original idea, and I derailed it because of my misunderstanding of what you wanted to initialize.The users would need to construct their own instance of engine to use the function that takes engine as a parameter. So it would make sense to add helper functions for creating engine instances.In general the idea is that the engine should be something hidden; where you need to use it, you just need to pass the name as a template parameter; it should be rare that you really need to manually instantiate your own engine. But we can add such a helper function if you like. What I'm frustrated about is that as-is it's not possible to just have auto nrng = Normal(mean, sigma); ... but you have instead to write, auto nrng = Normal!()(mean, sigma); despite the fact that Normal has default template parameters given. So maybe a helper function normalRNG which returns an instance of Normal would also be helpful.There's one change that I think would make the API more convenient. Normal struct and the engine don't store an instance of a RNG , so they don't need to take it as a template parameter. We could make opCall methods templates instead. That way the users would never need to explicitly specify the type of the RNG.Good call. I've uploaded a tweaked version based on your comments here: https://github.com/WebDrake/phobos/tree/normal ... so feel free to pull, further revise and add in your Ziggurat implementation :-) I haven't yet written any of the helper functions you suggest, but will do so shortly based on your response to my above remarks.
Nov 11 2012
What's wrong with such initialization being in the constructor of the relevant NormalEngine? I think that was your original idea, and I derailed it because of my misunderstanding of what you wanted to initialize.The problem is that structs can't have constructors with no parameters.In general the idea is that the engine should be something hidden; where you need to use it, you just need to pass the name as a template parameter; it should be rare that you really need to manually instantiate your own engine. But we can add such a helper function if you like.I agree that those helper functions are not very important, but on the other hand I don't think adding them costs us anything. But maybe we should take care of the other stuff first.What I'm frustrated about is that as-is it's not possible to just have auto nrng = Normal(mean, sigma); ... but you have instead to write, auto nrng = Normal!()(mean, sigma); despite the fact that Normal has default template parameters given. So maybe a helper function normalRNG which returns an instance of Normal would also be helpful.Yes, I think it would be.I've uploaded a tweaked version based on your comments here: https://github.com/WebDrake/phobos/tree/normal ... so feel free to pull, further revise and add in your Ziggurat implementation :-)I'll add the Zigggurat implementation, probably tomorrow.
Nov 11 2012
On 11/11/2012 05:30 PM, jerro wrote:The problem is that structs can't have constructors with no parameters.Aaack, I'd completely forgotten that. OK, so we can go with the initialize function as you suggested. I suggest you add it in together with Ziggurat.I agree that those helper functions are not very important, but on the other hand I don't think adding them costs us anything. But maybe we should take care of the other stuff first.Well, they should be trivial enough to write, let's get Ziggurat in and working first and then add them.I'll add the Zigggurat implementation, probably tomorrow.Excellent! Will look forward to seeing that.
Nov 11 2012
Just to update everyone, Jerro and I have been knocking around a few discussions and code tweaks on GitHub: https://github.com/WebDrake/phobos/pull/1#issuecomment-10319570 If anyone wants to comment or give feedback, please feel free. It's surely useful that we get as many concerns covered and addressed before submitting an official Phobos pull request!
Nov 20 2012
On Tuesday, 20 November 2012 at 15:04:53 UTC, Joseph Rushton Wakeling wrote:Just to update everyone, Jerro and I have been knocking around a few discussions and code tweaks on GitHub: https://github.com/WebDrake/phobos/pull/1#issuecomment-10319570 If anyone wants to comment or give feedback, please feel free. It's surely useful that we get as many concerns covered and addressed before submitting an official Phobos pull request!I agree, it's a good idea to ask the community for feedback. I've replied to you in the github thread, but maybe it would be better to continue the discussion here?
Nov 20 2012
On 11/20/2012 07:00 PM, jerro wrote:I agree, it's a good idea to ask the community for feedback. I've replied to you in the github thread, but maybe it would be better to continue the discussion here?I replied in the GitHub thread before reading this email! :-P My feeling is that the core normal RNG code is probably fairly well defined, and won't need too many more tweaks. What I see lacking right now are unittests (so far the ones I've written have just been to test that the type inference rules work correctly) and possibly also some additional necessary type constraints (e.g. it would be good to have something like isNormalRandomNumberEngine, and I don't know if you want to add some extra type checks to some of your Ziggurat-related functions). I'd particularly appreciate it if community members could give the type checking a once-over to see if it's adequate/appropriate, whether there are any important checks missing and whether in some cases it's actually too strict (e.g. could we reasonably replace the isNumeric checks with isScalar?). As I wrote on GitHub, I also think it's worth considering if some of the helper functions/structs introduced in these patches could better be moved out to other parts of Phobos for general-purpose use, e.g. your isPowerOfTwo. Lastly, we need to make sure that this new functionality is well documented. Perhaps a good start here could be for people to take a look at the code and describe what isn't clear or easy to understand just from looking ... ? Best wishes, -- Joe
Nov 20 2012
isNormalRandomNumberEngine, and I don't know if you want to add some extra type checks to some of your Ziggurat-related functions).Yes, I need to add some checks. Currently, there aren't any :DLastly, we need to make sure that this new functionality is well documented. Perhaps a good start here could be for people to take a look at the code and describe what isn't clear or easy to understand just from looking ... ?By functionality, do you mean just the API or also the implementation? I assume my Ziggurat algorithm implementation isn't easy to understand without some kind of description. I'll need to write a ton of comments in there.
Nov 20 2012
On 11/21/2012 01:44 AM, jerro wrote:By functionality, do you mean just the API or also the implementation? I assume my Ziggurat algorithm implementation isn't easy to understand without some kind of description. I'll need to write a ton of comments in there.A bit of both, but mainly the API. I'm interested in how obvious it is how to _use_ it rather than how obvious the internals are (which is rare for any case, let alone something like Ziggurat).
Nov 21 2012