digitalmars.D - Faster uniform() in [0.0 - 1.0(
- bearophile (5/5) Nov 19 2010 Some kind of little D programs I write need a lot of random values, and ...
- tn (53/60) Nov 22 2010 I did some testing with different combinations of types and boundary typ...
- Fawzi Mohamed (10/80) Nov 22 2010 I suspect that the default random generator I have implemented (in
- tn (2/30) Nov 22 2010 After further investigation it seems that the slowdown happens because o...
- Don (4/36) Nov 22 2010 No, it just shouldn't convert (] into []. It should do [], and then
- bearophile (4/6) Nov 22 2010 If bits in double have the same probability then I think there is a much...
- Don (13/19) Nov 23 2010 Yes, but randomly generated bits doesn't give a uniform distribution.
- tn (5/11) Nov 23 2010 For uniform distribution different bit combinations should have differen...
- Fawzi Mohamed (12/34) Nov 23 2010 that is the reason I used a better generation algorithm in blip (and
- tn (4/47) Nov 23 2010 Just using 64 bit integers as source would be enough for almost(?) all c...
- Fawzi Mohamed (16/70) Nov 23 2010 Yes, I was thinking of porting my code to D2, but if someone else
Some kind of little D programs I write need a lot of random values, and tests have shown me that std.random.uniform is slow. So I have suggested to add a faster special case to generate a random double in [0.0, 1.0), see: http://d.puremagic.com/issues/show_bug.cgi?id=5240 Bye, bearophile
Nov 19 2010
bearophile Wrote:Some kind of little D programs I write need a lot of random values, and tests have shown me that std.random.uniform is slow. So I have suggested to add a faster special case to generate a random double in [0.0, 1.0), see: http://d.puremagic.com/issues/show_bug.cgi?id=5240 Bye, bearophileI did some testing with different combinations of types and boundary types. The problem noticed is a bit different to the one bearophile mentioned. Here is my test code: -------------------- import std.conv; import std.date; import std.random; import std.stdio; void test(T, string boundaries)() { void fun() { uniform!(boundaries, T, T)(cast(T)0, cast(T)1000); } writefln("%-8s %s %6d", to!string(typeid(T)), boundaries, benchmark!fun(10_000_000)[0]); } void testBoundaries(T)() { test!(T, "[]")(); test!(T, "[)")(); test!(T, "(]")(); test!(T, "()")(); writeln(); } void main() { testBoundaries!(int)(); testBoundaries!(long)(); testBoundaries!(float)(); testBoundaries!(double)(); testBoundaries!(real)(); } -------------------- And here are the results for 10 million calls of uniform (columns are: type, boundaries, elapsed time): -------------------- int [] 271 int [) 271 int (] 283 int () 285 long [] 372 long [) 399 long (] 401 long () 397 float [] 286 float [) 374 float (] 5252 float () 5691 double [] 348 double [) 573 double (] 5319 double () 5875 real [] 434 real [) 702 real (] 2832 real () 3056 -------------------- In my opinion floating point uniforms with (] or () as boundary types are unacceptably slow. I had to use 1 - uniform!"[)"(0.0, 1.0) instead of uniform!"(]"(0.0, 1.0) because of this issue. I would also expect versions using float and double to be faster than the version using real. -- tn
Nov 22 2010
On 22-nov-10, at 16:11, tn wrote:bearophile Wrote:I suspect that the default random generator I have implemented (in blip & tango) is faster than phobos one, I did not try to support all possibilities, with floats just () and high probability (), but possible boundary values due to rounding when using an non 0-1 range, but I took lot of care to initialize *all* bits uniformly. The problem you describe looks like a bug though, if done correctly one should just add an if or two to the [] implementation to get () with very high probability. FawziSome kind of little D programs I write need a lot of random values, and tests have shown me that std.random.uniform is slow. So I have suggested to add a faster special case to generate a random double in [0.0, 1.0), see: http://d.puremagic.com/issues/show_bug.cgi?id=5240 Bye, bearophileI did some testing with different combinations of types and boundary types. The problem noticed is a bit different to the one bearophile mentioned. Here is my test code: -------------------- import std.conv; import std.date; import std.random; import std.stdio; void test(T, string boundaries)() { void fun() { uniform!(boundaries, T, T)(cast(T)0, cast(T)1000); } writefln("%-8s %s %6d", to!string(typeid(T)), boundaries, benchmark!fun(10_000_000)[0]); } void testBoundaries(T)() { test!(T, "[]")(); test!(T, "[)")(); test!(T, "(]")(); test!(T, "()")(); writeln(); } void main() { testBoundaries!(int)(); testBoundaries!(long)(); testBoundaries!(float)(); testBoundaries!(double)(); testBoundaries!(real)(); } -------------------- And here are the results for 10 million calls of uniform (columns are: type, boundaries, elapsed time): -------------------- int [] 271 int [) 271 int (] 283 int () 285 long [] 372 long [) 399 long (] 401 long () 397 float [] 286 float [) 374 float (] 5252 float () 5691 double [] 348 double [) 573 double (] 5319 double () 5875 real [] 434 real [) 702 real (] 2832 real () 3056 -------------------- In my opinion floating point uniforms with (] or () as boundary types are unacceptably slow. I had to use 1 - uniform!"[)"(0.0, 1.0) instead of uniform!"(]"(0.0, 1.0) because of this issue. I would also expect versions using float and double to be faster than the version using real. -- tn
Nov 22 2010
tn Wrote:-------------------- int [] 271 int [) 271 int (] 283 int () 285 long [] 372 long [) 399 long (] 401 long () 397 float [] 286 float [) 374 float (] 5252 float () 5691 double [] 348 double [) 573 double (] 5319 double () 5875 real [] 434 real [) 702 real (] 2832 real () 3056 -------------------- In my opinion floating point uniforms with (] or () as boundary types are unacceptably slow. I had to use 1 - uniform!"[)"(0.0, 1.0) instead of uniform!"(]"(0.0, 1.0) because of this issue. I would also expect versions using float and double to be faster than the version using real.After further investigation it seems that the slowdown happens because of subnormal numbers in calculations. If there is an open boundary at zero then the call of nextafter in uniform returns a subnormal number. Perhaps next normal number could be used instead?
Nov 22 2010
tn wrote:tn Wrote:No, it just shouldn't convert (] into []. It should do [], and then check for an end point. Since the probability of actually generating a zero is 1e-4000, it shouldn't affect the speed at all <g>.-------------------- int [] 271 int [) 271 int (] 283 int () 285 long [] 372 long [) 399 long (] 401 long () 397 float [] 286 float [) 374 float (] 5252 float () 5691 double [] 348 double [) 573 double (] 5319 double () 5875 real [] 434 real [) 702 real (] 2832 real () 3056 -------------------- In my opinion floating point uniforms with (] or () as boundary types are unacceptably slow. I had to use 1 - uniform!"[)"(0.0, 1.0) instead of uniform!"(]"(0.0, 1.0) because of this issue. I would also expect versions using float and double to be faster than the version using real.After further investigation it seems that the slowdown happens because of subnormal numbers in calculations. If there is an open boundary at zero then the call of nextafter in uniform returns a subnormal number. Perhaps next normal number could be used instead?
Nov 22 2010
Don:Since the probability of actually generating a zero is 1e-4000, it shouldn't affect the speed at all <g>.If bits in double have the same probability then I think there is a much higher probability to hit a zero, about 1 in 2^^63, and I'm not counting NaNs (but it's low enough to not change the substance of what you have said). Bye, bearophile
Nov 22 2010
bearophile wrote:Don:Yes, but randomly generated bits doesn't give a uniform distribution. With a uniform distribution, there should be as much chance of getting [1-real.epsilon .. 1] as [0.. real.epsilon] But there are only two representable numbers in the first range, and approx 2^^70 in the second. Further, there are 2^^63 numbers in the range [0..real.min] which are all equally likely. So, if you want a straightforward uniform distribution, you're better off using [1..2) or [0.5..1) than [0..1), because every possible result is equally likely.Since the probability of actually generating a zero is 1e-4000, it shouldn't affect the speed at all <g>.If bits in double have the same probability then I think there is a much higher probability to hit a zero, about 1 in 2^^63, and I'm not counting NaNs (but it's low enough to not change the substance of what you have said).
Nov 23 2010
bearophile Wrote:Don:For uniform distribution different bit combinations should have different probabilities because floating point numbers have more representable values close to zero. So for doubles the probability should be about 1e-300 and for reals about 1e-4900. But because uniform by default seems to use a 32 bit integer random number generator, the probability is actually 2^^-32. And that is actually verified: I generated 10 * 2^^32 samples of uniform!"[]"(0.0, 1.0) and got 16 zeros which is close enough to expected 10. Of course 2^^-32 is still small enough to have no performance penalty in practise. -- tnSince the probability of actually generating a zero is 1e-4000, it shouldn't affect the speed at all <g>.If bits in double have the same probability then I think there is a much higher probability to hit a zero, about 1 in 2^^63, and I'm not counting NaNs (but it's low enough to not change the substance of what you have said).
Nov 23 2010
On 23-nov-10, at 10:20, tn wrote:bearophile Wrote:that is the reason I used a better generation algorithm in blip (and tango) that guarantees the correct distribution, at the cost of being slightly more costly, but then the basic generator is cheaper, and if one needs maximum speed one can even use a cheaper source (from the CMWC family) that still seems to pass all statistical tests. The way I use to generate uniform numbers was shown to be better (and detectably so) in the case of floats, when looking at the tails of normal and other distributions generated from uniform numbers. This is very relevant in some cases (for example is you are interested in the probability of catastrophic events). FawziDon:For uniform distribution different bit combinations should have different probabilities because floating point numbers have more representable values close to zero. So for doubles the probability should be about 1e-300 and for reals about 1e-4900. But because uniform by default seems to use a 32 bit integer random number generator, the probability is actually 2^^-32. And that is actually verified: I generated 10 * 2^^32 samples of uniform!"[]"(0.0, 1.0) and got 16 zeros which is close enough to expected 10. Of course 2^^-32 is still small enough to have no performance penalty in practise. -- tnSince the probability of actually generating a zero is 1e-4000, it shouldn't affect the speed at all <g>.If bits in double have the same probability then I think there is a much higher probability to hit a zero, about 1 in 2^^63, and I'm not counting NaNs (but it's low enough to not change the substance of what you have said).
Nov 23 2010
Fawzi Mohamed Wrote:On 23-nov-10, at 10:20, tn wrote:Similar method would probably be nice also in phobos if the speed is almost the same.bearophile Wrote:that is the reason I used a better generation algorithm in blip (and tango) that guarantees the correct distribution, at the cost of being slightly more costly, but then the basic generator is cheaper, and if one needs maximum speed one can even use a cheaper source (from the CMWC family) that still seems to pass all statistical tests.Don:For uniform distribution different bit combinations should have different probabilities because floating point numbers have more representable values close to zero. So for doubles the probability should be about 1e-300 and for reals about 1e-4900. But because uniform by default seems to use a 32 bit integer random number generator, the probability is actually 2^^-32. And that is actually verified: I generated 10 * 2^^32 samples of uniform!"[]"(0.0, 1.0) and got 16 zeros which is close enough to expected 10. Of course 2^^-32 is still small enough to have no performance penalty in practise. -- tnSince the probability of actually generating a zero is 1e-4000, it shouldn't affect the speed at all <g>.If bits in double have the same probability then I think there is a much higher probability to hit a zero, about 1 in 2^^63, and I'm not counting NaNs (but it's low enough to not change the substance of what you have said).The way I use to generate uniform numbers was shown to be better (and detectably so) in the case of floats, when looking at the tails of normal and other distributions generated from uniform numbers. This is very relevant in some cases (for example is you are interested in the probability of catastrophic events). FawziJust using 64 bit integers as source would be enough for almost(?) all cases. At the current speed it would take thousands of years for one modern computer to generate so much random numbers that better resolution was justifiable. (And if one wants to measure probability of rare enough events, one should use more advanced methods like importance sampling.) -- tn
Nov 23 2010
On 23-nov-10, at 13:12, tn wrote:Fawzi Mohamed Wrote:Yes, I was thinking of porting my code to D2, but if someone else wants to do it... please note that for double the speed will *not* be the same, because it always tries to guarantee that all bits of the mantissa are random, and with 52 or 63 bits this cannot be done with a single 32 bit random number.On 23-nov-10, at 10:20, tn wrote:Similar method would probably be nice also in phobos if the speed is almost the same.bearophile Wrote:that is the reason I used a better generation algorithm in blip (and tango) that guarantees the correct distribution, at the cost of being slightly more costly, but then the basic generator is cheaper, and if one needs maximum speed one can even use a cheaper source (from the CMWC family) that still seems to pass all statistical tests.Don:For uniform distribution different bit combinations should have different probabilities because floating point numbers have more representable values close to zero. So for doubles the probability should be about 1e-300 and for reals about 1e-4900. But because uniform by default seems to use a 32 bit integer random number generator, the probability is actually 2^^-32. And that is actually verified: I generated 10 * 2^^32 samples of uniform!"[]"(0.0, 1.0) and got 16 zeros which is close enough to expected 10. Of course 2^^-32 is still small enough to have no performance penalty in practise. -- tnSince the probability of actually generating a zero is 1e-4000, it shouldn't affect the speed at all <g>.If bits in double have the same probability then I think there is a much higher probability to hit a zero, about 1 in 2^^63, and I'm not counting NaNs (but it's low enough to not change the substance of what you have said).I thought about directly having 64 bit as source, but the generators I know were written to generate 32 bit at a time. Probably one could modify CMWC to work natively with 64 bit, but it should be done carefully. So I simply decided to stick to 32 bit and generate two of them when needed. Note that my default sources are faster than Twister (the one that is used in phobos), I especially like CMWC (but the default combines it with Kiss for extra safety).The way I use to generate uniform numbers was shown to be better (and detectably so) in the case of floats, when looking at the tails of normal and other distributions generated from uniform numbers. This is very relevant in some cases (for example is you are interested in the probability of catastrophic events). FawziJust using 64 bit integers as source would be enough for almost(?) all cases. At the current speed it would take thousands of years for one modern computer to generate so much random numbers that better resolution was justifiable. (And if one wants to measure probability of rare enough events, one should use more advanced methods like importance sampling.)-- tn
Nov 23 2010