digitalmars.D - Mir Random [WIP]
- Ilya Yaroshenko (21/21) Nov 21 2016 https://github.com/libmir/mir-random
- Andrei Alexandrescu (4/5) Nov 22 2016 This seems like a gratuitous departure from common D practice. Random
- Joseph Rushton Wakeling (21/27) Nov 22 2016 Yes, I think this is avoiding the existing problems with RNGs and
- John Colvin (12/18) Nov 22 2016 I'm pretty sure everyone *wants* it to be a range, but it turns
- Andrei Alexandrescu (11/27) Nov 22 2016 The proposed design has disabled copy ctor and uses opCall() for a new
- Joseph Rushton Wakeling (12/16) Nov 23 2016 I'll see if I can write that up in depth some time soon. TBH
- Andrei Alexandrescu (4/17) Nov 23 2016 Well we need to get to the bottom of this if we're to make progress.
- Joseph Rushton Wakeling (10/13) Nov 23 2016 Would you be up for a direct discussion of this some time --
- Kagamin (4/6) Nov 23 2016 Didn't we have a generic Ref!T wrapper in std.typecons that would
- Martin Nowak (12/21) Nov 25 2016 Yes the problems are inadvertent copies and a disabled this(this)
- Martin Nowak (7/11) Nov 25 2016 Could have an optional .clone if copying is supported.
- Joseph Rushton Wakeling (5/6) Nov 26 2016 I would personally really welcome that, but subject to the
- Andrei Alexandrescu (5/9) Nov 26 2016 Input ranges with disabled this(this) would be a major breaking change
- Jonathan M Davis via Digitalmars-d (24/34) Nov 27 2016 For a while now, I've been tempted to argue that we should require that
- Joseph Rushton Wakeling (17/29) Nov 26 2016 I remember you made that suggestion of disabled this(this) to me
- Ilya Yaroshenko (36/42) Nov 22 2016 It is safe low level architecture without performance and API
- Ilya Yaroshenko (3/26) Nov 22 2016 EDIT: std.range -> std.random
- Andrei Alexandrescu (2/6) Nov 23 2016 Oh, that narrows it down. Thx. -- Andrei
- Joseph Rushton Wakeling (9/15) Nov 23 2016 Alternative solution: functionality that expects the full
- Ilya Yaroshenko (3/20) Nov 23 2016 Good point, will add this. --Ilya
- Andrei Alexandrescu (2/8) Nov 23 2016 Good idea - static assert is your friend. -- Andrei
- Joseph Rushton Wakeling (25/34) Nov 23 2016 Note that if you want to do this, it's convenient to still
- Ilya Yaroshenko (14/48) Nov 23 2016 For example, Mir Random basic utilities (more low level then
- Joseph Rushton Wakeling (21/33) Nov 23 2016 Yes, but as described, `opCall` can easily be created as a
- Ilya Yaroshenko (5/40) Nov 23 2016 Inlining will solve this problem with data duplication (if any)
- Ilya Yaroshenko (4/19) Nov 23 2016 Added RandomRangeAdaptor for URBGs:
- Joseph Rushton Wakeling (7/9) Nov 23 2016 This has exactly the problem I identified above, though: you're
- Ilya Yaroshenko (7/16) Nov 23 2016 1. Current default RNG (Mt19937) has not state for the latest
- Joseph Rushton Wakeling (8/13) Nov 23 2016 That's a detail of your implementation, though, and it's not true
- Andrei Alexandrescu (7/15) Nov 23 2016 Well I do see another small problem with
- Joseph Rushton Wakeling (9/11) Nov 23 2016 Yea, this touches on one of my more general concerns related to
- Jonathan M Davis via Digitalmars-d (17/32) Nov 23 2016 It's quite feasible if you don't calculate front until it's called or
- Joseph Rushton Wakeling (24/41) Nov 25 2016 Yes, indeed. I actually came up with a general purpose solution
- Jonathan M Davis via Digitalmars-d (63/106) Nov 25 2016 It's much messier from an implementation standpoint. If the constructor
- Andrei Alexandrescu (15/16) Nov 23 2016 So it either "takes 1 minute" to change the interface from opCall to
- Jonathan M Davis via Digitalmars-d (16/31) Nov 23 2016 I've frequently found that in cases where I'm not operating on a range o...
- Somebody (2/9) Nov 23 2016 Would, assuming the OpCall() interface, generate!(() => rng())
- Kagamin (6/17) Nov 23 2016 xorshift128+ returns a temporary value computed during state
- Andrei Alexandrescu (10/14) Nov 23 2016 That seems to be a minor matter. Of course at best some measurements
- Kagamin (6/13) Nov 23 2016 Consider this okayish looking code:
- Ryan (11/16) Nov 23 2016 Also consider the case of using 1 generator in your program, but
- Jonathan M Davis via Digitalmars-d (19/33) Nov 23 2016 In the cases where I don't really need a range of random numbers, I've
- Kagamin (4/13) Nov 24 2016 It's the same behavior and suffers from the same problem of reuse
- Jonathan M Davis via Digitalmars-d (23/38) Nov 24 2016 How so? Because someone might call range.front again without bothering t...
- Kagamin (3/5) Nov 24 2016 That's what everything in std.algorithm does.
- Jonathan M Davis via Digitalmars-d (11/16) Nov 24 2016 Then call popFront or drop before passing it along if you're paranoid ab...
- Kagamin (5/7) Nov 25 2016 There's little need for it, as it was pointed out earlier that
- Ilya Yaroshenko (13/24) Nov 24 2016 The fix is opCall syntax. Reference types are classes, which
- John Colvin (9/25) Nov 24 2016 If druntime was initialised by default using
- Ilya Yaroshenko (4/11) Nov 24 2016 No, it should work without DRuntime. Nicholas wrote that it will
- Kagamin (5/9) Nov 23 2016 Yes, it's just Joseph worries about microoptimizations.
- Kagamin (4/7) Nov 23 2016 Though we saw that the compiler can't optimize some simple
- Andrei Alexandrescu (4/10) Nov 23 2016 [Details needed]
- Kagamin (3/5) Nov 24 2016 That's three instructions instead of one shr eax,1
- Timon Gehr (2/7) Nov 24 2016 That would be wrong code. Cast to int after dividing.
- John Colvin (75/82) Nov 24 2016 That's because of the cast(int), dividing by two is optimised
- Kagamin (2/4) Nov 24 2016 What about Andrei's example?
- John Colvin (3/8) Nov 24 2016 I was talking about andrei's example. He has a cast(int) in it
- Kagamin (4/6) Nov 24 2016 And you think that compilation of cast(int)a.length/2 to 3
- John Colvin (4/11) Nov 24 2016 Because it's correct. If a.length is larger than int.max then
- Kagamin (4/7) Nov 24 2016 It can't be possibly correct when a.length is larger than int.max
- Timon Gehr (3/9) Nov 24 2016 The code does not specify that this information should be recovered. Why...
- Andrei Alexandrescu (56/99) Nov 23 2016 I don't understand this. Can you please be more specific? I don't see a
- Kagamin (16/19) Nov 23 2016 struct A
- Joseph Rushton Wakeling (3/17) Nov 23 2016 auto m = (&r).map!(a=>1);
- Kagamin (3/5) Nov 23 2016 Or RefRange https://issues.dlang.org/show_bug.cgi?id=10888 :)
- Ilya Yaroshenko (3/9) Nov 23 2016 What the principal difference with randomRangeAdaptor? Anyway,
- Ryan (3/5) Nov 23 2016 Tested, works with 2.071.1
- Ilya Yaroshenko (2/9) Nov 23 2016 r.randomRangeAdaptor.map!(a=>1);
- Ilya Yaroshenko (69/187) Nov 23 2016 A range to use it with std.algorithm and std.range must be
- Andrei Alexandrescu (5/11) Nov 23 2016 Is this a design matter, or an implementation matter? Could you
- Ilya Yaroshenko (52/69) Nov 23 2016 Floating point generation are implementation matter.
- Ilya Yaroshenko (3/5) Nov 23 2016 EDIT: Mod_X ~ Y + X, Y ~ X. (X & Y are independent)
- Andrei Alexandrescu (9/12) Nov 23 2016 OK, so (lack of) copyability is a good argument. I'm ready to live with
- Ilya Yaroshenko (7/10) Nov 23 2016 Current adaptor should be used in a function scope. (Would be
- Andrei Alexandrescu (2/10) Nov 23 2016 Yah, problem here is we can't make that @safe. -- Andrei
- Ilya Yaroshenko (3/20) Nov 23 2016 Can we improve D to make it safe?
- Andrei Alexandrescu (4/20) Nov 23 2016 It's difficult in the general case for objective reasons (modular
- Kagamin (4/9) Nov 24 2016 You want to instantiate and seed Mt every time you call a
- Ilya Yaroshenko (3/12) Nov 24 2016 A function can use a global RNG defined by a user or accepts RNG
- Ilya Yaroshenko (5/20) Nov 24 2016 Real uniform `rand` (-2^^exp, +2^^exp) and real UniformVariable
- Timon Gehr (16/21) Nov 24 2016 I would posit that it is not actually natural to model random numbers as...
- Joseph Rushton Wakeling (27/33) Nov 22 2016 As we discussed in relation to Seb's project, I think this is a
- Andrei Alexandrescu (2/16) Nov 22 2016 Yah, I think that would be nice. -- Andrei
- Joseph Rushton Wakeling (16/17) Nov 22 2016 This means that seemingly identical code will produce different
- Andrei Alexandrescu (2/4) Nov 22 2016 Interesting. Could you please add a couple of links about that? -- Andre...
- ketmar (4/10) Nov 22 2016 http://www.pcg-random.org/other-rngs.html
- Kagamin (3/5) Nov 23 2016 http://xoroshiro.di.unimi.it/
- Andrea Fontana (3/8) Nov 24 2016 Very short public domain implementation:
- Joseph Rushton Wakeling (4/13) Nov 24 2016 Implementation in D, written during DConf 2016 ;-)
- Ilya Yaroshenko (10/29) Nov 22 2016 Mir Random is going to be a library with saturated uniform FP
- Joseph Rushton Wakeling (17/25) Nov 23 2016 All of which is fine in its own terms, but why prioritize the
- Ilya Yaroshenko (12/31) Nov 23 2016 We have a Random alias. I think it is OK if it generates
- Joseph Rushton Wakeling (6/8) Nov 23 2016 I think we're at an impasse in terms of priorities, because
- Andrea Fontana (6/8) Nov 23 2016 I wonder why Mersenne Twister is *always* choosen over other
- Joseph Rushton Wakeling (10/12) Nov 23 2016 The weight of history, I suspect. Mersenne Twister was the major
- Ilya Yaroshenko (3/11) Nov 23 2016 A PR for CMWC is highly welcome!
https://github.com/libmir/mir-random Dlang Random Number Generators - `opCall` API instead of range interface is used (similar to C++) - No default and copy constructors are allowed for generators. - 64-bit Mt19937 initialization is fixed - 64-bit Mt19937 is default for 64-bit targets - `unpredictableSeed` has not state, returns `ulong` - Does not depend on DRuntime (Better C concept) - [WIP] additional Xorshift generators [WIP] [WIP] [WIP] Please contact me if you are interesting to contribute! https://gitter.im/libmir/public
Nov 21 2016
On 11/22/16 1:31 AM, Ilya Yaroshenko wrote:- `opCall` API instead of range interface is used (similar to C++)This seems like a gratuitous departure from common D practice. Random number generators are most naturally modeled in D as infinite ranges. -- Andrei
Nov 22 2016
On Tuesday, 22 November 2016 at 23:55:01 UTC, Andrei Alexandrescu wrote:On 11/22/16 1:31 AM, Ilya Yaroshenko wrote:Yes, I think this is avoiding the existing problems with RNGs and ranges rather than solving them. I don't blame anyone for _wanting_ to avoid them; they are nasty, subtle issues that seem to keep getting more complex the more one looks at them (for example, after my DConf talk last year, I realized that there were a whole set of other potential complications related to how ranges typically treat laziness). But I think they can be solved, and should be. OTOH, there's no reason per se why there should not be an `opCall` for random number generators along the lines of, UIntType opCall() { this.popFront(); return this.front; } ... just to provide options to the user. (BTW, note the order there, which touches on the issues related to what lazy evaluation means not just for RNGs but for any non-deterministic IO.)- `opCall` API instead of range interface is used (similar to C++)This seems like a gratuitous departure from common D practice. Random number generators are most naturally modeled in D as infinite ranges. -- Andrei
Nov 22 2016
On Tuesday, 22 November 2016 at 23:55:01 UTC, Andrei Alexandrescu wrote:On 11/22/16 1:31 AM, Ilya Yaroshenko wrote:I'm pretty sure everyone *wants* it to be a range, but it turns out it's a difficult design space. Lot's of nasty trade-offs that can be (and have been) argued back and forth depending on your priorities. Perhaps you have an insight that has been missed that can make a good rng range without causing less than optimal performance or statistically unsafe default behaviour? IIRC you think people are making too much fuss about the statistically safe default behaviour, but it would be interesting to hear a more expanded version of that view.- `opCall` API instead of range interface is used (similar to C++)This seems like a gratuitous departure from common D practice. Random number generators are most naturally modeled in D as infinite ranges. -- Andrei
Nov 22 2016
On 11/22/16 7:30 PM, John Colvin wrote:On Tuesday, 22 November 2016 at 23:55:01 UTC, Andrei Alexandrescu wrote:The proposed design has disabled copy ctor and uses opCall() for a new element. That seems to be a difference without a distinction from an input range that has disabled copy ctor and offers the input range primitives.On 11/22/16 1:31 AM, Ilya Yaroshenko wrote:I'm pretty sure everyone *wants* it to be a range, but it turns out it's a difficult design space. Lot's of nasty trade-offs that can be (and have been) argued back and forth depending on your priorities.- `opCall` API instead of range interface is used (similar to C++)This seems like a gratuitous departure from common D practice. Random number generators are most naturally modeled in D as infinite ranges. -- AndreiPerhaps you have an insight that has been missed that can make a good rng range without causing less than optimal performance or statistically unsafe default behaviour?We should add a reference RNG that transforms any other RNG into an input range that shares the actual RNG.IIRC you think people are making too much fuss about the statistically safe default behaviour, but it would be interesting to hear a more expanded version of that view.I'm unclear on what that statistically unsafe default behavior is - my understanding is it has to do with RNGs being inadvertently copied. It would be great to formalize that in a well-explained issue. Andrei
Nov 22 2016
On Wednesday, 23 November 2016 at 01:34:23 UTC, Andrei Alexandrescu wrote:I'm unclear on what that statistically unsafe default behavior is - my understanding is it has to do with RNGs being inadvertently copied. It would be great to formalize that in a well-explained issue.I'll see if I can write that up in depth some time soon. TBH though I think the problem is less about RNGs and more about stuff like RandomSample and RandomCover (and, in future, random distributions that have internal state, like a struct implementing a normal distribution using the Ziggurat algorithm internally). It's not so difficult to stop RNG state being copied inadvertently, but when you have ranges wrapping ranges wrapping ranges, each containing their own extra state that cannot be copied by value, things get a bit more complicated.
Nov 23 2016
On 11/23/2016 05:47 AM, Joseph Rushton Wakeling wrote:On Wednesday, 23 November 2016 at 01:34:23 UTC, Andrei Alexandrescu wrote:Well we need to get to the bottom of this if we're to make progress. Otherwise it's copypasta with little changes followed by disappointment all the way down. -- AndreiI'm unclear on what that statistically unsafe default behavior is - my understanding is it has to do with RNGs being inadvertently copied. It would be great to formalize that in a well-explained issue.I'll see if I can write that up in depth some time soon. TBH though I think the problem is less about RNGs and more about stuff like RandomSample and RandomCover (and, in future, random distributions that have internal state, like a struct implementing a normal distribution using the Ziggurat algorithm internally). It's not so difficult to stop RNG state being copied inadvertently, but when you have ranges wrapping ranges wrapping ranges, each containing their own extra state that cannot be copied by value, things get a bit more complicated.
Nov 23 2016
On Wednesday, 23 November 2016 at 13:44:00 UTC, Andrei Alexandrescu wrote:Well we need to get to the bottom of this if we're to make progress. Otherwise it's copypasta with little changes followed by disappointment all the way down. -- AndreiWould you be up for a direct discussion of this some time -- maybe next week once I've had time to re-review the fine details of my concerns? I can provide a summary of issues as a starting point for that discussion, if it would be useful. I suggest this because (i) I'm interested in your insight and (ii) even if we end up disagreeing on concerns and solutions, it would be good to make sure that we're on the same page in terms of understanding each other's point of view.
Nov 23 2016
On Wednesday, 23 November 2016 at 01:34:23 UTC, Andrei Alexandrescu wrote:We should add a reference RNG that transforms any other RNG into an input range that shares the actual RNG.Didn't we have a generic Ref!T wrapper in std.typecons that would work this way? Can't find it now.
Nov 23 2016
On Wednesday, 23 November 2016 at 01:34:23 UTC, Andrei Alexandrescu wrote:On 11/22/16 7:30 PM, John Colvin wrote:Yes the problems are inadvertent copies and a disabled this(this) would prevent that. RNGs should have unique ownership of their internal state. Using InputRanges with phobos is somewhat clumsy. Maybe people have been burned by those and now generally consider InputRanges as broken. Maybe non-copyability needs to become a requirement for InputRanges.On Tuesday, 22 November 2016 at 23:55:01 UTC, Andrei Alexandrescu wrote:The proposed design has disabled copy ctor and uses opCall() for a new element. That seems to be a difference without a distinction from an input range that has disabled copy ctor and offers the input range primitives.We should add a reference RNG that transforms any other RNG into an input range that shares the actual RNG.We already have refRange for that, no? Also refCounted(Random(unpredictableSeed)) should work.
Nov 25 2016
On Saturday, 26 November 2016 at 06:46:19 UTC, Martin Nowak wrote:Maybe non-copyability needs to become a requirement for InputRanges.Could have an optional .clone if copying is supported. What would be an InputRange where copying is correct?We already have refRange for that, no? Also refCounted(Random(unpredictableSeed)) should work.Same scheme works for files with only plain int fds streams, non-copyable/unique ownership, and move, refRange, or refCounted for passing and sharing. Should we split off this discussion to a dlang-study thread?
Nov 25 2016
On Saturday, 26 November 2016 at 06:55:24 UTC, Martin Nowak wrote:Should we split off this discussion to a dlang-study thread?I would personally really welcome that, but subject to the understanding that people agree to look seriously at random algorithms (like RandomSample) and not focus the discussion solely on RNGs.
Nov 26 2016
On 11/26/2016 01:55 AM, Martin Nowak wrote:On Saturday, 26 November 2016 at 06:46:19 UTC, Martin Nowak wrote:Input ranges with disabled this(this) would be a major breaking change that we probably cannot bear (and shouldn't because the gain is small). I think an input range would be at best "pure reference" in the sense that popFront advances all copies to the same position. -- AndreiMaybe non-copyability needs to become a requirement for InputRanges.Could have an optional .clone if copying is supported. What would be an InputRange where copying is correct?
Nov 26 2016
On Saturday, November 26, 2016 12:24:47 Andrei Alexandrescu via Digitalmars- d wrote:On 11/26/2016 01:55 AM, Martin Nowak wrote:For a while now, I've been tempted to argue that we should require that well-behaved input ranges be reference types (or at least have reference semantics). It just seems like too much misbehaves if they're not, and arguably, the whole reason that they're input ranges and not forward ranges (aside from the programmer just not bothering to implement save) is that they effectively have reference semantics whether they're actually coded that way or not. We obviously can't test for that behavior at compile time, but it could easily be on the list of things that are documented as a requirement for well-behaved ranges. I'm also tempted to argue that well-behaved forward ranges and greater should have value semantics (in the sense that dynamic arrays do), but that's a much bigger change (it effectively makes save unnecessary), and proper use of save works around that problem (though it's very common that save isn't used as often as it should be). It would clean up a lot of the mistakes that occur with forward ranges though. However, regardless, a non-copyable range would not play nicely at all with how range-based code works right now. You'd have to use ref all over the place, which range-based code simply doesn't do and would not play well with function chaining, since you'd need lvalues, and since you couldn't copy the range to the stack to make it an lvalue, that seems like it would go nowhere fast. - Jonathan M DavisOn Saturday, 26 November 2016 at 06:46:19 UTC, Martin Nowak wrote:Input ranges with disabled this(this) would be a major breaking change that we probably cannot bear (and shouldn't because the gain is small). I think an input range would be at best "pure reference" in the sense that popFront advances all copies to the same position. -- AndreiMaybe non-copyability needs to become a requirement for InputRanges.Could have an optional .clone if copying is supported. What would be an InputRange where copying is correct?
Nov 27 2016
On Saturday, 26 November 2016 at 06:46:19 UTC, Martin Nowak wrote:Yes the problems are inadvertent copies and a disabled this(this) would prevent that. RNGs should have unique ownership of their internal state. Using InputRanges with phobos is somewhat clumsy. Maybe people have been burned by those and now generally consider InputRanges as broken. Maybe non-copyability needs to become a requirement for InputRanges.I remember you made that suggestion of disabled this(this) to me a while back, and I did use it for this small collection of RNGs: https://github.com/WebDrake/dxorshiftRefRange still has the issue that it only forwards the range primitives and not other properties such as the `isUniformRandom` enum. But those are probably fixable. However, these approaches basically cover the case of something where an instance is initialized high up in the program and passed around by ref throughout its lifetime. That doesn't address the question of how random _algorithms_ like `RandomSample` could be updated for safety (since they might often need to be initialized multiple times in the inner loops of a program). See: https://forum.dlang.org/post/gnlvyogolkmocujtnxjj forum.dlang.org https://forum.dlang.org/post/gpsilefdillwtleuwohl forum.dlang.org for some discussion of the problem.We should add a reference RNG that transforms any other RNG into an input range that shares the actual RNG.We already have refRange for that, no? Also refCounted(Random(unpredictableSeed)) should work.
Nov 26 2016
On Tuesday, 22 November 2016 at 23:55:01 UTC, Andrei Alexandrescu wrote:On 11/22/16 1:31 AM, Ilya Yaroshenko wrote:It is safe low level architecture without performance and API issues. It prevents users to do stupid things implicitly (like copying RNGs). A hight level range interface can be added in the future (it will hold a _pointer_ to an RNG). In additional, when you need to write algorithms or distributions opCall is much more convenient than range API. In additions, users would not use Engine API in 99% cases: they will just want to call `rand` or `uniform`, or other distribution. I am sure that almost any library should have low level API that is fits to its implementation first. Addition API levels also may be added. Current Phobos evolution is generic degradation: more generic and "universal" code hide more uncovered bugs in the code. The std.range is good example of degradation, it has a lot of API and implementation bugs. not used when an uniform integer number is generated : `uniform!ulong` / `uniform!ulong(0, 100)`. 8/16/32/64 bits uniformly. It is RNG problem how to do it. I will not fill this bug as well another dozen std.random bugs because the module should be rewritten anyway and I am working on it. std.random is a collection of bugs from C/C++ libraries extended with D generic idioms. For example, there is no reason in 64 bit Xorshift. It is 32 bit by design. Furthermore, 64 expansion of 32 bit algorithms must be proved theoretically before we allow it for end users. 64 bit analogs are exists, but they have another implementations. Phobos degrades because we add a lot of generic specializations and small utilities without understanding use cases. Phobos really follows stupid idealistic idea: more generic is better, more API is better, more universal algorithms is better. The problems is that Phobos/DRuntime is soup where all (because its "universality") interacts with everything.- `opCall` API instead of range interface is used (similar to C++)This seems like a gratuitous departure from common D practice. Random number generators are most naturally modeled in D as infinite ranges. -- Andrei
Nov 22 2016
On Wednesday, 23 November 2016 at 05:58:47 UTC, Ilya Yaroshenko wrote:On Tuesday, 22 November 2016 at 23:55:01 UTC, Andrei Alexandrescu wrote:EDIT: std.range -> std.randomOn 11/22/16 1:31 AM, Ilya Yaroshenko wrote:It is safe low level architecture without performance and API issues. It prevents users to do stupid things implicitly (like copying RNGs). A hight level range interface can be added in the future (it will hold a _pointer_ to an RNG). In additional, when you need to write algorithms or distributions opCall is much more convenient than range API. In additions, users would not use Engine API in 99% cases: they will just want to call `rand` or `uniform`, or other distribution. I am sure that almost any library should have low level API that is fits to its implementation first. Addition API levels also may be added. Current Phobos evolution is generic degradation: more generic and "universal" code hide more uncovered bugs in the code. The std.range is good example of degradation, it has a lot of API and implementation bugs.- `opCall` API instead of range interface is used (similar to C++)This seems like a gratuitous departure from common D practice. Random number generators are most naturally modeled in D as infinite ranges. -- Andrei
Nov 22 2016
On 11/23/2016 01:16 AM, Ilya Yaroshenko wrote:Oh, that narrows it down. Thx. -- AndreiThe std.range is good example of degradation, it has a lot of API and implementation bugs.EDIT: std.range -> std.random
Nov 23 2016
On Wednesday, 23 November 2016 at 05:58:47 UTC, Ilya Yaroshenko wrote:are not used when an uniform integer number is generated : `uniform!ulong` / `uniform!ulong(0, 100)`. 8/16/32/64 bits uniformly. It is RNG problem how to do it.Alternative solution: functionality that expects the full unsigned integer word (UIntType) to be filled with random bits, should validate that the min/max values of the generator correspond to UIntType.min and UIntType.max. That introduces less breaking change, creates less divergence with the C++11 standard, and preserves more flexibility for the future.
Nov 23 2016
On Wednesday, 23 November 2016 at 10:27:00 UTC, Joseph Rushton Wakeling wrote:On Wednesday, 23 November 2016 at 05:58:47 UTC, Ilya Yaroshenko wrote:Good point, will add this. --Ilyaare not used when an uniform integer number is generated : `uniform!ulong` / `uniform!ulong(0, 100)`. 8/16/32/64 bits uniformly. It is RNG problem how to do it.Alternative solution: functionality that expects the full unsigned integer word (UIntType) to be filled with random bits, should validate that the min/max values of the generator correspond to UIntType.min and UIntType.max. That introduces less breaking change, creates less divergence with the C++11 standard, and preserves more flexibility for the future.
Nov 23 2016
On 11/23/2016 05:27 AM, Joseph Rushton Wakeling wrote:Alternative solution: functionality that expects the full unsigned integer word (UIntType) to be filled with random bits, should validate that the min/max values of the generator correspond to UIntType.min and UIntType.max. That introduces less breaking change, creates less divergence with the C++11 standard, and preserves more flexibility for the future.Good idea - static assert is your friend. -- Andrei
Nov 23 2016
On Wednesday, 23 November 2016 at 05:58:47 UTC, Ilya Yaroshenko wrote:It is safe low level architecture without performance and API issues. It prevents users to do stupid things implicitly (like copying RNGs). A hight level range interface can be added in the future (it will hold a _pointer_ to an RNG).Note that if you want to do this, it's convenient to still preserve a separation between popping the RNG state versus getting the current variate. Otherwise, the range interface will wind up having to separately cache the variate value, which is wasteful. Something like this: struct MyRNG { void popFront() { /* update internal state */ } UIntType front() property { return /* whatever part of internal state */; } UIntType opCall() { this.popFront(); return this.front; } } (The above is basically just the input range API less the `empty` property, and the `front` and `popFront()` are arguably a lower-level API than `opCall`.)In additional, when you need to write algorithms or distributions opCall is much more convenient than range API.Can you give some examples of what you mean here?In additions, users would not use Engine API in 99% cases: they will just want to call `rand` or `uniform`, or other distribution.I don't think that's necessarily true, but in any case, we shouldn't restrict the use-cases of the 1% unless we have to.
Nov 23 2016
On Wednesday, 23 November 2016 at 10:57:04 UTC, Joseph Rushton Wakeling wrote:On Wednesday, 23 November 2016 at 05:58:47 UTC, Ilya Yaroshenko wrote:For example, Mir Random basic utilities (more low level then distributions): https://github.com/libmir/mir-random/blob/master/source/random/package.d Also you can explore std.random code. opCall would almost always more convenient to use.It is safe low level architecture without performance and API issues. It prevents users to do stupid things implicitly (like copying RNGs). A hight level range interface can be added in the future (it will hold a _pointer_ to an RNG).Note that if you want to do this, it's convenient to still preserve a separation between popping the RNG state versus getting the current variate. Otherwise, the range interface will wind up having to separately cache the variate value, which is wasteful. Something like this: struct MyRNG { void popFront() { /* update internal state */ } UIntType front() property { return /* whatever part of internal state */; } UIntType opCall() { this.popFront(); return this.front; } } (The above is basically just the input range API less the `empty` property, and the `front` and `popFront()` are arguably a lower-level API than `opCall`.)In additional, when you need to write algorithms or distributions opCall is much more convenient than range API.Can you give some examples of what you mean here?We don't restrict. It is 1 minute to write an Range wrapper. This wrapper can be added to the library, if we found a real world use case. Again, I have not seen a real world algorithm, which looks better with Range API for generators. RandomSample can be created without Range API, and it would look more convenient then it is now. I think that Range API is bad and useless overengineering for RNGs.In additions, users would not use Engine API in 99% cases: they will just want to call `rand` or `uniform`, or other distribution.I don't think that's necessarily true, but in any case, we shouldn't restrict the use-cases of the 1% unless we have to.
Nov 23 2016
On Wednesday, 23 November 2016 at 11:14:38 UTC, Ilya Yaroshenko wrote:For example, Mir Random basic utilities (more low level then distributions): https://github.com/libmir/mir-random/blob/master/source/random/package.d Also you can explore std.random code. opCall would almost always more convenient to use.Yes, but as described, `opCall` can easily be created as a composition of `popFront` and `front` for these convenience purposes.We don't restrict. It is 1 minute to write an Range wrapper.If all you have is `opCall` then the range wrapper is less efficient than it need be.This wrapper can be added to the library, if we found a real world use case. Again, I have not seen a real world algorithm, which looks better with Range API for generators. RandomSample can be created without Range API, and it would look more convenient then it is now. I think that Range API is bad and useless overengineering for RNGs.Yes, most uses of RNGs in std.random involve calling `front` and then `popFront()` (although it would probably be better the other way round). But it's readily possible to imagine range-based use-cases for random distributions along the lines of, myRNG.normalDistribution(0.0, 5.0).filter!(a => a > 0).somethingElse.take(20); But what I'd say more broadly is that of what I've seen so far, mir.random is conflating too many breaking changes without giving thought for their impact (for example, converting the `isUniformRNG` check to rely on a UDA is IMO problematic; I can file a GitHub issue explaining the reasons for this). Allowing for the wider goals of the exercise, it could be worth giving some thought to which of these breakages is really needed to support your use-cases, and which can be avoided.
Nov 23 2016
On Wednesday, 23 November 2016 at 11:44:44 UTC, Joseph Rushton Wakeling wrote:On Wednesday, 23 November 2016 at 11:14:38 UTC, Ilya Yaroshenko wrote:Inlining will solve this problem with data duplication (if any) for most real world cases.For example, Mir Random basic utilities (more low level then distributions): https://github.com/libmir/mir-random/blob/master/source/random/package.d Also you can explore std.random code. opCall would almost always more convenient to use.Yes, but as described, `opCall` can easily be created as a composition of `popFront` and `front` for these convenience purposes.We don't restrict. It is 1 minute to write an Range wrapper.If all you have is `opCall` then the range wrapper is less efficient than it need be.Yes, please file a GitHub issue.This wrapper can be added to the library, if we found a real world use case. Again, I have not seen a real world algorithm, which looks better with Range API for generators. RandomSample can be created without Range API, and it would look more convenient then it is now. I think that Range API is bad and useless overengineering for RNGs.Yes, most uses of RNGs in std.random involve calling `front` and then `popFront()` (although it would probably be better the other way round). But it's readily possible to imagine range-based use-cases for random distributions along the lines of, myRNG.normalDistribution(0.0, 5.0).filter!(a => a > 0).somethingElse.take(20); But what I'd say more broadly is that of what I've seen so far, mir.random is conflating too many breaking changes without giving thought for their impact (for example, converting the `isUniformRNG` check to rely on a UDA is IMO problematic; I can file a GitHub issue explaining the reasons for this). Allowing for the wider goals of the exercise, it could be worth giving some thought to which of these breakages is really needed to support your use-cases, and which can be avoided.
Nov 23 2016
On Wednesday, 23 November 2016 at 11:44:44 UTC, Joseph Rushton Wakeling wrote:Yes, most uses of RNGs in std.random involve calling `front` and then `popFront()` (although it would probably be better the other way round). But it's readily possible to imagine range-based use-cases for random distributions along the lines of, myRNG.normalDistribution(0.0, 5.0).filter!(a => a > 0).somethingElse.take(20); But what I'd say more broadly is that of what I've seen so far, mir.random is conflating too many breaking changes without giving thought for their impact (for example, converting the `isUniformRNG` check to rely on a UDA is IMO problematic; I can file a GitHub issue explaining the reasons for this). Allowing for the wider goals of the exercise, it could be worth giving some thought to which of these breakages is really needed to support your use-cases, and which can be avoided.Added RandomRangeAdaptor for URBGs: https://github.com/libmir/mir-random/blob/master/source/random/algorithm.d
Nov 23 2016
On Wednesday, 23 November 2016 at 13:03:04 UTC, Ilya Yaroshenko wrote:Added RandomRangeAdaptor for URBGs: https://github.com/libmir/mir-random/blob/master/source/random/algorithm.dThis has exactly the problem I identified above, though: you're unnecessarily cacheing the latest variate rather than just using the RNG state directly. Not the biggest deal in the world, but avoidable if you allow a separation between updating RNG state and accessing it.
Nov 23 2016
On Wednesday, 23 November 2016 at 13:26:41 UTC, Joseph Rushton Wakeling wrote:On Wednesday, 23 November 2016 at 13:03:04 UTC, Ilya Yaroshenko wrote:1. Current default RNG (Mt19937) has not state for the latest value. 2. The structure is allocated on stack and compilers can recognise loop patterns and eliminate addition memory movements for many cases.Added RandomRangeAdaptor for URBGs: https://github.com/libmir/mir-random/blob/master/source/random/algorithm.dThis has exactly the problem I identified above, though: you're unnecessarily cacheing the latest variate rather than just using the RNG state directly. Not the biggest deal in the world, but avoidable if you allow a separation between updating RNG state and accessing it.
Nov 23 2016
On Wednesday, 23 November 2016 at 13:35:33 UTC, Ilya Yaroshenko wrote:1. Current default RNG (Mt19937) has not state for the latest value.That's a detail of your implementation, though, and it's not true for lots of other pseudo-RNGs.2. The structure is allocated on stack and compilers can recognise loop patterns and eliminate addition memory movements for many cases.Fair enough. I'm still not sure, though, why you would object to providing public methods for pseudo-RNGs to (i) update the internal state and (ii) access the most recently generated variate, which the `opCall` would then wrap.
Nov 23 2016
On 11/23/2016 08:26 AM, Joseph Rushton Wakeling wrote:On Wednesday, 23 November 2016 at 13:03:04 UTC, Ilya Yaroshenko wrote:Well I do see another small problem with https://github.com/libmir/mir-random/blob/master/source/random/algorithm.d#L19. It creates the first value eagerly upon construction. That's just a bit awkward. It's not the first time I'm seeing this issue, and to be honest am a bit bummed about it. It's a consequence of all ranges needing to buffer at least one element. -- AndreiAdded RandomRangeAdaptor for URBGs: https://github.com/libmir/mir-random/blob/master/source/random/algorithm.dThis has exactly the problem I identified above, though: you're unnecessarily cacheing the latest variate rather than just using the RNG state directly. Not the biggest deal in the world, but avoidable if you allow a separation between updating RNG state and accessing it.
Nov 23 2016
On Wednesday, 23 November 2016 at 13:56:27 UTC, Andrei Alexandrescu wrote:Well I do see another small problem with https://github.com/libmir/mir-random/blob/master/source/ran om/algorithm.d#L19. It creates the first value eagerly upon construction. That's just a bit awkward. It's not the first time I'm seeing this issue, and to be honest am a bit bummed about it. It's a consequence of all ranges needing to buffer at least one element. -- AndreiYea, this touches on one of my more general concerns related to ranges, randomness and laziness. Part of the trouble is we don't have a really good general design for how to implement _truly_ lazy ranges, where the value of `front` is not determined until the point where you first access it. This has particular relevance to e.g. RandomSample and RandomCover.
Nov 23 2016
On Wednesday, November 23, 2016 14:02:36 Joseph Rushton Wakeling via Digitalmars-d wrote:On Wednesday, 23 November 2016 at 13:56:27 UTC, Andrei Alexandrescu wrote:It's quite feasible if you don't calculate front until it's called or popFront is called, and then you cache the result so that subsequent calls to front prior to the call to popFront return the same result, but it creates additional overhead, because then every call to front and popFront has to check whether the value has been calculated yet. And if every range in the chain of ranges has to do that, then that additional overhead is going to add up. In general, it's just cleaner to either calculate front on every call to front (which doesn't make sense in the case of random number generators) or to calculate the first front upon construction and be done with it. And I think that the general consensus has been that calculating front in the constructor and popFront and caching works better than calculating it on every call to front, but that doesn't always work (e.g. map), and that caching does cause issues occasionally. - Jonathan M DavisWell I do see another small problem with https://github.com/libmir/mir-random/blob/master/source/random/algorithm .d#L19. It creates the first value eagerly upon construction. That's just a bit awkward. It's not the first time I'm seeing this issue, and to be honest am a bit bummed about it. It's a consequence of all ranges needing to buffer at least one element. -- AndreiYea, this touches on one of my more general concerns related to ranges, randomness and laziness. Part of the trouble is we don't have a really good general design for how to implement _truly_ lazy ranges, where the value of `front` is not determined until the point where you first access it. This has particular relevance to e.g. RandomSample and RandomCover.
Nov 23 2016
On Wednesday, 23 November 2016 at 21:19:51 UTC, Jonathan M Davis wrote:It's quite feasible if you don't calculate front until it's called or popFront is called, and then you cache the result so that subsequent calls to front prior to the call to popFront return the same result, but it creates additional overhead, because then every call to front and popFront has to check whether the value has been calculated yet. And if every range in the chain of ranges has to do that, then that additional overhead is going to add up.Yes, indeed. I actually came up with a general purpose solution for that a while back via template mixin (although the version in the post linked to is not the final version: I updated it to use HaraldZealot's suggestion of making the frontImplementation and popFrontImplementation the template parameters): https://forum.dlang.org/post/mailman.263.1439926667.13986.digitalmars-d puremagic.comIn general, it's just cleaner to either calculate front on every call to front (which doesn't make sense in the case of random number generators) or to calculate the first front upon construction and be done with it.In some ways I think it's cleaner to not privilege the first front value and go for "true" laziness for all front values -- particularly when dealing with stuff like RandomSample. The downside is that `.front` becomes non-const. Sometimes I wonder whether it wouldn't have been better to require that `popFront()` be called before even the first call to `.front`, and place the onus on `popFront()` to handle any special-casing of the first `.front` value.And I think that the general consensus has been that calculating front in the constructor and popFront and caching works better than calculating it on every call to front, but that doesn't always work (e.g. map), and that caching does cause issues occasionally.The case I always think of is: what if your input range is designed to correspond to sensor observations made at a particular time? This is a case where cacheing the first value on construction is very problematic. For RNGs it's really not such a big deal so long as successive variates are independent, but for random algorithms I think the laziness matters, since their popFront is essentially IO-dependent (in the Haskell-style sense that randomness is IO).
Nov 25 2016
On Friday, November 25, 2016 08:32:07 Joseph Rushton Wakeling via Digitalmars-d wrote:On Wednesday, 23 November 2016 at 21:19:51 UTC, Jonathan M Davis wrote:It's much messier from an implementation standpoint. If the constructor checks for empty and caches front if the range it was given was non-empty, then every call to front can just return front, and every call to popFront can just pop the element off and do whatever it does to turn the front from the wrapped range into the front of this range and then cache it. If the constructor doesn't do that work, then every call to front and every call to popFront have to check whether they're the first time that either has been called and _then_ they end up doing all of the same work anyway. So, there's more code, and it's less efficient. And if every range in the chain does that, then you're getting a lot of extra checks just to determine whether it's the first time that front or popFront is being called. Now, if there's never any caching of the front value, and front always calculates it, then that might actually be more efficient if front is only ever called once per element, because you don't get the cost of copying the element to cache it, but if front gets called multiple times (which happens fairly often, I think), then the cost is definitely greater. You're never going to be able to rely on a fully lazy front anyway unless we specified it as part of the range API that all ranges work that way, and as it stands, very few ranges do.It's quite feasible if you don't calculate front until it's called or popFront is called, and then you cache the result so that subsequent calls to front prior to the call to popFront return the same result, but it creates additional overhead, because then every call to front and popFront has to check whether the value has been calculated yet. And if every range in the chain of ranges has to do that, then that additional overhead is going to add up.Yes, indeed. I actually came up with a general purpose solution for that a while back via template mixin (although the version in the post linked to is not the final version: I updated it to use HaraldZealot's suggestion of making the frontImplementation and popFrontImplementation the template parameters): https://forum.dlang.org/post/mailman.263.1439926667.13986.digitalmars-d pu remagic.comIn general, it's just cleaner to either calculate front on every call to front (which doesn't make sense in the case of random number generators) or to calculate the first front upon construction and be done with it.In some ways I think it's cleaner to not privilege the first front value and go for "true" laziness for all front values -- particularly when dealing with stuff like RandomSample.The downside is that `.front` becomes non-const.Well, realistically, as it stands, ranges are utterly useless with const anyway. We'd have to have a standard mechanism for getting a tail-const copy of a range from a const or immutable range, and we don't.Sometimes I wonder whether it wouldn't have been better to require that `popFront()` be called before even the first call to `.front`, and place the onus on `popFront()` to handle any special-casing of the first `.front` value.That would be terrible with chaining ranges. It also would be _very_ error-prone. It's also heading into two-phase initialization territory, which is almost always a bad idea.Honestly, I don't think that it really works to have a range where the value of its elements depend on when you call front or popFront. The API lets you do it, but you're going to run into plenty of problems if you do unless you don't care about the frequency at which the values are generated. Part of the problem here is that ranges and the algorithms that use them are really designed with arrays in mind. A non-random-access range reduces those capabilities, and a basic input range doesn't actually require that the elements be reproducible aside from multiple calls to front giving the same result, but it's still the case that it's pretty much assumed that a range is a fixed set of values, and pretty much nothing is concerned with how fast front or popFront is called aside for efficiency concerns. So, if someone has a range that calculates a value when front or popFront is called, and that value depends on when the function is called, I think that they're just going to have to deal with the consequences of that. It's already pretty problematic when you have a range that's a basic input range rather than a forward range. Another thing to consider here is that those sorts of concerns really only apply to basic input ranges. As soon as you have a forward range, the values of the elements need to be completely predictable - or at least completely reproducible - meaning that aside from caching a potentially infinite number of elements to guarantee that save works, nothing that's doing something like grabbing changing values from a sensor is going to be anything but a basic input range. So, maybe we should do something special with regards to basic input ranges and how they're treated in order to better deal with cases like that, but if we went that route, then we pretty much couldn't treat them like forward ranges without save anymore. In many cases, we'd have to have separate implementations that avoided things like caching, and that would become problematic fast. I think that there always going to be cases where certain things have some funky corner cases with ranges - especially the further that they get from being dynamic arrays. We probably need to do a better job of formalizing some of this stuff to better avoid some of those corner cases, but I think that we're always going to be able to find something that we can define as a range that works in many cases but starts doing weird things if we use it on a large enough range of algorithms. - Jonathan M DavisAnd I think that the general consensus has been that calculating front in the constructor and popFront and caching works better than calculating it on every call to front, but that doesn't always work (e.g. map), and that caching does cause issues occasionally.The case I always think of is: what if your input range is designed to correspond to sensor observations made at a particular time? This is a case where cacheing the first value on construction is very problematic. For RNGs it's really not such a big deal so long as successive variates are independent, but for random algorithms I think the laziness matters, since their popFront is essentially IO-dependent (in the Haskell-style sense that randomness is IO).
Nov 25 2016
On 11/23/2016 06:14 AM, Ilya Yaroshenko wrote:I think that Range API is bad and useless overengineering for RNGs.So it either "takes 1 minute" to change the interface from opCall to ranges (i.e. they're virtually the same thing), or it's the above. There's precious little overengineering that can be done in 1 minute :o). I see you did that in a dozen lines in RandomRangeAdaptor. I understand you believe the range interface is unnecessary or overkill for random number generators. I may even agree for certain cases. However, there are a few arguments you may want to consider: * By virtue of working with D, everybody know what a range is. Presenting the RNGs that way instantly evokes a host of knowledge, understanding, and idioms. * D users (or at least a fraction) and several algorithms understand the notion of an infinite range and make salient decisions based on that. A range interface automatically taps into that. Andrei
Nov 23 2016
On Wednesday, November 23, 2016 08:54:30 Andrei Alexandrescu via Digitalmars-d wrote:On 11/23/2016 06:14 AM, Ilya Yaroshenko wrote:I've frequently found that in cases where I'm not operating on a range of values, I really just want a way to get a random number, and having to call both popFront and front to get it is a bit annoying (it's actually one of the few places that I've ever used the comma operator). That being said, I think that the correct solution to that is to add a function to std.random that takes a range, calls popFront on it, and then returns front so that it can work with any of the random number generators and avoids problems with whether or not popFront was called prior to calling front. I have to agree that creating a different API that uses opCall or whatever instead of a the range API is a bad idea, particularly when a simple helper function would make it possible to use a random number generator in a fashion more like rand() for the cases where that's preferable, and for a lot of cases, having the range API is _extremely_ useful. - Jonathan M DavisI think that Range API is bad and useless overengineering for RNGs.So it either "takes 1 minute" to change the interface from opCall to ranges (i.e. they're virtually the same thing), or it's the above. There's precious little overengineering that can be done in 1 minute :o). I see you did that in a dozen lines in RandomRangeAdaptor. I understand you believe the range interface is unnecessary or overkill for random number generators. I may even agree for certain cases. However, there are a few arguments you may want to consider: * By virtue of working with D, everybody know what a range is. Presenting the RNGs that way instantly evokes a host of knowledge, understanding, and idioms. * D users (or at least a fraction) and several algorithms understand the notion of an infinite range and make salient decisions based on that. A range interface automatically taps into that.
Nov 23 2016
I have to agree that creating a different API that uses opCall or whatever instead of a the range API is a bad idea, particularly when a simple helper function would make it possible to use a random number generator in a fashion more like rand() for the cases where that's preferable, and for a lot of cases, having the range API is _extremely_ useful. - Jonathan M DavisWould, assuming the OpCall() interface, generate!(() => rng()) make an input range out of it easily? Or am I missing something?
Nov 23 2016
On Wednesday, 23 November 2016 at 10:57:04 UTC, Joseph Rushton Wakeling wrote:struct MyRNG { void popFront() { /* update internal state */ } UIntType front() property { return /* whatever part of internal state */; } UIntType opCall() { this.popFront(); return this.front; } }xorshift128+ returns a temporary value computed during state update, so it can't efficiently separate font from popFront. Also this API is prone to reuse some values and discard others, call popFront after front.
Nov 23 2016
On 11/23/2016 09:12 AM, Kagamin wrote:xorshift128+ returns a temporary value computed during state update, so it can't efficiently separate font from popFront.That seems to be a minor matter. Of course at best some measurements would be in order.Also this API is prone to reuse some values and discard others, call popFront after front.This claim would apply to all ranges. There seems to be evidence it is unfounded. The main argument for using the range interface for RNGs is reuse of abstraction. Minute implementation matters are much less important. The main counter-argument is that the abstraction is not fitting well and another abstraction (such as opCall) is more advantageous. Andrei
Nov 23 2016
On Wednesday, 23 November 2016 at 14:30:53 UTC, Andrei Alexandrescu wrote:This claim would apply to all ranges. There seems to be evidence it is unfounded. The main argument for using the range interface for RNGs is reuse of abstraction. Minute implementation matters are much less important. The main counter-argument is that the abstraction is not fitting well and another abstraction (such as opCall) is more advantageous.Consider this okayish looking code: consume(rng()); consume(rng.take(2)); //reuses previous value consume(rng()); //discards unused value
Nov 23 2016
On Wednesday, 23 November 2016 at 15:29:14 UTC, Kagamin wrote:On Wednesday, 23 November 2016 at 14:30:53 UTC, Andrei Consider this okayish looking code: consume(rng()); consume(rng.take(2)); //reuses previous value consume(rng()); //discards unused valueAlso consider the case of using 1 generator in your program, but calling it from different places. In one place, you use opCall with the popFront -> front order of calls, and in the other you use the range interface directly with the order reversed. You would re-use values there too. By offering both interfaces together it makes it pretty easy to create silly bugs like this, especially in a large project. Might be useful to have a basic RNG interface and then wrap it with a template that provides either, but not both, of the other interfaces at a time.
Nov 23 2016
On Wednesday, November 23, 2016 15:29:14 Kagamin via Digitalmars-d wrote:On Wednesday, 23 November 2016 at 14:30:53 UTC, Andrei Alexandrescu wrote:In the cases where I don't really need a range of random numbers, I've typically ended up doing stuff like auto nextNum = rndGen().popFront(), rndGen().front; though I think that using the comma operator like that is deprecated now. Adding a helper function such as auto getNext(R)(ref R range) if(isInputRange!R) { range.popFront(); return range.front; } would solve that problem. And there are plenty of cases where having a range of random numbers is extremely useful. After having dealt with, std.random, it would really suck to have to deal with a random number generator that was not range-based. std.random has problems such as not using reference types for its ranges, but using the range API in the way it does is extremely useful. - Jonathan M DavisThis claim would apply to all ranges. There seems to be evidence it is unfounded. The main argument for using the range interface for RNGs is reuse of abstraction. Minute implementation matters are much less important. The main counter-argument is that the abstraction is not fitting well and another abstraction (such as opCall) is more advantageous.Consider this okayish looking code: consume(rng()); consume(rng.take(2)); //reuses previous value consume(rng()); //discards unused value
Nov 23 2016
On Wednesday, 23 November 2016 at 21:33:53 UTC, Jonathan M Davis wrote:though I think that using the comma operator like that is deprecated now. Adding a helper function such as auto getNext(R)(ref R range) if(isInputRange!R) { range.popFront(); return range.front; } would solve that problem.It's the same behavior and suffers from the same problem of reuse of RNG output.
Nov 24 2016
On Thursday, November 24, 2016 09:05:34 Kagamin via Digitalmars-d wrote:On Wednesday, 23 November 2016 at 21:33:53 UTC, Jonathan M Davis wrote:How so? Because someone might call range.front again without bothering to call popFront? If you're paranoid about that, then it can call popFront again before returning (though that would be wasteful). My take on it is that if you just call popFront before using the random number generator range, then you don't have to worry about what any other code that used it did. Regardless, the range API is _way_ more useful in general than something like rand(). And if anyone is trying to avoid ranges with random numbers, I think that they're just making their life harder. Occasionally, it's useful to get just one random number, in which case, having to deal with the range API over rand() is kind of annoying, but in general, it's not a problem, and the wrapper function that I suggested basically gives you rand() from a range of random numbers. Alternatively, you could just do rndGen().take(1).front, and as long as rndGen() gives you a reference type, it works just fine. Unfortunately, std.random did not use reference types for its ranges. _That_ is the big mistake of std.random and the main item that needs to be fixed. There are some other subtleties (e.g. it's useful to be able to save the state of a random number generating range, but you don't necessarily really want it to be a forward range), but those are minor in comparison to the mistake of std.random using value types rather than reference types for ranges. - Jonathan M Davisthough I think that using the comma operator like that is deprecated now. Adding a helper function such as auto getNext(R)(ref R range) if(isInputRange!R) { range.popFront(); return range.front; } would solve that problem.It's the same behavior and suffers from the same problem of reuse of RNG output.
Nov 24 2016
On Thursday, 24 November 2016 at 13:45:40 UTC, Jonathan M Davis wrote:How so? Because someone might call range.front again without bothering to call popFront?That's what everything in std.algorithm does.
Nov 24 2016
On Thursday, November 24, 2016 13:54:50 Kagamin via Digitalmars-d wrote:On Thursday, 24 November 2016 at 13:45:40 UTC, Jonathan M Davis wrote:Then call popFront or drop before passing it along if you're paranoid about it. If it's a serious concern that this be fixed in general, then the obvious fix to that would be to make it so that rndGen() just called popFront before returning it. Heck, if rndGen() were guaranteed to call popFront when it was called rather than simply returning the range, then you could just do rndGen().front whenever you wanted the equivalent of rand(), making it trivial to use rndGen() both for cases where you wanted a single number and for cases where you wanted a range of them - and without worrying about front being stale. - Jonathan M DavisHow so? Because someone might call range.front again without bothering to call popFront?That's what everything in std.algorithm does.
Nov 24 2016
On Thursday, 24 November 2016 at 14:22:05 UTC, Jonathan M Davis wrote:Then call popFront or drop before passing it along if you're paranoid about it.There's little need for it, as it was pointed out earlier that the generate algorithm does the right thing and needs only opCall, also a nice example of orthogonality and composability.
Nov 25 2016
On Thursday, 24 November 2016 at 13:45:40 UTC, Jonathan M Davis wrote:Alternatively, you could just do rndGen().take(1).front, and as long as rndGen() gives you a reference type, it works just fine. Unfortunately, std.random did not use reference types for its ranges. _That_ is the big mistake of std.random and the main item that needs to be fixed. There are some other subtleties (e.g. it's useful to be able to save the state of a random number generating range, but you don't necessarily really want it to be a forward range), but those are minor in comparison to the mistake of std.random using value types rather than reference types for ranges. - Jonathan M DavisThe fix is opCall syntax. Reference types are classes, which would not work without linking DRuntime and Phobos (BetterC). Refcounting is to slow to implement for users. Note, that Engines is only backend RNGs. There are also nonuniform distributions (WIP). See the example of users code: https://forum.dlang.org/post/nyvtoejvmsaolzaqyche forum.dlang.org . It is very difficult to implement both user random variable and Engine using Range API. Note, that code should work without DRuntime and should be simple (the example is user code). Ilya
Nov 24 2016
On Thursday, 24 November 2016 at 16:09:23 UTC, Ilya Yaroshenko wrote:On Thursday, 24 November 2016 at 13:45:40 UTC, Jonathan M Davis wrote:If druntime was initialised by default using __attribute__((constructor)) for static and linker .init for shared libraries, would that be good enough for you*? I feel like you're limiting your design choices because of a relatively small and simple implementation shortcoming. * remember, we don't currently guarantee ABI compatibility between compiler versions even if you don't use druntime/phobos.Alternatively, you could just do rndGen().take(1).front, and as long as rndGen() gives you a reference type, it works just fine. Unfortunately, std.random did not use reference types for its ranges. _That_ is the big mistake of std.random and the main item that needs to be fixed. There are some other subtleties (e.g. it's useful to be able to save the state of a random number generating range, but you don't necessarily really want it to be a forward range), but those are minor in comparison to the mistake of std.random using value types rather than reference types for ranges. - Jonathan M DavisThe fix is opCall syntax. Reference types are classes, which would not work without linking DRuntime and Phobos (BetterC).
Nov 24 2016
On Thursday, 24 November 2016 at 17:04:43 UTC, John Colvin wrote:If druntime was initialised by default using __attribute__((constructor)) for static and linker .init for shared libraries, would that be good enough for you*? I feel like you're limiting your design choices because of a relatively small and simple implementation shortcoming. * remember, we don't currently guarantee ABI compatibility between compiler versions even if you don't use druntime/phobos.No, it should work without DRuntime. Nicholas wrote that it will work on GPU using dcompute :-) BetterC is the goal for the future Mir-Phobos, hehe
Nov 24 2016
On Wednesday, 23 November 2016 at 14:30:53 UTC, Andrei Alexandrescu wrote:That seems to be a minor matter.Yes, it's just Joseph worries about microoptimizations.The main counter-argument is that the abstraction is not fitting well and another abstraction (such as opCall) is more advantageous.Ranges are designed for reuse of front, which is not desirable for RNGs.
Nov 23 2016
On Wednesday, 23 November 2016 at 14:30:53 UTC, Andrei Alexandrescu wrote:That seems to be a minor matter. Of course at best some measurements would be in order.Though we saw that the compiler can't optimize some simple operations like division by 2.Yes, it's just Joseph worries about microoptimizations.
Nov 23 2016
On 11/23/16 11:10 AM, Kagamin wrote:On Wednesday, 23 November 2016 at 14:30:53 UTC, Andrei Alexandrescu wrote:[Details needed] I just took a look https://godbolt.org/g/EMy6X4, it's happening. AndreiThat seems to be a minor matter. Of course at best some measurements would be in order.Though we saw that the compiler can't optimize some simple operations like division by 2.Yes, it's just Joseph worries about microoptimizations.
Nov 23 2016
On Wednesday, 23 November 2016 at 21:18:27 UTC, Andrei Alexandrescu wrote:[Details needed] I just took a look https://godbolt.org/g/EMy6X4, it's happening.That's three instructions instead of one shr eax,1
Nov 24 2016
On 24.11.2016 09:55, Kagamin wrote:On Wednesday, 23 November 2016 at 21:18:27 UTC, Andrei Alexandrescu wrote:That would be wrong code. Cast to int after dividing.[Details needed] I just took a look https://godbolt.org/g/EMy6X4, it's happening.That's three instructions instead of one shr eax,1
Nov 24 2016
On Thursday, 24 November 2016 at 08:55:18 UTC, Kagamin wrote:On Wednesday, 23 November 2016 at 21:18:27 UTC, Andrei Alexandrescu wrote:That's because of the cast(int), dividing by two is optimised just fine. You can even do this: int foo(int a) { return a / 2; } uint foo(uint a) { return a / 2; } int bar(int a) { if (a > 0) return a / 2; else assert(0); } and get int example.foo(int): // slow, to deal with sign mov eax, edi shr eax, 31 add eax, edi sar eax ret uint example.foo(uint): // fast because no sign mov eax, edi shr eax ret int example.bar(int): // single shift because sign always 0 test edi, edi jle .L8 mov eax, edi sar eax ret .L8: ud2 This should help explain why the extra/different instructions are necessary for signed: int foo0(int a) { asm { naked; mov EAX, EDI; shr EAX, 31; add EAX, EDI; sar EAX, 1; ret; } } int foo1(int a) { asm { naked; mov EAX, EDI; sar EAX, 1; ret; } } int foo2(int a) { asm { naked; mov EAX, EDI; shr EAX, 1; ret; } } void main() { import std.stdio; foreach(i; [int.min, -1, 0, 1, int.max]) writeln(foo0(i), ' ', foo1(i), ' ', foo2(i)); } output: -1073741824 -1073741824 1073741824 0 -1 2147483647 0 0 0 0 0 0 1073741823 1073741823 1073741823[Details needed] I just took a look https://godbolt.org/g/EMy6X4, it's happening.That's three instructions instead of one shr eax,1
Nov 24 2016
On Thursday, 24 November 2016 at 09:46:16 UTC, John Colvin wrote:That's because of the cast(int), dividing by two is optimised just fine.What about Andrei's example?
Nov 24 2016
On Thursday, 24 November 2016 at 10:14:27 UTC, Kagamin wrote:On Thursday, 24 November 2016 at 09:46:16 UTC, John Colvin wrote:I was talking about andrei's example. He has a cast(int) in it before the division.That's because of the cast(int), dividing by two is optimised just fine.What about Andrei's example?
Nov 24 2016
On Thursday, 24 November 2016 at 10:16:11 UTC, John Colvin wrote:I was talking about andrei's example. He has a cast(int) in it before the division.And you think that compilation of cast(int)a.length/2 to 3 instructions is optimized just fine? How is it better that one shift?
Nov 24 2016
On Thursday, 24 November 2016 at 10:41:44 UTC, Kagamin wrote:On Thursday, 24 November 2016 at 10:16:11 UTC, John Colvin wrote:Because it's correct. If a.length is larger than int.max then cast(int)a.length will half the time be negative and therefore a simple rightshift would not be equivalent to division by 2.I was talking about andrei's example. He has a cast(int) in it before the division.And you think that compilation of cast(int)a.length/2 to 3 instructions is optimized just fine? How is it better that one shift?
Nov 24 2016
On Thursday, 24 November 2016 at 12:02:22 UTC, John Colvin wrote:Because it's correct. If a.length is larger than int.max then cast(int)a.length will half the time be negative and therefore a simple rightshift would not be equivalent to division by 2.It can't be possibly correct when a.length is larger than int.max because it doesn't recover information lost in a narrowing conversion.
Nov 24 2016
On 24.11.2016 14:50, Kagamin wrote:On Thursday, 24 November 2016 at 12:02:22 UTC, John Colvin wrote:The code does not specify that this information should be recovered. Why is this the compiler's problem?Because it's correct. If a.length is larger than int.max then cast(int)a.length will half the time be negative and therefore a simple rightshift would not be equivalent to division by 2.It can't be possibly correct when a.length is larger than int.max because it doesn't recover information lost in a narrowing conversion.
Nov 24 2016
On 11/23/2016 12:58 AM, Ilya Yaroshenko wrote:On Tuesday, 22 November 2016 at 23:55:01 UTC, Andrei Alexandrescu wrote:I don't understand this. Can you please be more specific? I don't see a major issue wrt offering opCall() vs. front/popFront. (empty is always true.)On 11/22/16 1:31 AM, Ilya Yaroshenko wrote:It is safe low level architecture without performance and API issues.- `opCall` API instead of range interface is used (similar to C++)This seems like a gratuitous departure from common D practice. Random number generators are most naturally modeled in D as infinite ranges. -- AndreiIt prevents users to do stupid things implicitly (like copying RNGs).An input range can be made noncopyable.A hight level range interface can be added in the future (it will hold a _pointer_ to an RNG).Is there a reason to not have that now? Again, I'm literally talking about offering front/popFront in lieu of opCall(). The only implementation difference is you keep the currently-generated number as a member instead of returning it from opCall. I doubt one could measure a performance difference. If you implement it as a noncopyable input range, you get a large support network working for you. With opCall, we have virtually no such support - you need to do everything once again.In additional, when you need to write algorithms or distributions opCall is much more convenient than range API.Could you please be more specific? On the face of it I'd agree one call is less than two, but I don't see a major drawback here.In additions, users would not use Engine API in 99% cases: they will just want to call `rand` or `uniform`, or other distribution. I am sure that almost any library should have low level API that is fits to its implementation first. Addition API levels also may be added.Is there a large difference between opCall and front/popFront? Actually I can think of one - the matter of getting things started. Ranges have this awkwardness of starting the iteration: either you fill the current front eagerly in the constructor, or you have some sort of means to detect initialization has not yet been done and do it lazily upon the first use of front. The best strategy would depend on the actual generator, and admittedly would be a bit more of a headache compared to opCall. Was this the motivation?Current Phobos evolution is generic degradation: more generic and "universal" code hide more uncovered bugs in the code. The std.range is good example of degradation, it has a lot of API and implementation bugs.Do you have examples of issues outside random number generators?used when an uniform integer number is generated : `uniform!ulong` / `uniform!ulong(0, 100)`. uniformly. It is RNG problem how to do it.Min and max are not parameters, they are bounds provided by each generator. I agree their purpose is unclear. We could require all generators to provide min = 0 and max = UIntType.max without breaking APIs. In that case we only need to renounce LinearCongruentialEngine with c = 0 (see https://github.com/dlang/phobos/blob/master/std/random.d#L258) - in fact that's the main reason for introducing min and max in the first place. All other code stays unchanged, and we can easily deprecate min and max for RNGs. (I do see min and max used by uniform at https://github.com/dlang/phobos/blob/master/std/random.d#L1281 so I'm not sure I get what you mean, but anyhow the idea that we require RNGs to fill an uint/ulong with all random bits simplifies a lot of matters.)I will not fill this bug as well another dozen std.random bugs because the module should be rewritten anyway and I am working on it. std.random is a collection of bugs from C/C++ libraries extended with D generic idioms. For example, there is no reason in 64 bit Xorshift. It is 32 bit by design. Furthermore, 64 expansion of 32 bit algorithms must be proved theoretically before we allow it for end users. 64 bit analogs are exists, but they have another implementations.One matter that I see is there's precious little difference between mir.random and std.random. Much of the code seems copied, which is an inefficient way to go about things. We shouldn't fork everything if we don't like a bit of it, though admittedly the path toward making changes in std is more difficult. Is your intent to work on mir.random on the side and then submit it as a wholesale replacement of std.random under a different name? In that case you'd have my support, but you'd need to convince me the replacement is necessary. You'd probably have a good case for eliminating xorshift/64, but then we may simply deprecate that outright. You'd possibly have a more difficult time with opCall.Phobos degrades because we add a lot of generic specializations and small utilities without understanding use cases.This is really difficult to parse. Are you using "degrades" the way it's meant? What is a "generic specialization"? What are examples of "small utilities without understanding use cases"?Phobos really follows stupid idealistic idea: more generic is better, more API is better, more universal algorithms is better. The problems is that Phobos/DRuntime is soup where all (because its "universality") interacts with everything.I do think more generic is better, of course within reason. It would be a tenuous statement that generic designs in Phobos such as ranges, algorithms, and allocators are stupid and idealistic. So I'd be quite interested in hearing more about this. What's that bouillabaisse about? Thanks, Andrei
Nov 23 2016
On Wednesday, 23 November 2016 at 13:41:25 UTC, Andrei Alexandrescu wrote:An input range can be made noncopyable. If you implement it as a noncopyable input range, you get a large support network working for you.struct A { bool empty; int front; void popFront(){} disable this(this); } import std.algorithm; void f() { A r; auto m=r.map!(a=>1); } Does this compile for you?
Nov 23 2016
On Wednesday, 23 November 2016 at 15:25:29 UTC, Kagamin wrote:struct A { bool empty; int front; void popFront(){} disable this(this); } import std.algorithm; void f() { A r; auto m=r.map!(a=>1); } Does this compile for you?auto m = (&r).map!(a=>1); ...? (Untested, but I think it should work.)
Nov 23 2016
On Wednesday, 23 November 2016 at 15:44:10 UTC, Joseph Rushton Wakeling wrote:auto m = (&r).map!(a=>1); ...? (Untested, but I think it should work.)Or RefRange https://issues.dlang.org/show_bug.cgi?id=10888 :)
Nov 23 2016
On Wednesday, 23 November 2016 at 15:59:39 UTC, Kagamin wrote:On Wednesday, 23 November 2016 at 15:44:10 UTC, Joseph Rushton Wakeling wrote:What the principal difference with randomRangeAdaptor? Anyway, noncopyable input range con not be used with Phobos. --Ilyaauto m = (&r).map!(a=>1); ...? (Untested, but I think it should work.)Or RefRange https://issues.dlang.org/show_bug.cgi?id=10888 :)
Nov 23 2016
On Wednesday, 23 November 2016 at 15:44:10 UTC, Joseph Rushton Wakeling wrote:auto m = (&r).map!(a=>1); ...? (Untested, but I think it should work.)Tested, works with 2.071.1
Nov 23 2016
On Wednesday, 23 November 2016 at 16:01:14 UTC, Ryan wrote:On Wednesday, 23 November 2016 at 15:44:10 UTC, Joseph Rushton Wakeling wrote:r.randomRangeAdaptor.map!(a=>1);auto m = (&r).map!(a=>1); ...? (Untested, but I think it should work.)Tested, works with 2.071.1
Nov 23 2016
On Wednesday, 23 November 2016 at 13:41:25 UTC, Andrei Alexandrescu wrote:On 11/23/2016 12:58 AM, Ilya Yaroshenko wrote:A range to use it with std.algorithm and std.range must be copyable (it is passed by value.On Tuesday, 22 November 2016 at 23:55:01 UTC, Andrei Alexandrescu wrote:I don't understand this. Can you please be more specific? I don't see a major issue wrt offering opCall() vs. front/popFront. (empty is always true.)On 11/22/16 1:31 AM, Ilya Yaroshenko wrote:It is safe low level architecture without performance and API issues.- `opCall` API instead of range interface is used (similar to C++)This seems like a gratuitous departure from common D practice. Random number generators are most naturally modeled in D as infinite ranges. -- AndreiDitto. A noncopyable input range is useless.It prevents users to do stupid things implicitly (like copying RNGs).An input range can be made noncopyable.Done. See `RandomRangeAdaptor`: https://github.com/libmir/mir-random/blob/master/source/random/algorithm.dA hight level range interface can be added in the future (it will hold a _pointer_ to an RNG).Is there a reason to not have that now?The main reason in implementation simplicity. Engines should be simple to create, simple to maintain, and simple to use. opCall is more simple then range interface because 1. One declaration instead of 4 (3 range functions for plus latest generated value (optional)) 2. Input range is useless if range is not copyable. 3. `randomRangeAdaptor` is implemented for Engines and will be done for Distributions too. So range API is supported better then in std.range (because Engines are copied).In additional, when you need to write algorithms or distributions opCall is much more convenient than range API.Could you please be more specific? On the face of it I'd agree one call is less than two, but I don't see a major drawback here.Simplicity is main motivation.In additions, users would not use Engine API in 99% cases: they will just want to call `rand` or `uniform`, or other distribution. I am sure that almost any library should have low level API that is fits to its implementation first. Addition API levels also may be added.Is there a large difference between opCall and front/popFront? Actually I can think of one - the matter of getting things started. Ranges have this awkwardness of starting the iteration: either you fill the current front eagerly in the constructor, or you have some sort of means to detect initialization has not yet been done and do it lazily upon the first use of front. The best strategy would depend on the actual generator, and admittedly would be a bit more of a headache compared to opCall. Was this the motivation?Current Mir solution looks like pair isURBG and isSURBG. `S` prefix means `T.max == ReturnType!T.max` where T is an Engine. So, functions use isSURBG now. The min property is not required: we can just subtract actual min from a returning value. An adaptor can be added to convert URBG to Saturated URBG.are not used when an uniform integer number is generated : `uniform!ulong` / `uniform!ulong(0, 100)`. 8/16/32/64 bits uniformly. It is RNG problem how to do it.Min and max are not parameters, they are bounds provided by each generator. I agree their purpose is unclear. We could require all generators to provide min = 0 and max = UIntType.max without breaking APIs. In that case we only need to renounce LinearCongruentialEngine with c = 0 (see https://github.com/dlang/phobos/blob/master/std/random.d#L258) - in fact that's the main reason for introducing min and max in the first place. All other code stays unchanged, and we can easily deprecate min and max for RNGs. (I do see min and max used by uniform at https://github.com/dlang/phobos/blob/master/std/random.d#L1281 so I'm not sure I get what you mean, but anyhow the idea that we require RNGs to fill an uint/ulong with all random bits simplifies a lot of matters.)I started with Engines as basis. The library will be very different comparing with Phobos and _any_ other RNG libraries in terms of floating point generation quality. All FP generation I have seen are not saturated (amount of possible unique FP values are very small comparing with ideal situation because of IEEE arithmetic). I have not found the idea described by others, so it may be an article in the future. A set of new modern Engines would be added (Nicholas Wilson, and may be Joseph). Also Seb and I will add a set of distributions.I will not fill this bug as well another dozen std.random bugs because the module should be rewritten anyway and I am working on it. std.random is a collection of bugs from C/C++ libraries extended with D generic idioms. For example, there is no reason in 64 bit Xorshift. It is 32 bit by design. Furthermore, 64 expansion of 32 bit algorithms must be proved theoretically before we allow it for end users. 64 bit analogs are exists, but they have another implementations.One matter that I see is there's precious little difference between mir.random and std.random. Much of the code seems copied, which is an inefficient way to go about things. We shouldn't fork everything if we don't like a bit of it, though admittedly the path toward making changes in std is more difficult. Is your intent to work on mir.random on the side and then submit it as a wholesale replacement of std.random under a different name? In that case you'd have my support, but you'd need to convince me the replacement is necessary. You'd probably have a good case for eliminating xorshift/64, but then we may simply deprecate that outright. You'd possibly have a more difficult time with opCall.Sorry, my English is ... . It is not clear to me what subset of generic code is nothrow (reduce, for example). The same true for BetterC concept: it is hard to predict when an algorithms requires DRuntime to be linked / initialised. It is not clear what modules are imported by an module. "small utilities without understanding use cases" - Numeric code in std.algorithm: minElement, sum. They should not be in std.algorithm. A user can use `reduce`. Or, if speed is required we need to move to numeric solution suitable for vectorization. And std.algorithm seems to be wrong module for vectorised numeric code.Phobos degrades because we add a lot of generic specializations and small utilities without understanding use cases.This is really difficult to parse. Are you using "degrades" the way it's meant? What is a "generic specialization"? What are examples of "small utilities without understanding use cases"?For example std.allocator. It is awesome! But I can not use it in GLAS, because I don't understand if it will work without linking with DRuntime. So, I copy-pasted and modified your code for AlignedMallocator: https://github.com/libmir/mir-glas/blob/master/source/glas/internal/memory.d ranges, algorithms seem good to me except it is not clear when code is nothrow /BetterC. std.math is a problem: we are adding new API without solving existing API problems and C compatibility. std.complex prevents math optimisations (this can not be solved without a compiler hacks), GLAS migrated to native (old) complex numbers. I like generics when they make D usage simpler. If one will add a random number generation for Phobos sorting algorithm it will make it useless for BetterC (because it will require to link RNG). Such issues are not reviewed during Phobos review process. Linking Phobos / DRuntime is not an option because it has not backward binary compatibility, so packages can not be distributed as precompiled libraries. std.traits, std.meta, std.range.primitives, std.ndslice, and part of std.math is only modules I am using in Mir libraries. It is very important to me to have BetterC guarainties between different Phobos versions. Super generic code when different modules imports each other is hard to review. Best regards, IlyaPhobos really follows stupid idealistic idea: more generic is better, more API is better, more universal algorithms is better. The problems is that Phobos/DRuntime is soup where all (because its "universality") interacts with everything.I do think more generic is better, of course within reason. It would be a tenuous statement that generic designs in Phobos such as ranges, algorithms, and allocators are stupid and idealistic. So I'd be quite interested in hearing more about this. What's that bouillabaisse about?
Nov 23 2016
On 11/23/2016 10:52 AM, Ilya Yaroshenko wrote:I started with Engines as basis. The library will be very different comparing with Phobos and _any_ other RNG libraries in terms of floating point generation quality. All FP generation I have seen are not saturated (amount of possible unique FP values are very small comparing with ideal situation because of IEEE arithmetic). I have not found the idea described by others, so it may be an article in the future.Is this a design matter, or an implementation matter? Could you implement the superior FP generation on the existing std.random API? To not be able to vs. to not want to is a different matter; I have an appreciation for each, but we need to clarify. -- Andrei
Nov 23 2016
On Wednesday, 23 November 2016 at 16:54:44 UTC, Andrei Alexandrescu wrote:On 11/23/2016 10:52 AM, Ilya Yaroshenko wrote:Floating point generation are implementation matter. Distributions and algorithms are design matter. Assume your are building a modification Mod_X ~ 1 / X + X for a distribution. This is how it will look in Mir Random: ------------------------- struct Mod(D) if(isRandomVariable!D && isFloatingPoint!(ReturnType!D)) { D irv; alias irv this; ReturnType!D opCall(G)(ref G gen) if(isSURBG!D) { ReturnType!D x1 = void; do x1 = irv(gen); while(x1 == 0); ReturnType!D x2 = irv(gen); ///////////////////////////////////////////////////////////////////////// ///////////////// X1 and X2 are independent! /////////////// //////////////////////////////////////////////////////////////////////// return 1 / x1 + x2; } } unittest { auto gen = Xorshift(1); auto rv = Mod!GammaRandomVariable(...); auto x = rv(gen); /// generator is never copied by value. } ------------------------- How it can be done with Range interface instead of opCall? Please note that users would use Distributions, not Engines. So, Range API requirements are: 1. Engine must not have implicit copy semantic: it is not correct for RNGs and has performance issues. They also should not be copied explicitly in this example. 2. Do not use Engine's pointers in RandomVariable, because RandomVariables can be passed easily outside of function: they are just a small tuples of params. 3. Do not use classes because of BetterC and performance issues. 4. Engines must have Range interface 5. RandomVariables (both Mod and an underlaying) must have Range interface. I don't think Range API for random variables can be done easily and without performance or security issues. Thanks, IlyaI started with Engines as basis. The library will be very different comparing with Phobos and _any_ other RNG libraries in terms of floating point generation quality. All FP generation I have seen are not saturated (amount of possible unique FP values are very small comparing with ideal situation because of IEEE arithmetic). I have not found the idea described by others, so it may be an article in the future.Is this a design matter, or an implementation matter? Could you implement the superior FP generation on the existing std.random API? To not be able to vs. to not want to is a different matter; I have an appreciation for each, but we need to clarify. -- Andrei
Nov 23 2016
On Wednesday, 23 November 2016 at 18:13:52 UTC, Ilya Yaroshenko wrote:Assume your are building a modification Mod_X ~ 1 / X + X for a distribution. This is how it will look in Mir Random:EDIT: Mod_X ~ Y + X, Y ~ X. (X & Y are independent)
Nov 23 2016
On 11/23/2016 01:13 PM, Ilya Yaroshenko wrote:1. Engine must not have implicit copy semantic: it is not correct for RNGs and has performance issues. They also should not be copied explicitly in this example.OK, so (lack of) copyability is a good argument. I'm ready to live with random generators that are noncopyable values and use opCall; then use a range adaptor on top of a pointer to that. It seems providing the range primitives for a noncopyable object is not useful and is liable to create more confusion and frustration than a distinct API. A tip for the range adaptor: have it allocate and own the generator internally. That way it's easy to make it reference counted economically. Andrei
Nov 23 2016
On Wednesday, 23 November 2016 at 19:16:29 UTC, Andrei Alexandrescu wrote:A tip for the range adaptor: have it allocate and own the generator internally. That way it's easy to make it reference counted economically.Current adaptor should be used in a function scope. (Would be great to have a DIP for that kind of semantic check). An RC adaptor can be added too. First we need to find a real world use case where we need to store a random range outside of a function. -- Ilya
Nov 23 2016
On 11/23/16 2:40 PM, Ilya Yaroshenko wrote:On Wednesday, 23 November 2016 at 19:16:29 UTC, Andrei Alexandrescu wrote:Yah, problem here is we can't make that safe. -- AndreiA tip for the range adaptor: have it allocate and own the generator internally. That way it's easy to make it reference counted economically.Current adaptor should be used in a function scope. (Would be great to have a DIP for that kind of semantic check). An RC adaptor can be added too. First we need to find a real world use case where we need to store a random range outside of a function. -- Ilya
Nov 23 2016
On Wednesday, 23 November 2016 at 19:51:50 UTC, Andrei Alexandrescu wrote:On 11/23/16 2:40 PM, Ilya Yaroshenko wrote:Can we improve D to make it safe?On Wednesday, 23 November 2016 at 19:16:29 UTC, Andrei Alexandrescu wrote:Yah, problem here is we can't make that safe. -- AndreiA tip for the range adaptor: have it allocate and own the generator internally. That way it's easy to make it reference counted economically.Current adaptor should be used in a function scope. (Would be great to have a DIP for that kind of semantic check). An RC adaptor can be added too. First we need to find a real world use case where we need to store a random range outside of a function. -- Ilya
Nov 23 2016
On 11/23/16 3:05 PM, Ilya Yaroshenko wrote:On Wednesday, 23 November 2016 at 19:51:50 UTC, Andrei Alexandrescu wrote:It's difficult in the general case for objective reasons (modular analysis). For this, we may be able to take an argument by ref, save its address inside, add some scope attributes. It's a project. -- AndreiOn 11/23/16 2:40 PM, Ilya Yaroshenko wrote:Can we improve D to make it safe?On Wednesday, 23 November 2016 at 19:16:29 UTC, Andrei Alexandrescu wrote:Yah, problem here is we can't make that safe. -- AndreiA tip for the range adaptor: have it allocate and own the generator internally. That way it's easy to make it reference counted economically.Current adaptor should be used in a function scope. (Would be great to have a DIP for that kind of semantic check). An RC adaptor can be added too. First we need to find a real world use case where we need to store a random range outside of a function. -- Ilya
Nov 23 2016
On Wednesday, 23 November 2016 at 19:40:47 UTC, Ilya Yaroshenko wrote:Current adaptor should be used in a function scope. (Would be great to have a DIP for that kind of semantic check). An RC adaptor can be added too. First we need to find a real world use case where we need to store a random range outside of a function. -- IlyaYou want to instantiate and seed Mt every time you call a function that may need random numbers?
Nov 24 2016
On Thursday, 24 November 2016 at 08:59:03 UTC, Kagamin wrote:On Wednesday, 23 November 2016 at 19:40:47 UTC, Ilya Yaroshenko wrote:A function can use a global RNG defined by a user or accepts RNG by reference. Range adaptor stores pointer, not value.Current adaptor should be used in a function scope. (Would be great to have a DIP for that kind of semantic check). An RC adaptor can be added too. First we need to find a real world use case where we need to store a random range outside of a function. -- IlyaYou want to instantiate and seed Mt every time you call a function that may need random numbers?
Nov 24 2016
On Wednesday, 23 November 2016 at 16:54:44 UTC, Andrei Alexandrescu wrote:On 11/23/2016 10:52 AM, Ilya Yaroshenko wrote:Real uniform `rand` (-2^^exp, +2^^exp) and real UniformVariable [a, b) was added. `rand` dose not use IEEE arithmetic to generate a real random number. --IlyaI started with Engines as basis. The library will be very different comparing with Phobos and _any_ other RNG libraries in terms of floating point generation quality. All FP generation I have seen are not saturated (amount of possible unique FP values are very small comparing with ideal situation because of IEEE arithmetic). I have not found the idea described by others, so it may be an article in the future.Is this a design matter, or an implementation matter? Could you implement the superior FP generation on the existing std.random API?
Nov 24 2016
On 23.11.2016 00:55, Andrei Alexandrescu wrote:On 11/22/16 1:31 AM, Ilya Yaroshenko wrote:I would posit that it is not actually natural to model random numbers as ranges. Not every random object is a sequence. Different samples are (ideally) independent, so there is little conceptual motivation to group them together -- it just increases the logistic overhead needed to get at the single samples. I.e., the most natural way to use a PRNG range is as follows: PRNG range; auto sample(){ auto r=range.front; range.popFront(); return r; } PRNGs are an example where global state is actually desirable, because in contrast to most other programming tasks, usually less predictability is good.- `opCall` API instead of range interface is used (similar to C++)This seems like a gratuitous departure from common D practice. Random number generators are most naturally modeled in D as infinite ranges. -- Andrei
Nov 24 2016
On Tuesday, 22 November 2016 at 06:31:45 UTC, Ilya Yaroshenko wrote:[WIP] [WIP] [WIP]As we discussed in relation to Seb's project, I think this is a problematic conceptualization of the best way to structure functionality related to randomness. An arguably better way (as outlined in the C++11 standard) is to think in terms of: * random _generators_, i.e. sources of uniformly distributed random bits: - random _engines_ (seedable, pseudo-random algorithms) - random _devices_ (non-deterministic sources of uniformly distributed bits) * random _distributions_, which transform uniformly-distributed random bits into variates with some other type and distribution - note _this includes uniformly-distributed integers_! - also uniformly-distributed floats, enums, etc. - and also non-uniform distributions * random _algorithms_, i.e. algorithms in the sense of std.random, but where their popFront() includes random decision-making - randomCover, randomSample, etc. The point of the above breakdown is that it gives a nice and clear separation of concerns that allows for easily replaceable components. Separating out stuff just by the ultimate result you want (integers vs. floats, uniform vs. non-uniform, etc.) isn't helpful in that way.
Nov 22 2016
On 11/22/16 7:36 PM, Joseph Rushton Wakeling wrote:* random _generators_, i.e. sources of uniformly distributed random bits: - random _engines_ (seedable, pseudo-random algorithms) - random _devices_ (non-deterministic sources of uniformly distributed bits) * random _distributions_, which transform uniformly-distributed random bits into variates with some other type and distribution - note _this includes uniformly-distributed integers_! - also uniformly-distributed floats, enums, etc. - and also non-uniform distributions * random _algorithms_, i.e. algorithms in the sense of std.random, but where their popFront() includes random decision-making - randomCover, randomSample, etc.Yah, I think that would be nice. -- Andrei
Nov 22 2016
On Tuesday, 22 November 2016 at 06:31:45 UTC, Ilya Yaroshenko wrote:- 64-bit Mt19937 is default for 64-bit targetsThis means that seemingly identical code will produce different results depending on whether it's compiled for 64-bit or 32-bit. Is that really worth it, when anyone who cares about the difference between 32-bit vs. 64-bit random words is quite capable of specifying the RNG they want to use and not just relying on the default? Having a different default RNG would make sense for targets where there are serious performance issues at stake (e.g. minimal memory available for RNG state) but just for the sake of 32- vs. 64-bit Mersenne Twister seems an unnecessary complication. These days it's debatable whether Mersenne Twister of _any_ word size is the optimal choice for a default RNG, so if the default is going to be changed, it might as well be to something significantly better (rather than worrying about numbers of bits).
Nov 22 2016
On 11/22/16 7:44 PM, Joseph Rushton Wakeling wrote:These days it's debatable whether Mersenne Twister of _any_ word size is the optimal choice for a default RNGInteresting. Could you please add a couple of links about that? -- Andrei
Nov 22 2016
On Wednesday, 23 November 2016 at 01:28:11 UTC, Andrei Alexandrescu wrote:On 11/22/16 7:44 PM, Joseph Rushton Wakeling wrote:http://www.pcg-random.org/other-rngs.html ;-)These days it's debatable whether Mersenne Twister of _any_ word size is the optimal choice for a default RNGInteresting. Could you please add a couple of links about that? -- Andrei
Nov 22 2016
On Wednesday, 23 November 2016 at 01:28:11 UTC, Andrei Alexandrescu wrote:Interesting. Could you please add a couple of links about that? -- Andreihttp://xoroshiro.di.unimi.it/
Nov 23 2016
On Wednesday, 23 November 2016 at 17:31:58 UTC, Kagamin wrote:On Wednesday, 23 November 2016 at 01:28:11 UTC, Andrei Alexandrescu wrote:Very short public domain implementation: http://xoroshiro.di.unimi.it/xoroshiro128plus.cInteresting. Could you please add a couple of links about that? -- Andreihttp://xoroshiro.di.unimi.it/
Nov 24 2016
On Thursday, 24 November 2016 at 08:36:41 UTC, Andrea Fontana wrote:On Wednesday, 23 November 2016 at 17:31:58 UTC, Kagamin wrote:Implementation in D, written during DConf 2016 ;-) https://github.com/WebDrake/dxorshiftOn Wednesday, 23 November 2016 at 01:28:11 UTC, Andrei Alexandrescu wrote:Very short public domain implementation: http://xoroshiro.di.unimi.it/xoroshiro128plus.cInteresting. Could you please add a couple of links about that? -- Andreihttp://xoroshiro.di.unimi.it/
Nov 24 2016
On Wednesday, 23 November 2016 at 00:44:26 UTC, Joseph Rushton Wakeling wrote:On Tuesday, 22 November 2016 at 06:31:45 UTC, Ilya Yaroshenko wrote:Mir Random is going to be a library with saturated uniform FP RNGs and almost saturated exponential FP RNGs. Comparing with all other libraries (any language) the basic uniform FP numbers will be generated in interval (-1, +1) and contains _all_ possible values including all subnormal numbers. 64 bit generators are 2 times faster for this task if you need to generate a 64 bit floating point number. Explanation of technique will be in my post/article. --Ilya- 64-bit Mt19937 is default for 64-bit targetsThis means that seemingly identical code will produce different results depending on whether it's compiled for 64-bit or 32-bit. Is that really worth it, when anyone who cares about the difference between 32-bit vs. 64-bit random words is quite capable of specifying the RNG they want to use and not just relying on the default? Having a different default RNG would make sense for targets where there are serious performance issues at stake (e.g. minimal memory available for RNG state) but just for the sake of 32- vs. 64-bit Mersenne Twister seems an unnecessary complication. These days it's debatable whether Mersenne Twister of _any_ word size is the optimal choice for a default RNG, so if the default is going to be changed, it might as well be to something significantly better (rather than worrying about numbers of bits).
Nov 22 2016
On Wednesday, 23 November 2016 at 05:26:12 UTC, Ilya Yaroshenko wrote:Mir Random is going to be a library with saturated uniform FP RNGs and almost saturated exponential FP RNGs. Comparing with all other libraries (any language) the basic uniform FP numbers will be generated in interval (-1, +1) and contains _all_ possible values including all subnormal numbers. 64 bit generators are 2 times faster for this task if you need to generate a 64 bit floating point number. Explanation of technique will be in my post/article. --IlyaAll of which is fine in its own terms, but why prioritize the speed of the default behaviour over its reliability and reproducibility? Anyone who cares about that combination of speed and statistical quality will have enough information in the documentation to know what to do. By contrast producing different results for identical user code depending on whether you're making a 32- or 64-bit build is an unpleasant complication it could be better to avoid. In any case, given what you say above, shouldn't the choice of 32- vs. 64-bit RNG depend on whether one is using a distribution that generates 32- vs. 64-bit floating-point variates, rather than whether one is building for a 32- or 64-bit target? In which case it needs to be a user choice, since it's only the user who knows what distribution they're using.
Nov 23 2016
On Wednesday, 23 November 2016 at 10:33:21 UTC, Joseph Rushton Wakeling wrote:On Wednesday, 23 November 2016 at 05:26:12 UTC, Ilya Yaroshenko wrote:We have a Random alias. I think it is OK if it generates different numbers for different arch and library versions. If a user want to exact the same behaviour he can use explicit names like Mt19937_32 and Mt19937_64. Both are defined for all architectures. 64-bit has not significant speed degradation on 64-bit machines for 32-bit random number generation. Maybe only few %. So it is better for 64-bit machines. It is only question of `Random` alias, which can be changed in the future anyway. Both Mt19937_32 and Mt19937_64 are defined.[...]All of which is fine in its own terms, but why prioritize the speed of the default behaviour over its reliability and reproducibility? Anyone who cares about that combination of speed and statistical quality will have enough information in the documentation to know what to do. By contrast producing different results for identical user code depending on whether you're making a 32- or 64-bit build is an unpleasant complication it could be better to avoid. In any case, given what you say above, shouldn't the choice of 32- vs. 64-bit RNG depend on whether one is using a distribution that generates 32- vs. 64-bit floating-point variates, rather than whether one is building for a 32- or 64-bit target? In which case it needs to be a user choice, since it's only the user who knows what distribution they're using.
Nov 23 2016
On Wednesday, 23 November 2016 at 11:03:33 UTC, Ilya Yaroshenko wrote:It is only question of `Random` alias, which can be changed in the future anyway. Both Mt19937_32 and Mt19937_64 are defined.I think we're at an impasse in terms of priorities, because that's exactly the reason that I think you should leave the Random alias pointing to the same generator, and let the people with speed/quality concerns select the alternative generator ;-)
Nov 23 2016
On Tuesday, 22 November 2016 at 06:31:45 UTC, Ilya Yaroshenko wrote:- 64-bit Mt19937 initialization is fixed - 64-bit Mt19937 is default for 64-bit targetsI wonder why Mersenne Twister is *always* choosen over other algorithms. My vote goes for CMWC, of course. Andrea
Nov 23 2016
On Wednesday, 23 November 2016 at 13:01:22 UTC, Andrea Fontana wrote:I wonder why Mersenne Twister is *always* choosen over other algorithms.The weight of history, I suspect. Mersenne Twister was the major new high-quality RNG back when people started getting really concerned about having good defaults, and when the Diehard Tests were the state of the art in tests of randomness. IIRC there's also a benefit in terms of dimensionality, which some more recent generators don't address, which can make it a safer "all-round default". Agree that there are probably better choices for today, though.
Nov 23 2016
On Wednesday, 23 November 2016 at 13:01:22 UTC, Andrea Fontana wrote:On Tuesday, 22 November 2016 at 06:31:45 UTC, Ilya Yaroshenko wrote:A PR for CMWC is highly welcome!- 64-bit Mt19937 initialization is fixed - 64-bit Mt19937 is default for 64-bit targetsI wonder why Mersenne Twister is *always* choosen over other algorithms. My vote goes for CMWC, of course. Andrea
Nov 23 2016