digitalmars.D - [OT] Algorithm question
- H. S. Teoh via Digitalmars-d (70/70) Apr 30 2017 I'm been thinking about the following problem, and thought I'd pick the
- Jack Stouffer (5/8) Apr 30 2017 as the first element found in A will be chosen, and then each
- H. S. Teoh via Digitalmars-d (29/40) Apr 30 2017 Actually, this is exactly why it *does* satisfy the requirement.
- Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= (3/6) May 01 2017 O(N) given P(x)===false for all x...
- John Colvin (27/88) May 01 2017 You can recover O(n) calls to P by caching the results. You can
- Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= (11/19) May 01 2017 Yes, this is the standard text-book way of doing it, still O(N)
- Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= (31/33) May 01 2017 To avoid confusion: I didn't mean a literal "heap", but some kind
- H. S. Teoh via Digitalmars-d (36/56) May 01 2017 Sorry, I forgot to specify that P(x) can be different each time.
- Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= (14/16) May 01 2017 Does it exist? Yes, because you can build permutation tables for
- Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= (16/30) May 01 2017 Keep also in mind that you can split the original array in two,
- Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= (13/28) May 01 2017 But if you don't use a hardware random generator and are happy
- Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= (7/14) May 01 2017 Well, actually, just select the single cyclic generator that is
- Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= (6/8) May 01 2017 I take that back:
- Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= (4/4) May 01 2017 Ah, found a sensible search term: «pseudorandom permutation»
- Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= (9/13) May 01 2017 A readable paper for most people:
- Moritz Maxeiner (33/48) May 01 2017 Works only with a random access range and I haven't done the full
- Timon Gehr (6/49) May 01 2017 This deterministically chooses 0. (uniform is right-exclusive.) I assume...
- Moritz Maxeiner (5/15) May 01 2017 Dang. Serves me right for not doing the full analysis. Thanks for
- Mike B Johnson (12/15) May 01 2017 Since most real world problems would require selecting elements
- H. S. Teoh via Digitalmars-d (8/25) May 01 2017 [...]
- Andrea Fontana (6/24) May 01 2017 If you find an element that doesn't satisfy P(x) move it on top
- Andrea Fontana (2/5) May 01 2017 Whoops i mean uniform(1, r.length) of course :)
- Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= (16/21) May 01 2017 I guess you meant to the end, but if mutating the array is ok,
- H. S. Teoh via Digitalmars-d (8/31) May 01 2017 [...]
- Ivan Kazmenko (50/84) May 01 2017 I'd like to note here that, if you make use of the same P(x) many
- H. S. Teoh via Digitalmars-d (23/82) May 01 2017 Unfortunately, P(x) could be different each time (sorry, should have
- MysticZach (43/55) May 01 2017 Here's how I would do it:
- MysticZach (7/24) May 01 2017 The line above has a bug. Replace it with:
- MysticZach (6/16) May 01 2017 Now I'm questioning this, because if the positive cases are
- Ivan Kazmenko (8/24) May 02 2017 A pity; it sounded nice!
- Ivan Kazmenko (9/12) May 02 2017 I tried to implement a similar approach, but instead of a queue
- MysticZach (24/37) May 02 2017 Well, I thought of another idea that may not be technically
- MysticZach (39/52) May 02 2017 Taking this a step further, it occurred to me that you could use
- Timon Gehr (9/57) May 02 2017 Counterexample: [1,1,1,0,0]
- MysticZach (10/31) May 02 2017 skipLength is determined modulo hop, thus it can't be more than
- MysticZach (5/23) May 03 2017 Actually, I did botch the implementation. The hopping interval
- Timon Gehr (9/26) May 04 2017 It's not though. For example, [1,1,1,0,...,0] (length 29), you get 0 and...
- MysticZach (7/30) May 04 2017 Or how close it can get, depending on the range of intervals
- MysticZach (2/20) May 04 2017 Within a small range of error, I mean.
- H. S. Teoh via Digitalmars-d (13/16) May 12 2017 Of course there is. The random permutation approach works. In this
- Ola Fosheim Grostad (2/5) May 12 2017 I've told you all you need to know...
I'm been thinking about the following problem, and thought I'd pick the brains of the bright people around these parts... Given a set A of n elements (let's say it's a random-access range of size n, where n is relatively large), and a predicate P(x) that specifies some subset of A of elements that we're interested in, what's the best algorithm (in terms of big-O time complexity) for selecting a random element x satisfying P(x), such that elements that satisfy P(x) have equal probability of being chosen? (Elements that do not satisfy P(x) are disregarded.) Which elements of A satisfy P(x) is not known ahead of time, nor is the relative proportion of elements that satisfy P(x) or not. Note that A is considered to be immutable (swapping elements or otherwise changing A is not allowed). So far, I have come up with the following algorithms: 1) "Random shooting": auto r = ... /* elements of A */ for (;;) { auto i = uniform(0, r.length); if (P(r[i])) return r[i]; } Advantages: If a large percentage of elements in A satisfy P(x), then the loop will terminate within a small number of iterations; if the majority of elements satisfy P(x), this will terminate well below n iterations. Disadvantages: If only a small percentage of elements satisfy P(x), then this loop could take arbitrarily long to terminate, and it will not terminate at all if no elements satisfy P(x). 2) "Turtle walk": auto r = ... /* elements of A */ int nSatisfy = 0; ElementType!A currentChoice; bool found = false; foreach (e; r) { if (P(e)) { if (uniform(0, nSatisfy) == 0) { currentChoice = e; found = true; } nSatisfy++; } } if (found) return currentChoice; else ... /* no elements satisfy P(x) */ Advantages: If there is any element at all in A that satisfies P(x), it will be found. The loop always terminates after n iterations. Disadvantages: Even if 99% of elements in A satisfies P(x), we still have to traverse the entire data set before we terminate. (We cannot terminate before that, otherwise the probability of elements satisfying P(x) being selected will not be even.) 3) Permutation walk: auto r = ... /* elements of A */ foreach (i; iota(0 .. r.length).randomPermutation) { if (P(r[i])) return r[i]; } /* no elements satisfy P(x) */ Advantages: if an element that satisfies P(x) is found early, the loop will terminate before n iterations. This seems like the best of both worlds of (1) and (2), except: Disadvantages: AFAIK, generating a random permutation of indices from 0 .. n requires at least O(n) time, so any advantage we may have had seems to be negated. Is there an algorithm for *incrementally* generating a random permutation of indices? If there is, we could use that in (3) and thus achieve the best of both worlds between early termination if an element satisfying P(x) is found, and guaranteeing termination after n iterations if no elements satisfying P(x) exists. T -- The early bird gets the worm. Moral: ewww...
Apr 30 2017
On Monday, 1 May 2017 at 04:15:35 UTC, H. S. Teoh wrote:2) "Turtle walk":I'd just like to point out that this algorithm doesn't satisfysuch that elements that satisfy P(x) have equal probability of being chosenas the first element found in A will be chosen, and then each subsequent element has a decreasing probability of replacing the first element of P = 1/nSatisfy.
Apr 30 2017
On Mon, May 01, 2017 at 05:03:17AM +0000, Jack Stouffer via Digitalmars-d wrote:On Monday, 1 May 2017 at 04:15:35 UTC, H. S. Teoh wrote:Actually, this is exactly why it *does* satisfy the requirement. If there is exactly 1 element that satisfies P(x), then the probability of it being picked is 1. If there are 2 elements, the first element is picked with probability 1 initially, but the 2nd element has a 1/2 chance of replacing it, meaning the overall probability distribution is 1/2 and 1/2. If there are 3 elements, then by the time the 2nd element is processed, the first 2 elements are equally likely to be currently chosen (with probability 1/2 for each); when the 3rd element is found, it has 1/3 chance of replacing the previous choice, meaning there is a 2/3 chance the previous choice prevails. In that case, the 2/3 chance is equally distributed between the first two elements (they were picked with 1/2 and 1/2 probability), so the effective probability is 1/3 each after the 3rd element is processed. In general, by the time you process the n'th element, the previous elements would have had been picked with 1/(n-1) chance each, and the n'th element would be picked with 1/n chance, meaning there is a (n-1)/n chance the previous choice prevails. That (n-1)/n chance is equally distributed among the previous (n-1) elements, so effectively they are picked with 1/n chance each. The key here is that the probability that the n'th element is *not* picked is exactly (n-1)/n, which, by inductive hypothesis, must be evenly distributed among the previous (n-1) candidates, thereby making their *eventual* probability of remaining as the chosen element exactly 1/n. T -- Two wrongs don't make a right; but three rights do make a left...2) "Turtle walk":I'd just like to point out that this algorithm doesn't satisfysuch that elements that satisfy P(x) have equal probability of being chosenas the first element found in A will be chosen, and then each subsequent element has a decreasing probability of replacing the first element of P = 1/nSatisfy.
Apr 30 2017
On Monday, 1 May 2017 at 04:15:35 UTC, H. S. Teoh wrote:Which elements of A satisfy P(x) is not known ahead of time, nor is the relative proportion of elements that satisfy P(x) or not.O(N) given P(x)===false for all x... What you want is probably average case analysis, not worst case?
May 01 2017
On Monday, 1 May 2017 at 04:15:35 UTC, H. S. Teoh wrote:I'm been thinking about the following problem, and thought I'd pick the brains of the bright people around these parts... Given a set A of n elements (let's say it's a random-access range of size n, where n is relatively large), and a predicate P(x) that specifies some subset of A of elements that we're interested in, what's the best algorithm (in terms of big-O time complexity) for selecting a random element x satisfying P(x), such that elements that satisfy P(x) have equal probability of being chosen? (Elements that do not satisfy P(x) are disregarded.) Which elements of A satisfy P(x) is not known ahead of time, nor is the relative proportion of elements that satisfy P(x) or not. Note that A is considered to be immutable (swapping elements or otherwise changing A is not allowed). So far, I have come up with the following algorithms: 1) "Random shooting": auto r = ... /* elements of A */ for (;;) { auto i = uniform(0, r.length); if (P(r[i])) return r[i]; } Advantages: If a large percentage of elements in A satisfy P(x), then the loop will terminate within a small number of iterations; if the majority of elements satisfy P(x), this will terminate well below n iterations. Disadvantages: If only a small percentage of elements satisfy P(x), then this loop could take arbitrarily long to terminate, and it will not terminate at all if no elements satisfy P(x).You can recover O(n) calls to P by caching the results. You can recover O(n) for both P and rng by progressively removing elements from r or any other method that results in an easily samplable set of all elements of r not yet visited (and therefore possible candidates). I like this one (warning, just thought of, untested): auto r = ... /* elements of A */ auto nRemaining = r.length; while (nRemaining) { auto i = uniform(0, nRemaining); if (P(r[i])) return r[i]; else swap(r[i], r[--nRemaining]); }3) Permutation walk: auto r = ... /* elements of A */ foreach (i; iota(0 .. r.length).randomPermutation) { if (P(r[i])) return r[i]; } /* no elements satisfy P(x) */ Advantages: if an element that satisfies P(x) is found early, the loop will terminate before n iterations. This seems like the best of both worlds of (1) and (2), except: Disadvantages: AFAIK, generating a random permutation of indices from 0 .. n requires at least O(n) time, so any advantage we may have had seems to be negated. Is there an algorithm for *incrementally* generating a random permutation of indices? If there is, we could use that in (3) and thus achieve the best of both worlds between early termination if an element satisfying P(x) is found, and guaranteeing termination after n iterations if no elements satisfying P(x) exists. TYes. As a matter of fact it would be the logical extension of my algorithm above for when you can't or don't want to change r (again, only just thought of this, untested...): auto r = ... /* elements of A */ auto indices = iota(r.length).array; auto nRemaining = r.length; while (nRemaining) { auto i = uniform(0, nRemaining); if (P(r[indices[i]])) return r[indices[i]]; else swap(indices[i], indices[--nRemaining]); } You could also do the same thing but with an array of pointers to elements of r instead of indices.
May 01 2017
On Monday, 1 May 2017 at 08:42:05 UTC, John Colvin wrote:I like this one (warning, just thought of, untested): auto r = ... /* elements of A */ auto nRemaining = r.length; while (nRemaining) { auto i = uniform(0, nRemaining); if (P(r[i])) return r[i]; else swap(r[i], r[--nRemaining]); }Yes, this is the standard text-book way of doing it, still O(N) of course. The other standard-text-book way is to generate an array of indexes and mutate that instead, still O(N). If you cache in a heap you get O(N log N). Anyway, this kind of worst case analysis doesn't really help. And neither does average case, which will be O(1) assuming 50% match P. So you need is to specify what kind of average case analysis you want. For instance expressed as f(N,S) where N is the number of elements and S is the number of elements that satisfies P.
May 01 2017
On Monday, 1 May 2017 at 09:01:33 UTC, Ola Fosheim Grøstad wrote:array of indexes and mutate that instead, still O(N). If you cache in a heap you get O(N log N).To avoid confusion: I didn't mean a literal "heap", but some kind of search structure that tracks and draws from unused indice ranges in a way that does not require O(N) initialization and can be implemented as a single array on the stack. I don't remember what it is called, but you draw on each node based on the number of children (kind of like the ones used in markov-chain type situations). I assume this can be maintained in O(log N) per iteration with O(1) initialization. E.g. 1. push_indices(0,N-1) //data (0,N-1) 2. i = draw_and_remove_random_index() // data could now be (0,i-1), (i+1,N-1) 3. if( i == -1 ) return 4. dostuff(i) 5. goto 2 Even though it is O(N log N) it could outperform other solutions if you only want to draw a small number of elements. And a small array with a linear search (tracking used indices) would outperform that if you only want to draw say 8 elements. If you want to draw close to N elements then you would be better off just randomizing a full array (with elements from 0 to N-1). Or you could do all of these, start with linear search, then convert to a tree like search, then convert to full enumeration. In general I don't think asymptotical analysis will do much good. I think you need to look on actual costs for typical N, P and different distributions of satisfied P, e.g. cost of loop iteration: 10 average cost of P(x)=>false: 1 average cost of P(x)=>true: 1000 probability for P(a[i]) for each i
May 01 2017
On Mon, May 01, 2017 at 08:42:05AM +0000, John Colvin via Digitalmars-d wrote: [...]You can recover O(n) calls to P by caching the results.Sorry, I forgot to specify that P(x) can be different each time. Also, the input A may also be different each time, albeit remain the same length (though the expectation is that it will be mostly the same -- not that it matters, though, I don't think any of the proposed solutions here count on that).You can recover O(n) for both P and rng by progressively removing elements from r or any other method that results in an easily samplable set of all elements of r not yet visited (and therefore possible candidates).[...] Yes, I realize that if r was mutable, there are existing simple solutions. The problem is that it's not mutable, or rather, should not be mutated by this selection algorithm. [...]Yes. As a matter of fact it would be the logical extension of my algorithm above for when you can't or don't want to change r (again, only just thought of this, untested...): auto r = ... /* elements of A */ auto indices = iota(r.length).array; auto nRemaining = r.length; while (nRemaining) { auto i = uniform(0, nRemaining); if (P(r[indices[i]])) return r[indices[i]]; else swap(indices[i], indices[--nRemaining]); } You could also do the same thing but with an array of pointers to elements of r instead of indices.The trouble with this is that you have to allocate an array of indices, and r.length is rather large, so creating this array is O(n) each time. But I think Ivan Kazmenko has an excellent idea to cache this index array. Since, by symmetry, it doesn't matter if the starting indices array was the identity permutation, we could just initialize this array once, and the next time just skip to the loop directly (reusing the previous (possibly partially) permuted indices as the starting point). It does require O(n) memory, which is unfortunate, but at least we only pay for that once. I was hoping for something more along the lines of a random permutation algorithm with sublinear (if not O(1)) space complexity that can return elements of the permutation consecutively on-demand, like a lazy range. Something along the lines of: auto seed = ... /* get random number */ auto lazyRange = generatePermutation(seed, n); where generatePermutation is some incremental algorithm that produces a unique permutation of 0 .. n per seed value. Ideally, this algorithm would only do as much work as is necessary to produce the first k elements of the permutation, so that if the main loop terminates early (we found an element satisfying P(x)) we don't waste time generating the rest of the permutation. T -- "No, John. I want formats that are actually useful, rather than over-featured megaliths that address all questions by piling on ridiculous internal links in forms which are hideously over-complex." -- Simon St. Laurent on xml-dev
May 01 2017
On Monday, 1 May 2017 at 04:15:35 UTC, H. S. Teoh wrote:Is there an algorithm for *incrementally* generating a random permutation of indices?Does it exist? Yes, because you can build permutation tables for it, but you'll have to find the closed form for it that is efficient... Can you do that without selecting a specific N? It is easy for N=2 and N=3 at least. E.g. For array length 2 you get the 2 permutations: 0 1, 1 0 Then you just select one of them at random (you know the number of permutations) So if it exists you'll should probably do a search for permutations. "How do I efficiently generate permutation number N" Of course for smaller N you could generate a big array of bytes and compress it by having arrays of slices into it (as many permutations will share index sequences).
May 01 2017
On Monday, 1 May 2017 at 08:44:25 UTC, Ola Fosheim Grøstad wrote:Does it exist? Yes, because you can build permutation tables for it, but you'll have to find the closed form for it that is efficient... Can you do that without selecting a specific N? It is easy for N=2 and N=3 at least. E.g. For array length 2 you get the 2 permutations: 0 1, 1 0 Then you just select one of them at random (you know the number of permutations) So if it exists you'll should probably do a search for permutations. "How do I efficiently generate permutation number N" Of course for smaller N you could generate a big array of bytes and compress it by having arrays of slices into it (as many permutations will share index sequences).Keep also in mind that you can split the original array in two, so it might be sufficient to have the permutations for various 2^M sizes if those are easier to generate (my guess, maybe through some xor magic?) E.g. if N = 21 then it can be split into 16 + (4 + 1) So you can draw like this: 1. randomly select group_16 over group_5 based with 16/21 probability) 2. if group_5 selected: ramdomly select group_4 over group 1 on 4/5 probability etc. That way you only need log(N) permutation sequences to keep track of. Which for all practical purposes are going to stay pretty low (<20?) Right?
May 01 2017
On Monday, 1 May 2017 at 10:51:46 UTC, Ola Fosheim Grøstad wrote:Keep also in mind that you can split the original array in two, so it might be sufficient to have the permutations for various 2^M sizes if those are easier to generate (my guess, maybe through some xor magic?) E.g. if N = 21 then it can be split into 16 + (4 + 1) So you can draw like this: 1. randomly select group_16 over group_5 based with 16/21 probability) 2. if group_5 selected: ramdomly select group_4 over group 1 on 4/5 probability etc. That way you only need log(N) permutation sequences to keep track of. Which for all practical purposes are going to stay pretty low (<20?) Right?But if you don't use a hardware random generator and are happy with more sloppy pseudo-randomness then you would be better of ditching the whole concept of permutations and replace it with some cylic random generators intead using the same technique I just outlined. E.g. find a set of cyclic random generators and break down N into a sum of these cycle-sizes, then just keep track of the seed for each. If they are 2^N then you could use xor to get more randomness between runs. Also in the algorithms above you need to change the probabilities each time you take away one index from a group (you don't want to draw from an empty group).
May 01 2017
On Monday, 1 May 2017 at 11:08:56 UTC, Ola Fosheim Grøstad wrote:E.g. find a set of cyclic random generators and break down N into a sum of these cycle-sizes, then just keep track of the seed for each. If they are 2^N then you could use xor to get more randomness between runs. Also in the algorithms above you need to change the probabilities each time you take away one index from a group (you don't want to draw from an empty group).Well, actually, just select the single cyclic generator that is larger or equal to N, then just redraw if it returns a value >= N. Duh! Sloppy thinking. I would think you should be able to find some prime sized ones that will get the next index with an insignificant number of redraws. But permutations is the way to go if you want scientific quality.
May 01 2017
On Monday, 1 May 2017 at 11:15:28 UTC, Ola Fosheim Grøstad wrote:But permutations is the way to go if you want scientific quality.I take that back: «Polynomial pseudo-random number generator via cyclic phase» http://www.sciencedirect.com/science/article/pii/S0378475409001463 Might be what you are looking for and if not it probably references other papers that fits the bill.
May 01 2017
Ah, found a sensible search term: «pseudorandom permutation» http://cse.iitkgp.ac.in/~debdeep/courses_iitkgp/FCrypto/slides/LubyRackoff.pdf If you go down this path, please share links. Useful stuff actually.
May 01 2017
On Monday, 1 May 2017 at 11:57:32 UTC, Ola Fosheim Grøstad wrote:Ah, found a sensible search term: «pseudorandom permutation» http://cse.iitkgp.ac.in/~debdeep/courses_iitkgp/FCrypto/slides/LubyRackoff.pdf If you go down this path, please share links. Useful stuff actually.A readable paper for most people: http://www.pcg-random.org/pdf/toms-oneill-pcg-family-v1.02.pdf «some users would prefer a much stronger property analogous to uniformity: that over the full period of the generator, every possible k-tuple will occur, and it will occur the same number of times.» PCG claims to have arbitrary cycle sizes. I haven't looked at it, but it is available as C/C++.
May 01 2017
On Monday, 1 May 2017 at 04:15:35 UTC, H. S. Teoh wrote:Given a set A of n elements (let's say it's a random-access range of size n, where n is relatively large), and a predicate P(x) that specifies some subset of A of elements that we're interested in, what's the best algorithm (in terms of big-O time complexity) for selecting a random element x satisfying P(x), such that elements that satisfy P(x) have equal probability of being chosen? (Elements that do not satisfy P(x) are disregarded.) Which elements of A satisfy P(x) is not known ahead of time, nor is the relative proportion of elements that satisfy P(x) or not.Works only with a random access range and I haven't done the full analysis (and I'm a bit rusty on probability), but my first thought was random divide and conquer: ElementType!A* select(A,P)(A r) { // Recursion anchor if (r.length == 1) { if (P(r[0])) return r[0]; else return null; // Recurse randomly with p=0.5 into either the left, or right half of r } else { ElementType!A* e; ElementType!A[][2] half; half[0] = r[0..floor(r.length/2)]; half[1] = r[ceil(r.length/2)..$]; ubyte coinFlip = uniform(0,1) > 0; // Recurse in one half and terminate if e is found there e = select(half[coinFlip]); if (e) return e; // Otherwise, recurse into other half return select(half[1 - coinFlip]); } } As stated above, I haven't done the full analysis, but intuitively speaking (which can be wrong, of course), the p=0.5 recursion ought satisfy the condition of elements satisfying P(x) being chosen uniformly; also intuitively, I'd expect the expected runtime for a uniform distribution of elements satisfying P(x) to be around O(log N). Worst-case would be that it has to inspect every element in r once => O(N)
May 01 2017
On 01.05.2017 11:51, Moritz Maxeiner wrote:On Monday, 1 May 2017 at 04:15:35 UTC, H. S. Teoh wrote:This deterministically chooses 0. (uniform is right-exclusive.) I assume you mean uniform(0,2).Given a set A of n elements (let's say it's a random-access range of size n, where n is relatively large), and a predicate P(x) that specifies some subset of A of elements that we're interested in, what's the best algorithm (in terms of big-O time complexity) for selecting a random element x satisfying P(x), such that elements that satisfy P(x) have equal probability of being chosen? (Elements that do not satisfy P(x) are disregarded.) Which elements of A satisfy P(x) is not known ahead of time, nor is the relative proportion of elements that satisfy P(x) or not.Works only with a random access range and I haven't done the full analysis (and I'm a bit rusty on probability), but my first thought was random divide and conquer: ElementType!A* select(A,P)(A r) { // Recursion anchor if (r.length == 1) { if (P(r[0])) return r[0]; else return null; // Recurse randomly with p=0.5 into either the left, or right half of r } else { ElementType!A* e; ElementType!A[][2] half; half[0] = r[0..floor(r.length/2)]; half[1] = r[ceil(r.length/2)..$]; ubyte coinFlip = uniform(0,1) > 0;// Recurse in one half and terminate if e is found there e = select(half[coinFlip]); if (e) return e; // Otherwise, recurse into other half return select(half[1 - coinFlip]); } } As stated above, I haven't done the full analysis, but intuitively speaking (which can be wrong, of course), the p=0.5 recursion ought satisfy the condition of elements satisfying P(x) being chosen uniformly;Counterexample: [1,0,1,1]. The first element is chosen with probability 1/2, which is not 1/3.also intuitively, I'd expect the expected runtime for a uniform distribution of elements satisfying P(x) to be around O(log N).I don't understand this input model.Worst-case would be that it has to inspect every element in r once => O(N)
May 01 2017
On Monday, 1 May 2017 at 09:58:39 UTC, Timon Gehr wrote:On 01.05.2017 11:51, Moritz Maxeiner wrote:Yes.[...]This deterministically chooses 0. (uniform is right-exclusive.) I assume you mean uniform(0,2).Dang. Serves me right for not doing the full analysis. Thanks for the counter example and sorry for wasting your time.[...]Counterexample: [1,0,1,1]. The first element is chosen with probability 1/2, which is not 1/3.Since the idea was nonsense, it doesn't matter, anyway.[...]I don't understand this input model.
May 01 2017
On Monday, 1 May 2017 at 04:15:35 UTC, H. S. Teoh wrote:I'm been thinking about the following problem, and thought I'd pick the brains of the bright people around these parts... [...]Since most real world problems would require selecting elements more than once it may be far more efficient to sort by P(x)(filter) then simply select a random element. e.g., a = A.filter(x->P(x)) // Creates a new set a where only the elements of A that satisfy P(x) are added ... e = a.random; this is O(1) if you only have to filter once(you can create a container that always "sorts" on P(x) so to speak.. like a sorted dictionary).
May 01 2017
On Mon, May 01, 2017 at 11:32:19AM +0000, Mike B Johnson via Digitalmars-d wrote: [...]Since most real world problems would require selecting elements more than once it may be far more efficient to sort by P(x)(filter) then simply select a random element. e.g., a = A.filter(x->P(x)) // Creates a new set a where only the elements of A that satisfy P(x) are added ... e = a.random; this is O(1) if you only have to filter once(you can create a container that always "sorts" on P(x) so to speak.. like a sorted dictionary).[...] Sorry, I forgot to specify that P(x) may change each time, as may the input A. So caching would be of little help in this case. T -- This is a tpyo.
May 01 2017
On Monday, 1 May 2017 at 04:15:35 UTC, H. S. Teoh wrote:1) "Random shooting": auto r = ... /* elements of A */ for (;;) { auto i = uniform(0, r.length); if (P(r[i])) return r[i]; } Advantages: If a large percentage of elements in A satisfy P(x), then the loop will terminate within a small number of iterations; if the majority of elements satisfy P(x), this will terminate well below n iterations. Disadvantages: If only a small percentage of elements satisfy P(x), then this loop could take arbitrarily long to terminate, and it will not terminate at all if no elements satisfy P(x).If you find an element that doesn't satisfy P(x) move it on top of array (swap!) and then try again with uniform(1, r.length - 1) and so on. It terminates in this way. Andrea
May 01 2017
On Monday, 1 May 2017 at 12:31:48 UTC, Andrea Fontana wrote:If you find an element that doesn't satisfy P(x) move it on top of array (swap!) and then try again with uniform(1, r.length - 1) and so on.Whoops i mean uniform(1, r.length) of course :)
May 01 2017
On Monday, 1 May 2017 at 12:38:32 UTC, Andrea Fontana wrote:On Monday, 1 May 2017 at 12:31:48 UTC, Andrea Fontana wrote:I guess you meant to the end, but if mutating the array is ok, then you don't have to swap. Just move in the last element and chop 1 off the length. Another cache-friendly mutation solution (assuming PRNG is cheap) is to have two phases: Phase 1: linear walkover the array and select elements with P(a[i])==true with probability 1/N the overwrite the ones with P(a[i]) = false. when done truncate the array. Phase 2: regular random selection from the array were all elements satisfy P. The OP underspecified the requirements... There is a gazillion variations... :-PIf you find an element that doesn't satisfy P(x) move it on top of array (swap!) and then try again with uniform(1, r.length - 1) and so on.Whoops i mean uniform(1, r.length) of course :)
May 01 2017
On Mon, May 01, 2017 at 12:31:48PM +0000, Andrea Fontana via Digitalmars-d wrote:On Monday, 1 May 2017 at 04:15:35 UTC, H. S. Teoh wrote:[...] The problem is that swapping elements is not allowed for this particular problem. If swapping was allowed this would be a much easier problem to solve. :-) T -- Chance favours the prepared mind. -- Louis Pasteur1) "Random shooting": auto r = ... /* elements of A */ for (;;) { auto i = uniform(0, r.length); if (P(r[i])) return r[i]; } Advantages: If a large percentage of elements in A satisfy P(x), then the loop will terminate within a small number of iterations; if the majority of elements satisfy P(x), this will terminate well below n iterations. Disadvantages: If only a small percentage of elements satisfy P(x), then this loop could take arbitrarily long to terminate, and it will not terminate at all if no elements satisfy P(x).If you find an element that doesn't satisfy P(x) move it on top of array (swap!) and then try again with uniform(1, r.length - 1) and so on. It terminates in this way.
May 01 2017
On Monday, 1 May 2017 at 04:15:35 UTC, H. S. Teoh wrote:Given a set A of n elements (let's say it's a random-access range of size n, where n is relatively large), and a predicate P(x) that specifies some subset of A of elements that we're interested in, what's the best algorithm (in terms of big-O time complexity) for selecting a random element x satisfying P(x), such that elements that satisfy P(x) have equal probability of being chosen? (Elements that do not satisfy P(x) are disregarded.)I'd like to note here that, if you make use of the same P(x) many times (instead of different predicates on each call), it makes sense to spend O(n) time and memory filtering by that predicate and storing the result, and then answer each query in O(1).3) Permutation walk: auto r = ... /* elements of A */ foreach (i; iota(0 .. r.length).randomPermutation) { if (P(r[i])) return r[i]; } /* no elements satisfy P(x) */ Advantages: if an element that satisfies P(x) is found early, the loop will terminate before n iterations. This seems like the best of both worlds of (1) and (2), except: Disadvantages: AFAIK, generating a random permutation of indices from 0 .. n requires at least O(n) time, so any advantage we may have had seems to be negated. Is there an algorithm for *incrementally* generating a random permutation of indices? If there is, we could use that in (3) and thus achieve the best of both worlds between early termination if an element satisfying P(x) is found, and guaranteeing termination after n iterations if no elements satisfying P(x) exists.Yes, there is. There are actually two variations of Fisher-Yates shuffle: (https://en.wikipedia.org/wiki/Fisher%E2%80%93Yates_shuffle) 1. auto p = n.iota.array; foreach (pos; 0..n) { auto otherPos = uniform (0, pos + 1); swap (p[pos], p[otherPos]); } When we look at this after k-th iteration, the first k elements are randomly and uniformly permuted, and the rest (n-k) are left untouched. 2. auto p = n.iota.array; foreach (pos; 0..n) { auto otherPos = uniform (pos, n); swap (p[pos], p[otherPos]); } When we look at this after k-th iteration, the first k elements are a random combination of all n elements, and this combination is randomly and uniformly permuted. So, the second variation is what we need: each new element is randomly and uniformly selected from all the elements left. Once we get the first element satisfying the predicate, we can just terminate the loop. If there are m out of n elements satisfying the predicate, the average number of steps is n/m. Now, the problem is that both of these allocate n "size_t"-s of memory to start with. And your problem does not allow to shuffle the elements of the original array in place, so we do need an external permutation for these algorithms. However, there are at least two ways to mitigate that: (I) We can allocate the permutation once using n time and memory, and then, on every call, just reuse it in its current state in n/m time. It does not matter if the permutation is not identity permutation: by symmetry argument, any starting permutation will do just fine. (II) We can store the permutation p in an associative array instead of a regular array, actually storing only the elements accessed at least once, and assuming other elements to satisfy the identity p[x] = x. So, if we finish in n/m steps on average, the time and extra memory used will be O(n/m) too. I can put together an example implementation if this best satisfies your requirements. Ivan Kazmenko.
May 01 2017
On Mon, May 01, 2017 at 02:38:09PM +0000, Ivan Kazmenko via Digitalmars-d wrote:On Monday, 1 May 2017 at 04:15:35 UTC, H. S. Teoh wrote:Unfortunately, P(x) could be different each time (sorry, should have stated this explicitly), and A may also change in between calls to this random selection algorithm. So filtering would not be a good approach. [...]Given a set A of n elements (let's say it's a random-access range of size n, where n is relatively large), and a predicate P(x) that specifies some subset of A of elements that we're interested in, what's the best algorithm (in terms of big-O time complexity) for selecting a random element x satisfying P(x), such that elements that satisfy P(x) have equal probability of being chosen? (Elements that do not satisfy P(x) are disregarded.)I'd like to note here that, if you make use of the same P(x) many times (instead of different predicates on each call), it makes sense to spend O(n) time and memory filtering by that predicate and storing the result, and then answer each query in O(1).There are actually two variations of Fisher-Yates shuffle: (https://en.wikipedia.org/wiki/Fisher%E2%80%93Yates_shuffle) 1. auto p = n.iota.array; foreach (pos; 0..n) { auto otherPos = uniform (0, pos + 1); swap (p[pos], p[otherPos]); } When we look at this after k-th iteration, the first k elements are randomly and uniformly permuted, and the rest (n-k) are left untouched. 2. auto p = n.iota.array; foreach (pos; 0..n) { auto otherPos = uniform (pos, n); swap (p[pos], p[otherPos]); } When we look at this after k-th iteration, the first k elements are a random combination of all n elements, and this combination is randomly and uniformly permuted. So, the second variation is what we need: each new element is randomly and uniformly selected from all the elements left. Once we get the first element satisfying the predicate, we can just terminate the loop. If there are m out of n elements satisfying the predicate, the average number of steps is n/m.But wouldn't creating the array p already be O(n)? Albeit, somewhat faster than n iterations of the main loop since copying iota ought to be somewhat cheaper than generating a random number and swapping two elements.Now, the problem is that both of these allocate n "size_t"-s of memory to start with. And your problem does not allow to shuffle the elements of the original array in place, so we do need an external permutation for these algorithms. However, there are at least two ways to mitigate that: (I) We can allocate the permutation once using n time and memory, and then, on every call, just reuse it in its current state in n/m time. It does not matter if the permutation is not identity permutation: by symmetry argument, any starting permutation will do just fine.I like this idea very much. This lets us only pay once for creating the index array, and reuse it almost "for free" thereafter. Still, the O(n) space requirement could potentially be onerous, since n is expected to be rather large.(II) We can store the permutation p in an associative array instead of a regular array, actually storing only the elements accessed at least once, and assuming other elements to satisfy the identity p[x] = x. So, if we finish in n/m steps on average, the time and extra memory used will be O(n/m) too. I can put together an example implementation if this best satisfies your requirements.[...] This is interesting, and relaxes the O(n) space requirement initially. After enough queries, though, I'd expect the AA to eventually grow to n elements (as there is a chance some queries will end up finding no elements that satisfy P(x) so it would generate the entire permutation). But I like your idea of reusing the index array. It's definitely a possibility! T -- Your inconsistency is the only consistent thing about you! -- KD
May 01 2017
On Monday, 1 May 2017 at 04:15:35 UTC, H. S. Teoh wrote:Given a set A of n elements (let's say it's a random-access range of size n, where n is relatively large), and a predicate P(x) that specifies some subset of A of elements that we're interested in, what's the best algorithm (in terms of big-O time complexity) for selecting a random element x satisfying P(x), such that elements that satisfy P(x) have equal probability of being chosen? (Elements that do not satisfy P(x) are disregarded.)Here's how I would do it: // returns a random index of array satisfying P(x), -1 if not found int randomlySatisfy(A[] array) { if (array.length == 0) return -1; int i = uniform(0, array.length); return randomlySatisfyImpl(array, i); } // recursive function private int randomlySatisfyImpl(A[] da, int i) { if (P(da[i])) return i; if (da.length == 1) return -1; // choose a random partition proportionally auto j = uniform(da.length - 1); if (j < i) { // the lower partition int a = randomlySatisfyImpl(da[0..i], j); if (a != -1) return a; else return randomlySatisfyImpl(da[i+1 .. da.length], j - (i + 1)); } else { // higher partition, investigate in reverse order int a = randomlySatisfyImpl(da[i+1 .. da.length], j - (i + 1)); if (a != -1) return i +1 + a; else return i + 1 + randomlySatisfyImpl(da[0..i], j); } } The goal is to have the first hit be the one you return. The method: if a random pick doesn't satisfy, randomly choose the partition greater than or less than based on uniform(0..array.length-1), and do the same procedure on that partition, reusing the random index to avoid having to call uniform twice on each recursion (one to choose a partition and one to choose an index within that partition). If the probability of choosing a partition is precisely proportional to the number of elements in that partition, it seems to me that the result will be truly random, but I'm not sure.
May 01 2017
On Monday, 1 May 2017 at 16:56:58 UTC, MysticZach wrote:// choose a random partition proportionally auto j = uniform(da.length - 1); if (j < i) { // the lower partition int a = randomlySatisfyImpl(da[0..i], j); if (a != -1) return a; else return randomlySatisfyImpl(da[i+1 .. da.length], j - (i + 1)); } else { // higher partition, investigate in reverse order int a = randomlySatisfyImpl(da[i+1 .. da.length], j - (i + 1)); if (a != -1) return i +1 + a; else return i + 1 + randomlySatisfyImpl(da[0..i], j);The line above has a bug. Replace it with: else { a = randomlySatisfyImpl(da[0..i], j); return (a == -1) ? -1 : i + 1 + a; }} }But the idea's the same. Hopefully it's clear.
May 01 2017
On Monday, 1 May 2017 at 16:56:58 UTC, MysticZach wrote:The goal is to have the first hit be the one you return. The method: if a random pick doesn't satisfy, randomly choose the partition greater than or less than based on uniform(0..array.length-1), and do the same procedure on that partition, reusing the random index to avoid having to call uniform twice on each recursion (one to choose a partition and one to choose an index within that partition). If the probability of choosing a partition is precisely proportional to the number of elements in that partition, it seems to me that the result will be truly random, but I'm not sure.Now I'm questioning this, because if the positive cases are unevenly distributed, i.e., [111111000100], the last one has about 50% chance to get picked instead of a 1 in 7 chance with my method. I guess you'd need to build a whole new array like the others are saying.
May 01 2017
On Monday, 1 May 2017 at 21:54:43 UTC, MysticZach wrote:On Monday, 1 May 2017 at 16:56:58 UTC, MysticZach wrote:A pity; it sounded nice! But yeah, once we pick the right ~half, we have to completely traverse it before paying any attention to the left ~half. I hope some part of the idea is still salvageable. For example, what if we put the intervals in a queue instead of a stack? Ivan Kazmenko.The goal is to have the first hit be the one you return. The method: if a random pick doesn't satisfy, randomly choose the partition greater than or less than based on uniform(0..array.length-1), and do the same procedure on that partition, reusing the random index to avoid having to call uniform twice on each recursion (one to choose a partition and one to choose an index within that partition). If the probability of choosing a partition is precisely proportional to the number of elements in that partition, it seems to me that the result will be truly random, but I'm not sure.Now I'm questioning this, because if the positive cases are unevenly distributed, i.e., [111111000100], the last one has about 50% chance to get picked instead of a 1 in 7 chance with my method. I guess you'd need to build a whole new array like the others are saying.
May 02 2017
On Tuesday, 2 May 2017 at 10:35:46 UTC, Ivan Kazmenko wrote:I hope some part of the idea is still salvageable. For example, what if we put the intervals in a queue instead of a stack?I tried to implement a similar approach, but instead of a queue or a stack, I used a random-access array of intervals. Sadly, it is still far from uniform, since it does not take interval lengths into account, and I don't see how to do that without at least O(log n) time per interval insertion or deletion. Implementation and empiric frequencies for n=5 elements in a permutation: http://ideone.com/3zSxLN Ivan Kazmenko.
May 02 2017
On Tuesday, 2 May 2017 at 11:27:17 UTC, Ivan Kazmenko wrote:On Tuesday, 2 May 2017 at 10:35:46 UTC, Ivan Kazmenko wrote:Well, I thought of another idea that may not be technically random, but which I imagine would get pretty close in real world use cases: 1. Test a random element in the array. If it satisfies P, return it. 2. Choose a hopper interval, h, that is relatively prime to the number of elements in the array, n. You could do this by randomly selecting from a pre-made list of primes of various orders of magnitude larger than n, with h = the prime % n. 3. Hop along the array, testing each element as you go. Increment a counter. If you reach the end of the array, cycle back to the beginning starting with the remainder of h that you didn't use. I think that h being relatively prime means it will thereby hit every element in the array. 4. Return the first hit. If the counter reaches n, return failure. The way I see it, with a random start position, and a *mostly* random interval, the only way to defeat the randomness of the result would be if the elements that satisfy P were dispersed precisely according to the random interval specified for that particular evocation of the function — in which case the first in the dispersed set would be chosen more often. This would require a rare convergence depending on both n and h, and would not repeat from one call to the next.I hope some part of the idea is still salvageable. For example, what if we put the intervals in a queue instead of a stack?I tried to implement a similar approach, but instead of a queue or a stack, I used a random-access array of intervals. Sadly, it is still far from uniform, since it does not take interval lengths into account, and I don't see how to do that without at least O(log n) time per interval insertion or deletion. Implementation and empiric frequencies for n=5 elements in a permutation: http://ideone.com/3zSxLN Ivan Kazmenko.
May 02 2017
On Tuesday, 2 May 2017 at 13:44:03 UTC, MysticZach wrote:1. Test a random element in the array. If it satisfies P, return it. 2. Choose a hopper interval, h, that is relatively prime to the number of elements in the array, n. You could do this by randomly selecting from a pre-made list of primes of various orders of magnitude larger than n, with h = the prime % n. 3. Hop along the array, testing each element as you go. Increment a counter. If you reach the end of the array, cycle back to the beginning starting with the remainder of h that you didn't use. I think that h being relatively prime means it will thereby hit every element in the array. 4. Return the first hit. If the counter reaches n, return failure.Taking this a step further, it occurred to me that you could use *any* hopping interval from 1 to array.length, if the length of the array were prime. So artificially extend the array and just skip a jump when you land in the extended area. And since you'd skip lots of jumps if the extension were any bigger that the hopping interval, reduce its length to modulo the hopping interval. // returns a random index of array satisfying P(x), -1 if not found int randomlySatisfy(A[] array) { auto length = array.length; if (!length) return -1; auto i = uniform(0, length); auto hop = uniform(1, length); // greatest prime < 2^^31, for simplicity enum prime = 2147483477; assert (length <= prime); auto skipLength = ((prime - length) % hop) + length; auto count = 0; for (;;) { if (P(a[i])) return i; ++count; if (count == length) return -1; i += hop; if (i < length) continue; if (i < skipLength) i += hop; i -= skipLength; } return -1; } This solution is stupidly simple and I haven't tested it. But I believe it's truly random, since the hopping interval is arbitrary. Who knows, it might work.
May 02 2017
On 02.05.2017 22:42, MysticZach wrote:On Tuesday, 2 May 2017 at 13:44:03 UTC, MysticZach wrote:(I guess this should be 'while'.)1. Test a random element in the array. If it satisfies P, return it. 2. Choose a hopper interval, h, that is relatively prime to the number of elements in the array, n. You could do this by randomly selecting from a pre-made list of primes of various orders of magnitude larger than n, with h = the prime % n. 3. Hop along the array, testing each element as you go. Increment a counter. If you reach the end of the array, cycle back to the beginning starting with the remainder of h that you didn't use. I think that h being relatively prime means it will thereby hit every element in the array. 4. Return the first hit. If the counter reaches n, return failure.Taking this a step further, it occurred to me that you could use *any* hopping interval from 1 to array.length, if the length of the array were prime. So artificially extend the array and just skip a jump when you land in the extended area. And since you'd skip lots of jumps if the extension were any bigger that the hopping interval, reduce its length to modulo the hopping interval. // returns a random index of array satisfying P(x), -1 if not found int randomlySatisfy(A[] array) { auto length = array.length; if (!length) return -1; auto i = uniform(0, length); auto hop = uniform(1, length); // greatest prime < 2^^31, for simplicity enum prime = 2147483477; assert (length <= prime); auto skipLength = ((prime - length) % hop) + length; auto count = 0; for (;;) { if (P(a[i])) return i; ++count; if (count == length) return -1; i += hop; if (i < length) continue; if (i < skipLength) i += hop;i -= skipLength; } return -1; } This solution is stupidly simple and I haven't tested it. But I believe it's truly random, since the hopping interval is arbitrary. Who knows, it might work.Counterexample: [1,1,1,0,0] Your algorithm returns 0 with probability 7/20, 1 with probability 6/20 and 2 with probability 7/20. Note that there is a simple reason why your algorithm cannot work for this case: it picks one of 20 numbers at random. The resulting probability mass cannot be evenly distributed among the three elements, because 20 is not divisible by 3.
May 02 2017
On Tuesday, 2 May 2017 at 21:00:36 UTC, Timon Gehr wrote:On 02.05.2017 22:42, MysticZach wrote:skipLength is determined modulo hop, thus it can't be more than one hop away.On Tuesday, 2 May 2017 at 13:44:03 UTC, MysticZach wrote: for (;;) { if (P(a[i])) return i; ++count; if (count == length) return -1; i += hop; if (i < length) continue; if (i < skipLength) i += hop;(I guess this should be 'while'.)Counterexample: [1,1,1,0,0] Your algorithm returns 0 with probability 7/20, 1 with probability 6/20 and 2 with probability 7/20. Note that there is a simple reason why your algorithm cannot work for this case: it picks one of 20 numbers at random. The resulting probability mass cannot be evenly distributed among the three elements, because 20 is not divisible by 3.That's true. Two points, though: If the range of error is within 1/(n*(n-1)), with array length n, it may be close enough for a given real world use case. 2. n is known to be large, which makes 1/(n*(n-1)) even smaller. You'd probably have to trade accuracy for performance. I wonder how much less performant a truly accurate algorithm would be. Also, whether there's a way to make an algorithm of this style completely accurate.
May 02 2017
On Tuesday, 2 May 2017 at 23:09:54 UTC, MysticZach wrote:On Tuesday, 2 May 2017 at 21:00:36 UTC, Timon Gehr wrote:Actually, I did botch the implementation. The hopping interval must be based on the prime rather than on n. But you could still short circuit the while loop, I think. `if (prime - n > hop) { skipLength = n + ((prime - n) % hop); }On 02.05.2017 22:42, MysticZach wrote:skipLength is determined modulo hop, thus it can't be more than one hop away.On Tuesday, 2 May 2017 at 13:44:03 UTC, MysticZach wrote: for (;;) { if (P(a[i])) return i; ++count; if (count == length) return -1; i += hop; if (i < length) continue; if (i < skipLength) i += hop;(I guess this should be 'while'.)
May 03 2017
On 03.05.2017 01:09, MysticZach wrote:It's not though. For example, [1,1,1,0,...,0] (length 29), you get 0 and 2 each with probability 43/116, but 1 only with probability 30/116. It might be interesting to figure out how far from uniform the distribution can get.Counterexample: [1,1,1,0,0] Your algorithm returns 0 with probability 7/20, 1 with probability 6/20 and 2 with probability 7/20. Note that there is a simple reason why your algorithm cannot work for this case: it picks one of 20 numbers at random. The resulting probability mass cannot be evenly distributed among the three elements, because 20 is not divisible by 3.That's true. Two points, though: If the range of error is within 1/(n*(n-1)), with array length n,it may be close enough for a given real world use case. 2. n is known to be large, which makes 1/(n*(n-1)) even smaller. You'd probably have to trade accuracy for performance. I wonder how much less performant a truly accurate algorithm would be. Also, whether there's a way to make an algorithm of this style completely accurate.For an accurate algorithm based on (unconditional) uniformly random sampling, the number of outcomes from sampling needs to be divisible by all numbers from 1 to n. (Because each of those could conceivably be the number of elements satisfying the predicate.)
May 04 2017
On Thursday, 4 May 2017 at 08:04:22 UTC, Timon Gehr wrote:On 03.05.2017 01:09, MysticZach wrote:Or how close it can get, depending on the range of intervals used. My math skill is shaky here. Maybe there's no way to deterministically jump to every element of an array with equal probability of hitting any given element satisfying a given predicate first. It sure would be cool if there were.It's not though. For example, [1,1,1,0,...,0] (length 29), you get 0 and 2 each with probability 43/116, but 1 only with probability 30/116. It might be interesting to figure out how far from uniform the distribution can get.Counterexample: [1,1,1,0,0] Your algorithm returns 0 with probability 7/20, 1 with probability 6/20 and 2 with probability 7/20. Note that there is a simple reason why your algorithm cannot work for this case: it picks one of 20 numbers at random. The resulting probability mass cannot be evenly distributed among the three elements, because 20 is not divisible by 3.That's true. Two points, though: If the range of error is within 1/(n*(n-1)), with array length n,
May 04 2017
On Thursday, 4 May 2017 at 13:19:43 UTC, MysticZach wrote:On Thursday, 4 May 2017 at 08:04:22 UTC, Timon Gehr wrote:Within a small range of error, I mean.On 03.05.2017 01:09, MysticZach wrote:Or how close it can get, depending on the range of intervals used. My math skill is shaky here. Maybe there's no way to deterministically jump to every element of an array with equal probability of hitting any given element satisfying a given predicate first. It sure would be cool if there were.That's true. Two points, though: If the range of error is within 1/(n*(n-1)), with array length n,It's not though. For example, [1,1,1,0,...,0] (length 29), you get 0 and 2 each with probability 43/116, but 1 only with probability 30/116. It might be interesting to figure out how far from uniform the distribution can get.
May 04 2017
On Thu, May 04, 2017 at 01:19:43PM +0000, MysticZach via Digitalmars-d wrote: [...]Maybe there's no way to deterministically jump to every element of an array with equal probability of hitting any given element satisfying a given predicate first. It sure would be cool if there were.Of course there is. The random permutation approach works. In this case, the data itself is not allowed to be permuted, but equivalent semantics can be obtained by permuting an array of indices into the data. But the challenge is how efficiently you can generate a random permutation so that the cost of generating one doesn't outweigh the O(n) linear scan algorithm. I'm surprised there are no (known) incremental algorithms for generating a random permutation of 0..n that requires less than O(n) space. T -- Life is complex. It consists of real and imaginary parts. -- YHL
May 12 2017
On Friday, 12 May 2017 at 18:43:53 UTC, H. S. Teoh wrote:I'm surprised there are no (known) incremental algorithms for generating a random permutation of 0..n that requires less than O(n) space.I've told you all you need to know...
May 12 2017