digitalmars.D - [OT] Algorithm question

"H. S. Teoh via Digitalmars-d" <digitalmars-d puremagic.com> writes:
```I'm been thinking about the following problem, and thought I'd pick the
brains of the bright people around these parts...

Given a set A of n elements (let's say it's a random-access range of
size n, where n is relatively large), and a predicate P(x) that
specifies some subset of A of elements that we're interested in, what's
the best algorithm (in terms of big-O time complexity) for selecting a
random element x satisfying P(x), such that elements that satisfy P(x)
have equal probability of being chosen? (Elements that do not satisfy
P(x) are disregarded.)

Which elements of A satisfy P(x) is not known ahead of time, nor is the
relative proportion of elements that satisfy P(x) or not.

Note that A is considered to be immutable (swapping elements or
otherwise changing A is not allowed).

So far, I have come up with the following algorithms:

1) "Random shooting":

auto r = ... /* elements of A */
for (;;) {
auto i = uniform(0, r.length);
if (P(r[i])) return r[i];
}

Advantages: If a large percentage of elements in A satisfy P(x), then
the loop will terminate within a small number of iterations; if the
majority of elements satisfy P(x), this will terminate well below n
iterations.

Disadvantages: If only a small percentage of elements satisfy P(x),
then this loop could take arbitrarily long to terminate, and it will
not terminate at all if no elements satisfy P(x).

2) "Turtle walk":

auto r = ... /* elements of A */
int nSatisfy = 0;
ElementType!A currentChoice;
bool found = false;
foreach (e; r) {
if (P(e)) {
if (uniform(0, nSatisfy) == 0)
{
currentChoice = e;
found = true;
}
nSatisfy++;
}
}
if (found) return currentChoice;
else ... /* no elements satisfy P(x) */

Advantages: If there is any element at all in A that satisfies P(x),
it will be found. The loop always terminates after n iterations.

Disadvantages: Even if 99% of elements in A satisfies P(x), we still
have to traverse the entire data set before we terminate. (We cannot
terminate before that, otherwise the probability of elements
satisfying P(x) being selected will not be even.)

3) Permutation walk:

auto r = ... /* elements of A */
foreach (i; iota(0 .. r.length).randomPermutation) {
if (P(r[i])) return r[i];
}
/* no elements satisfy P(x) */

Advantages: if an element that satisfies P(x) is found early, the
loop will terminate before n iterations. This seems like the best of
both worlds of (1) and (2), except:

Disadvantages: AFAIK, generating a random permutation of indices from
0 .. n requires at least O(n) time, so any advantage we may have had
seems to be negated.

Is there an algorithm for *incrementally* generating a random
permutation of indices? If there is, we could use that in (3) and thus
achieve the best of both worlds between early termination if an element
satisfying P(x) is found, and guaranteeing termination after n
iterations if no elements satisfying P(x) exists.

T

--
The early bird gets the worm. Moral: ewww...
```
Apr 30 2017
Jack Stouffer <jack jackstouffer.com> writes:
```On Monday, 1 May 2017 at 04:15:35 UTC, H. S. Teoh wrote:
2) "Turtle walk":

I'd just like to point out that this algorithm doesn't satisfy

such that elements that satisfy P(x) have equal probability of
being chosen

as the first element found in A will be chosen, and then each
subsequent element has a decreasing probability of replacing the
first element of P = 1/nSatisfy.
```
Apr 30 2017
"H. S. Teoh via Digitalmars-d" <digitalmars-d puremagic.com> writes:
```On Mon, May 01, 2017 at 05:03:17AM +0000, Jack Stouffer via Digitalmars-d wrote:
On Monday, 1 May 2017 at 04:15:35 UTC, H. S. Teoh wrote:
2) "Turtle walk":

I'd just like to point out that this algorithm doesn't satisfy

such that elements that satisfy P(x) have equal probability of being
chosen

as the first element found in A will be chosen, and then each
subsequent element has a decreasing probability of replacing the first
element of P = 1/nSatisfy.

Actually, this is exactly why it *does* satisfy the requirement.

If there is exactly 1 element that satisfies P(x), then the probability
of it being picked is 1.

If there are 2 elements, the first element is picked with probability 1
initially, but the 2nd element has a 1/2 chance of replacing it, meaning
the overall probability distribution is 1/2 and 1/2.

If there are 3 elements, then by the time the 2nd element is processed,
the first 2 elements are equally likely to be currently chosen (with
probability 1/2 for each); when the 3rd element is found, it has 1/3
chance of replacing the previous choice, meaning there is a 2/3 chance
the previous choice prevails. In that case, the 2/3 chance is equally
distributed between the first two elements (they were picked with 1/2
and 1/2 probability), so the effective probability is 1/3 each after the
3rd element is processed.

In general, by the time you process the n'th element, the previous
elements would have had been picked with 1/(n-1) chance each, and the
n'th element would be picked with 1/n chance, meaning there is a (n-1)/n
chance the previous choice prevails. That (n-1)/n chance is equally
distributed among the previous (n-1) elements, so effectively they are
picked with 1/n chance each.

The key here is that the probability that the n'th element is *not*
picked is exactly (n-1)/n, which, by inductive hypothesis, must be
evenly distributed among the previous (n-1) candidates, thereby making
their *eventual* probability of remaining as the chosen element exactly
1/n.

T

--
Two wrongs don't make a right; but three rights do make a left...
```
Apr 30 2017
Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= writes:
```On Monday, 1 May 2017 at 04:15:35 UTC, H. S. Teoh wrote:
Which elements of A satisfy P(x) is not known ahead of time,
nor is the
relative proportion of elements that satisfy P(x) or not.

O(N) given P(x)===false for all x...

What you want is probably average case analysis, not worst case?
```
May 01 2017
John Colvin <john.loughran.colvin gmail.com> writes:
```On Monday, 1 May 2017 at 04:15:35 UTC, H. S. Teoh wrote:
I'm been thinking about the following problem, and thought I'd
pick the brains of the bright people around these parts...

Given a set A of n elements (let's say it's a random-access
range of
size n, where n is relatively large), and a predicate P(x) that
specifies some subset of A of elements that we're interested
in, what's
the best algorithm (in terms of big-O time complexity) for
selecting a
random element x satisfying P(x), such that elements that
satisfy P(x)
have equal probability of being chosen? (Elements that do not
satisfy
P(x) are disregarded.)

Which elements of A satisfy P(x) is not known ahead of time,
nor is the
relative proportion of elements that satisfy P(x) or not.

Note that A is considered to be immutable (swapping elements or
otherwise changing A is not allowed).

So far, I have come up with the following algorithms:

1) "Random shooting":

auto r = ... /* elements of A */
for (;;) {
auto i = uniform(0, r.length);
if (P(r[i])) return r[i];
}

Advantages: If a large percentage of elements in A satisfy
P(x), then
the loop will terminate within a small number of iterations;
if the
majority of elements satisfy P(x), this will terminate well
below n
iterations.

Disadvantages: If only a small percentage of elements
satisfy P(x),
then this loop could take arbitrarily long to terminate, and
it will
not terminate at all if no elements satisfy P(x).

You can recover O(n) calls to P by caching the results. You can
recover O(n) for both P and rng by progressively removing
elements from r or any other method that results in an easily
samplable set of all elements of r not yet visited (and therefore
possible candidates).

I like this one (warning, just thought of, untested):

auto r = ... /* elements of A */
auto nRemaining = r.length;
while (nRemaining) {
auto i = uniform(0, nRemaining);
if (P(r[i])) return r[i];
else swap(r[i], r[--nRemaining]);
}

3) Permutation walk:

auto r = ... /* elements of A */
foreach (i; iota(0 .. r.length).randomPermutation) {
if (P(r[i])) return r[i];
}
/* no elements satisfy P(x) */

Advantages: if an element that satisfies P(x) is found
early, the
loop will terminate before n iterations. This seems like the
best of
both worlds of (1) and (2), except:

Disadvantages: AFAIK, generating a random permutation of
indices from
0 .. n requires at least O(n) time, so any advantage we may
seems to be negated.

Is there an algorithm for *incrementally* generating a random
permutation of indices? If there is, we could use that in (3)
and thus achieve the best of both worlds between early
termination if an element satisfying P(x) is found, and
guaranteeing termination after n iterations if no elements
satisfying P(x) exists.

T

Yes. As a matter of fact it would be the logical extension of my
algorithm above for when you can't or don't want to change r
(again, only just thought of this, untested...):

auto r = ... /* elements of A */
auto indices = iota(r.length).array;
auto nRemaining = r.length;
while (nRemaining) {
auto i = uniform(0, nRemaining);
if (P(r[indices[i]])) return r[indices[i]];
else swap(indices[i], indices[--nRemaining]);
}

You could also do the same thing but with an array of pointers to
elements of r instead of indices.
```
May 01 2017
Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= writes:
```On Monday, 1 May 2017 at 08:42:05 UTC, John Colvin wrote:
I like this one (warning, just thought of, untested):

auto r = ... /* elements of A */
auto nRemaining = r.length;
while (nRemaining) {
auto i = uniform(0, nRemaining);
if (P(r[i])) return r[i];
else swap(r[i], r[--nRemaining]);
}

Yes, this is the standard text-book way of doing it, still O(N)
of course. The other standard-text-book way is to generate an
array of indexes and mutate that instead, still O(N). If you
cache in a heap you get O(N log N).

Anyway, this kind of worst case analysis doesn't really help. And
neither does average case, which will be O(1) assuming 50% match
P.

So you need is to specify what kind of average case analysis you
want. For instance expressed as f(N,S)  where N is the number of
elements and S is the number of elements that satisfies P.
```
May 01 2017
Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= writes:
```On Monday, 1 May 2017 at 09:01:33 UTC, Ola Fosheim Grøstad wrote:
array of indexes and mutate that instead, still O(N). If you
cache in a heap you get O(N log N).

To avoid confusion: I didn't mean a literal "heap", but some kind
of search structure that tracks and draws from unused indice
ranges in a way that does not require O(N) initialization and can
be implemented as a single array on the stack. I don't remember
what it is called, but you draw on each node based on the number
of children (kind of like the ones used in markov-chain type
situations). I assume this can be maintained in O(log N) per
iteration with O(1) initialization.

E.g.

1. push_indices(0,N-1) //data (0,N-1)
2. i = draw_and_remove_random_index() // data could now be
(0,i-1), (i+1,N-1)
3. if( i == -1 ) return
4. dostuff(i)
5. goto 2

Even though it is O(N log N) it could outperform other solutions
if you only want to draw a small number of elements. And a small
array with a linear search (tracking used indices) would
outperform that if you only want to draw say 8 elements. If you
want to draw close to N elements then you would be better off
just randomizing a full array (with elements from 0 to N-1). Or
you could do all of these, start with linear search, then convert
to a tree like search, then convert to full enumeration.

In general I don't think asymptotical analysis will do much good.
I think you need to look on actual costs for typical N, P and
different distributions of satisfied P, e.g.

cost of loop iteration: 10
average cost of P(x)=>false: 1
average cost of P(x)=>true: 1000
probability for P(a[i]) for each i
```
May 01 2017
"H. S. Teoh via Digitalmars-d" <digitalmars-d puremagic.com> writes:
```On Mon, May 01, 2017 at 08:42:05AM +0000, John Colvin via Digitalmars-d wrote:
[...]
You can recover O(n) calls to P by caching the results.

Sorry, I forgot to specify that P(x) can be different each time.

Also, the input A may also be different each time, albeit remain the
same length (though the expectation is that it will be mostly the same
-- not that it matters, though, I don't think any of the proposed
solutions here count on that).

You can recover O(n) for both P and rng by progressively removing
elements from r or any other method that results in an easily
samplable set of all elements of r not yet visited (and therefore
possible candidates).

[...]

Yes, I realize that if r was mutable, there are existing simple
solutions. The problem is that it's not mutable, or rather, should not
be mutated by this selection algorithm.

[...]
Yes. As a matter of fact it would be the logical extension of my
algorithm above for when you can't or don't want to change r (again,
only just thought of this, untested...):

auto r = ... /* elements of A */
auto indices = iota(r.length).array;
auto nRemaining = r.length;
while (nRemaining) {
auto i = uniform(0, nRemaining);
if (P(r[indices[i]])) return r[indices[i]];
else swap(indices[i], indices[--nRemaining]);
}

You could also do the same thing but with an array of pointers to
elements of r instead of indices.

The trouble with this is that you have to allocate an array of indices,
and r.length is rather large, so creating this array is O(n) each time.
But I think Ivan Kazmenko has an excellent idea to cache this index
array. Since, by symmetry, it doesn't matter if the starting indices
array was the identity permutation, we could just initialize this array
once, and the next time just skip to the loop directly (reusing the
previous (possibly partially) permuted indices as the starting point).
It does require O(n) memory, which is unfortunate, but at least we only
pay for that once.

I was hoping for something more along the lines of a random permutation
algorithm with sublinear (if not O(1)) space complexity that can return
elements of the permutation consecutively on-demand, like a lazy range.
Something along the lines of:

auto seed = ... /* get random number */
auto lazyRange = generatePermutation(seed, n);

where generatePermutation is some incremental algorithm that produces a
unique permutation of 0 .. n per seed value. Ideally, this algorithm
would only do as much work as is necessary to produce the first k
elements of the permutation, so that if the main loop terminates early
(we found an element satisfying P(x)) we don't waste time generating the
rest of the permutation.

T

--
"No, John.  I want formats that are actually useful, rather than over-featured
megaliths that address all questions by piling on ridiculous internal links in
forms which are hideously over-complex." -- Simon St. Laurent on xml-dev
```
May 01 2017
Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= writes:
```On Monday, 1 May 2017 at 04:15:35 UTC, H. S. Teoh wrote:
Is there an algorithm for *incrementally* generating a random
permutation of indices?

Does it exist? Yes, because you can build permutation tables for
it, but you'll have to find the closed form for it that is
efficient... Can you do that without selecting a specific N? It
is easy for N=2 and N=3 at least.

E.g.

For array length 2 you get the 2 permutations: 0 1, 1 0
Then you just select one of them at random (you know the number
of permutations)

So if it exists you'll should probably do a search for
permutations. "How do I efficiently generate permutation number N"

Of course for smaller N you could generate a big array of bytes
and compress it by having arrays of slices into it (as many
permutations will share index sequences).
```
May 01 2017
Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= writes:
```On Monday, 1 May 2017 at 08:44:25 UTC, Ola Fosheim Grøstad wrote:
Does it exist? Yes, because you can build permutation tables
for it, but you'll have to find the closed form for it that is
efficient... Can you do that without selecting a specific N? It
is easy for N=2 and N=3 at least.

E.g.

For array length 2 you get the 2 permutations: 0 1, 1 0
Then you just select one of them at random (you know the number
of permutations)

So if it exists you'll should probably do a search for
permutations. "How do I efficiently generate permutation number
N"

Of course for smaller N you could generate a big array of bytes
and compress it by having arrays of slices into it (as many
permutations will share index sequences).

Keep also in mind that you can split the original array in two,
so it might be sufficient to  have the permutations for various
2^M sizes if those are easier to generate (my guess, maybe
through some xor magic?)

E.g. if N = 21 then it can be split into 16 + (4 + 1)

So you can draw like this:

1. randomly select group_16 over group_5 based with 16/21
probability)

2. if group_5 selected: ramdomly select group_4 over group 1 on
4/5 probability

etc.

That way you only need log(N) permutation sequences to keep track
of. Which for all practical purposes are going to stay pretty low
(<20?)

Right?
```
May 01 2017
Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= writes:
```On Monday, 1 May 2017 at 10:51:46 UTC, Ola Fosheim Grøstad wrote:
Keep also in mind that you can split the original array in two,
so it might be sufficient to  have the permutations for various
2^M sizes if those are easier to generate (my guess, maybe
through some xor magic?)

E.g. if N = 21 then it can be split into 16 + (4 + 1)

So you can draw like this:

1. randomly select group_16 over group_5 based with 16/21
probability)

2. if group_5 selected: ramdomly select group_4 over group 1 on
4/5 probability

etc.

That way you only need log(N) permutation sequences to keep
track of. Which for all practical purposes are going to stay
pretty low (<20?)

Right?

But if you don't use a hardware random generator and are happy
with more sloppy pseudo-randomness then you would be better of
ditching the whole concept of permutations and replace it with
some cylic random generators intead using the same technique I
just outlined.

E.g. find a set of cyclic random generators and break down N into
a sum of these cycle-sizes, then just keep track of the seed for
each. If they are 2^N then you could use xor to get more
randomness between runs.

Also in the algorithms above you need to change the probabilities
each time you take away one index from a group (you don't want to
draw from an empty group).
```
May 01 2017
Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= writes:
```On Monday, 1 May 2017 at 11:08:56 UTC, Ola Fosheim Grøstad wrote:
E.g. find a set of cyclic random generators and break down N
into a sum of these cycle-sizes, then just keep track of the
seed for each. If they are 2^N then you could use xor to get
more randomness between runs.

Also in the algorithms above you need to change the
probabilities each time you take away one index from a group
(you don't want to draw from an empty group).

Well, actually, just select the single cyclic generator that is
larger or equal to N, then just redraw if it returns a value >=
N. Duh! Sloppy thinking. I would think you should be able to find
some prime sized ones that will get the next index with an
insignificant number of redraws.

But permutations is the way to go if you want scientific quality.
```
May 01 2017
Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= writes:
```On Monday, 1 May 2017 at 11:15:28 UTC, Ola Fosheim Grøstad wrote:
But permutations is the way to go if you want scientific
quality.

I take that back:

«Polynomial pseudo-random number generator via cyclic phase»
http://www.sciencedirect.com/science/article/pii/S0378475409001463

Might be what you are looking for and if not it probably
references other papers that fits the bill.
```
May 01 2017
Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= writes:
```Ah, found a sensible search term: «pseudorandom permutation»

http://cse.iitkgp.ac.in/~debdeep/courses_iitkgp/FCrypto/slides/LubyRackoff.pdf

actually.
```
May 01 2017
Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= writes:
```On Monday, 1 May 2017 at 11:57:32 UTC, Ola Fosheim Grøstad wrote:
Ah, found a sensible search term: «pseudorandom permutation»

http://cse.iitkgp.ac.in/~debdeep/courses_iitkgp/FCrypto/slides/LubyRackoff.pdf

actually.

A readable paper for most people:
http://www.pcg-random.org/pdf/toms-oneill-pcg-family-v1.02.pdf

«some users would prefer a much stronger property analogous to
uniformity: that over the full period of the generator, every
possible k-tuple will occur,
and it will occur the same number of times.»

PCG claims to have arbitrary cycle sizes.

I haven't looked at it, but it is available as C/C++.
```
May 01 2017
Moritz Maxeiner <moritz ucworks.org> writes:
```On Monday, 1 May 2017 at 04:15:35 UTC, H. S. Teoh wrote:
Given a set A of n elements (let's say it's a random-access
range of
size n, where n is relatively large), and a predicate P(x) that
specifies some subset of A of elements that we're interested
in, what's
the best algorithm (in terms of big-O time complexity) for
selecting a
random element x satisfying P(x), such that elements that
satisfy P(x)
have equal probability of being chosen? (Elements that do not
satisfy
P(x) are disregarded.)

Which elements of A satisfy P(x) is not known ahead of time,
nor is the
relative proportion of elements that satisfy P(x) or not.

Works only with a random access range and I haven't done the full
analysis (and I'm a bit rusty on probability), but my first
thought was random divide and conquer:

ElementType!A* select(A,P)(A r)
{
// Recursion anchor
if (r.length == 1) {
if (P(r[0])) return r[0];
else return null;
// Recurse randomly with p=0.5 into either the left, or right
half of r
} else {
ElementType!A* e;
ElementType!A[][2] half;
half[0] = r[0..floor(r.length/2)];
half[1] = r[ceil(r.length/2)..\$];

ubyte coinFlip = uniform(0,1) > 0;
// Recurse in one half and terminate if e is found there
e = select(half[coinFlip]);
if (e) return e;
// Otherwise, recurse into other half
return select(half[1 - coinFlip]);
}
}

As stated above, I haven't done the full analysis, but
intuitively speaking (which can be wrong, of course), the p=0.5
recursion ought satisfy the condition of elements satisfying P(x)
being chosen uniformly; also intuitively, I'd expect the expected
runtime for a uniform distribution of elements satisfying P(x) to
be around O(log N).
Worst-case would be that it has to inspect every element in r
once => O(N)
```
May 01 2017
Timon Gehr <timon.gehr gmx.ch> writes:
```On 01.05.2017 11:51, Moritz Maxeiner wrote:
On Monday, 1 May 2017 at 04:15:35 UTC, H. S. Teoh wrote:
Given a set A of n elements (let's say it's a random-access range of
size n, where n is relatively large), and a predicate P(x) that
specifies some subset of A of elements that we're interested in, what's
the best algorithm (in terms of big-O time complexity) for selecting a
random element x satisfying P(x), such that elements that satisfy P(x)
have equal probability of being chosen? (Elements that do not satisfy
P(x) are disregarded.)

Which elements of A satisfy P(x) is not known ahead of time, nor is the
relative proportion of elements that satisfy P(x) or not.

Works only with a random access range and I haven't done the full
analysis (and I'm a bit rusty on probability), but my first thought was
random divide and conquer:

ElementType!A* select(A,P)(A r)
{
// Recursion anchor
if (r.length == 1) {
if (P(r[0])) return r[0];
else return null;
// Recurse randomly with p=0.5 into either the left, or right half of r
} else {
ElementType!A* e;
ElementType!A[][2] half;
half[0] = r[0..floor(r.length/2)];
half[1] = r[ceil(r.length/2)..\$];

ubyte coinFlip = uniform(0,1) > 0;

This deterministically chooses 0. (uniform is right-exclusive.) I assume
you mean uniform(0,2).

// Recurse in one half and terminate if e is found there
e = select(half[coinFlip]);
if (e) return e;
// Otherwise, recurse into other half
return select(half[1 - coinFlip]);
}
}

As stated above, I haven't done the full analysis, but intuitively
speaking (which can be wrong, of course), the p=0.5 recursion ought
satisfy the condition of elements satisfying P(x) being chosen
uniformly;

Counterexample: [1,0,1,1].

The first element is chosen with probability 1/2, which is not 1/3.

also intuitively, I'd expect the expected runtime for a
uniform distribution of elements satisfying P(x)  to be around O(log N).

I don't understand this input model.

Worst-case would be that it has to inspect every element in r once => O(N)

```
May 01 2017
Moritz Maxeiner <moritz ucworks.org> writes:
```On Monday, 1 May 2017 at 09:58:39 UTC, Timon Gehr wrote:
On 01.05.2017 11:51, Moritz Maxeiner wrote:
[...]

This deterministically chooses 0. (uniform is right-exclusive.)
I assume you mean uniform(0,2).

Yes.

[...]

Counterexample: [1,0,1,1].

The first element is chosen with probability 1/2, which is not
1/3.

Dang. Serves me right for not doing the full analysis. Thanks for
the counter example and sorry for wasting your time.

[...]

I don't understand this input model.

Since the idea was nonsense, it doesn't matter, anyway.
```
May 01 2017
Mike B Johnson <Mikey Ikes.com> writes:
```On Monday, 1 May 2017 at 04:15:35 UTC, H. S. Teoh wrote:
I'm been thinking about the following problem, and thought I'd
pick the brains of the bright people around these parts...

[...]

Since most real world problems would require selecting elements
more than once it may be far more efficient to sort by
P(x)(filter) then simply select a random element.

e.g.,

a = A.filter(x->P(x))  // Creates a new set a where only the
elements of A that satisfy P(x) are added

...

e = a.random;

this is O(1) if you only have to filter once(you can create a
container that always "sorts" on P(x) so to speak.. like a sorted
dictionary).
```
May 01 2017
"H. S. Teoh via Digitalmars-d" <digitalmars-d puremagic.com> writes:
```On Mon, May 01, 2017 at 11:32:19AM +0000, Mike B Johnson via Digitalmars-d
wrote:
[...]
Since most real world problems would require selecting elements more
than once it may be far more efficient to sort by P(x)(filter) then
simply select a random element.

e.g.,

a = A.filter(x->P(x))  // Creates a new set a where only the elements
of A that satisfy P(x) are added

...

e = a.random;

this is O(1) if you only have to filter once(you can create a
container that always "sorts" on P(x) so to speak.. like a sorted
dictionary).

[...]

Sorry, I forgot to specify that P(x) may change each time, as may the
input A. So caching would be of little help in this case.

T

--
This is a tpyo.
```
May 01 2017
Andrea Fontana <nospam example.com> writes:
```On Monday, 1 May 2017 at 04:15:35 UTC, H. S. Teoh wrote:
1) "Random shooting":

auto r = ... /* elements of A */
for (;;) {
auto i = uniform(0, r.length);
if (P(r[i])) return r[i];
}

Advantages: If a large percentage of elements in A satisfy
P(x), then
the loop will terminate within a small number of iterations;
if the
majority of elements satisfy P(x), this will terminate well
below n
iterations.

Disadvantages: If only a small percentage of elements
satisfy P(x),
then this loop could take arbitrarily long to terminate, and
it will
not terminate at all if no elements satisfy P(x).

If you find an element that doesn't satisfy P(x) move it on top
of array (swap!) and then try again with uniform(1, r.length - 1)
and so on.

It terminates in this way.

Andrea
```
May 01 2017
Andrea Fontana <nospam example.com> writes:
```On Monday, 1 May 2017 at 12:31:48 UTC, Andrea Fontana wrote:
If you find an element that doesn't satisfy P(x) move it on top
of array (swap!) and then try again with uniform(1, r.length -
1) and so on.

Whoops i mean uniform(1, r.length) of course :)
```
May 01 2017
Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= writes:
```On Monday, 1 May 2017 at 12:38:32 UTC, Andrea Fontana wrote:
On Monday, 1 May 2017 at 12:31:48 UTC, Andrea Fontana wrote:
If you find an element that doesn't satisfy P(x) move it on
top of array (swap!) and then try again with uniform(1,
r.length - 1) and so on.

Whoops i mean uniform(1, r.length) of course :)

I guess you meant to the end, but if mutating the array is ok,
then you don't have to swap. Just move in the last element and
chop 1 off the length.

Another cache-friendly mutation solution (assuming PRNG is cheap)
is to have two phases:

Phase 1:
linear walkover the array and select elements with P(a[i])==true
with probability 1/N
the overwrite the ones with P(a[i]) = false.

when done truncate the array.

Phase 2:
regular random selection from the array were all elements satisfy
P.

The OP underspecified the requirements... There is a gazillion
variations... :-P
```
May 01 2017
"H. S. Teoh via Digitalmars-d" <digitalmars-d puremagic.com> writes:
```On Mon, May 01, 2017 at 12:31:48PM +0000, Andrea Fontana via Digitalmars-d
wrote:
On Monday, 1 May 2017 at 04:15:35 UTC, H. S. Teoh wrote:
1) "Random shooting":

auto r = ... /* elements of A */
for (;;) {
auto i = uniform(0, r.length);
if (P(r[i])) return r[i];
}

Advantages: If a large percentage of elements in A satisfy P(x), then
the loop will terminate within a small number of iterations; if the
majority of elements satisfy P(x), this will terminate well below n
iterations.

Disadvantages: If only a small percentage of elements satisfy P(x),
then this loop could take arbitrarily long to terminate, and it will
not terminate at all if no elements satisfy P(x).

If you find an element that doesn't satisfy P(x) move it on top of array
(swap!) and then try again with uniform(1, r.length - 1) and so on.

It terminates in this way.

[...]

The problem is that swapping elements is not allowed for this particular
problem.  If swapping was allowed this would be a much easier problem to
solve. :-)

T

--
Chance favours the prepared mind. -- Louis Pasteur
```
May 01 2017
Ivan Kazmenko <gassa mail.ru> writes:
```On Monday, 1 May 2017 at 04:15:35 UTC, H. S. Teoh wrote:
Given a set A of n elements (let's say it's a random-access
range of
size n, where n is relatively large), and a predicate P(x) that
specifies some subset of A of elements that we're interested
in, what's
the best algorithm (in terms of big-O time complexity) for
selecting a
random element x satisfying P(x), such that elements that
satisfy P(x)
have equal probability of being chosen? (Elements that do not
satisfy
P(x) are disregarded.)

I'd like to note here that, if you make use of the same P(x) many
times (instead of different predicates on each call), it makes
sense to spend O(n) time and memory filtering by that predicate
and storing the result, and then answer each query in O(1).

3) Permutation walk:

auto r = ... /* elements of A */
foreach (i; iota(0 .. r.length).randomPermutation) {
if (P(r[i])) return r[i];
}
/* no elements satisfy P(x) */

Advantages: if an element that satisfies P(x) is found
early, the
loop will terminate before n iterations. This seems like the
best of
both worlds of (1) and (2), except:

Disadvantages: AFAIK, generating a random permutation of
indices from
0 .. n requires at least O(n) time, so any advantage we may
seems to be negated.

Is there an algorithm for *incrementally* generating a random
permutation of indices? If there is, we could use that in (3)
and thus achieve the best of both worlds between early
termination if an element satisfying P(x) is found, and
guaranteeing termination after n iterations if no elements
satisfying P(x) exists.

Yes, there is.

There are actually two variations of Fisher-Yates shuffle:
(https://en.wikipedia.org/wiki/Fisher%E2%80%93Yates_shuffle)

1.
auto p = n.iota.array;
foreach (pos; 0..n) {
auto otherPos = uniform (0, pos + 1);
swap (p[pos], p[otherPos]);
}

When we look at this after k-th iteration, the first k elements
are randomly and uniformly permuted, and the rest (n-k) are left
untouched.

2.
auto p = n.iota.array;
foreach (pos; 0..n) {
auto otherPos = uniform (pos, n);
swap (p[pos], p[otherPos]);
}

When we look at this after k-th iteration, the first k elements
are a random combination of all n elements, and this combination
is randomly and uniformly permuted.  So, the second variation is
what we need: each new element is randomly and uniformly selected
from all the elements left.  Once we get the first element
satisfying the predicate, we can just terminate the loop.  If
there are m out of n elements satisfying the predicate, the
average number of steps is n/m.

Now, the problem is that both of these allocate n "size_t"-s of
the elements of the original array in place, so we do need an
external permutation for these algorithms.  However, there are at
least two ways to mitigate that:

(I)
We can allocate the permutation once using n time and memory, and
then, on every call, just reuse it in its current state in n/m
time.  It does not matter if the permutation is not identity
permutation: by symmetry argument, any starting permutation will
do just fine.

(II)
We can store the permutation p in an associative array instead of
a regular array, actually storing only the elements accessed at
least once, and assuming other elements to satisfy the identity
p[x] = x.  So, if we finish in n/m steps on average, the time and
extra memory used will be O(n/m) too.  I can put together an
example implementation if this best satisfies your requirements.

Ivan Kazmenko.
```
May 01 2017
"H. S. Teoh via Digitalmars-d" <digitalmars-d puremagic.com> writes:
```On Mon, May 01, 2017 at 02:38:09PM +0000, Ivan Kazmenko via Digitalmars-d wrote:
On Monday, 1 May 2017 at 04:15:35 UTC, H. S. Teoh wrote:
Given a set A of n elements (let's say it's a random-access range of
size n, where n is relatively large), and a predicate P(x) that
specifies some subset of A of elements that we're interested in,
what's the best algorithm (in terms of big-O time complexity) for
selecting a random element x satisfying P(x), such that elements
that satisfy P(x) have equal probability of being chosen? (Elements
that do not satisfy P(x) are disregarded.)

I'd like to note here that, if you make use of the same P(x) many
times (instead of different predicates on each call), it makes sense
to spend O(n) time and memory filtering by that predicate and storing
the result, and then answer each query in O(1).

Unfortunately, P(x) could be different each time (sorry, should have
stated this explicitly), and A may also change in between calls to this
random selection algorithm.  So filtering would not be a good approach.

[...]
There are actually two variations of Fisher-Yates shuffle:
(https://en.wikipedia.org/wiki/Fisher%E2%80%93Yates_shuffle)

1.
auto p = n.iota.array;
foreach (pos; 0..n) {
auto otherPos = uniform (0, pos + 1);
swap (p[pos], p[otherPos]);
}

When we look at this after k-th iteration, the first k elements are
randomly and uniformly permuted, and the rest (n-k) are left
untouched.

2.
auto p = n.iota.array;
foreach (pos; 0..n) {
auto otherPos = uniform (pos, n);
swap (p[pos], p[otherPos]);
}

When we look at this after k-th iteration, the first k elements are a
random combination of all n elements, and this combination is randomly
and uniformly permuted.  So, the second variation is what we need:
each new element is randomly and uniformly selected from all the
elements left.  Once we get the first element satisfying the
predicate, we can just terminate the loop.  If there are m out of n
elements satisfying the predicate, the average number of steps is n/m.

But wouldn't creating the array p already be O(n)? Albeit, somewhat
faster than n iterations of the main loop since copying iota ought to be
somewhat cheaper than generating a random number and swapping two
elements.

Now, the problem is that both of these allocate n "size_t"-s of memory
elements of the original array in place, so we do need an external
permutation for these algorithms.  However, there are at least two
ways to mitigate that:

(I)
We can allocate the permutation once using n time and memory, and
then, on every call, just reuse it in its current state in n/m time.
It does not matter if the permutation is not identity permutation: by
symmetry argument, any starting permutation will do just fine.

I like this idea very much. This lets us only pay once for creating the

Still, the O(n) space requirement could potentially be onerous, since n
is expected to be rather large.

(II)
We can store the permutation p in an associative array instead of a
regular array, actually storing only the elements accessed at least
once, and assuming other elements to satisfy the identity p[x] = x.
So, if we finish in n/m steps on average, the time and extra memory
used will be O(n/m) too.  I can put together an example implementation
if this best satisfies your requirements.

[...]

This is interesting, and relaxes the O(n) space requirement initially.
After enough queries, though, I'd expect the AA to eventually grow to n
elements (as there is a chance some queries will end up finding no
elements that satisfy P(x) so it would generate the entire permutation).

But I like your idea of reusing the index array.  It's definitely a
possibility!

T

--
```
May 01 2017
MysticZach <reachzach ggmail.com> writes:
```On Monday, 1 May 2017 at 04:15:35 UTC, H. S. Teoh wrote:
Given a set A of n elements (let's say it's a random-access
range of
size n, where n is relatively large), and a predicate P(x) that
specifies some subset of A of elements that we're interested
in, what's
the best algorithm (in terms of big-O time complexity) for
selecting a
random element x satisfying P(x), such that elements that
satisfy P(x)
have equal probability of being chosen? (Elements that do not
satisfy
P(x) are disregarded.)

Here's how I would do it:

// returns a random index of array satisfying P(x), -1 if not
found
int randomlySatisfy(A[] array) {
if (array.length == 0)
return -1;
int i = uniform(0, array.length);
return randomlySatisfyImpl(array, i);
}

// recursive function
private int randomlySatisfyImpl(A[] da, int i) {
if (P(da[i]))
return i;
if (da.length == 1)
return -1;

// choose a random partition proportionally
auto j = uniform(da.length - 1);
if (j < i) {
// the lower partition
int a = randomlySatisfyImpl(da[0..i], j);
if (a != -1) return a;
else return randomlySatisfyImpl(da[i+1 .. da.length], j -
(i + 1));
}
else {
// higher partition, investigate in reverse order
int a = randomlySatisfyImpl(da[i+1 .. da.length], j - (i +
1));
if (a != -1) return i +1 + a;
else return i + 1 + randomlySatisfyImpl(da[0..i], j);
}
}

The goal is to have the first hit be the one you return. The
method: if a random pick doesn't satisfy, randomly choose the
partition greater than or less than based on
uniform(0..array.length-1), and do the same procedure on that
partition, reusing the random index to avoid having to call
uniform twice on each recursion (one to choose a partition and
one to choose an index within that partition). If the probability
of choosing a partition is precisely proportional to the number
of elements in that partition, it seems to me that the result
will be truly random, but I'm not sure.
```
May 01 2017
MysticZach <reachzach ggmail.com> writes:
```On Monday, 1 May 2017 at 16:56:58 UTC, MysticZach wrote:
// choose a random partition proportionally
auto j = uniform(da.length - 1);
if (j < i) {
// the lower partition
int a = randomlySatisfyImpl(da[0..i], j);
if (a != -1) return a;
else return randomlySatisfyImpl(da[i+1 .. da.length], j -
(i + 1));
}
else {
// higher partition, investigate in reverse order
int a = randomlySatisfyImpl(da[i+1 .. da.length], j - (i
+ 1));
if (a != -1) return i +1 + a;
else return i + 1 + randomlySatisfyImpl(da[0..i], j);

The line above has a bug. Replace it with:

else {
a = randomlySatisfyImpl(da[0..i], j);
return (a == -1) ? -1 : i + 1 + a;
}

}
}

But the idea's the same. Hopefully it's clear.
```
May 01 2017
MysticZach <reachzach ggmail.com> writes:
```On Monday, 1 May 2017 at 16:56:58 UTC, MysticZach wrote:
The goal is to have the first hit be the one you return. The
method: if a random pick doesn't satisfy, randomly choose the
partition greater than or less than based on
uniform(0..array.length-1), and do the same procedure on that
partition, reusing the random index to avoid having to call
uniform twice on each recursion (one to choose a partition and
one to choose an index within that partition). If the
probability of choosing a partition is precisely proportional
to the number of elements in that partition, it seems to me
that the result will be truly random, but I'm not sure.

Now I'm questioning this, because if the positive cases are
unevenly distributed, i.e., [111111000100], the last one has
about 50% chance to get picked instead of a 1 in 7 chance with my
method. I guess you'd need to build a whole new array like the
others are saying.
```
May 01 2017
Ivan Kazmenko <gassa mail.ru> writes:
```On Monday, 1 May 2017 at 21:54:43 UTC, MysticZach wrote:
On Monday, 1 May 2017 at 16:56:58 UTC, MysticZach wrote:
The goal is to have the first hit be the one you return. The
method: if a random pick doesn't satisfy, randomly choose the
partition greater than or less than based on
uniform(0..array.length-1), and do the same procedure on that
partition, reusing the random index to avoid having to call
uniform twice on each recursion (one to choose a partition and
one to choose an index within that partition). If the
probability of choosing a partition is precisely proportional
to the number of elements in that partition, it seems to me
that the result will be truly random, but I'm not sure.

Now I'm questioning this, because if the positive cases are
unevenly distributed, i.e., [111111000100], the last one has
about 50% chance to get picked instead of a 1 in 7 chance with
my method. I guess you'd need to build a whole new array like
the others are saying.

A pity; it sounded nice!
But yeah, once we pick the right ~half, we have to completely
traverse it before paying any attention to the left ~half.

I hope some part of the idea is still salvageable.
For example, what if we put the intervals in a queue instead of a
stack?

Ivan Kazmenko.
```
May 02 2017
Ivan Kazmenko <gassa mail.ru> writes:
```On Tuesday, 2 May 2017 at 10:35:46 UTC, Ivan Kazmenko wrote:
I hope some part of the idea is still salvageable.
For example, what if we put the intervals in a queue instead of
a stack?

I tried to implement a similar approach, but instead of a queue
or a stack, I used a random-access array of intervals.

Sadly, it is still far from uniform, since it does not take
interval lengths into account, and I don't see how to do that
without at least O(log n) time per interval insertion or deletion.

Implementation and empiric frequencies for n=5 elements in a
permutation: http://ideone.com/3zSxLN

Ivan Kazmenko.
```
May 02 2017
MysticZach <reachzach ggmail.com> writes:
```On Tuesday, 2 May 2017 at 11:27:17 UTC, Ivan Kazmenko wrote:
On Tuesday, 2 May 2017 at 10:35:46 UTC, Ivan Kazmenko wrote:
I hope some part of the idea is still salvageable.
For example, what if we put the intervals in a queue instead
of a stack?

I tried to implement a similar approach, but instead of a queue
or a stack, I used a random-access array of intervals.

Sadly, it is still far from uniform, since it does not take
interval lengths into account, and I don't see how to do that
without at least O(log n) time per interval insertion or
deletion.

Implementation and empiric frequencies for n=5 elements in a
permutation: http://ideone.com/3zSxLN

Ivan Kazmenko.

Well, I thought of another idea that may not be technically
random, but which I imagine would get pretty close in real world
use cases:

1. Test a random element in the array. If it satisfies P, return
it.
2. Choose a hopper interval, h, that is relatively prime to the
number of elements in the array, n. You could do this by randomly
selecting from a pre-made list of primes of various orders of
magnitude larger than n, with h = the prime % n.
3. Hop along the array, testing each element as you go. Increment
a counter. If you reach the end of the array, cycle back to the
beginning starting with the remainder of h that you didn't use. I
think that h being relatively prime means it will thereby hit
every element in the array.
4. Return the first hit. If the counter reaches n, return failure.

The way I see it, with a random start position, and a *mostly*
random interval, the only way to defeat the randomness of the
result would be if the elements that satisfy P were dispersed
precisely according to the random interval specified for that
particular evocation of the function — in which case the first in
the dispersed set would be chosen more often. This would require
a rare convergence depending on both n and h, and would not
repeat from one call to the next.
```
May 02 2017
MysticZach <reachzach ggmail.com> writes:
```On Tuesday, 2 May 2017 at 13:44:03 UTC, MysticZach wrote:
1. Test a random element in the array. If it satisfies P,
return it.
2. Choose a hopper interval, h, that is relatively prime to the
number of elements in the array, n. You could do this by
randomly selecting from a pre-made list of primes of various
orders of magnitude larger than n, with h = the prime % n.
3. Hop along the array, testing each element as you go.
Increment a counter. If you reach the end of the array, cycle
back to the beginning starting with the remainder of h that you
didn't use. I think that h being relatively prime means it will
thereby hit every element in the array.
4. Return the first hit. If the counter reaches n, return
failure.

Taking this a step further, it occurred to me that you could use
*any* hopping interval from 1 to array.length, if the length of
the array were prime. So artificially extend the array and just
skip a jump when you land in the extended area. And since you'd
skip lots of jumps if the extension were any bigger that the
hopping interval, reduce its length to modulo the hopping
interval.

// returns a random index of array satisfying P(x), -1 if not
found
int randomlySatisfy(A[] array) {
auto length = array.length;
if (!length)
return -1;
auto i = uniform(0, length);
auto hop = uniform(1, length);

// greatest prime < 2^^31, for simplicity
enum prime = 2147483477;
assert (length <= prime);

auto skipLength = ((prime - length) % hop) + length;
auto count = 0;

for (;;) {
if (P(a[i]))
return i;
++count;
if (count == length)
return -1;
i += hop;
if (i < length)
continue;
if (i < skipLength)
i += hop;
i -= skipLength;
}
return -1;
}

This solution is stupidly simple and I haven't tested it. But I
believe it's truly random, since the hopping interval is
arbitrary. Who knows, it might work.
```
May 02 2017
Timon Gehr <timon.gehr gmx.ch> writes:
```On 02.05.2017 22:42, MysticZach wrote:
On Tuesday, 2 May 2017 at 13:44:03 UTC, MysticZach wrote:
1. Test a random element in the array. If it satisfies P, return it.
2. Choose a hopper interval, h, that is relatively prime to the number
of elements in the array, n. You could do this by randomly selecting
from a pre-made list of primes of various orders of magnitude larger
than n, with h = the prime % n.
3. Hop along the array, testing each element as you go. Increment a
counter. If you reach the end of the array, cycle back to the
beginning starting with the remainder of h that you didn't use. I
think that h being relatively prime means it will thereby hit every
element in the array.
4. Return the first hit. If the counter reaches n, return failure.

Taking this a step further, it occurred to me that you could use *any*
hopping interval from 1 to array.length, if the length of the array were
prime. So artificially extend the array and just skip a jump when you
land in the extended area. And since you'd skip lots of jumps if the
extension were any bigger that the hopping interval, reduce its length
to modulo the hopping interval.

// returns a random index of array satisfying P(x), -1 if not found
int randomlySatisfy(A[] array) {
auto length = array.length;
if (!length)
return -1;
auto i = uniform(0, length);
auto hop = uniform(1, length);

// greatest prime < 2^^31, for simplicity
enum prime = 2147483477;
assert (length <= prime);

auto skipLength = ((prime - length) % hop) + length;
auto count = 0;

for (;;) {
if (P(a[i]))
return i;
++count;
if (count == length)
return -1;
i += hop;
if (i < length)
continue;
if (i < skipLength)
i += hop;

(I guess this should be 'while'.)

i -= skipLength;
}
return -1;
}

This solution is stupidly simple and I haven't tested it. But I believe
it's truly random, since the hopping interval is arbitrary. Who knows,
it might work.

Counterexample: [1,1,1,0,0]

Your algorithm returns 0 with probability 7/20, 1 with probability 6/20
and 2 with probability 7/20.

Note that there is a simple reason why your algorithm cannot work for
this case: it picks one of 20 numbers at random. The resulting
probability mass cannot be evenly distributed among the three elements,
because 20 is not divisible by 3.
```
May 02 2017
MysticZach <reachzach ggmail.com> writes:
```On Tuesday, 2 May 2017 at 21:00:36 UTC, Timon Gehr wrote:
On 02.05.2017 22:42, MysticZach wrote:
On Tuesday, 2 May 2017 at 13:44:03 UTC, MysticZach wrote:
for (;;) {
if (P(a[i]))
return i;
++count;
if (count == length)
return -1;
i += hop;
if (i < length)
continue;
if (i < skipLength)
i += hop;

(I guess this should be 'while'.)

skipLength is determined modulo hop, thus it can't be more than
one hop away.

Counterexample: [1,1,1,0,0]

Your algorithm returns 0 with probability 7/20, 1 with
probability 6/20 and 2 with probability 7/20.

Note that there is a simple reason why your algorithm cannot
work for this case: it picks one of 20 numbers at random. The
resulting probability mass cannot be evenly distributed among
the three elements, because 20 is not divisible by 3.

That's true. Two points, though: If the range of error is within
1/(n*(n-1)), with array length n, it may be close enough for a
given real world use case. 2. n is known to be large, which makes
1/(n*(n-1)) even smaller. You'd probably have to trade accuracy
for performance. I wonder how much less performant a truly
accurate algorithm would be. Also, whether there's a way to make
an algorithm of this style completely accurate.
```
May 02 2017
MysticZach <reachzach ggmail.com> writes:
```On Tuesday, 2 May 2017 at 23:09:54 UTC, MysticZach wrote:
On Tuesday, 2 May 2017 at 21:00:36 UTC, Timon Gehr wrote:
On 02.05.2017 22:42, MysticZach wrote:
On Tuesday, 2 May 2017 at 13:44:03 UTC, MysticZach wrote:
for (;;) {
if (P(a[i]))
return i;
++count;
if (count == length)
return -1;
i += hop;
if (i < length)
continue;
if (i < skipLength)
i += hop;

(I guess this should be 'while'.)

skipLength is determined modulo hop, thus it can't be more than
one hop away.

Actually, I did botch the implementation. The hopping interval
must be based on the prime rather than on n. But you could still
short circuit the while loop, I think. `if (prime - n > hop) {
skipLength = n + ((prime - n) % hop); }
```
May 03 2017
Timon Gehr <timon.gehr gmx.ch> writes:
```On 03.05.2017 01:09, MysticZach wrote:
Counterexample: [1,1,1,0,0]

Your algorithm returns 0 with probability 7/20, 1 with probability
6/20 and 2 with probability 7/20.

Note that there is a simple reason why your algorithm cannot work for
this case: it picks one of 20 numbers at random. The resulting
probability mass cannot be evenly distributed among the three
elements, because 20 is not divisible by 3.

That's true. Two points, though: If the range of error is within
1/(n*(n-1)), with array length n,

It's not though. For example, [1,1,1,0,...,0] (length 29), you get 0 and
2 each with probability 43/116, but 1 only with probability 30/116.

It might be interesting to figure out how far from uniform the
distribution can get.

it may be close enough for a given
real world use case. 2. n is known to be large, which makes 1/(n*(n-1))
even smaller.  You'd probably have to trade accuracy for performance. I
wonder how much less performant a truly accurate algorithm would be.
Also, whether there's a way to make an algorithm of this style
completely accurate.

For an accurate algorithm based on (unconditional) uniformly random
sampling, the number of outcomes from sampling needs to be divisible by
all numbers from 1 to n. (Because each of those could conceivably be the
number of elements satisfying the predicate.)
```
May 04 2017
MysticZach <reachzach ggmail.com> writes:
```On Thursday, 4 May 2017 at 08:04:22 UTC, Timon Gehr wrote:
On 03.05.2017 01:09, MysticZach wrote:
Counterexample: [1,1,1,0,0]

Your algorithm returns 0 with probability 7/20, 1 with
probability
6/20 and 2 with probability 7/20.

Note that there is a simple reason why your algorithm cannot
work for
this case: it picks one of 20 numbers at random. The resulting
probability mass cannot be evenly distributed among the three
elements, because 20 is not divisible by 3.

That's true. Two points, though: If the range of error is
within
1/(n*(n-1)), with array length n,

It's not though. For example, [1,1,1,0,...,0] (length 29), you
get 0 and 2 each with probability 43/116, but 1 only with
probability 30/116.

It might be interesting to figure out how far from uniform the
distribution can get.

Or how close it can get, depending on the range of intervals
used. My math skill is shaky here.

of an array with equal probability of hitting any given element
satisfying a given predicate first. It sure would be cool if
there were.
```
May 04 2017
MysticZach <reachzach ggmail.com> writes:
```On Thursday, 4 May 2017 at 13:19:43 UTC, MysticZach wrote:
On Thursday, 4 May 2017 at 08:04:22 UTC, Timon Gehr wrote:
On 03.05.2017 01:09, MysticZach wrote:
That's true. Two points, though: If the range of error is
within
1/(n*(n-1)), with array length n,

It's not though. For example, [1,1,1,0,...,0] (length 29), you
get 0 and 2 each with probability 43/116, but 1 only with
probability 30/116.

It might be interesting to figure out how far from uniform the
distribution can get.

Or how close it can get, depending on the range of intervals
used. My math skill is shaky here.

of an array with equal probability of hitting any given element
satisfying a given predicate first. It sure would be cool if
there were.

Within a small range of error, I mean.
```
May 04 2017
"H. S. Teoh via Digitalmars-d" <digitalmars-d puremagic.com> writes:
```On Thu, May 04, 2017 at 01:19:43PM +0000, MysticZach via Digitalmars-d wrote:
[...]
Maybe there's no way to deterministically jump to every element of an
array with equal probability of hitting any given element satisfying a
given predicate first. It sure would be cool if there were.

Of course there is.  The random permutation approach works. In this
case, the data itself is not allowed to be permuted, but equivalent
semantics can be obtained by permuting an array of indices into the
data. But the challenge is how efficiently you can generate a random
permutation so that the cost of generating one doesn't outweigh the O(n)
linear scan algorithm.

I'm surprised there are no (known) incremental algorithms for generating
a random permutation of 0..n that requires less than O(n) space.

T

--
Life is complex. It consists of real and imaginary parts. -- YHL
```
May 12 2017
```On Friday, 12 May 2017 at 18:43:53 UTC, H. S. Teoh wrote: