www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - And here's another interesting algorithm/structure: Randomized Slide

reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Now that we got talking about searching in arrays, allow me to also 
share an idea I've had a short while ago.

(Again, we're in the "I'd prefer to use an array if at all possible" 
mindset. So let's see how we can help searching an array with as little 
work as possible.)

One well-known search strategy is "Bring to front" (described by Knuth 
in TAoCP). A BtF-organized linear data structure is searched with the 
classic linear algorithm. The difference is what happens after the 
search: whenever the search is successful, the found element is brought 
to the front of the structure. If we're looking most often for a handful 
of elements, in time these will be near the front of the searched structure.

For a linked list, bringing an element to the front is O(1) (just rewire 
the pointers). For an array, things are not so pleasant - rotating the 
found element to the front of the array is O(n).

So let's see how we can implement a successful BtF for arrays.

The first idea is to just swap the found element with the first element 
of the array. That's O(1) but has many disadvantages - if you search 
e.g. for two elements, they'll compete for the front of the array and 
they'll go back and forth without making progress.

Another idea is to just swap the found element with the one just before 
it. The logic is, each successful find will shift the element closer to 
the front, in a bubble sort manner. In time, the frequently searched 
elements will slowly creep toward the front. The resulting performance 
is not appealing - you need O(n) searches to bring a given element to 
the front, for a total of O(n * n) steps spent in the n searches. Meh.

So let's improve on that: whenever an element is found in position k, 
pick a random number i in the range 0, 1, 2, ..., k inclusive. Then swap 
the array elements at indexes i and k. This is the Randomized Slide to 
Front strategy.

With RStF, worst case search time remains O(n), as is the unsuccessful 
search. However, frequently searched elements migrate quickly to the 
front - it only takes O(log n) searches to bring a given value at the 
front of the array.

Insertion and removal are both a sweet O(1), owing to the light 
structuring: to insert just append the element (and perhaps swap it in a 
random position of the array to prime searching for it). Removal by 
position simply swaps the last element into the position to be removed 
and then reduces the size of the array.

So the RStF is suitable in all cases where BtF would be recommended, but 
allows an array layout without considerable penalty.

Related work: Theodoulos Garefalakis' Master's thesis "A Family of 
Randomized Algorithms for List Accessing" describes Markov Move to 
Front, which brings the searched element to front according to a Markov 
chain schedule; and also Randomized Move to Front, which decides whether 
a found element is brought to front depending on a random choice. These 
approaches are similar in that they both use randomization, but 
different because neither has good complexity on array storage.


Andrei
Nov 30 2015
next sibling parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 11/30/15 4:33 PM, Andrei Alexandrescu wrote:
[snip]

I just posted to reddit: 
https://www.reddit.com/r/programming/comments/3uwp42/its_my_birthday_so_heres_some_cake_for_thought/

Andrei
Nov 30 2015
prev sibling next sibling parent reply "H. S. Teoh via Digitalmars-d" <digitalmars-d puremagic.com> writes:
On Mon, Nov 30, 2015 at 04:33:27PM -0500, Andrei Alexandrescu via Digitalmars-d
wrote:
[...]
 One well-known search strategy is "Bring to front" (described by Knuth
 in TAoCP). A BtF-organized linear data structure is searched with the
 classic linear algorithm. The difference is what happens after the
 search: whenever the search is successful, the found element is
 brought to the front of the structure. If we're looking most often for
 a handful of elements, in time these will be near the front of the
 searched structure.
[...]
 So let's see how we can implement a successful BtF for arrays.
 
 The first idea is to just swap the found element with the first
 element of the array. That's O(1) but has many disadvantages - if you
 search e.g. for two elements, they'll compete for the front of the
 array and they'll go back and forth without making progress.
 
 Another idea is to just swap the found element with the one just
 before it.  The logic is, each successful find will shift the element
 closer to the front, in a bubble sort manner. In time, the frequently
 searched elements will slowly creep toward the front. The resulting
 performance is not appealing - you need O(n) searches to bring a given
 element to the front, for a total of O(n * n) steps spent in the n
 searches. Meh.
 
 So let's improve on that: whenever an element is found in position k,
 pick a random number i in the range 0, 1, 2, ..., k inclusive. Then
 swap the array elements at indexes i and k. This is the Randomized
 Slide to Front strategy.
What about when element i is matched, swap it with the (i/2)'th element? Then it will take just log(n) searches to bring it to the front of the array, but it won't (immediately) compete with whatever's currently the most popular item in the array. Furthermore, when it does compete with it, it will already have been moved closer to the front of the array, so the previous most-popular element won't be pushed far back into the deep recesses of the array, but remain close to the front where it will be quickly found. More generally, you could pick the (i/k)'th element for swapping when you've matched an element, with k being a constant, chosen parameter. You may be able to optimize for certain use cases (e.g., if you knew beforehand the average number of "popular" elements) by choosing an appropriate value of k. T -- Nobody is perfect. I am Nobody. -- pepoluan, GKC forum
Nov 30 2015
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 11/30/15 4:41 PM, H. S. Teoh via Digitalmars-d wrote:
 What about when element i is matched, swap it with the (i/2)'th element?
Randomization is essential - without it you have thrashing if you search for 2 elements in alternation. -- Andrei
Nov 30 2015
next sibling parent reply Steven Schveighoffer <schveiguy yahoo.com> writes:
On 11/30/15 4:50 PM, Andrei Alexandrescu wrote:
 On 11/30/15 4:41 PM, H. S. Teoh via Digitalmars-d wrote:
 What about when element i is matched, swap it with the (i/2)'th element?
Randomization is essential - without it you have thrashing if you search for 2 elements in alternation. -- Andrei
What about selecting a random element in 0..k/2 instead of 0..k-1? -Steve
Nov 30 2015
next sibling parent "H. S. Teoh via Digitalmars-d" <digitalmars-d puremagic.com> writes:
On Mon, Nov 30, 2015 at 04:58:16PM -0500, Steven Schveighoffer via
Digitalmars-d wrote:
 On 11/30/15 4:50 PM, Andrei Alexandrescu wrote:
On 11/30/15 4:41 PM, H. S. Teoh via Digitalmars-d wrote:
What about when element i is matched, swap it with the (i/2)'th element?
Randomization is essential - without it you have thrashing if you search for 2 elements in alternation. -- Andrei
What about selecting a random element in 0..k/2 instead of 0..k-1?
[...] Or selecting the (i/k)'th element for k = uniform(1..i)? T -- People walk. Computers run.
Nov 30 2015
prev sibling next sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 11/30/15 4:58 PM, Steven Schveighoffer wrote:
 On 11/30/15 4:50 PM, Andrei Alexandrescu wrote:
 On 11/30/15 4:41 PM, H. S. Teoh via Digitalmars-d wrote:
 What about when element i is matched, swap it with the (i/2)'th element?
Randomization is essential - without it you have thrashing if you search for 2 elements in alternation. -- Andrei
What about selecting a random element in 0..k/2 instead of 0..k-1?
I think complexity would stay the same. Choosing a tighter range puts a greater weight on the last search than on recent searches. One thing I like is that I choose 0..k, not 0..k-1, which means it's possible that the element stays put (no change in position). That reduces thrashing for the top (most frequently searched) few elements. andrei
Nov 30 2015
parent Steven Schveighoffer <schveiguy yahoo.com> writes:
On 11/30/15 5:07 PM, Andrei Alexandrescu wrote:
 On 11/30/15 4:58 PM, Steven Schveighoffer wrote:
 What about selecting a random element in 0..k/2 instead of 0..k-1?
I think complexity would stay the same. Choosing a tighter range puts a greater weight on the last search than on recent searches.
In the case where you search for a very small number of elements, it puts an upper bound on how soon they make it to the front (log(n) instead of n)
 One thing I like is that I choose 0..k, not 0..k-1, which means it's
 possible that the element stays put (no change in position). That
 reduces thrashing for the top (most frequently searched) few elements.
I think insignificantly, depending on the number of "frequently searched" elements. But you could tune it, let's say to 8 elements: const upperBound = max(k/2, min(8, k)); There are a lot of options for tuning that can be played with, probably the best way to "prove" what is best is to just try some experiments :) -Steve
Dec 01 2015
prev sibling parent Chris Wright <dhasenan gmail.com> writes:
On Mon, 30 Nov 2015 16:58:16 -0500, Steven Schveighoffer wrote:

 On 11/30/15 4:50 PM, Andrei Alexandrescu wrote:
 On 11/30/15 4:41 PM, H. S. Teoh via Digitalmars-d wrote:
 What about when element i is matched, swap it with the (i/2)'th
 element?
Randomization is essential - without it you have thrashing if you search for 2 elements in alternation. -- Andrei
What about selecting a random element in 0..k/2 instead of 0..k-1? -Steve
You can use that to put a hard upper bound of O(log n), and maybe you'll want to use that. However, that also means you have greater odds of a single rare query making it to the front of the stack. If you want to prevent that and still get a guarantee of O(log n), you could choose a range of floor(sqrt(k))..k/2.
Nov 30 2015
prev sibling parent reply deadalnix <deadalnix gmail.com> writes:
On Monday, 30 November 2015 at 21:50:09 UTC, Andrei Alexandrescu 
wrote:
 On 11/30/15 4:41 PM, H. S. Teoh via Digitalmars-d wrote:
 What about when element i is matched, swap it with the 
 (i/2)'th element?
Randomization is essential - without it you have thrashing if you search for 2 elements in alternation. -- Andrei
You'd end up swaping the 2 element in front, but keep them both in front, so that sounds like it would have the same behavior as the randomized algorithm. Where it gets hairy, is when you access 2 elements in the array that would swap each other without getting in the front (because they are at 2n and 2n + 1 with n big).
Nov 30 2015
parent reply Denis Koroskin <2korden gmail.com> writes:
On Monday, 30 November 2015 at 22:11:09 UTC, deadalnix wrote:
 On Monday, 30 November 2015 at 21:50:09 UTC, Andrei 
 Alexandrescu wrote:
 On 11/30/15 4:41 PM, H. S. Teoh via Digitalmars-d wrote:
 What about when element i is matched, swap it with the 
 (i/2)'th element?
Randomization is essential - without it you have thrashing if you search for 2 elements in alternation. -- Andrei
You'd end up swaping the 2 element in front, but keep them both in front, so that sounds like it would have the same behavior as the randomized algorithm. Where it gets hairy, is when you access 2 elements in the array that would swap each other without getting in the front (because they are at 2n and 2n + 1 with n big).
Imagine that there are 1000 elements, 500th elements is X and 1000th element is Y. 1) search for Y: Y is last, takes 1000 iterations, swaps X<->Y 2) search for X: X is last, takes 1000 iterations, swaps X<->Y 3) back to 1
Nov 30 2015
parent reply =?UTF-8?Q?Ali_=c3=87ehreli?= <acehreli yahoo.com> writes:
On 11/30/2015 03:15 PM, Denis Koroskin wrote:

Forget the algorithms! Denis is back... :)

Ali
Nov 30 2015
parent Iain Buclaw via Digitalmars-d <digitalmars-d puremagic.com> writes:
On 1 December 2015 at 00:21, Ali =C3=87ehreli <digitalmars-d puremagic.com>
wrote:

 On 11/30/2015 03:15 PM, Denis Koroskin wrote:

 Forget the algorithms! Denis is back... :)
Hurray!
Dec 01 2015
prev sibling next sibling parent reply deadalnix <deadalnix gmail.com> writes:
On Monday, 30 November 2015 at 21:33:31 UTC, Andrei Alexandrescu 
wrote:
 Now that we got talking about searching in arrays, allow me to 
 also share an idea I've had a short while ago.

 (Again, we're in the "I'd prefer to use an array if at all 
 possible" mindset. So let's see how we can help searching an 
 array with as little work as possible.)

 One well-known search strategy is "Bring to front" (described 
 by Knuth in TAoCP). A BtF-organized linear data structure is 
 searched with the classic linear algorithm. The difference is 
 what happens after the search: whenever the search is 
 successful, the found element is brought to the front of the 
 structure. If we're looking most often for a handful of 
 elements, in time these will be near the front of the searched 
 structure.

 For a linked list, bringing an element to the front is O(1) 
 (just rewire the pointers). For an array, things are not so 
 pleasant - rotating the found element to the front of the array 
 is O(n).

 So let's see how we can implement a successful BtF for arrays.

 The first idea is to just swap the found element with the first 
 element of the array. That's O(1) but has many disadvantages - 
 if you search e.g. for two elements, they'll compete for the 
 front of the array and they'll go back and forth without making 
 progress.

 Another idea is to just swap the found element with the one 
 just before it. The logic is, each successful find will shift 
 the element closer to the front, in a bubble sort manner. In 
 time, the frequently searched elements will slowly creep toward 
 the front. The resulting performance is not appealing - you 
 need O(n) searches to bring a given element to the front, for a 
 total of O(n * n) steps spent in the n searches. Meh.

 So let's improve on that: whenever an element is found in 
 position k, pick a random number i in the range 0, 1, 2, ..., k 
 inclusive. Then swap the array elements at indexes i and k. 
 This is the Randomized Slide to Front strategy.

 With RStF, worst case search time remains O(n), as is the 
 unsuccessful search. However, frequently searched elements 
 migrate quickly to the front - it only takes O(log n) searches 
 to bring a given value at the front of the array.

 Insertion and removal are both a sweet O(1), owing to the light 
 structuring: to insert just append the element (and perhaps 
 swap it in a random position of the array to prime searching 
 for it). Removal by position simply swaps the last element into 
 the position to be removed and then reduces the size of the 
 array.

 So the RStF is suitable in all cases where BtF would be 
 recommended, but allows an array layout without considerable 
 penalty.

 Related work: Theodoulos Garefalakis' Master's thesis "A Family 
 of Randomized Algorithms for List Accessing" describes Markov 
 Move to Front, which brings the searched element to front 
 according to a Markov chain schedule; and also Randomized Move 
 to Front, which decides whether a found element is brought to 
 front depending on a random choice. These approaches are 
 similar in that they both use randomization, but different 
 because neither has good complexity on array storage.


 Andrei
What is the advantage compared to let's say a ringbuffer ? On find you can put the element to the front, and swap the old element with the new element ? I guess randomizing would avoid hitting pathological cases too often, but would converge more slowly ?
Nov 30 2015
parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 11/30/15 4:55 PM, deadalnix wrote:
 I guess randomizing would avoid hitting pathological cases too often,
 but would converge more slowly ?
That's it. Problem is with deterministic approaches pathological cases are easy to hit and relatively common. -- Andrei
Nov 30 2015
prev sibling next sibling parent reply "H. S. Teoh via Digitalmars-d" <digitalmars-d puremagic.com> writes:
On Mon, Nov 30, 2015 at 01:41:12PM -0800, H. S. Teoh via Digitalmars-d wrote:
[...]
 What about when element i is matched, swap it with the (i/2)'th
 element?
 
 Then it will take just log(n) searches to bring it to the front of the
 array, but it won't (immediately) compete with whatever's currently
 the most popular item in the array. Furthermore, when it does compete
 with it, it will already have been moved closer to the front of the
 array, so the previous most-popular element won't be pushed far back
 into the deep recesses of the array, but remain close to the front
 where it will be quickly found.
In fact, it's probably provable that if there are 2 most popular items in the array, they will eventually migrate to the 1st two positions of the array. Not so sure about the case of n most popular items for n>2, as position 3 is a kind of odd case where it gets displaced only by elements from indices that aren't a power of 2, but it would seem at a cursory glance that the 3 most popular items would tend to settle around the first 4 elements of the array. Hmm... it seems that in the worst case (the most popular n items all lie precisely at indices of the form 2^j) the most popular items will end up within the first 2^n positions of the array. Not sure how to compute the average case; intuitively at least it seems that it should lie somewhere between the first n positions and the first 2^n positions. T -- Любишь кататься - люби и саночки возить.
Nov 30 2015
parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 11/30/15 4:53 PM, H. S. Teoh via Digitalmars-d wrote:
 On Mon, Nov 30, 2015 at 01:41:12PM -0800, H. S. Teoh via Digitalmars-d wrote:
 [...]
 What about when element i is matched, swap it with the (i/2)'th
 element?

 Then it will take just log(n) searches to bring it to the front of the
 array, but it won't (immediately) compete with whatever's currently
 the most popular item in the array. Furthermore, when it does compete
 with it, it will already have been moved closer to the front of the
 array, so the previous most-popular element won't be pushed far back
 into the deep recesses of the array, but remain close to the front
 where it will be quickly found.
In fact, it's probably provable that if there are 2 most popular items in the array, they will eventually migrate to the 1st two positions of the array. Not so sure about the case of n most popular items for n>2, as position 3 is a kind of odd case where it gets displaced only by elements from indices that aren't a power of 2, but it would seem at a cursory glance that the 3 most popular items would tend to settle around the first 4 elements of the array. Hmm... it seems that in the worst case (the most popular n items all lie precisely at indices of the form 2^j) the most popular items will end up within the first 2^n positions of the array. Not sure how to compute the average case; intuitively at least it seems that it should lie somewhere between the first n positions and the first 2^n positions.
With RStF it's easy to prove (e.g. by reductio ad absurdum) that if you search only for k items out of n, they will end up in the top k positions of the array. Then they'll churn there :o). The key to the proof is that in the stationary state no element migrates in our out of the top k slots. I think it would be difficult to achieve this property with a deterministic approach. The more interesting question would be what the element distribution is if both elements and searches are Gaussian-distributed (probably a frequent case in practice). Andrei
Nov 30 2015
prev sibling next sibling parent Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= writes:
On Monday, 30 November 2015 at 21:33:31 UTC, Andrei Alexandrescu 
wrote:
 With RStF, worst case search time remains O(n), as is the 
 unsuccessful search. However, frequently searched elements
If you just do a linear search then shifting down the array in another pass won't change the complexity. O(2N) == O(N) But you could also just shift down the array while searching since if the elements are less than the cacheline-size then you already have everything in registers/first level cache. (The write back cost from cache to memory is contextual and depends on many factors.)
Nov 30 2015
prev sibling next sibling parent reply Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= writes:
On Monday, 30 November 2015 at 21:33:31 UTC, Andrei Alexandrescu 
wrote:
 So let's improve on that: whenever an element is found in 
 position k, pick a random number i in the range 0, 1, 2, ..., k 
 inclusive. Then swap the array elements at indexes i and k. 
 This is the Randomized Slide to Front strategy.

 With RStF, worst case search time remains O(n), as is the 
 unsuccessful search. However, frequently searched elements 
 migrate quickly to the front - it only takes O(log n) searches 
 to bring a given value at the front of the array.
Something is wrong with the math here. The randomization means that you must assume that you get element k-1 in the worst case, so if you repeatedly search for the same element you need O(N) repeats to move it to the front, so you get O(N^2) complexity for moving any element to the front. Right? You are probably thinking about the average case analysis, which is a more sensible theoretical concept for randomized algorithms than worst case, but then you need a model for what is typical.
Nov 30 2015
parent reply Chris Wright <dhasenan gmail.com> writes:
On Mon, 30 Nov 2015 23:27:24 +0000, Ola Fosheim Grøstad wrote:

 On Monday, 30 November 2015 at 21:33:31 UTC, Andrei Alexandrescu wrote:
 So let's improve on that: whenever an element is found in position k,
 pick a random number i in the range 0, 1, 2, ..., k inclusive. Then
 swap the array elements at indexes i and k. This is the Randomized
 Slide to Front strategy.

 With RStF, worst case search time remains O(n), as is the unsuccessful
 search. However, frequently searched elements migrate quickly to the
 front - it only takes O(log n) searches to bring a given value at the
 front of the array.
Something is wrong with the math here. The randomization means that you must assume that you get element k-1 in the worst case, so if you repeatedly search for the same element you need O(N) repeats to move it to the front, so you get O(N^2) complexity for moving any element to the front. Right? You are probably thinking about the average case analysis, which is a more sensible theoretical concept for randomized algorithms than worst case, but then you need a model for what is typical.
People typically use lax terminology. Here, when someone doesn't specify whether they're talking about average or worst case complexity for an algorithm, they're probably talking about average case. Similarly for amortized algorithms. Andrei is no doubt aware of the worst case (which in this case is O(inf), not O(N)) and has responded to people talking about ways to reduce the worst case. This doesn't mean Andrei was wrong or misspoke; it means that he could have been marginally clearer. Most people instantly grok the intent, but some who are blessed with pedantry don't. Identifying these cases is a skill that took me years to learn.
Nov 30 2015
parent Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= writes:
On Tuesday, 1 December 2015 at 01:12:39 UTC, Chris Wright wrote:
 People typically use lax terminology. Here, when someone 
 doesn't specify whether they're talking about average or worst 
 case complexity for an algorithm, they're probably talking 
 about average case.
Don't use lax terminology when doing complexity analysis. Average case analysis is much much much harder to do than worst case/amortized.
Nov 30 2015
prev sibling next sibling parent Marcelo Juchem <juchem gmail.com> writes:
On Monday, 30 November 2015 at 21:33:31 UTC, Andrei Alexandrescu 
wrote:
[...]
 One well-known search strategy is "Bring to front" (described 
 by Knuth in TAoCP). A BtF-organized linear data structure is 
 searched with the classic linear algorithm. The difference is 
 what happens after the search: whenever the search is 
 successful, the found element is brought to the front of the 
 structure. If we're looking most often for a handful of 
 elements, in time these will be near the front of the searched 
 structure.
[...]
 Another idea is to just swap the found element with the one 
 just before it. The logic is, each successful find will shift 
 the element closer to the front, in a bubble sort manner. In 
 time, the frequently searched elements will slowly creep toward 
 the front. The resulting performance is not appealing - you 
 need O(n) searches to bring a given element to the front, for a 
 total of O(n * n) steps spent in the n searches. Meh.

 So let's improve on that: whenever an element is found in 
 position k, pick a random number i in the range 0, 1, 2, ..., k 
 inclusive. Then swap the array elements at indexes i and k. 
 This is the Randomized Slide to Front strategy.
[...]
 Insertion and removal are both a sweet O(1), owing to the light 
 structuring: to insert just append the element (and perhaps 
 swap it in a random position of the array to prime searching 
 for it). Removal by position simply swaps the last element into 
 the position to be removed and then reduces the size of the 
 array.
[...]
 Andrei
It seems to me you're trying to implement the array based equivalent of Splay Trees (Splay Array rhymes, btw). Would that be a close enough description? I'm assuming you're trying to optimize for some distribution where a minority of the elements account for the majority of queries (say, Zipfian). Here are some ideas that come to mind. I haven't thought through them too much so everyone's welcome to destroy me. Rather than making index 0 always the front, use some rotating technique similar to what ring buffers do. Say we initially have elements ABCDE (front at 0) and we search for C. We swap the left of front (cycling back to the end of the array, thus index 4) with the new front. We now have the following array at hand: ABEDC, front at 4 (logically CABED). Obviously we shouldn't change front if the queried element is already it. An immediate problem with this technique is that we'll frequently pollute the front of the array with infrequent items. Say these are the number of queries made so far for each element: A:7, B:5, C:2, all others 0. Also, assume that this is the state of the array at this point: DEABC, front at 2. Say we now query for B. This is the new state: DBAEC, front at 1 (logically BAECD). Having E in front of C is undesirable, so we need a way to avoid that. From now on I'll refer to indexes as the logical index. That is, let i be (front + index) % size. For the sake of brevity, let d be the distance between the element and the front = i - front. Let q be the number of successful queries performed so far. What I have in mind boils down to decide between: - move a newly queried element at logical position i to the left of front (technique above). Let's call it move-pre-front for the lack of a better name; - bubble the element up to some position between [0, i), not necessarily max(0, i - 1). Augmenting the array with the number of queries for each element would tremendously help the decision making, but I'm assuming that's undesirable for a few reasons like: - the array can't be transparently used in algorithms unaware of the structure; - alignment; - data bloating. My immediate thought is to use some heuristic. For instance, say we have some threshold k. If d <= k, we bubble up s <= d positions to the left, where s could be computed using some deterministic formula taking d, q and/or k into account, or just randomly (Andrei's RStF). If d > k, we move-pre-front the element. The threshold k could be computed as a factor of q. Say, sqrt(q), log q or log^2 q (logarithm base 2). Thoughts? Marcelo
Nov 30 2015
prev sibling next sibling parent Richard Hodges <hodges.r gmail.com> writes:
I had a little think about the pathological case of most searches 
being for one of a few elements.

I'm sure my idea is not new, but it seems to me that keeping a 
'hit count' for each element solves this. The decision of whether 
to swap then becomes:

++ Frequency[I]
if I >= Threshold1 then
   choose an index J that is [either I/2 or rand(0...I-1), 
depending on preference]
   if (Frequency[I] - Frequency[J]) >= Threshold2 then
     swap Item[I] and Item[J]
     swap Frequency[I] and Frequency[J]
     I = J
   endif
endif
return I

I wrote a little test in c++ (sorry guys, old habits die hard):

template<class T>
struct mrucache
{
     using vector_type  = std::vector<T>;
     using iterator = typename vector_type::iterator;
     using const_iterator = typename vector_type::const_iterator;

     void add(T item) {
         _items.push_back(std::move(item));
         _frequency.push_back(0);
     }

     template<class Pred>
     iterator find_if(Pred&& pred)
     {
         using std::begin;
         using std::end;
         auto iter = std::find_if(begin(_items),
                                  end(_items),
                                  std::forward<Pred>(pred));
         if (iter != end(_items))
         {
             auto i = std::distance(_items.begin(), iter);
             ++ _frequency[i];
             i = maybe_swap(i);
             iter = std::next(begin(_items), i);
         }
         return iter;
     }

     std::size_t maybe_swap(std::size_t i)
     {
         if (i >= closeness_threshold)
         {
             auto candidate_i = i / 2;
             if ((_frequency[i] - _frequency[candidate_i]) >= 
difference_threshold) {
                 swap(_items[i], _items[candidate_i]);
                 swap(_frequency[i], _frequency[candidate_i]);
                 i = candidate_i;
             }
         }
         return i;
     }

     auto begin() const { return _items.begin(); }
     auto end() const { return _items.end(); }

     static const size_t closeness_threshold = 4;
     static const int difference_threshold = 1;

     std::vector<T> _items;
     std::vector<int> _frequency;
};
Dec 01 2015
prev sibling next sibling parent CraigDillabaugh <craig.dillabaugh gmail.com> writes:
On Monday, 30 November 2015 at 21:33:31 UTC, Andrei Alexandrescu 
wrote:
 Now that we got talking about searching in arrays, allow me to 
 also share an idea I've had a short while ago.

 [...]
Perhaps some strategy similar to Working Sets: https://en.wikipedia.org/wiki/Iacono's_working_set_structure would work (or inspired by the same idea). You move the element from where it is found to T_1, move a random element from T_1 to T_2, from T_2 to T_3 and so on to T_i. In this case rather than trees you would have lists. Maybe that has poor cache locality properties though.
Dec 01 2015
prev sibling parent Jakob Ovrum <jakobovrum gmail.com> writes:
On Monday, 30 November 2015 at 21:33:31 UTC, Andrei Alexandrescu 
wrote:
 Now that we got talking about searching in arrays, allow me to 
 also share an idea I've had a short while ago.
I wrote a range-based implementation to see how it would look like. https://gist.github.com/JakobOvrum/45a37f55ba5c9a7501d6 The tests are very thin but it seems to do the trick.
Dec 08 2015