www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Worst-case performance of quickSort / getPivot

reply "Vladimir Panteleev" <vladimir thecybershadow.net> writes:
I've been investigating an instance that I ran into while 
processing some data, in which multiSort had apparently stalled 
whereas two successive sort calls processed the data quickly. 
After reducing the code and 20 million points, I've reduced the 
problem to this test case:

enum N=1000000;
N.iota.retro.chain(N.only).array.sort();

Apparently, this simple case exhibits worst-case (quadratic) 
complexity in our sort implementation. The .retro isn't necessary 
- this also exhibits quadratic complexity (athough the total 
number of predicate calls is halved), because the first quickSort 
cycle will flip the order of elements during partitioning:

N.iota.chain(0.only).array.sort();

Closer inspection shows that the problem is manifested due to how 
the pivot is chosen. Currently, std.algorithm's getPivot uses the 
median-of-three method (median value of first, middle and last 
element[1]). However, in this case, the median element will 
always be the first element - this remains true in all quickSort 
iterations.

Here is an illustrated example with N=9. Note that at the same 
time as choosing the pivot, getPivot also orders the candidates 
(first, middle and last range elements) according to the sort 
predicate. After calling getPivot, quickSortImpl moves the pivot 
to the end of the range by swapping it with the last element.

getPivot(0..10)
8,7,6,5,4,3,2,1,0,9 <- getPivot - before swap
9,7,6,5,4,8,2,1,0,3 <- getPivot - after swap
9,7,6,5,4,3,2,1,0,8 <- quickSortImpl - after swap
9,8,6,5,4,3,2,1,0,7 <- quickSortImpl - after partition
getPivot(2..10)
6,5,4,3,2,1,0,7 <- getPivot - before swap
7,5,4,3,6,1,0,2 <- getPivot - after swap
7,5,4,3,2,1,0,6 <- quickSortImpl - after swap
7,6,4,3,2,1,0,5 <- quickSortImpl - after partition
(...)

The algorithm gets stuck on "slopes" in sorted data, which are 
not uncommon in real-world data (my data resembled a 1D 
heightmap). Although the median-of-three way of picking a pivot 
is recommended by some sources (notably Sedgewick), perhaps a 
better method would be better suitable for Phobos:

* Many sources recommend using a random element as a pivot. 
According to [2], "Randomized quicksort, for any input, it 
requires only O(n log n) expected time (averaged over all choices 
of pivots)". Also, if it is not possible to predict the pivot 
choice, it is impossible to craft worst-case input, which is a 
plus from a security point[3]. However, I'm not sure if making 
the behavior of std.algorithm's sort nondeterministic is 
desirable.
* It looks like many STL implementations use Introsort[4] - a 
hybrid sorting algorithm which starts with QuickSort, but 
switches to HeapSort if the recursion limit exceeds a threshold.
* I found a recent article[5] which describes an optimal 
algorithm of picking a pivot, however it is behind a paywall.

[1]: Apparently the term is also used to refer to choosing three 
points *randomly*, instead of first/middle/last: 
http://stackoverflow.com/a/164183/21501
[2]: 
http://en.wikipedia.org/wiki/Quicksort#Analysis_of_Randomized_quicksort
[3]: 
https://www.google.com/search?q=quicksort+worst+case+denial+of+service
[4]: http://en.wikipedia.org/wiki/Introsort
[5]: https://news.ycombinator.com/item?id=6629117
Nov 15 2013
next sibling parent reply "Craig Dillabaugh" <cdillaba cg.scs.carleton.ca> writes:
On Friday, 15 November 2013 at 21:46:26 UTC, Vladimir Panteleev
wrote:
clip
 The algorithm gets stuck on "slopes" in sorted data, which are 
 not uncommon in real-world data (my data resembled a 1D 
 heightmap). Although the median-of-three way of picking a pivot 
 is recommended by some sources (notably Sedgewick), perhaps a 
 better method would be better suitable for Phobos:
If you use median-of-five (or seven) then you would start getting reasonable pivots in such an instance, though I am sure someone could craft an input set that defeats that. Does the Sedgewick source mention why 3 is picked? Easy to implement I assume.
 * Many sources recommend using a random element as a pivot. 
 According to [2], "Randomized quicksort, for any input, it 
 requires only O(n log n) expected time (averaged over all 
 choices of pivots)". Also, if it is not possible to predict the 
 pivot choice, it is impossible to craft worst-case input, which 
 is a plus from a security point[3]. However, I'm not sure if 
 making the behavior of std.algorithm's sort nondeterministic is 
 desirable.
What are the arguments against using a randomized algorithm?
 * It looks like many STL implementations use Introsort[4] - a 
 hybrid sorting algorithm which starts with QuickSort, but 
 switches to HeapSort if the recursion limit exceeds a threshold.
 * I found a recent article[5] which describes an optimal 
 algorithm of picking a pivot, however it is behind a paywall.

 [1]: Apparently the term is also used to refer to choosing 
 three points *randomly*, instead of first/middle/last: 
 http://stackoverflow.com/a/164183/21501
 [2]: 
 http://en.wikipedia.org/wiki/Quicksort#Analysis_of_Randomized_quicksort
 [3]: 
 https://www.google.com/search?q=quicksort+worst+case+denial+of+service
 [4]: http://en.wikipedia.org/wiki/Introsort
 [5]: https://news.ycombinator.com/item?id=6629117
Nov 15 2013
next sibling parent "Vladimir Panteleev" <vladimir thecybershadow.net> writes:
On Friday, 15 November 2013 at 22:56:58 UTC, Craig Dillabaugh 
wrote:
 What are the arguments against using a randomized algorithm?
- Inconsistent execution time - someone trying to benchmark Phobos sort functions might get really confused; - QuickSort is an unstable sort. If there is more to the data than the items being compared, the data will end up in a different order on different program runs. - std.algorithm's sort allows specifying arbitrary code as a predicate. If that code has side effects, the program will behave differently on different runs.
Nov 15 2013
prev sibling parent "Xinok" <xinok live.com> writes:
On Friday, 15 November 2013 at 22:56:58 UTC, Craig Dillabaugh 
wrote:
 What are the arguments against using a randomized algorithm?
(1) Sort is capable of being marked pure, depending on the type being sorted and the predicate. But choosing random pivots means introducing side effects. (2) Random pivots invoke average performance, whereas choosing the pivot from a median of N has the potential to achieve better performance on partially sorted lists (fewer reads and writes). (3) I've tested random pivots and found that choosing from a median of three or five is often the better choice.
Nov 16 2013
prev sibling next sibling parent "Chris Cain" <clcain uncg.edu> writes:
On Friday, 15 November 2013 at 21:46:26 UTC, Vladimir Panteleev 
wrote:
 I've been investigating an instance that I ran into while 
 processing some data, in which multiSort had apparently stalled 
 whereas two successive sort calls processed the data quickly. 
 After reducing the code and 20 million points, I've reduced the 
 problem to this test case:

 enum N=1000000;
 N.iota.retro.chain(N.only).array.sort();

 Apparently, this simple case exhibits worst-case (quadratic) 
 complexity in our sort implementation. The .retro isn't 
 necessary - this also exhibits quadratic complexity (athough 
 the total number of predicate calls is halved), because the 
 first quickSort cycle will flip the order of elements during 
 partitioning:

 N.iota.chain(0.only).array.sort();

 Closer inspection shows that the problem is manifested due to 
 how the pivot is chosen. Currently, std.algorithm's getPivot 
 uses the median-of-three method (median value of first, middle 
 and last element[1]). However, in this case, the median element 
 will always be the first element - this remains true in all 
 quickSort iterations.

 Here is an illustrated example with N=9. Note that at the same 
 time as choosing the pivot, getPivot also orders the candidates 
 (first, middle and last range elements) according to the sort 
 predicate. After calling getPivot, quickSortImpl moves the 
 pivot to the end of the range by swapping it with the last 
 element.

 getPivot(0..10)
 8,7,6,5,4,3,2,1,0,9 <- getPivot - before swap
 9,7,6,5,4,8,2,1,0,3 <- getPivot - after swap
 9,7,6,5,4,3,2,1,0,8 <- quickSortImpl - after swap
 9,8,6,5,4,3,2,1,0,7 <- quickSortImpl - after partition
 getPivot(2..10)
 6,5,4,3,2,1,0,7 <- getPivot - before swap
 7,5,4,3,6,1,0,2 <- getPivot - after swap
 7,5,4,3,2,1,0,6 <- quickSortImpl - after swap
 7,6,4,3,2,1,0,5 <- quickSortImpl - after partition
 (...)

 The algorithm gets stuck on "slopes" in sorted data, which are 
 not uncommon in real-world data (my data resembled a 1D 
 heightmap). Although the median-of-three way of picking a pivot 
 is recommended by some sources (notably Sedgewick), perhaps a 
 better method would be better suitable for Phobos:

 * Many sources recommend using a random element as a pivot. 
 According to [2], "Randomized quicksort, for any input, it 
 requires only O(n log n) expected time (averaged over all 
 choices of pivots)". Also, if it is not possible to predict the 
 pivot choice, it is impossible to craft worst-case input, which 
 is a plus from a security point[3]. However, I'm not sure if 
 making the behavior of std.algorithm's sort nondeterministic is 
 desirable.
 * It looks like many STL implementations use Introsort[4] - a 
 hybrid sorting algorithm which starts with QuickSort, but 
 switches to HeapSort if the recursion limit exceeds a threshold.
 * I found a recent article[5] which describes an optimal 
 algorithm of picking a pivot, however it is behind a paywall.

 [1]: Apparently the term is also used to refer to choosing 
 three points *randomly*, instead of first/middle/last: 
 http://stackoverflow.com/a/164183/21501
 [2]: 
 http://en.wikipedia.org/wiki/Quicksort#Analysis_of_Randomized_quicksort
 [3]: 
 https://www.google.com/search?q=quicksort+worst+case+denial+of+service
 [4]: http://en.wikipedia.org/wiki/Introsort
 [5]: https://news.ycombinator.com/item?id=6629117
Improving quicksort is probably a good idea, but IIRC, the stable sort has been changed to Timsort, which I find to be faster in most cases. Probably what should be done is change the default to stable sort and improve the unstable version to be faster in some way. Presumably you could use a "randomized version," but with a constant key. That way the algorithm would be deterministic but likely to have good characteristics in normal inputs. Unfortunately that wouldn't solve the issue of an attacker getting to choose an input and causing worst case performance (it would just make it harder to come up with such an input). Ultimately, using the stable version (Timsort) is safer for security use (hence why I'd suggest it be the default).
Nov 15 2013
prev sibling next sibling parent reply "Ivan Kazmenko" <gassa mail.ru> writes:
On Friday, 15 November 2013 at 21:46:26 UTC, Vladimir Panteleev 
wrote:
 I've been investigating an instance that I ran into while 
 processing some data, in which multiSort had apparently stalled 
 whereas two successive sort calls processed the data quickly. 
 After reducing the code and 20 million points, I've reduced the 
 problem to this test case:
I've been hit by this case too, half a year ago. See the relevant discussion on D.learn: http://forum.dlang.org/thread/drpbkvhlprkdjzhbcqjn forum.dlang.org The fact that the issue is raised once again perhaps means it is worth a bugreport. What do you guys think? Ivan Kazmenko.
Nov 16 2013
next sibling parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 11/16/13 3:29 AM, Ivan Kazmenko wrote:
 On Friday, 15 November 2013 at 21:46:26 UTC, Vladimir Panteleev wrote:
 I've been investigating an instance that I ran into while processing
 some data, in which multiSort had apparently stalled whereas two
 successive sort calls processed the data quickly. After reducing the
 code and 20 million points, I've reduced the problem to this test case:
I've been hit by this case too, half a year ago. See the relevant discussion on D.learn: http://forum.dlang.org/thread/drpbkvhlprkdjzhbcqjn forum.dlang.org The fact that the issue is raised once again perhaps means it is worth a bugreport. What do you guys think? Ivan Kazmenko.
I think a median of five pivot selection is in order. Andrei
Nov 16 2013
prev sibling parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 11/16/13 3:29 AM, Ivan Kazmenko wrote:
 On Friday, 15 November 2013 at 21:46:26 UTC, Vladimir Panteleev wrote:
 I've been investigating an instance that I ran into while processing
 some data, in which multiSort had apparently stalled whereas two
 successive sort calls processed the data quickly. After reducing the
 code and 20 million points, I've reduced the problem to this test case:
I've been hit by this case too, half a year ago. See the relevant discussion on D.learn: http://forum.dlang.org/thread/drpbkvhlprkdjzhbcqjn forum.dlang.org The fact that the issue is raised once again perhaps means it is worth a bugreport. What do you guys think? Ivan Kazmenko.
Yah, I thought one is in place already. Andrei
Nov 16 2013
prev sibling next sibling parent reply "Jean Christophe" <cybrarian live.fr> writes:
On Friday, 15 November 2013 at 21:46:26 UTC, Vladimir Panteleev 
wrote:

 getPivot(0..10)
 8,7,6,5,4,3,2,1,0,9 <- getPivot - before swap
 9,7,6,5,4,8,2,1,0,3 <- getPivot - after swap
 9,7,6,5,4,3,2,1,0,8 <- quickSortImpl - after swap
 9,8,6,5,4,3,2,1,0,7 <- quickSortImpl - after partition
 getPivot(2..10)
 6,5,4,3,2,1,0,7 <- getPivot - before swap
 7,5,4,3,6,1,0,2 <- getPivot - after swap
 7,5,4,3,2,1,0,6 <- quickSortImpl - after swap
 7,6,4,3,2,1,0,5 <- quickSortImpl - after partition
 (...)
One possible implementation suggests to swap Left and Right immediatly after choosing the Pivot (if Left > Right), then place the Pivot at Right-1. It seems that this option was not taken. Any reason? As the efficiency of Quicksort is known to be bad in sorting a small number of elements, ie. < 10, it might be nice to implement an option to automatically switch to a more appropriate algorithm if it's relevant to do so.
 * Many sources recommend using a random element as a pivot. 
 According to [2], "Randomized quicksort, for any input, it 
 requires only O(n log n) expected time (averaged over all 
 choices of pivots)".
IMO it would be costly and not so relevant if the goal is to be fast.
 Also, if it is not possible to predict the pivot choice, it is 
 impossible to craft worst-case input, which is a plus from a 
 security point[3]. However, I'm not sure if making the behavior 
 of std.algorithm's sort nondeterministic is desirable.
I think it's not desirable. -- Quicksorting a collection of Objects? BTW I'm very interested in finding a library which could Quicksort an array of pointers, where each pointer points to a class object (or a structure) address. The library would make possible, for example, to sort the `class objects` using one of their members as the key. Because the swaps are done on the actual pointers (and not the Objects pointed), the Quicksort should be very efficient. However, the algorithm woulnd't be so trivial to implement because each comparison shall be done using the Object's member's value : eg. Obj1.foo < Obj2.foo. Of course, the programmer can choose any relevant member property to sort the collection. And it should be as easy to use as: class SomeClass { string foo; double bar;} SomeClass[] a; // put 100 000 objects into a a.sort("foo"); Do we already have something like that available somewhere or is it possible to make one eventually? -- JC
Nov 16 2013
next sibling parent reply "Vladimir Panteleev" <vladimir thecybershadow.net> writes:
On Saturday, 16 November 2013 at 14:20:32 UTC, Jean Christophe 
wrote:
 On Friday, 15 November 2013 at 21:46:26 UTC, Vladimir Panteleev 
 wrote:

 getPivot(0..10)
 8,7,6,5,4,3,2,1,0,9 <- getPivot - before swap
 9,7,6,5,4,8,2,1,0,3 <- getPivot - after swap
 9,7,6,5,4,3,2,1,0,8 <- quickSortImpl - after swap
 9,8,6,5,4,3,2,1,0,7 <- quickSortImpl - after partition
 getPivot(2..10)
 6,5,4,3,2,1,0,7 <- getPivot - before swap
 7,5,4,3,6,1,0,2 <- getPivot - after swap
 7,5,4,3,2,1,0,6 <- quickSortImpl - after swap
 7,6,4,3,2,1,0,5 <- quickSortImpl - after partition
 (...)
One possible implementation suggests to swap Left and Right immediatly after choosing the Pivot (if Left > Right), then place the Pivot at Right-1. It seems that this option was not taken. Any reason?
getPivot sorts the first, middle and last element in the range right after it figures out their order. Is there any advantage to your suggestion over this?
 As the efficiency of Quicksort is known to be bad in sorting a 
 small number of elements, ie. < 10, it might be nice to 
 implement an option to automatically switch to a more 
 appropriate algorithm if it's relevant to do so.
It does, actually - it switches to optimisticInsertionSort for ranges of 25 elements or shorter. I disabled that behavior for illustration.
 IMO it would be costly and not so relevant if the goal is to be 
 fast.
The impact shouldn't be big if a simple RNG, like LCG or XorShift, was used.
 Quicksorting a collection of Objects?
?
 BTW I'm very interested in finding a library which could 
 Quicksort an array of pointers, where each pointer points to a 
 class object (or a structure) address. The library would make 
 possible, for example, to sort the `class objects` using one of 
 their members as the key. Because the swaps are done on the 
 actual pointers (and not the Objects pointed), the Quicksort 
 should be very efficient. However, the algorithm woulnd't be so 
 trivial to implement because each comparison shall be done 
 using the Object's member's value :

 eg. Obj1.foo < Obj2.foo.

 Of course, the programmer can choose any relevant member 
 property to sort the collection. And it should be as easy to 
 use as:

 class SomeClass { string foo; double bar;}
 SomeClass[] a;
 // put 100 000 objects into a
 a.sort("foo");

 Do we already have something like that available somewhere or 
 is it possible to make one eventually?
You mean, sort!`a.foo < b.foo` ?
Nov 16 2013
parent reply "Jean Christophe" <cybrarian live.fr> writes:
 BTW I'm very interested in finding a library which could 
 Quicksort an array of pointers, where each pointer points to a 
 class object (or a structure) address. The library would make 
 possible, for example, to sort the `class objects` using one 
 of their members as the key. Because the swaps are done on the 
 actual pointers (and not the Objects pointed), the Quicksort 
 should be very efficient. However, the algorithm woulnd't be 
 so trivial to implement because each comparison shall be done 
 using the Object's member's value :

 eg. Obj1.foo < Obj2.foo.

 Of course, the programmer can choose any relevant member 
 property to sort the collection. And it should be as easy to 
 use as:

 class SomeClass { string foo; double bar;}
 SomeClass[] a;
 // put 100 000 objects into a
 a.sort("foo");

 Do we already have something like that available somewhere or 
 is it possible to make one eventually?
You mean, sort!`a.foo < b.foo` ?
Yes. An indirect sorting, assuming a and b to be ojects of class SomePotentialyLargeClass. Because the array to sort contains pointers only, all the data movement is essentially the same as if we were sorting integer. -- JC
Nov 16 2013
next sibling parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 11/16/13 9:30 PM, Jean Christophe wrote:
 An indirect sorting, assuming a and b to be ojects of class
 SomePotentialyLargeClass.

 Because the array to sort contains pointers only, all the data movement
 is essentially the same as if we were sorting integer.
Maybe makeIndex may be of help. http://dlang.org/phobos/std_algorithm.html#makeIndex Andrei
Nov 16 2013
prev sibling parent "Vladimir Panteleev" <vladimir thecybershadow.net> writes:
On Sunday, 17 November 2013 at 05:30:24 UTC, Jean Christophe 
wrote:
 You mean, sort!`a.foo < b.foo` ?
Yes. An indirect sorting, assuming a and b to be ojects of class SomePotentialyLargeClass. Because the array to sort contains pointers only, all the data movement is essentially the same as if we were sorting integer.
If the range elements are reference types, that's what will happen (unless they overload opAssign oslt). Otherwise, there's makeIndex (already mentioned by Andrei), or you could also do it by hand: 1. r.length.iota.array.sort!((a, b) => r[a]<r[b]); 2. r.length.iota.map!(a => &r[a]).array.sort!((a, b) => *a<*b); r.map!`&a` doesn't work though, because we don't get a reference to the range element in the predicate.
Nov 17 2013
prev sibling next sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 11/16/13 6:20 AM, Jean Christophe wrote:
 On Friday, 15 November 2013 at 21:46:26 UTC, Vladimir Panteleev wrote:

 getPivot(0..10)
 8,7,6,5,4,3,2,1,0,9 <- getPivot - before swap
 9,7,6,5,4,8,2,1,0,3 <- getPivot - after swap
 9,7,6,5,4,3,2,1,0,8 <- quickSortImpl - after swap
 9,8,6,5,4,3,2,1,0,7 <- quickSortImpl - after partition
 getPivot(2..10)
 6,5,4,3,2,1,0,7 <- getPivot - before swap
 7,5,4,3,6,1,0,2 <- getPivot - after swap
 7,5,4,3,2,1,0,6 <- quickSortImpl - after swap
 7,6,4,3,2,1,0,5 <- quickSortImpl - after partition
 (...)
One possible implementation suggests to swap Left and Right immediatly after choosing the Pivot (if Left > Right), then place the Pivot at Right-1. It seems that this option was not taken. Any reason?
That may help this particular situation, but does it do anything interesting in general?
 As the efficiency of Quicksort is known to be bad in sorting a small
 number of elements, ie. < 10, it might be nice to implement an option to
 automatically switch to a more appropriate algorithm if it's relevant to
 do so.
That's done already.
 * Many sources recommend using a random element as a pivot. According
 to [2], "Randomized quicksort, for any input, it requires only O(n log
 n) expected time (averaged over all choices of pivots)".
IMO it would be costly and not so relevant if the goal is to be fast.
 Also, if it is not possible to predict the pivot choice, it is
 impossible to craft worst-case input, which is a plus from a security
 point[3]. However, I'm not sure if making the behavior of
 std.algorithm's sort nondeterministic is desirable.
I think it's not desirable.
Vladimir also enumerated a few disadvantages of random pivot selection. I think those notwithstanding, random pivot is an option we should consider very seriously. It's easy to use a fast random number generators (randomness doesn't need to be very good). The only reason for which I've hesitated to add random pivot selection is that it's less usual and might surprise some. It is _very_ solidly motivated.
 Quicksorting a collection of Objects?

 BTW I'm very interested in finding a library which could Quicksort an
 array of pointers, where each pointer points to a class object (or a
 structure) address. The library would make possible, for example, to
 sort the `class objects` using one of their members as the key. Because
 the swaps are done on the actual pointers (and not the Objects pointed),
 the Quicksort should be very efficient. However, the algorithm woulnd't
 be so trivial to implement because each comparison shall be done using
 the Object's member's value :

 eg. Obj1.foo < Obj2.foo.

 Of course, the programmer can choose any relevant member property to
 sort the collection. And it should be as easy to use as:

 class SomeClass { string foo; double bar;}
 SomeClass[] a;
 // put 100 000 objects into a
 a.sort("foo");

 Do we already have something like that available somewhere or is it
 possible to make one eventually?
I had a diff at a point that added overloads to sort allowing the called to only specify the key. It turned out to be confusing, and all it saved was to save a tad of duplication "expr" instead of "expr(a) < expr(b)". Andrei
Nov 16 2013
parent "Jean Christophe" <cybrarian live.fr> writes:
On Saturday, 16 November 2013 at 16:10:56 UTC, Andrei 
Alexandrescu wrote:
 On 11/16/13 6:20 AM, Jean Christophe wrote:
 On Friday, 15 November 2013 at 21:46:26 UTC, Vladimir 
 Panteleev wrote:

 getPivot(0..10)
 8,7,6,5,4,3,2,1,0,9 <- getPivot - before swap
 9,7,6,5,4,8,2,1,0,3 <- getPivot - after swap
 9,7,6,5,4,3,2,1,0,8 <- quickSortImpl - after swap
 9,8,6,5,4,3,2,1,0,7 <- quickSortImpl - after partition
 getPivot(2..10)
 6,5,4,3,2,1,0,7 <- getPivot - before swap
 7,5,4,3,6,1,0,2 <- getPivot - after swap
 7,5,4,3,2,1,0,6 <- quickSortImpl - after swap
 7,6,4,3,2,1,0,5 <- quickSortImpl - after partition
 (...)
One possible implementation suggests to swap Left and Right immediatly after choosing the Pivot (if Left > Right), then place the Pivot at Right-1. It seems that this option was not taken. Any reason?
That may help this particular situation, but does it do anything interesting in general?
Yes. This has the extra advantage that the smallest of the three winds up in A[left], which is where the partitioning routine would put it anyway. The largest winds up at A[right] which is also the correct place, since it is larger than the Pivot. Therefore you can place the Pivot in A[right -1] and initialize i and j to (left+1) and (right-2) in the partition phase. Another benefit is that because A[left] is smaller than the Pivot, it will act as a sentinel for j. We do not need to worry about j running past the end. Same for i. Thus you assert: A[left] <= A[center] <= A[right] even before you hide the Pivot. -- JC
Nov 16 2013
prev sibling parent "Craig Dillabaugh" <cdillaba cg.scs.carleton.ca> writes:
On Saturday, 16 November 2013 at 14:20:32 UTC, Jean Christophe
wrote:
clip
 BTW I'm very interested in finding a library which could 
 Quicksort an array of pointers, where each pointer points to a 
 class object (or a structure) address. The library would make 
 possible, for example, to sort the `class objects` using one of 
 their members as the key. Because the swaps are done on the 
 actual pointers (and not the Objects pointed), the Quicksort 
 should be very efficient. However, the algorithm woulnd't be so 
 trivial to implement because each comparison shall be done 
 using the Object's member's value :

 eg. Obj1.foo < Obj2.foo.

 Of course, the programmer can choose any relevant member 
 property to sort the collection. And it should be as easy to 
 use as:

 class SomeClass { string foo; double bar;}
 SomeClass[] a;
 // put 100 000 objects into a
 a.sort("foo");

 Do we already have something like that available somewhere or 
 is it possible to make one eventually?

 -- JC
This is sort of an interesting problem that you propose, because while sorting just the pointers means moving things around less, you would also get absolutely horrible locality of reference/cache hit performance. Have you thought about ways to deal with that?
Nov 16 2013
prev sibling next sibling parent reply "Xinok" <xinok live.com> writes:
On Friday, 15 November 2013 at 21:46:26 UTC, Vladimir Panteleev 
wrote:
 ...
I'm a horrible procrastinator, I should have had this fixed over a year ago, lol. Anyways, I posted a bug report about this problem long ago and a working solution has been gathering dust waiting to be implemented in Phobos. Bug report (poorly titled): http://d.puremagic.com/issues/show_bug.cgi?id=7767 Solution: https://github.com/Xinok/XSort/blob/master/unstablesort.d * It's an introsort which falls back to a bottom-up binary heap sort after so many recursions to guarantee O(n log n) performance in the worst-case. * It chooses the pivot from a median-of-five. It adds nearly zero overhead and results in a noticeable decrease in comparisons (predicate calls). I see no reason why not to add this. * My solution uses a binary insertion sort for sublists up to 32 elements long. This reduces comparisons as well but also adds a fair amount of overhead. It's about 15-20% slower when sorting an array of integers. My idea is to use this by default and specialize the linear insertion sort when the element type and predicate are known. * Regardless of these improvements, I think Timsort should be the default sorting algorithm. It's the better choice in many (most?) cases and, well, it's stable. Quick sort would still be available for those cases in which it's faster and stable sorting isn't needed. Well, it's Saturday and I have no plans. Let's see if I can't get a pull request done by the end of the day...
Nov 16 2013
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 11/16/13 11:46 AM, Xinok wrote:
 * Regardless of these improvements, I think Timsort should be the
 default sorting algorithm. It's the better choice in many (most?) cases
 and, well, it's stable. Quick sort would still be available for those
 cases in which it's faster and stable sorting isn't needed.
There's something fishy about a more restricted algorithm doing better than one that's less restricted. We must improve unstable sort, not make stable sort the default. Andrei
Nov 16 2013
parent reply "Xinok" <xinok live.com> writes:
On Saturday, 16 November 2013 at 20:35:21 UTC, Andrei 
Alexandrescu wrote:
 On 11/16/13 11:46 AM, Xinok wrote:
 * Regardless of these improvements, I think Timsort should be 
 the
 default sorting algorithm. It's the better choice in many 
 (most?) cases
 and, well, it's stable. Quick sort would still be available 
 for those
 cases in which it's faster and stable sorting isn't needed.
There's something fishy about a more restricted algorithm doing better than one that's less restricted. We must improve unstable sort, not make stable sort the default. Andrei
Timsort is an "adaptive" sort. It's able to achieve better performance in many cases by exploiting patterns in the data. I decided to make a nice little test using my benchmark code here: https://github.com/Xinok/XSort/blob/master/benchsort.d This is the permutation I designed for the test: static uint[] base; base.length = 1024 * 1024; foreach(i, ref v; base) v = i; foreach(i; 0..1024) { auto rand() { rndGen.popFront(); return rndGen.front % base.length; } switch(rand() % 3) { // Swap two ranges of elements case 0: { auto arr = [rand(), rand(), rand(), rand()]; sort(arr); swapRanges(base[arr[0]..arr[1]], base[arr[2]..arr[3]]); } break; // Reverse a range of elements case 1: { auto a = rand(), b = rand(); if(a > b) swap(a, b); reverse(base[a..b]); } break; // Shuffle a small range of elements case 2: { auto a = rand(); while(a < 1024) a = rand(); assert(a >= 1024); randomShuffle(base[a - 1024 .. a]); } break; default: assert(0); } } And the results (last number is predicate calls): Current Unstable Sort 50ms 32783474 New Unstable Sort 69ms 21503542 Timsort 35ms 3905887 That's why I suggest making Timsort the default sorting algorithm.
Nov 16 2013
next sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 11/16/13 2:11 PM, Xinok wrote:
 On Saturday, 16 November 2013 at 20:35:21 UTC, Andrei Alexandrescu wrote:
 On 11/16/13 11:46 AM, Xinok wrote:
 * Regardless of these improvements, I think Timsort should be the
 default sorting algorithm. It's the better choice in many (most?) cases
 and, well, it's stable. Quick sort would still be available for those
 cases in which it's faster and stable sorting isn't needed.
There's something fishy about a more restricted algorithm doing better than one that's less restricted. We must improve unstable sort, not make stable sort the default. Andrei
Timsort is an "adaptive" sort. It's able to achieve better performance in many cases by exploiting patterns in the data.
This is in response to what? Are you trying to talk me out of the pigeonhole principle?
 I decided to make a nice little test using my benchmark code here:
 https://github.com/Xinok/XSort/blob/master/benchsort.d
I understand and appreciate that Timsort is a nicely optimized algorithm. This has nothing to do with it doing more work than an unstable sort that is just as nicely optimized. At the end of the day whatever stable sorting algorithms are out there, their set is included in the set of all sorting algorithms. In order to preserve stability, stable sorting algorithms need to do nonnegative extra work. Andrei
Nov 16 2013
parent reply "Chris Cain" <clcain uncg.edu> writes:
On Saturday, 16 November 2013 at 23:40:39 UTC, Andrei 
Alexandrescu wrote:
 This is in response to what? Are you trying to talk me out of 
 the pigeonhole principle?
...
 I understand and appreciate that Timsort is a nicely optimized 
 algorithm. This has nothing to do with it doing more work than 
 an unstable sort that is just as nicely optimized.

 At the end of the day whatever stable sorting algorithms are 
 out there, their set is included in the set of all sorting 
 algorithms. In order to preserve stability, stable sorting 
 algorithms need to do nonnegative extra work.


 Andrei
I think it's more complicated than that. Let's assume for a moment that you've proven that such an unstable sort must exist that is faster (I'm not convinced that it necessarily must take extra work to maintain stability). You have not shown how much faster it might be (it could be only 1% faster) nor how much work it would take to discover (even an ideal pivot choice for quicksort actually cannot be as fast as Timsort on an already sorted sequence, and quicksort is not an appropriate algorithm for accepting presorted subsequences). If it takes 2 years to come up with an unstable sort faster than Timsort, then it seems like a long time to be using something that isn't the best that we currently have. Especially if D is to be used in benchmarks against other languages. Timsort is implemented now and has no real cost to use as default. If, at some point, an unstable sort is discovered that is faster on average than Timsort AND (extremely important) it has no potential for an attacker to be able to choose a sequence to sort that causes poor performance, then unstable could be switched back as the default at some point. Right now, this could, in fact, cause a security hole (DOS-type attack) depending on how it is used. I think it's better to use a safer default in this case, even if we could make it faster than Timsort somehow.
Nov 16 2013
next sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 11/16/13 4:18 PM, Chris Cain wrote:
 I think it's more complicated than that. Let's assume for a moment that
 you've proven that such an unstable sort must exist that is faster (I'm
 not convinced that it necessarily must take extra work to maintain
 stability).
Very simple. Any sequence with a large number of equivalent elements will cause timsort to work more than an unstable algorithm, because it would need to shift around those elements. (Consider e.g. 1, 0, 0, 0, ..., 0.) The discussion then shifts to how many equivalent elements are in the typical range etc.
 You have not shown how much faster it might be (it could be
 only 1% faster) nor how much work it would take to discover (even an
 ideal pivot choice for quicksort actually cannot be as fast as Timsort
 on an already sorted sequence, and quicksort is not an appropriate
 algorithm for accepting presorted subsequences).
There are well known tweaks to quicksort that make it operate well on sorted or almost sorted sequences.
 If it takes 2 years to
 come up with an unstable sort faster than Timsort, then it seems like a
 long time to be using something that isn't the best that we currently
 have. Especially if D is to be used in benchmarks against other languages.

 Timsort is implemented now and has no real cost to use as default. If,
 at some point, an unstable sort is discovered that is faster on average
 than Timsort AND (extremely important) it has no potential for an
 attacker to be able to choose a sequence to sort that causes poor
 performance, then unstable could be switched back as the default at some
 point.
We could do that, but that would be masking a bug instead of fixing it. We should instead focus on fixing unstable sort. One additional issue with making stable sort the default for a while is that people will grow to assume sort() is stable, and will have broken code when we switch back. Let's fix unstable sort. If it's slower on any other data than Timsort's happy case, we have a bug. Andrei
Nov 16 2013
next sibling parent "Chris Cain" <clcain uncg.edu> writes:
On Sunday, 17 November 2013 at 00:56:46 UTC, Andrei Alexandrescu 
wrote:
 Very simple. Any sequence with a large number of equivalent 
 elements will cause timsort to work more than an unstable 
 algorithm, because it would need to shift around those 
 elements. (Consider e.g. 1, 0, 0, 0, ..., 0.) The discussion 
 then shifts to how many equivalent elements are in the typical 
 range etc.
Ah, very nicely described. So a stable sort can take more time when there is more than one equal element than a theoretical unstable sort.
 There are well known tweaks to quicksort that make it operate 
 well on sorted or almost sorted sequences.
I'd have to see them to comment on them. I'd doubt that they only use quicksort if they support taking advantage of large parts of the array being sorted. I'd expect some sort of merging to occur in that case.
 We could do that, but that would be masking a bug instead of 
 fixing it. We should instead focus on fixing unstable sort. One 
 additional issue with making stable sort the default for a 
 while is that people will grow to assume sort() is stable, and 
 will have broken code when we switch back.

 Let's fix unstable sort. If it's slower on any other data than 
 Timsort's happy case, we have a bug.
Sure, but could we at least agree on a timescale? If we can't come up with something significantly better than Timsort in the next year, it seems pointless to hold out for something that may never come. Hidden bug or not, deliberately exposing a bug that has no effective plan to be fixed isn't a good tactic either. Especially if D is expected to be competitive to other native languages in speed. Seeing D falter in the "sort" column of a shootout could be disconcerting to some and we can't count on everyone knowing to specify stable sort in those head-to-heads until we work out improving unstable sort. That said, your point that people might grow comfortable with sort being stable is a good one. Unfortunately, I don't have an answer for that. Documentation detailing that sort is not guaranteed a particular stability is about the best that could be done. Perhaps a variant of Timsort could be prepared that deliberately changes the order of equal elements. Designed instability could suffice as a stopgap against that (at least until the unstable sort is worked out).
Nov 16 2013
prev sibling parent reply "Xinok" <xinok live.com> writes:
On Sunday, 17 November 2013 at 00:56:46 UTC, Andrei Alexandrescu 
wrote:
 On 11/16/13 4:18 PM, Chris Cain wrote:
 You have not shown how much faster it might be (it could be
 only 1% faster) nor how much work it would take to discover 
 (even an
 ideal pivot choice for quicksort actually cannot be as fast as 
 Timsort
 on an already sorted sequence, and quicksort is not an 
 appropriate
 algorithm for accepting presorted subsequences).
There are well known tweaks to quicksort that make it operate well on sorted or almost sorted sequences.
If you tweak for one scenario, you're only losing in other scenarios. Timsort already performs ideally in these cases, as I've demonstrated. Rather than optimizing quicksort for cases in which Timsort will still be the victor, we should optimize for those cases in which quicksort is already faster.
Nov 16 2013
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 11/16/13 6:42 PM, Xinok wrote:
 On Sunday, 17 November 2013 at 00:56:46 UTC, Andrei Alexandrescu wrote:
 On 11/16/13 4:18 PM, Chris Cain wrote:
 You have not shown how much faster it might be (it could be
 only 1% faster) nor how much work it would take to discover (even an
 ideal pivot choice for quicksort actually cannot be as fast as Timsort
 on an already sorted sequence, and quicksort is not an appropriate
 algorithm for accepting presorted subsequences).
There are well known tweaks to quicksort that make it operate well on sorted or almost sorted sequences.
If you tweak for one scenario, you're only losing in other scenarios.
I am not so sure about that.
 Timsort already performs ideally in these cases, as I've demonstrated.
Yes, it is also my understanding that Timsort does well on almost sorted data. The issue is one does not know a priori what the distribution of the data is.
 Rather than optimizing quicksort for cases in which Timsort will still
 be the victor, we should optimize for those cases in which quicksort is
 already faster.
We should have a "good unstable sort" in addition to the "good stable sort" we already have. Seeing this as a competition between quicksort and timsort would be the wrong frame. Probably a good step forward would be to hook a sort benchmarking corpus to our unittests. What are the consecrated corpora? Andrei
Nov 16 2013
parent reply "Chris Cain" <clcain uncg.edu> writes:
On Sunday, 17 November 2013 at 04:14:22 UTC, Andrei Alexandrescu 
wrote:
 Probably a good step forward would be to hook a sort 
 benchmarking corpus to our unittests. What are the consecrated 
 corpora?
I'll kick off the suggestions with this: File to provide "real" data: ftp://ftp.ncbi.nih.gov/genomes/Bacteria/Escherichia_coli_K_12_substr__MG1655_uid57779/NC_000913.fna From http://www.genome.jp/kegg-bin/show_organism?org=eco -> RefSeq -> NC_000913.fna (despite this being real data, typically you don't actually sort genomes like this ... it's simply "more interesting" than a pseudo-randomized dataset) Code: --- import std.range, std.algorithm, std.stdio, std.array, std.conv, std.datetime; enum strategy = SwapStrategy.stable; void main() { auto arr = File("NC_000913.fna").byLine .drop(1) // Trim the first line because it's just metadata /*.take(50)*/ .join .chunks(50) .map!text .array; StopWatch sw = StopWatch(AutoStart.yes); arr.sort!("a < b", strategy)(); sw.stop(); writeln(text(arr.length, " elements sorted in ", sw.peek.msecs, " msecs")); } --- On my system, this shows unstable sort to be ~2x faster than stable sort (which is to be expected of data with no discernible patterns). The "key" thing this is testing is the effectiveness of the sort on data that often takes longer to compare (with such a small alphabet, many strings have a common prefix). That said, it might also be reproduced "well enough" using a random generator to create similar strings to sort, but the basic idea is there. I just like using real genomes for performance testing things :)
Nov 16 2013
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 11/16/13 9:21 PM, Chris Cain wrote:
 That said, it might also be reproduced "well enough" using a random
 generator to create similar strings to sort, but the basic idea is
 there. I just like using real genomes for performance testing things :)
I am hoping for some more representative corpora, along the lines of http://sortbenchmark.org/. Some data that we can use as good proxies for typical application usage. Andrei
Nov 16 2013
parent reply "Chris Cain" <clcain uncg.edu> writes:
On Sunday, 17 November 2013 at 07:19:26 UTC, Andrei Alexandrescu 
wrote:
 On 11/16/13 9:21 PM, Chris Cain wrote:
 That said, it might also be reproduced "well enough" using a 
 random
 generator to create similar strings to sort, but the basic 
 idea is
 there. I just like using real genomes for performance testing 
 things :)
I am hoping for some more representative corpora, along the lines of http://sortbenchmark.org/. Some data that we can use as good proxies for typical application usage. Andrei
I think I get what you're saying, but sortbenchmark.org uses completely pseudorandom (but reproducable) entries that I don't think are representative of real data either: (using gensort -a minus the verification columns) --- AsfAGHM5om ~sHd0jDv6X uI^EYm8s=| Q)JN)R9z-L o4FoBkqERn *}-Wz1;TD- 0fssx}~[oB ... --- Most places use very fake data as proxies for real data. It's better to have something somewhat structured and choose data that, despite not being real data, stresses the benchmark in a unique way. I'm not suggesting my benchmark be the only one; if we're going to use pseudorandom data (I'm not certain we could actually get "realistic data" that would serve us that much better) we might as well have different test cases that stress the sort routine in different ways. Obviously, using the real genome would be preferable to generating some (since it's actually truly "real" data, just used in an unorthodox way) but there's a disadvantage to attaching a 4.6MB file to a benchmarking setup. Especially if more might come. Anyway, it's a reasonable representation of "data that has no discernable order that can occasionally take some time to compare." Think something like sorting a list of customer records by name. If they're ordered by ID, then the names would not likely have a discernable pattern and the comparison between names might be "more expensive" because some names can be common. We can do "more realistic" for that type of scenario, if you'd like. I could look at a distribution for last names/first names and generate fake names to sort in a reasonable approximation of a distribution of real names. I'm not certain the outcome would change that much.
Nov 17 2013
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 11/17/13 6:20 AM, Chris Cain wrote:
 I'm not suggesting my benchmark be the only one; if we're going to use
 pseudorandom data (I'm not certain we could actually get "realistic
 data" that would serve us that much better) we might as well have
 different test cases that stress the sort routine in different ways.
 Obviously, using the real genome would be preferable to generating some
 (since it's actually truly "real" data, just used in an unorthodox way)
 but there's a disadvantage to attaching a 4.6MB file to a benchmarking
 setup. Especially if more might come.
OK, since I see you have some interest... You said nobody would care to actually sort genome data. I'm aiming for data that's likely to be a good proxy for tasks people are interested in doing. For example, people may be interested in sorting floating-point numbers resulting from sales, measurements, frequencies, probabilities, and whatnot. Since most of those have a Gaussian distribution, a corpus with Gaussian-distributed measurements would be nice. Then, people may want to sort things by date/time. Depending on the scale the distribution is different - diurnal cycle, weekly cycle, seasonal cycle, secular ebbs and flows etc. I'm unclear on what would be a good set of data. For sub-day time ranges uniform distribution may be appropriate. Then, people may want to sort records by e.g. Lastname, Firstname, or index a text by words. For names we'd need some census data or phonebook. For general text sorting we can use classic texts such as Alice in Wonderland or the King James Bible (see http://corpus.canterbury.ac.nz/descriptions/). Sorting by word length is a possibility (word lengths are probably Gaussian-distributed). Uniform random data is also a baseline, not terribly representative, but worth keeping an eye on. In fact uniform data is unfairly rigged in quicksort's favor: any pivot is likely to be pretty good, and there are no sorted runs that often occur in real data. Andrei
Nov 17 2013
parent reply "Chris Cain" <clcain uncg.edu> writes:
On Sunday, 17 November 2013 at 16:57:19 UTC, Andrei Alexandrescu 
wrote:
 Then, people may want to sort records by e.g. Lastname, 
 Firstname, or index a text by words. For names we'd need some 
 census data or phonebook.
Sure. As a source of data, app_c.csv from http://www.census.gov/genealogy/www/data/2000surnames/ -> File B: Surnames... Test case: --- import std.stdio, std.algorithm, std.random, std.range, std.conv, std.datetime; // Name generator constants enum NumberOfPossibleNames = 1_000_000; enum NumberOfRandomNames = 1_000_000; enum SortStrategy = SwapStrategy.unstable; void main() { auto data = File("app_c.csv").byLine .drop(1) // First line is just column names .take(NumberOfPossibleNames) .map!(e => e.text.split(",")) .array; auto names = data.map!(e => e[0]).array; auto proportions = data.map!(e => e[2].to!size_t).array; auto rnd = Random(50); auto randomNames = fastDice(rnd, proportions) .take(NumberOfRandomNames) .map!(i => names[i]) .array; StopWatch sw = AutoStart.yes; randomNames.sort!("a < b", SortStrategy)(); sw.stop(); writeln(randomNames.length, " names sorted in ", sw.peek.msecs, " msecs using ", SortStrategy, " sort"); } struct FastDice(Rng, SearchPolicy pol = SearchPolicy.gallop) { SortedRange!(size_t[]) _propCumSumSorted; size_t _sum; size_t _front; Rng* _rng; this(ref Rng rng, size_t[] proportions) { size_t[] _propCumSum = proportions.save.array; _rng = &rng; size_t mass = 0; foreach(ref e; _propCumSum) { mass += e; e = mass; } _sum = _propCumSum.back; _propCumSumSorted = assumeSorted(_propCumSum); popFront(); } void popFront() { immutable point = uniform(0, _sum, *_rng); assert(point < _sum); _front = _propCumSumSorted.lowerBound!pol(point).length; } enum empty = false; property auto front() { return _front; } } auto fastDice(Rng)(ref Rng rng, size_t[] proportions) { return FastDice!(Rng)(rng, proportions); } auto fastDice(size_t[] proportions) { return fastDice(rndGen, proportions); } --- Results (using `-O -inline -release -noboundschecks` on my computer): unstable sort: 738 msecs stable sort: 1001 msecs Also, to do this (in reasonable time) I had to create a new range which I called "FastDice" ... it does the same as std.random.dice, but is intended for cases where you'll be throwing dice throws often on the same data, so it does a bit of precomputation (creating a cumulative sum array) and allows for binary search/gallop to reduce the time to come up with each throw. I opted for gallop in this since the data is sorted in such a way that most common names come first. It needs a bit of work to make it actually generic, but it's a good start and I'll just say it's WTFPL code, so use it for whatever. It'll be especially good for generating test cases like I have done above. FWIW, FastDice takes ~400ms to generate all those dice throws. I did a back-of-the-envelope calculation on using dice, and just the time saved caching the sum saved maybe 30 minutes (No, I didn't wait that long, I stopped after 5 and wrote FastDice) of time each run. And the time saved using a gallop search instead of linear search is pretty significant as well (difference between n and log n time).
Nov 17 2013
next sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 11/17/13 1:23 PM, Chris Cain wrote:
 On Sunday, 17 November 2013 at 16:57:19 UTC, Andrei Alexandrescu wrote:
 Then, people may want to sort records by e.g. Lastname, Firstname, or
 index a text by words. For names we'd need some census data or phonebook.
Sure. As a source of data, app_c.csv from http://www.census.gov/genealogy/www/data/2000surnames/ -> File B: Surnames...
[snip]
 It needs a bit of work to make it actually generic, but it's a good
 start and I'll just say it's WTFPL code, so use it for whatever. It'll
 be especially good for generating test cases like I have done above.

 FWIW, FastDice takes ~400ms to generate all those dice throws. I did a
 back-of-the-envelope calculation on using dice, and just the time saved
 caching the sum saved maybe 30 minutes (No, I didn't wait that long, I
 stopped after 5 and wrote FastDice) of time each run. And the time saved
 using a gallop search instead of linear search is pretty significant as
 well (difference between n and log n time).
I think that's a terrific start! (Not sure I understand where the 30 minutes come from...) Andrei
Nov 17 2013
parent reply "Chris Cain" <clcain uncg.edu> writes:
On Monday, 18 November 2013 at 02:36:40 UTC, Andrei Alexandrescu 
wrote:
 I think that's a terrific start!
Thanks :)
 (Not sure I understand where the 30 minutes come from...)
On my computer (`-release -inline -noboundscheck` ... `-O` is omitted because it removes the summation all together since it isn't used anywhere): --- auto benchmarkSumming() { auto proportions = iota(150_000); // copied from std.random.dice: auto sum = reduce!("(assert(b >= 0), a + b)")(0.0, proportions.save); } auto bench = benchmark!benchmarkSumming(1_000); writeln(bench.front.msecs, " ms"); --- Gives me 1618 ms. ~150_000 is the number of entries in the probability table (number of surnames), and generating the sum for 1_000 names would take about that long. I really wanted 1 million names to sort, so scaling that up would mean 1618 seconds, which is nearly 27 minutes. That summation doesn't change, hence the importance of caching it for repeated runs. That's about 27 minutes of completely wasted work. That all isn't to say that std.random.dice is fundamentally flawed (it's fine and perfectly suited for one-shot dice rolls and is, in fact, a bit better than a DiceRange for that task) but a DiceRange is really needed to efficiently do repeated rolls like this. Jeez, we're talking about maximizing sorting efficiency, and now I've gone off talking about making dice rolls faster too! :)
Nov 17 2013
parent "David Nadlinger" <code klickverbot.at> writes:
On Monday, 18 November 2013 at 05:26:30 UTC, Chris Cain wrote:
 On my computer (`-release -inline -noboundscheck` ... `-O` is 
 omitted because it removes the summation all together since it 
 isn't used anywhere):
The better way to avoid this problem is usually to compile the operation to be benchmarked separately, in such a way that the result can't be ignored (e.g. by making it read the parameters from function arguments and returning the result as the return value). Comparing non-optimized (i.e. -O) builds for execution speed is of questionable relevance for most cases. David
Dec 04 2013
prev sibling parent reply "bearophile" <bearophileHUGS lycos.com> writes:
Chris Cain:

 Also, to do this (in reasonable time) I had to create a new 
 range which I called "FastDice" ... it does the same as 
 std.random.dice, but is intended for cases where you'll be 
 throwing dice throws often on the same data, so it does a bit 
 of precomputation (creating a cumulative sum array)
See also: http://d.puremagic.com/issues/show_bug.cgi?id=5849 Bye, bearophile
Nov 17 2013
parent "Chris Cain" <clcain uncg.edu> writes:
On Monday, 18 November 2013 at 03:10:22 UTC, bearophile wrote:
 See also:
 http://d.puremagic.com/issues/show_bug.cgi?id=5849

 Bye,
 bearophile
Nice one, bearophile. You called it. I've done a bit of testing with the solution you've attached to that bug report. It's 10-50% faster than my solution depending on how many proportions are provided (I've tested as much as 500_000). Despite mine being O(log(n)) to find the next dice roll and your solution being O(1), there really isn't that much of a difference. Strangely (to me), your solution actually seems to grow faster than mine as the number of proportions grow larger despite the proof claiming it should grow slower (although testing for > 500_000 seems pointless until actual data demands it). As far as integer proportions go, I think my solution might be better overall because it's effectively equivalent to the current dice implementation (meaning it could be used as a near drop-in replacement and get precisely the same results, as long as the proportions are floating point values) and the code is simpler and avoids hiding any real bugs. (That said, the original implementation has a bug that was easy to spot on my second comb-through) --- in popFront: // Return the number of elements that are not strictly greater than point _front = _propCumSumSorted.length - _propCumSumSorted.upperBound!pol(point).length; --- On the other hand, a cumulative sum array approach (like mine and the std.random.dice implementation) will likely suffer major precision issues with floating point numbers. Especially if the first proportions are very large and the later proportions are very small. I haven't closely looked through the algorithm you're using to see whether it fares better, but I can't imagine it doing much worse.
Nov 17 2013
prev sibling parent reply "Ivan Kazmenko" <gassa mail.ru> writes:
On Sunday, 17 November 2013 at 00:18:24 UTC, Chris Cain wrote:
 I think it's more complicated than that. Let's assume for a 
 moment that you've proven that such an unstable sort must exist 
 that is faster (I'm not convinced that it necessarily must take 
 extra work to maintain stability). You have not shown how much 
 faster it might be (it could be only 1% faster) nor how much 
 work it would take to discover (even an ideal pivot choice for 
 quicksort actually cannot be as fast as Timsort on an already 
 sorted sequence, and quicksort is not an appropriate algorithm 
 for accepting presorted subsequences). If it takes 2 years to 
 come up with an unstable sort faster than Timsort, then it 
 seems like a long time to be using something that isn't the 
 best that we currently have. Especially if D is to be used in 
 benchmarks against other languages.
Regarding an ideal pivot choice for quicksort, I'd like to emphasize that it is simply non-existent. Here's why. Let us measure quicksort performance as the number of comparisons it makes. Let us make some assumptions: quicksort goes as --(1) choose a pivot p in O(1) comparisons; --(2) partition the segment into two; --(3) run quicksort recursively on the two resulting segments. Now, (2) is effectively done by comparing every element with p, thus in Theta(n) (on the order of n) comparisons where n is the length of the current segment. Under these assumptions, we can construct the killer array for an arbitrary quicksort of such kind, even if we don't know how exactly the pivot is chosen (say, closed source or obfuscation), but we have the following interface to it: Instead of accessing the array a[], quicksort calls the function less(i,j) which tells it whether a[i]<a[j], and we control that function. Now, we start with all values in array a[] undefined. With each a[i], we associate a counter c[i]: how many times it took part in a comparison. As we can see from the above, during a single call to quicksort, in steps (1) and (2) the pivot will eventually get Theta(n) comparisons, while other elements will all get only O(1) comparisons each. So here's what we'll do. 1. When a comparison asks to relate two undefined values, we pick one of them with the larger c[i] and make it the lowest number possible. That is, the first fixed value will be 0, the next 1, and so on. (In the end, a[] will be a permutation of 0, 1, ..., n-1.) 2. When we are asked to relate a defined value to an undefined one, the defined value will always be lower (because, when we eventually define the undefined one, it will be higher than all the values defined before it). 3. When we are asked about two known values, just tell the truth. This simple algorithm ensures that, on each call to quicksort, we quickly fix the pivot as one of the O(1) lowest values on the segment. So, one of the next two segments will have length n-O(1), and the recursion gives us Theta(n) segments of linearly decreasing lengths, and thus Theta(n^2) total running time. Now, the assumption of picking a pivot in O(1) comparisons covers a broad variety of pivot choices, including first/last/middle/random element, median of three or five, median of medians, or any combination of these. The random pick fails in the following sense: if we seed the RNG, construct a killer case, and then start with the same seed, we get Theta(n^2) behavior reproduced. Double-pivot quicksort can be forced to go quadratic in a similar fashion. Now, introsort escapes this fate by switching to a guaranteed-n-log-n algorithm instead of (3) when it goes too deep. This comes at a (small) cost of checking the current depth on every call to quicksort. The above is just my retelling of a great short article "A Killer Adversary for Quicksort" by M. D. McIlroy here: http://www.cs.dartmouth.edu/~doug/mdmspe.pdf Ivan Kazmenko.
Nov 16 2013
next sibling parent "Ivan Kazmenko" <gassa mail.ru> writes:
On Sunday, 17 November 2013 at 01:07:20 UTC, Ivan Kazmenko wrote:
 Now, the assumption of picking a pivot in O(1) comparisons 
 covers a broad variety of pivot choices, including 
 first/last/middle/random element, median of three or five, 
 median of medians, or any combination of these.  The random 
 pick fails in the following sense: if we seed the RNG, 
 construct a killer case, and then start with the same seed, we 
 get Theta(n^2) behavior reproduced.
Correction: this does *not* cover the median-of-medians case [1]. Median of medians does not fall under our assumption that selecting a pivot is O(1). It also actually guarantees O(n log n) worst case performance, but of course at a cost of being slower than your typical pivot selection on an average case. [1] http://en.wikipedia.org/wiki/Median_of_medians Ivan Kazmenko.
Nov 16 2013
prev sibling next sibling parent reply "Chris Cain" <clcain uncg.edu> writes:
On Sunday, 17 November 2013 at 01:07:20 UTC, Ivan Kazmenko wrote:
 Regarding an ideal pivot choice for quicksort, I'd like to 
 emphasize that it is simply non-existent.  Here's why.
Oh, of course. I did that in an algorithms class taught by a Professor who is into Cryptography. I agree that essentially there is no way to pick pivots that avoid worst case performance when an attacker is involved. When I mentioned "ideal pivot" I mean an actual good one on a sorted sequence Given [1,2,3,4,5,6,7] Picking the best pivot, 4 would result in scanning the entire array to assure that it is partitioned appropriately around the 4 (if a quicksort were designed wise to this sort of trick, but most, including D's quicksort, would actually shuffle everything around). By that point, Timsort is already done and drinking the victory champagne. And the quicksort still has to recur on the subsequences to sort them as well!
Nov 16 2013
next sibling parent "Chris Cain" <clcain uncg.edu> writes:
On Sunday, 17 November 2013 at 01:39:43 UTC, Chris Cain wrote:
 (if a quicksort were designed wise to this sort of trick, but 
 most, including D's quicksort, would actually shuffle 
 everything around)
Actually, not everything. It'd swap the pivot and the last element (4 and 7 for the first iteration) and then swap it back after it figures out the two partitions are good. There was a case where it ended up swapping everything around unnecessarily, but it wasn't this case... But I digress, the point remains that it would be slower than Timsort despite having a good pivot.
Nov 16 2013
prev sibling parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 11/16/13 5:39 PM, Chris Cain wrote:
 Given [1,2,3,4,5,6,7]
 Picking the best pivot, 4 would result in scanning the entire array to
 assure that it is partitioned appropriately around the 4 (if a quicksort
 were designed wise to this sort of trick, but most, including D's
 quicksort, would actually shuffle everything around).
I keep on thinking of a sort with delayed pivot selection that starts scanning the array from left and right and only stops when it figures it's unpartitioned. At that point it proceeds with pivot selection, but if the array is already sorted there's no need to partition the left and right portions that were already explored. Essentially the pivot selection would be preceded by a sampling of the left and right of the array.
 By that point,
 Timsort is already done and drinking the victory champagne. And the
 quicksort still has to recur on the subsequences to sort them as well!
Yah, already sorted data is ideal for Timsort. Andrei
Nov 16 2013
prev sibling next sibling parent reply "Craig Dillabaugh" <cdillaba cg.scs.carleton.ca> writes:
On Sunday, 17 November 2013 at 01:07:20 UTC, Ivan Kazmenko wrote:
 On Sunday, 17 November 2013 at 00:18:24 UTC, Chris Cain wrote:
 I think it's more complicated than that. Let's assume for a 
 moment that you've proven that such an unstable sort must 
 exist that is faster (I'm not convinced that it necessarily 
 must take extra work to maintain stability). You have not 
 shown how much faster it might be (it could be only 1% faster) 
 nor how much work it would take to discover (even an ideal 
 pivot choice for quicksort actually cannot be as fast as 
 Timsort on an already sorted sequence, and quicksort is not an 
 appropriate algorithm for accepting presorted subsequences). 
 If it takes 2 years to come up with an unstable sort faster 
 than Timsort, then it seems like a long time to be using 
 something that isn't the best that we currently have. 
 Especially if D is to be used in benchmarks against other 
 languages.
Regarding an ideal pivot choice for quicksort, I'd like to emphasize that it is simply non-existent. Here's why. Let us measure quicksort performance as the number of comparisons it makes. Let us make some assumptions: quicksort goes as --(1) choose a pivot p in O(1) comparisons; --(2) partition the segment into two; --(3) run quicksort recursively on the two resulting segments. Now, (2) is effectively done by comparing every element with p, thus in Theta(n) (on the order of n) comparisons where n is the length of the current segment. Under these assumptions, we can construct the killer array for an arbitrary quicksort of such kind, even if we don't know how exactly the pivot is chosen (say, closed source or obfuscation), but we have the following interface to it: Instead of accessing the array a[], quicksort calls the function less(i,j) which tells it whether a[i]<a[j], and we control that function. Now, we start with all values in array a[] undefined. With each a[i], we associate a counter c[i]: how many times it took part in a comparison. As we can see from the above, during a single call to quicksort, in steps (1) and (2) the pivot will eventually get Theta(n) comparisons, while other elements will all get only O(1) comparisons each. So here's what we'll do. 1. When a comparison asks to relate two undefined values, we pick one of them with the larger c[i] and make it the lowest number possible. That is, the first fixed value will be 0, the next 1, and so on. (In the end, a[] will be a permutation of 0, 1, ..., n-1.) 2. When we are asked to relate a defined value to an undefined one, the defined value will always be lower (because, when we eventually define the undefined one, it will be higher than all the values defined before it). 3. When we are asked about two known values, just tell the truth. This simple algorithm ensures that, on each call to quicksort, we quickly fix the pivot as one of the O(1) lowest values on the segment. So, one of the next two segments will have length n-O(1), and the recursion gives us Theta(n) segments of linearly decreasing lengths, and thus Theta(n^2) total running time. Now, the assumption of picking a pivot in O(1) comparisons covers a broad variety of pivot choices, including first/last/middle/random element, median of three or five, median of medians, or any combination of these. The random pick fails in the following sense: if we seed the RNG, construct a killer case, and then start with the same seed, we get Theta(n^2) behavior reproduced. Double-pivot quicksort can be forced to go quadratic in a similar fashion. Now, introsort escapes this fate by switching to a guaranteed-n-log-n algorithm instead of (3) when it goes too deep. This comes at a (small) cost of checking the current depth on every call to quicksort. The above is just my retelling of a great short article "A Killer Adversary for Quicksort" by M. D. McIlroy here: http://www.cs.dartmouth.edu/~doug/mdmspe.pdf Ivan Kazmenko.
Woah, that sounds complicated. Not 100% sure I entirely understand but it seems to rely on changing the elements while you sort. How would that work in real life? For example, how would this defeat the following pivoting method. 1. Find the median value in the array. This can be done deterministically in linear time, so from a theoretical point of view is just as fast as picking a random element since you use linear time. 2. Use that as the pivot.
Nov 16 2013
next sibling parent reply "Vladimir Panteleev" <vladimir thecybershadow.net> writes:
On Sunday, 17 November 2013 at 01:48:14 UTC, Craig Dillabaugh 
wrote:
 1. Find the median value in the array. This can be done
 deterministically in linear time,
My understanding that for unordered data, there is no algorithm that runs in worst-case O(n): http://en.wikipedia.org/wiki/Selection_algorithm http://en.wikipedia.org/wiki/Quickselect
 so from a theoretical point of
 view is just as fast as picking a random element since you use
 linear time.
Picking a random element is constant time, though. You mean that O(1) and O(n) are equivalent here because they're both smaller than O(n log n), QuickSort's complexity?
Nov 16 2013
parent reply "Craig Dillabaugh" <cdillaba cg.scs.carleton.ca> writes:
On Sunday, 17 November 2013 at 02:43:05 UTC, Vladimir Panteleev
wrote:
 On Sunday, 17 November 2013 at 01:48:14 UTC, Craig Dillabaugh 
 wrote:
 1. Find the median value in the array. This can be done
 deterministically in linear time,
My understanding that for unordered data, there is no algorithm that runs in worst-case O(n): http://en.wikipedia.org/wiki/Selection_algorithm http://en.wikipedia.org/wiki/Quickselect
Unless I misunderstand something I am pretty sure there is a deterministic algorithm for finding the median in unsorted data: http://www.cs.cmu.edu/afs/cs.cmu.edu/academic/class/15451-s07/www/lecture_notes/lect0125.pdf
 so from a theoretical point of
 view is just as fast as picking a random element since you use
 linear time.
Picking a random element is constant time, though. You mean that O(1) and O(n) are equivalent here because they're both smaller than O(n log n), QuickSort's complexity?
Yeah, since you have to do O(n) work to partition the array, whether you do O(1) work, or O(n) work to find your pivot really doesn't matter. From a theoretical perspective anyway :o)
Nov 17 2013
parent "Vladimir Panteleev" <vladimir thecybershadow.net> writes:
On Sunday, 17 November 2013 at 08:33:11 UTC, Craig Dillabaugh 
wrote:
 http://www.cs.cmu.edu/afs/cs.cmu.edu/academic/class/15451-s07/www/lecture_notes/lect0125.pdf
Nice! Didn't know about this - although still seems to be a lot of work for each 'n'.
Nov 17 2013
prev sibling parent "Ivan Kazmenko" <gassa mail.ru> writes:
On Sunday, 17 November 2013 at 01:48:14 UTC, Craig Dillabaugh 
wrote:
 On Sunday, 17 November 2013 at 01:07:20 UTC, Ivan Kazmenko 
 wrote:
 Regarding an ideal pivot choice for quicksort, I'd like to 
 emphasize that it is simply non-existent.  Here's why.

 Let us measure quicksort performance as the number of 
 comparisons it makes.

 Let us make some assumptions:
 quicksort goes as
 --(1) choose a pivot p in O(1) comparisons;
 --(2) partition the segment into two;
 --(3) run quicksort recursively on the two resulting segments.

 Now, (2) is effectively done by comparing every element with 
 p, thus in Theta(n) (on the order of n) comparisons where n is 
 the length of the current segment.

 Under these assumptions, we can construct the killer array for 
 an arbitrary quicksort of such kind, even if we don't know how 
 exactly the pivot is chosen (say, closed source or 
 obfuscation), but we have the following interface to it:

 Instead of accessing the array a[], quicksort calls the 
 function less(i,j) which tells it whether a[i]<a[j], and we 
 control that function.

 (snip)
Woah, that sounds complicated. Not 100% sure I entirely understand but it seems to rely on changing the elements while you sort. How would that work in real life? For example, how would this defeat the following pivoting method. 1. Find the median value in the array. This can be done deterministically in linear time, so from a theoretical point of view is just as fast as picking a random element since you use linear time. 2. Use that as the pivot.
Right, this method (median of medians is an example implementation) will not be defeated by the above algorithm. There's nothing wrong in it, since it takes Theta(n) comparisons to choose the pivot and so violates our attack's working conditions. In practice however, it means that, on an average case, the constant factor is very likely to be higher than for a quicksort vulnerable to our attack. Ivan Kazmenko.
Nov 17 2013
prev sibling next sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 11/16/13 5:07 PM, Ivan Kazmenko wrote:
 The above is just my retelling of a great short article "A Killer
 Adversary for Quicksort" by M. D. McIlroy here:
 http://www.cs.dartmouth.edu/~doug/mdmspe.pdf
Nice story, but the setup is a tad tenuous (albeit indeed theoretically interesting). For starters, if the attacker has a hook into the comparison function, they could trivially do a lot worse... Andrei
Nov 16 2013
parent reply "Ivan Kazmenko" <gassa mail.ru> writes:
On Sunday, 17 November 2013 at 03:58:58 UTC, Andrei Alexandrescu 
wrote:
 On 11/16/13 5:07 PM, Ivan Kazmenko wrote:
 The above is just my retelling of a great short article "A 
 Killer
 Adversary for Quicksort" by M. D. McIlroy here:
 http://www.cs.dartmouth.edu/~doug/mdmspe.pdf
Nice story, but the setup is a tad tenuous (albeit indeed theoretically interesting). For starters, if the attacker has a hook into the comparison function, they could trivially do a lot worse...
Actually, I was thinking about a two phase attack, and it is not at all unlikely. 0. Say, the Sorter presents a library quicksort solution. It may be closed source, but can take comparison function as argument. 1. On the preparation phase, the Attacker uses the tricky comparison function described previously. What's important is that, besides observing Theta(n^2) behavior once, he now gets a real array a[] such that this behavior can be reproduced. 2. On the attack phase, the Attacker does not need access to the comparison function. He just feeds the array obtained on the previous step as plain data. Ivan Kazmenko.
Nov 17 2013
parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 11/17/13 2:20 AM, Ivan Kazmenko wrote:
 On Sunday, 17 November 2013 at 03:58:58 UTC, Andrei Alexandrescu wrote:
 On 11/16/13 5:07 PM, Ivan Kazmenko wrote:
 The above is just my retelling of a great short article "A Killer
 Adversary for Quicksort" by M. D. McIlroy here:
 http://www.cs.dartmouth.edu/~doug/mdmspe.pdf
Nice story, but the setup is a tad tenuous (albeit indeed theoretically interesting). For starters, if the attacker has a hook into the comparison function, they could trivially do a lot worse...
Actually, I was thinking about a two phase attack, and it is not at all unlikely. 0. Say, the Sorter presents a library quicksort solution. It may be closed source, but can take comparison function as argument. 1. On the preparation phase, the Attacker uses the tricky comparison function described previously. What's important is that, besides observing Theta(n^2) behavior once, he now gets a real array a[] such that this behavior can be reproduced. 2. On the attack phase, the Attacker does not need access to the comparison function. He just feeds the array obtained on the previous step as plain data. Ivan Kazmenko.
That won't do with randomized pivot selection. Andrei
Nov 17 2013
prev sibling parent reply Timon Gehr <timon.gehr gmx.ch> writes:
On 11/17/2013 02:07 AM, Ivan Kazmenko wrote:
 The random pick fails in the following sense: if we seed the RNG,
 construct a killer case, and then start with the same seed, we get
 Theta(n^2) behavior reproduced.
Hence, in no sense. This does not perform independent uniform random picks.
Nov 17 2013
parent reply "Ivan Kazmenko" <gassa mail.ru> writes:
On Sunday, 17 November 2013 at 13:07:40 UTC, Timon Gehr wrote:
 On 11/17/2013 02:07 AM, Ivan Kazmenko wrote:
 The random pick fails in the following sense: if we seed the 
 RNG,
 construct a killer case, and then start with the same seed, we 
 get
 Theta(n^2) behavior reproduced.
Hence, in no sense. This does not perform independent uniform random picks.
Not at all. There is a number of situations where you want your program to use RNG, but it is also important for the result to be reproducible. In such cases, you typically store the RNG seed for re-use. Of course there are also many cases where you don't need reproducibility guarantees, and there, the attack is useless. Ivan Kazmenko.
Nov 17 2013
parent Timon Gehr <timon.gehr gmx.ch> writes:
On 11/17/2013 02:14 PM, Ivan Kazmenko wrote:
 On Sunday, 17 November 2013 at 13:07:40 UTC, Timon Gehr wrote:
 On 11/17/2013 02:07 AM, Ivan Kazmenko wrote:
 The random pick fails in the following sense: if we seed the RNG,
 construct a killer case, and then start with the same seed, we get
 Theta(n^2) behavior reproduced.
Hence, in no sense. This does not perform independent uniform random picks.
Not at all. There is a number of situations where you want your program to use RNG, but it is also important for the result to be reproducible. In such cases, you typically store the RNG seed for re-use. Of course there are also many cases where you don't need reproducibility guarantees, and there, the attack is useless. Ivan Kazmenko.
One can't say random picking is bad because one can use some other (deterministic) pivot selection algorithm instead which is bad. If you need deterministic reproducibility guarantees, then random picking is useless.
Nov 17 2013
prev sibling parent reply "Vladimir Panteleev" <vladimir thecybershadow.net> writes:
On Saturday, 16 November 2013 at 22:11:46 UTC, Xinok wrote:
 And the results (last number is predicate calls):

 Current Unstable Sort  50ms  32783474
 New Unstable Sort      69ms  21503542
 Timsort                35ms  3905887
For the record, I tried both SwapStragegy options with my data (the data that got me to start this thread), and although TimSort did fewer comparisons (predicate calls), it ran about 20% slower.
Nov 16 2013
parent reply "Xinok" <xinok live.com> writes:
On Sunday, 17 November 2013 at 02:44:45 UTC, Vladimir Panteleev 
wrote:
 On Saturday, 16 November 2013 at 22:11:46 UTC, Xinok wrote:
 And the results (last number is predicate calls):

 Current Unstable Sort  50ms  32783474
 New Unstable Sort      69ms  21503542
 Timsort                35ms  3905887
For the record, I tried both SwapStragegy options with my data (the data that got me to start this thread), and although TimSort did fewer comparisons (predicate calls), it ran about 20% slower.
Could you try running a benchmark on the same data using this instead? https://github.com/Xinok/XSort/blob/master/unstablesort.d My implementation of quick sort has some odd tweaks which makes it slower in some cases, but I've never invoked worst-case behavior by accident.
Nov 16 2013
parent "Vladimir Panteleev" <vladimir thecybershadow.net> writes:
On Sunday, 17 November 2013 at 03:14:41 UTC, Xinok wrote:
 On Sunday, 17 November 2013 at 02:44:45 UTC, Vladimir Panteleev 
 wrote:
 On Saturday, 16 November 2013 at 22:11:46 UTC, Xinok wrote:
 And the results (last number is predicate calls):

 Current Unstable Sort  50ms  32783474
 New Unstable Sort      69ms  21503542
 Timsort                35ms  3905887
For the record, I tried both SwapStragegy options with my data (the data that got me to start this thread), and although TimSort did fewer comparisons (predicate calls), it ran about 20% slower.
Could you try running a benchmark on the same data using this instead? https://github.com/Xinok/XSort/blob/master/unstablesort.d My implementation of quick sort has some odd tweaks which makes it slower in some cases, but I've never invoked worst-case behavior by accident.
Sorry for the late reply - my access to my home PC was spotty in the past week. I wasn't entirely correct in the parent post - I didn't test things in the same way as the original problem, since I originally encountered it while using multiSort. multiSort can't use stable sort, because "stable $(D partition3) has not been implemented yet". If I hack multiSort to use TimSort instead of QuickSort, the program finishes quickly. I reran the test in the same way as in the parent post, incl. with unstableSort. Here are the results: SwapStrategy.unstable: 560576749 comparisons, 4796076 ticks SwapStrategy.stable : 340617464 comparisons, 5757974 ticks unstableSort : 573916901 comparisons, 5255061 ticks
Dec 04 2013
prev sibling parent reply "Xinok" <xinok live.com> writes:
On Friday, 15 November 2013 at 21:46:26 UTC, Vladimir Panteleev 
wrote:
 ...
I've made a pull request to fix this issue: https://github.com/D-Programming-Language/phobos/pull/1735
Nov 28 2013
parent "Ivan Kazmenko" <gassa mail.ru> writes:
On Friday, 29 November 2013 at 01:47:49 UTC, Xinok wrote:
 I've made a pull request to fix this issue:

 https://github.com/D-Programming-Language/phobos/pull/1735
I've left some comments on Github. The greatest concern IMO is that the new implementation relies on assignments rather than swaps. It would be a regression to disallow sorting of entities with disabled assignment. Ivan Kazmenko.
Nov 28 2013