digitalmars.D - Worst-case performance of quickSort / getPivot
- Vladimir Panteleev (62/62) Nov 15 2013 I've been investigating an instance that I ran into while
- Craig Dillabaugh (8/35) Nov 15 2013 On Friday, 15 November 2013 at 21:46:26 UTC, Vladimir Panteleev
- Vladimir Panteleev (10/11) Nov 15 2013 - Inconsistent execution time - someone trying to benchmark
- Xinok (10/11) Nov 16 2013 (1) Sort is capable of being marked pure, depending on the type
- Chris Cain (15/79) Nov 15 2013 Improving quicksort is probably a good idea, but IIRC, the stable
- Ivan Kazmenko (8/13) Nov 16 2013 I've been hit by this case too, half a year ago. See the
- Andrei Alexandrescu (3/14) Nov 16 2013 I think a median of five pivot selection is in order.
- Andrei Alexandrescu (3/14) Nov 16 2013 Yah, I thought one is in place already.
- Jean Christophe (34/53) Nov 16 2013 One possible implementation suggests to swap Left and Right
- Vladimir Panteleev (12/55) Nov 16 2013 getPivot sorts the first, middle and last element in the range
- Jean Christophe (6/30) Nov 16 2013 Yes.
- Andrei Alexandrescu (4/8) Nov 16 2013 Maybe makeIndex may be of help.
- Vladimir Panteleev (10/16) Nov 17 2013 If the range elements are reference types, that's what will
- Andrei Alexandrescu (15/61) Nov 16 2013 That may help this particular situation, but does it do anything
- Jean Christophe (14/37) Nov 16 2013 Yes. This has the extra advantage that the smallest of the three
- Craig Dillabaugh (8/28) Nov 16 2013 On Saturday, 16 November 2013 at 14:20:32 UTC, Jean Christophe
- Xinok (29/30) Nov 16 2013 I'm a horrible procrastinator, I should have had this fixed over
- Andrei Alexandrescu (5/9) Nov 16 2013 There's something fishy about a more restricted algorithm doing better
- Xinok (53/65) Nov 16 2013 Timsort is an "adaptive" sort. It's able to achieve better
- Andrei Alexandrescu (11/27) Nov 16 2013 This is in response to what? Are you trying to talk me out of the
- Chris Cain (25/36) Nov 16 2013 I think it's more complicated than that. Let's assume for a
- Andrei Alexandrescu (15/34) Nov 16 2013 Very simple. Any sequence with a large number of equivalent elements
- Chris Cain (27/42) Nov 16 2013 Ah, very nicely described. So a stable sort can take more time
- Xinok (7/18) Nov 16 2013 If you tweak for one scenario, you're only losing in other
- Andrei Alexandrescu (11/26) Nov 16 2013 Yes, it is also my understanding that Timsort does well on almost sorted...
- Chris Cain (41/44) Nov 16 2013 I'll kick off the suggestions with this:
- Andrei Alexandrescu (5/8) Nov 16 2013 I am hoping for some more representative corpora, along the lines of
- Chris Cain (40/51) Nov 17 2013 I think I get what you're saying, but sortbenchmark.org uses
- Andrei Alexandrescu (25/33) Nov 17 2013 OK, since I see you have some interest...
- Chris Cain (91/94) Nov 17 2013 Sure.
- Andrei Alexandrescu (5/21) Nov 17 2013 I think that's a terrific start! (Not sure I understand where the 30
- Chris Cain (31/33) Nov 17 2013 Thanks :)
- David Nadlinger (8/11) Dec 04 2013 The better way to avoid this problem is usually to compile the
- bearophile (5/10) Nov 17 2013 See also:
- Chris Cain (33/37) Nov 17 2013 Nice one, bearophile. You called it.
- Ivan Kazmenko (59/72) Nov 16 2013 Regarding an ideal pivot choice for quicksort, I'd like to
- Ivan Kazmenko (8/15) Nov 16 2013 Correction: this does *not* cover the median-of-medians case [1].
- Chris Cain (14/16) Nov 16 2013 Oh, of course. I did that in an algorithms class taught by a
- Chris Cain (7/10) Nov 16 2013 Actually, not everything. It'd swap the pivot and the last
- Andrei Alexandrescu (10/18) Nov 16 2013 I keep on thinking of a sort with delayed pivot selection that starts
- Craig Dillabaugh (10/85) Nov 16 2013 Woah, that sounds complicated. Not 100% sure I entirely
- Vladimir Panteleev (9/14) Nov 16 2013 My understanding that for unordered data, there is no algorithm
- Craig Dillabaugh (8/22) Nov 17 2013 Unless I misunderstand something I am pretty sure there is a
- Vladimir Panteleev (4/5) Nov 17 2013 Nice! Didn't know about this - although still seems to be a lot
- Ivan Kazmenko (10/48) Nov 17 2013 Right, this method (median of medians is an example
- Andrei Alexandrescu (5/8) Nov 16 2013 Nice story, but the setup is a tad tenuous (albeit indeed theoretically
- Ivan Kazmenko (14/23) Nov 17 2013 Actually, I was thinking about a two phase attack, and it is not
- Andrei Alexandrescu (3/24) Nov 17 2013 That won't do with randomized pivot selection.
- Timon Gehr (2/5) Nov 17 2013 Hence, in no sense. This does not perform independent uniform random pic...
- Ivan Kazmenko (8/16) Nov 17 2013 Not at all. There is a number of situations where you want your
- Timon Gehr (5/19) Nov 17 2013 One can't say random picking is bad because one can use some other
- Vladimir Panteleev (4/8) Nov 16 2013 For the record, I tried both SwapStragegy options with my data
- Xinok (8/18) Nov 16 2013 Could you try running a benchmark on the same data using this
- Vladimir Panteleev (14/33) Dec 04 2013 Sorry for the late reply - my access to my home PC was spotty in
- Xinok (4/5) Nov 28 2013 I've made a pull request to fix this issue:
- Ivan Kazmenko (6/8) Nov 28 2013 I've left some comments on Github.
I've been investigating an instance that I ran into while processing some data, in which multiSort had apparently stalled whereas two successive sort calls processed the data quickly. After reducing the code and 20 million points, I've reduced the problem to this test case: enum N=1000000; N.iota.retro.chain(N.only).array.sort(); Apparently, this simple case exhibits worst-case (quadratic) complexity in our sort implementation. The .retro isn't necessary - this also exhibits quadratic complexity (athough the total number of predicate calls is halved), because the first quickSort cycle will flip the order of elements during partitioning: N.iota.chain(0.only).array.sort(); Closer inspection shows that the problem is manifested due to how the pivot is chosen. Currently, std.algorithm's getPivot uses the median-of-three method (median value of first, middle and last element[1]). However, in this case, the median element will always be the first element - this remains true in all quickSort iterations. Here is an illustrated example with N=9. Note that at the same time as choosing the pivot, getPivot also orders the candidates (first, middle and last range elements) according to the sort predicate. After calling getPivot, quickSortImpl moves the pivot to the end of the range by swapping it with the last element. getPivot(0..10) 8,7,6,5,4,3,2,1,0,9 <- getPivot - before swap 9,7,6,5,4,8,2,1,0,3 <- getPivot - after swap 9,7,6,5,4,3,2,1,0,8 <- quickSortImpl - after swap 9,8,6,5,4,3,2,1,0,7 <- quickSortImpl - after partition getPivot(2..10) 6,5,4,3,2,1,0,7 <- getPivot - before swap 7,5,4,3,6,1,0,2 <- getPivot - after swap 7,5,4,3,2,1,0,6 <- quickSortImpl - after swap 7,6,4,3,2,1,0,5 <- quickSortImpl - after partition (...) The algorithm gets stuck on "slopes" in sorted data, which are not uncommon in real-world data (my data resembled a 1D heightmap). Although the median-of-three way of picking a pivot is recommended by some sources (notably Sedgewick), perhaps a better method would be better suitable for Phobos: * Many sources recommend using a random element as a pivot. According to [2], "Randomized quicksort, for any input, it requires only O(n log n) expected time (averaged over all choices of pivots)". Also, if it is not possible to predict the pivot choice, it is impossible to craft worst-case input, which is a plus from a security point[3]. However, I'm not sure if making the behavior of std.algorithm's sort nondeterministic is desirable. * It looks like many STL implementations use Introsort[4] - a hybrid sorting algorithm which starts with QuickSort, but switches to HeapSort if the recursion limit exceeds a threshold. * I found a recent article[5] which describes an optimal algorithm of picking a pivot, however it is behind a paywall. [1]: Apparently the term is also used to refer to choosing three points *randomly*, instead of first/middle/last: http://stackoverflow.com/a/164183/21501 [2]: http://en.wikipedia.org/wiki/Quicksort#Analysis_of_Randomized_quicksort [3]: https://www.google.com/search?q=quicksort+worst+case+denial+of+service [4]: http://en.wikipedia.org/wiki/Introsort [5]: https://news.ycombinator.com/item?id=6629117
Nov 15 2013
On Friday, 15 November 2013 at 21:46:26 UTC, Vladimir Panteleev wrote: clipThe algorithm gets stuck on "slopes" in sorted data, which are not uncommon in real-world data (my data resembled a 1D heightmap). Although the median-of-three way of picking a pivot is recommended by some sources (notably Sedgewick), perhaps a better method would be better suitable for Phobos:If you use median-of-five (or seven) then you would start getting reasonable pivots in such an instance, though I am sure someone could craft an input set that defeats that. Does the Sedgewick source mention why 3 is picked? Easy to implement I assume.* Many sources recommend using a random element as a pivot. According to [2], "Randomized quicksort, for any input, it requires only O(n log n) expected time (averaged over all choices of pivots)". Also, if it is not possible to predict the pivot choice, it is impossible to craft worst-case input, which is a plus from a security point[3]. However, I'm not sure if making the behavior of std.algorithm's sort nondeterministic is desirable.What are the arguments against using a randomized algorithm?* It looks like many STL implementations use Introsort[4] - a hybrid sorting algorithm which starts with QuickSort, but switches to HeapSort if the recursion limit exceeds a threshold. * I found a recent article[5] which describes an optimal algorithm of picking a pivot, however it is behind a paywall. [1]: Apparently the term is also used to refer to choosing three points *randomly*, instead of first/middle/last: http://stackoverflow.com/a/164183/21501 [2]: http://en.wikipedia.org/wiki/Quicksort#Analysis_of_Randomized_quicksort [3]: https://www.google.com/search?q=quicksort+worst+case+denial+of+service [4]: http://en.wikipedia.org/wiki/Introsort [5]: https://news.ycombinator.com/item?id=6629117
Nov 15 2013
On Friday, 15 November 2013 at 22:56:58 UTC, Craig Dillabaugh wrote:What are the arguments against using a randomized algorithm?- Inconsistent execution time - someone trying to benchmark Phobos sort functions might get really confused; - QuickSort is an unstable sort. If there is more to the data than the items being compared, the data will end up in a different order on different program runs. - std.algorithm's sort allows specifying arbitrary code as a predicate. If that code has side effects, the program will behave differently on different runs.
Nov 15 2013
On Friday, 15 November 2013 at 22:56:58 UTC, Craig Dillabaugh wrote:What are the arguments against using a randomized algorithm?(1) Sort is capable of being marked pure, depending on the type being sorted and the predicate. But choosing random pivots means introducing side effects. (2) Random pivots invoke average performance, whereas choosing the pivot from a median of N has the potential to achieve better performance on partially sorted lists (fewer reads and writes). (3) I've tested random pivots and found that choosing from a median of three or five is often the better choice.
Nov 16 2013
On Friday, 15 November 2013 at 21:46:26 UTC, Vladimir Panteleev wrote:I've been investigating an instance that I ran into while processing some data, in which multiSort had apparently stalled whereas two successive sort calls processed the data quickly. After reducing the code and 20 million points, I've reduced the problem to this test case: enum N=1000000; N.iota.retro.chain(N.only).array.sort(); Apparently, this simple case exhibits worst-case (quadratic) complexity in our sort implementation. The .retro isn't necessary - this also exhibits quadratic complexity (athough the total number of predicate calls is halved), because the first quickSort cycle will flip the order of elements during partitioning: N.iota.chain(0.only).array.sort(); Closer inspection shows that the problem is manifested due to how the pivot is chosen. Currently, std.algorithm's getPivot uses the median-of-three method (median value of first, middle and last element[1]). However, in this case, the median element will always be the first element - this remains true in all quickSort iterations. Here is an illustrated example with N=9. Note that at the same time as choosing the pivot, getPivot also orders the candidates (first, middle and last range elements) according to the sort predicate. After calling getPivot, quickSortImpl moves the pivot to the end of the range by swapping it with the last element. getPivot(0..10) 8,7,6,5,4,3,2,1,0,9 <- getPivot - before swap 9,7,6,5,4,8,2,1,0,3 <- getPivot - after swap 9,7,6,5,4,3,2,1,0,8 <- quickSortImpl - after swap 9,8,6,5,4,3,2,1,0,7 <- quickSortImpl - after partition getPivot(2..10) 6,5,4,3,2,1,0,7 <- getPivot - before swap 7,5,4,3,6,1,0,2 <- getPivot - after swap 7,5,4,3,2,1,0,6 <- quickSortImpl - after swap 7,6,4,3,2,1,0,5 <- quickSortImpl - after partition (...) The algorithm gets stuck on "slopes" in sorted data, which are not uncommon in real-world data (my data resembled a 1D heightmap). Although the median-of-three way of picking a pivot is recommended by some sources (notably Sedgewick), perhaps a better method would be better suitable for Phobos: * Many sources recommend using a random element as a pivot. According to [2], "Randomized quicksort, for any input, it requires only O(n log n) expected time (averaged over all choices of pivots)". Also, if it is not possible to predict the pivot choice, it is impossible to craft worst-case input, which is a plus from a security point[3]. However, I'm not sure if making the behavior of std.algorithm's sort nondeterministic is desirable. * It looks like many STL implementations use Introsort[4] - a hybrid sorting algorithm which starts with QuickSort, but switches to HeapSort if the recursion limit exceeds a threshold. * I found a recent article[5] which describes an optimal algorithm of picking a pivot, however it is behind a paywall. [1]: Apparently the term is also used to refer to choosing three points *randomly*, instead of first/middle/last: http://stackoverflow.com/a/164183/21501 [2]: http://en.wikipedia.org/wiki/Quicksort#Analysis_of_Randomized_quicksort [3]: https://www.google.com/search?q=quicksort+worst+case+denial+of+service [4]: http://en.wikipedia.org/wiki/Introsort [5]: https://news.ycombinator.com/item?id=6629117Improving quicksort is probably a good idea, but IIRC, the stable sort has been changed to Timsort, which I find to be faster in most cases. Probably what should be done is change the default to stable sort and improve the unstable version to be faster in some way. Presumably you could use a "randomized version," but with a constant key. That way the algorithm would be deterministic but likely to have good characteristics in normal inputs. Unfortunately that wouldn't solve the issue of an attacker getting to choose an input and causing worst case performance (it would just make it harder to come up with such an input). Ultimately, using the stable version (Timsort) is safer for security use (hence why I'd suggest it be the default).
Nov 15 2013
On Friday, 15 November 2013 at 21:46:26 UTC, Vladimir Panteleev wrote:I've been investigating an instance that I ran into while processing some data, in which multiSort had apparently stalled whereas two successive sort calls processed the data quickly. After reducing the code and 20 million points, I've reduced the problem to this test case:I've been hit by this case too, half a year ago. See the relevant discussion on D.learn: http://forum.dlang.org/thread/drpbkvhlprkdjzhbcqjn forum.dlang.org The fact that the issue is raised once again perhaps means it is worth a bugreport. What do you guys think? Ivan Kazmenko.
Nov 16 2013
On 11/16/13 3:29 AM, Ivan Kazmenko wrote:On Friday, 15 November 2013 at 21:46:26 UTC, Vladimir Panteleev wrote:I think a median of five pivot selection is in order. AndreiI've been investigating an instance that I ran into while processing some data, in which multiSort had apparently stalled whereas two successive sort calls processed the data quickly. After reducing the code and 20 million points, I've reduced the problem to this test case:I've been hit by this case too, half a year ago. See the relevant discussion on D.learn: http://forum.dlang.org/thread/drpbkvhlprkdjzhbcqjn forum.dlang.org The fact that the issue is raised once again perhaps means it is worth a bugreport. What do you guys think? Ivan Kazmenko.
Nov 16 2013
On 11/16/13 3:29 AM, Ivan Kazmenko wrote:On Friday, 15 November 2013 at 21:46:26 UTC, Vladimir Panteleev wrote:Yah, I thought one is in place already. AndreiI've been investigating an instance that I ran into while processing some data, in which multiSort had apparently stalled whereas two successive sort calls processed the data quickly. After reducing the code and 20 million points, I've reduced the problem to this test case:I've been hit by this case too, half a year ago. See the relevant discussion on D.learn: http://forum.dlang.org/thread/drpbkvhlprkdjzhbcqjn forum.dlang.org The fact that the issue is raised once again perhaps means it is worth a bugreport. What do you guys think? Ivan Kazmenko.
Nov 16 2013
On Friday, 15 November 2013 at 21:46:26 UTC, Vladimir Panteleev wrote:getPivot(0..10) 8,7,6,5,4,3,2,1,0,9 <- getPivot - before swap 9,7,6,5,4,8,2,1,0,3 <- getPivot - after swap 9,7,6,5,4,3,2,1,0,8 <- quickSortImpl - after swap 9,8,6,5,4,3,2,1,0,7 <- quickSortImpl - after partition getPivot(2..10) 6,5,4,3,2,1,0,7 <- getPivot - before swap 7,5,4,3,6,1,0,2 <- getPivot - after swap 7,5,4,3,2,1,0,6 <- quickSortImpl - after swap 7,6,4,3,2,1,0,5 <- quickSortImpl - after partition (...)One possible implementation suggests to swap Left and Right immediatly after choosing the Pivot (if Left > Right), then place the Pivot at Right-1. It seems that this option was not taken. Any reason? As the efficiency of Quicksort is known to be bad in sorting a small number of elements, ie. < 10, it might be nice to implement an option to automatically switch to a more appropriate algorithm if it's relevant to do so.* Many sources recommend using a random element as a pivot. According to [2], "Randomized quicksort, for any input, it requires only O(n log n) expected time (averaged over all choices of pivots)".IMO it would be costly and not so relevant if the goal is to be fast.Also, if it is not possible to predict the pivot choice, it is impossible to craft worst-case input, which is a plus from a security point[3]. However, I'm not sure if making the behavior of std.algorithm's sort nondeterministic is desirable.I think it's not desirable. -- Quicksorting a collection of Objects? BTW I'm very interested in finding a library which could Quicksort an array of pointers, where each pointer points to a class object (or a structure) address. The library would make possible, for example, to sort the `class objects` using one of their members as the key. Because the swaps are done on the actual pointers (and not the Objects pointed), the Quicksort should be very efficient. However, the algorithm woulnd't be so trivial to implement because each comparison shall be done using the Object's member's value : eg. Obj1.foo < Obj2.foo. Of course, the programmer can choose any relevant member property to sort the collection. And it should be as easy to use as: class SomeClass { string foo; double bar;} SomeClass[] a; // put 100 000 objects into a a.sort("foo"); Do we already have something like that available somewhere or is it possible to make one eventually? -- JC
Nov 16 2013
On Saturday, 16 November 2013 at 14:20:32 UTC, Jean Christophe wrote:On Friday, 15 November 2013 at 21:46:26 UTC, Vladimir Panteleev wrote:getPivot sorts the first, middle and last element in the range right after it figures out their order. Is there any advantage to your suggestion over this?getPivot(0..10) 8,7,6,5,4,3,2,1,0,9 <- getPivot - before swap 9,7,6,5,4,8,2,1,0,3 <- getPivot - after swap 9,7,6,5,4,3,2,1,0,8 <- quickSortImpl - after swap 9,8,6,5,4,3,2,1,0,7 <- quickSortImpl - after partition getPivot(2..10) 6,5,4,3,2,1,0,7 <- getPivot - before swap 7,5,4,3,6,1,0,2 <- getPivot - after swap 7,5,4,3,2,1,0,6 <- quickSortImpl - after swap 7,6,4,3,2,1,0,5 <- quickSortImpl - after partition (...)One possible implementation suggests to swap Left and Right immediatly after choosing the Pivot (if Left > Right), then place the Pivot at Right-1. It seems that this option was not taken. Any reason?As the efficiency of Quicksort is known to be bad in sorting a small number of elements, ie. < 10, it might be nice to implement an option to automatically switch to a more appropriate algorithm if it's relevant to do so.It does, actually - it switches to optimisticInsertionSort for ranges of 25 elements or shorter. I disabled that behavior for illustration.IMO it would be costly and not so relevant if the goal is to be fast.The impact shouldn't be big if a simple RNG, like LCG or XorShift, was used.Quicksorting a collection of Objects??BTW I'm very interested in finding a library which could Quicksort an array of pointers, where each pointer points to a class object (or a structure) address. The library would make possible, for example, to sort the `class objects` using one of their members as the key. Because the swaps are done on the actual pointers (and not the Objects pointed), the Quicksort should be very efficient. However, the algorithm woulnd't be so trivial to implement because each comparison shall be done using the Object's member's value : eg. Obj1.foo < Obj2.foo. Of course, the programmer can choose any relevant member property to sort the collection. And it should be as easy to use as: class SomeClass { string foo; double bar;} SomeClass[] a; // put 100 000 objects into a a.sort("foo"); Do we already have something like that available somewhere or is it possible to make one eventually?You mean, sort!`a.foo < b.foo` ?
Nov 16 2013
Yes. An indirect sorting, assuming a and b to be ojects of class SomePotentialyLargeClass. Because the array to sort contains pointers only, all the data movement is essentially the same as if we were sorting integer. -- JCBTW I'm very interested in finding a library which could Quicksort an array of pointers, where each pointer points to a class object (or a structure) address. The library would make possible, for example, to sort the `class objects` using one of their members as the key. Because the swaps are done on the actual pointers (and not the Objects pointed), the Quicksort should be very efficient. However, the algorithm woulnd't be so trivial to implement because each comparison shall be done using the Object's member's value : eg. Obj1.foo < Obj2.foo. Of course, the programmer can choose any relevant member property to sort the collection. And it should be as easy to use as: class SomeClass { string foo; double bar;} SomeClass[] a; // put 100 000 objects into a a.sort("foo"); Do we already have something like that available somewhere or is it possible to make one eventually?You mean, sort!`a.foo < b.foo` ?
Nov 16 2013
On 11/16/13 9:30 PM, Jean Christophe wrote:An indirect sorting, assuming a and b to be ojects of class SomePotentialyLargeClass. Because the array to sort contains pointers only, all the data movement is essentially the same as if we were sorting integer.Maybe makeIndex may be of help. http://dlang.org/phobos/std_algorithm.html#makeIndex Andrei
Nov 16 2013
On Sunday, 17 November 2013 at 05:30:24 UTC, Jean Christophe wrote:If the range elements are reference types, that's what will happen (unless they overload opAssign oslt). Otherwise, there's makeIndex (already mentioned by Andrei), or you could also do it by hand: 1. r.length.iota.array.sort!((a, b) => r[a]<r[b]); 2. r.length.iota.map!(a => &r[a]).array.sort!((a, b) => *a<*b); r.map!`&a` doesn't work though, because we don't get a reference to the range element in the predicate.You mean, sort!`a.foo < b.foo` ?Yes. An indirect sorting, assuming a and b to be ojects of class SomePotentialyLargeClass. Because the array to sort contains pointers only, all the data movement is essentially the same as if we were sorting integer.
Nov 17 2013
On 11/16/13 6:20 AM, Jean Christophe wrote:On Friday, 15 November 2013 at 21:46:26 UTC, Vladimir Panteleev wrote:That may help this particular situation, but does it do anything interesting in general?getPivot(0..10) 8,7,6,5,4,3,2,1,0,9 <- getPivot - before swap 9,7,6,5,4,8,2,1,0,3 <- getPivot - after swap 9,7,6,5,4,3,2,1,0,8 <- quickSortImpl - after swap 9,8,6,5,4,3,2,1,0,7 <- quickSortImpl - after partition getPivot(2..10) 6,5,4,3,2,1,0,7 <- getPivot - before swap 7,5,4,3,6,1,0,2 <- getPivot - after swap 7,5,4,3,2,1,0,6 <- quickSortImpl - after swap 7,6,4,3,2,1,0,5 <- quickSortImpl - after partition (...)One possible implementation suggests to swap Left and Right immediatly after choosing the Pivot (if Left > Right), then place the Pivot at Right-1. It seems that this option was not taken. Any reason?As the efficiency of Quicksort is known to be bad in sorting a small number of elements, ie. < 10, it might be nice to implement an option to automatically switch to a more appropriate algorithm if it's relevant to do so.That's done already.Vladimir also enumerated a few disadvantages of random pivot selection. I think those notwithstanding, random pivot is an option we should consider very seriously. It's easy to use a fast random number generators (randomness doesn't need to be very good). The only reason for which I've hesitated to add random pivot selection is that it's less usual and might surprise some. It is _very_ solidly motivated.* Many sources recommend using a random element as a pivot. According to [2], "Randomized quicksort, for any input, it requires only O(n log n) expected time (averaged over all choices of pivots)".IMO it would be costly and not so relevant if the goal is to be fast.Also, if it is not possible to predict the pivot choice, it is impossible to craft worst-case input, which is a plus from a security point[3]. However, I'm not sure if making the behavior of std.algorithm's sort nondeterministic is desirable.I think it's not desirable.Quicksorting a collection of Objects? BTW I'm very interested in finding a library which could Quicksort an array of pointers, where each pointer points to a class object (or a structure) address. The library would make possible, for example, to sort the `class objects` using one of their members as the key. Because the swaps are done on the actual pointers (and not the Objects pointed), the Quicksort should be very efficient. However, the algorithm woulnd't be so trivial to implement because each comparison shall be done using the Object's member's value : eg. Obj1.foo < Obj2.foo. Of course, the programmer can choose any relevant member property to sort the collection. And it should be as easy to use as: class SomeClass { string foo; double bar;} SomeClass[] a; // put 100 000 objects into a a.sort("foo"); Do we already have something like that available somewhere or is it possible to make one eventually?I had a diff at a point that added overloads to sort allowing the called to only specify the key. It turned out to be confusing, and all it saved was to save a tad of duplication "expr" instead of "expr(a) < expr(b)". Andrei
Nov 16 2013
On Saturday, 16 November 2013 at 16:10:56 UTC, Andrei Alexandrescu wrote:On 11/16/13 6:20 AM, Jean Christophe wrote:Yes. This has the extra advantage that the smallest of the three winds up in A[left], which is where the partitioning routine would put it anyway. The largest winds up at A[right] which is also the correct place, since it is larger than the Pivot. Therefore you can place the Pivot in A[right -1] and initialize i and j to (left+1) and (right-2) in the partition phase. Another benefit is that because A[left] is smaller than the Pivot, it will act as a sentinel for j. We do not need to worry about j running past the end. Same for i. Thus you assert: A[left] <= A[center] <= A[right] even before you hide the Pivot. -- JCOn Friday, 15 November 2013 at 21:46:26 UTC, Vladimir Panteleev wrote:That may help this particular situation, but does it do anything interesting in general?getPivot(0..10) 8,7,6,5,4,3,2,1,0,9 <- getPivot - before swap 9,7,6,5,4,8,2,1,0,3 <- getPivot - after swap 9,7,6,5,4,3,2,1,0,8 <- quickSortImpl - after swap 9,8,6,5,4,3,2,1,0,7 <- quickSortImpl - after partition getPivot(2..10) 6,5,4,3,2,1,0,7 <- getPivot - before swap 7,5,4,3,6,1,0,2 <- getPivot - after swap 7,5,4,3,2,1,0,6 <- quickSortImpl - after swap 7,6,4,3,2,1,0,5 <- quickSortImpl - after partition (...)One possible implementation suggests to swap Left and Right immediatly after choosing the Pivot (if Left > Right), then place the Pivot at Right-1. It seems that this option was not taken. Any reason?
Nov 16 2013
On Saturday, 16 November 2013 at 14:20:32 UTC, Jean Christophe wrote: clipBTW I'm very interested in finding a library which could Quicksort an array of pointers, where each pointer points to a class object (or a structure) address. The library would make possible, for example, to sort the `class objects` using one of their members as the key. Because the swaps are done on the actual pointers (and not the Objects pointed), the Quicksort should be very efficient. However, the algorithm woulnd't be so trivial to implement because each comparison shall be done using the Object's member's value : eg. Obj1.foo < Obj2.foo. Of course, the programmer can choose any relevant member property to sort the collection. And it should be as easy to use as: class SomeClass { string foo; double bar;} SomeClass[] a; // put 100 000 objects into a a.sort("foo"); Do we already have something like that available somewhere or is it possible to make one eventually? -- JCThis is sort of an interesting problem that you propose, because while sorting just the pointers means moving things around less, you would also get absolutely horrible locality of reference/cache hit performance. Have you thought about ways to deal with that?
Nov 16 2013
On Friday, 15 November 2013 at 21:46:26 UTC, Vladimir Panteleev wrote:...I'm a horrible procrastinator, I should have had this fixed over a year ago, lol. Anyways, I posted a bug report about this problem long ago and a working solution has been gathering dust waiting to be implemented in Phobos. Bug report (poorly titled): http://d.puremagic.com/issues/show_bug.cgi?id=7767 Solution: https://github.com/Xinok/XSort/blob/master/unstablesort.d * It's an introsort which falls back to a bottom-up binary heap sort after so many recursions to guarantee O(n log n) performance in the worst-case. * It chooses the pivot from a median-of-five. It adds nearly zero overhead and results in a noticeable decrease in comparisons (predicate calls). I see no reason why not to add this. * My solution uses a binary insertion sort for sublists up to 32 elements long. This reduces comparisons as well but also adds a fair amount of overhead. It's about 15-20% slower when sorting an array of integers. My idea is to use this by default and specialize the linear insertion sort when the element type and predicate are known. * Regardless of these improvements, I think Timsort should be the default sorting algorithm. It's the better choice in many (most?) cases and, well, it's stable. Quick sort would still be available for those cases in which it's faster and stable sorting isn't needed. Well, it's Saturday and I have no plans. Let's see if I can't get a pull request done by the end of the day...
Nov 16 2013
On 11/16/13 11:46 AM, Xinok wrote:* Regardless of these improvements, I think Timsort should be the default sorting algorithm. It's the better choice in many (most?) cases and, well, it's stable. Quick sort would still be available for those cases in which it's faster and stable sorting isn't needed.There's something fishy about a more restricted algorithm doing better than one that's less restricted. We must improve unstable sort, not make stable sort the default. Andrei
Nov 16 2013
On Saturday, 16 November 2013 at 20:35:21 UTC, Andrei Alexandrescu wrote:On 11/16/13 11:46 AM, Xinok wrote:Timsort is an "adaptive" sort. It's able to achieve better performance in many cases by exploiting patterns in the data. I decided to make a nice little test using my benchmark code here: https://github.com/Xinok/XSort/blob/master/benchsort.d This is the permutation I designed for the test: static uint[] base; base.length = 1024 * 1024; foreach(i, ref v; base) v = i; foreach(i; 0..1024) { auto rand() { rndGen.popFront(); return rndGen.front % base.length; } switch(rand() % 3) { // Swap two ranges of elements case 0: { auto arr = [rand(), rand(), rand(), rand()]; sort(arr); swapRanges(base[arr[0]..arr[1]], base[arr[2]..arr[3]]); } break; // Reverse a range of elements case 1: { auto a = rand(), b = rand(); if(a > b) swap(a, b); reverse(base[a..b]); } break; // Shuffle a small range of elements case 2: { auto a = rand(); while(a < 1024) a = rand(); assert(a >= 1024); randomShuffle(base[a - 1024 .. a]); } break; default: assert(0); } } And the results (last number is predicate calls): Current Unstable Sort 50ms 32783474 New Unstable Sort 69ms 21503542 Timsort 35ms 3905887 That's why I suggest making Timsort the default sorting algorithm.* Regardless of these improvements, I think Timsort should be the default sorting algorithm. It's the better choice in many (most?) cases and, well, it's stable. Quick sort would still be available for those cases in which it's faster and stable sorting isn't needed.There's something fishy about a more restricted algorithm doing better than one that's less restricted. We must improve unstable sort, not make stable sort the default. Andrei
Nov 16 2013
On 11/16/13 2:11 PM, Xinok wrote:On Saturday, 16 November 2013 at 20:35:21 UTC, Andrei Alexandrescu wrote:This is in response to what? Are you trying to talk me out of the pigeonhole principle?On 11/16/13 11:46 AM, Xinok wrote:Timsort is an "adaptive" sort. It's able to achieve better performance in many cases by exploiting patterns in the data.* Regardless of these improvements, I think Timsort should be the default sorting algorithm. It's the better choice in many (most?) cases and, well, it's stable. Quick sort would still be available for those cases in which it's faster and stable sorting isn't needed.There's something fishy about a more restricted algorithm doing better than one that's less restricted. We must improve unstable sort, not make stable sort the default. AndreiI decided to make a nice little test using my benchmark code here: https://github.com/Xinok/XSort/blob/master/benchsort.dI understand and appreciate that Timsort is a nicely optimized algorithm. This has nothing to do with it doing more work than an unstable sort that is just as nicely optimized. At the end of the day whatever stable sorting algorithms are out there, their set is included in the set of all sorting algorithms. In order to preserve stability, stable sorting algorithms need to do nonnegative extra work. Andrei
Nov 16 2013
On Saturday, 16 November 2013 at 23:40:39 UTC, Andrei Alexandrescu wrote:This is in response to what? Are you trying to talk me out of the pigeonhole principle? ... I understand and appreciate that Timsort is a nicely optimized algorithm. This has nothing to do with it doing more work than an unstable sort that is just as nicely optimized. At the end of the day whatever stable sorting algorithms are out there, their set is included in the set of all sorting algorithms. In order to preserve stability, stable sorting algorithms need to do nonnegative extra work. AndreiI think it's more complicated than that. Let's assume for a moment that you've proven that such an unstable sort must exist that is faster (I'm not convinced that it necessarily must take extra work to maintain stability). You have not shown how much faster it might be (it could be only 1% faster) nor how much work it would take to discover (even an ideal pivot choice for quicksort actually cannot be as fast as Timsort on an already sorted sequence, and quicksort is not an appropriate algorithm for accepting presorted subsequences). If it takes 2 years to come up with an unstable sort faster than Timsort, then it seems like a long time to be using something that isn't the best that we currently have. Especially if D is to be used in benchmarks against other languages. Timsort is implemented now and has no real cost to use as default. If, at some point, an unstable sort is discovered that is faster on average than Timsort AND (extremely important) it has no potential for an attacker to be able to choose a sequence to sort that causes poor performance, then unstable could be switched back as the default at some point. Right now, this could, in fact, cause a security hole (DOS-type attack) depending on how it is used. I think it's better to use a safer default in this case, even if we could make it faster than Timsort somehow.
Nov 16 2013
On 11/16/13 4:18 PM, Chris Cain wrote:I think it's more complicated than that. Let's assume for a moment that you've proven that such an unstable sort must exist that is faster (I'm not convinced that it necessarily must take extra work to maintain stability).Very simple. Any sequence with a large number of equivalent elements will cause timsort to work more than an unstable algorithm, because it would need to shift around those elements. (Consider e.g. 1, 0, 0, 0, ..., 0.) The discussion then shifts to how many equivalent elements are in the typical range etc.You have not shown how much faster it might be (it could be only 1% faster) nor how much work it would take to discover (even an ideal pivot choice for quicksort actually cannot be as fast as Timsort on an already sorted sequence, and quicksort is not an appropriate algorithm for accepting presorted subsequences).There are well known tweaks to quicksort that make it operate well on sorted or almost sorted sequences.If it takes 2 years to come up with an unstable sort faster than Timsort, then it seems like a long time to be using something that isn't the best that we currently have. Especially if D is to be used in benchmarks against other languages. Timsort is implemented now and has no real cost to use as default. If, at some point, an unstable sort is discovered that is faster on average than Timsort AND (extremely important) it has no potential for an attacker to be able to choose a sequence to sort that causes poor performance, then unstable could be switched back as the default at some point.We could do that, but that would be masking a bug instead of fixing it. We should instead focus on fixing unstable sort. One additional issue with making stable sort the default for a while is that people will grow to assume sort() is stable, and will have broken code when we switch back. Let's fix unstable sort. If it's slower on any other data than Timsort's happy case, we have a bug. Andrei
Nov 16 2013
On Sunday, 17 November 2013 at 00:56:46 UTC, Andrei Alexandrescu wrote:Very simple. Any sequence with a large number of equivalent elements will cause timsort to work more than an unstable algorithm, because it would need to shift around those elements. (Consider e.g. 1, 0, 0, 0, ..., 0.) The discussion then shifts to how many equivalent elements are in the typical range etc.Ah, very nicely described. So a stable sort can take more time when there is more than one equal element than a theoretical unstable sort.There are well known tweaks to quicksort that make it operate well on sorted or almost sorted sequences.I'd have to see them to comment on them. I'd doubt that they only use quicksort if they support taking advantage of large parts of the array being sorted. I'd expect some sort of merging to occur in that case.We could do that, but that would be masking a bug instead of fixing it. We should instead focus on fixing unstable sort. One additional issue with making stable sort the default for a while is that people will grow to assume sort() is stable, and will have broken code when we switch back. Let's fix unstable sort. If it's slower on any other data than Timsort's happy case, we have a bug.Sure, but could we at least agree on a timescale? If we can't come up with something significantly better than Timsort in the next year, it seems pointless to hold out for something that may never come. Hidden bug or not, deliberately exposing a bug that has no effective plan to be fixed isn't a good tactic either. Especially if D is expected to be competitive to other native languages in speed. Seeing D falter in the "sort" column of a shootout could be disconcerting to some and we can't count on everyone knowing to specify stable sort in those head-to-heads until we work out improving unstable sort. That said, your point that people might grow comfortable with sort being stable is a good one. Unfortunately, I don't have an answer for that. Documentation detailing that sort is not guaranteed a particular stability is about the best that could be done. Perhaps a variant of Timsort could be prepared that deliberately changes the order of equal elements. Designed instability could suffice as a stopgap against that (at least until the unstable sort is worked out).
Nov 16 2013
On Sunday, 17 November 2013 at 00:56:46 UTC, Andrei Alexandrescu wrote:On 11/16/13 4:18 PM, Chris Cain wrote:If you tweak for one scenario, you're only losing in other scenarios. Timsort already performs ideally in these cases, as I've demonstrated. Rather than optimizing quicksort for cases in which Timsort will still be the victor, we should optimize for those cases in which quicksort is already faster.You have not shown how much faster it might be (it could be only 1% faster) nor how much work it would take to discover (even an ideal pivot choice for quicksort actually cannot be as fast as Timsort on an already sorted sequence, and quicksort is not an appropriate algorithm for accepting presorted subsequences).There are well known tweaks to quicksort that make it operate well on sorted or almost sorted sequences.
Nov 16 2013
On 11/16/13 6:42 PM, Xinok wrote:On Sunday, 17 November 2013 at 00:56:46 UTC, Andrei Alexandrescu wrote:I am not so sure about that.On 11/16/13 4:18 PM, Chris Cain wrote:If you tweak for one scenario, you're only losing in other scenarios.You have not shown how much faster it might be (it could be only 1% faster) nor how much work it would take to discover (even an ideal pivot choice for quicksort actually cannot be as fast as Timsort on an already sorted sequence, and quicksort is not an appropriate algorithm for accepting presorted subsequences).There are well known tweaks to quicksort that make it operate well on sorted or almost sorted sequences.Timsort already performs ideally in these cases, as I've demonstrated.Yes, it is also my understanding that Timsort does well on almost sorted data. The issue is one does not know a priori what the distribution of the data is.Rather than optimizing quicksort for cases in which Timsort will still be the victor, we should optimize for those cases in which quicksort is already faster.We should have a "good unstable sort" in addition to the "good stable sort" we already have. Seeing this as a competition between quicksort and timsort would be the wrong frame. Probably a good step forward would be to hook a sort benchmarking corpus to our unittests. What are the consecrated corpora? Andrei
Nov 16 2013
On Sunday, 17 November 2013 at 04:14:22 UTC, Andrei Alexandrescu wrote:Probably a good step forward would be to hook a sort benchmarking corpus to our unittests. What are the consecrated corpora?I'll kick off the suggestions with this: File to provide "real" data: ftp://ftp.ncbi.nih.gov/genomes/Bacteria/Escherichia_coli_K_12_substr__MG1655_uid57779/NC_000913.fna From http://www.genome.jp/kegg-bin/show_organism?org=eco -> RefSeq -> NC_000913.fna (despite this being real data, typically you don't actually sort genomes like this ... it's simply "more interesting" than a pseudo-randomized dataset) Code: --- import std.range, std.algorithm, std.stdio, std.array, std.conv, std.datetime; enum strategy = SwapStrategy.stable; void main() { auto arr = File("NC_000913.fna").byLine .drop(1) // Trim the first line because it's just metadata /*.take(50)*/ .join .chunks(50) .map!text .array; StopWatch sw = StopWatch(AutoStart.yes); arr.sort!("a < b", strategy)(); sw.stop(); writeln(text(arr.length, " elements sorted in ", sw.peek.msecs, " msecs")); } --- On my system, this shows unstable sort to be ~2x faster than stable sort (which is to be expected of data with no discernible patterns). The "key" thing this is testing is the effectiveness of the sort on data that often takes longer to compare (with such a small alphabet, many strings have a common prefix). That said, it might also be reproduced "well enough" using a random generator to create similar strings to sort, but the basic idea is there. I just like using real genomes for performance testing things :)
Nov 16 2013
On 11/16/13 9:21 PM, Chris Cain wrote:That said, it might also be reproduced "well enough" using a random generator to create similar strings to sort, but the basic idea is there. I just like using real genomes for performance testing things :)I am hoping for some more representative corpora, along the lines of http://sortbenchmark.org/. Some data that we can use as good proxies for typical application usage. Andrei
Nov 16 2013
On Sunday, 17 November 2013 at 07:19:26 UTC, Andrei Alexandrescu wrote:On 11/16/13 9:21 PM, Chris Cain wrote:I think I get what you're saying, but sortbenchmark.org uses completely pseudorandom (but reproducable) entries that I don't think are representative of real data either: (using gensort -a minus the verification columns) --- AsfAGHM5om ~sHd0jDv6X uI^EYm8s=| Q)JN)R9z-L o4FoBkqERn *}-Wz1;TD- 0fssx}~[oB ... --- Most places use very fake data as proxies for real data. It's better to have something somewhat structured and choose data that, despite not being real data, stresses the benchmark in a unique way. I'm not suggesting my benchmark be the only one; if we're going to use pseudorandom data (I'm not certain we could actually get "realistic data" that would serve us that much better) we might as well have different test cases that stress the sort routine in different ways. Obviously, using the real genome would be preferable to generating some (since it's actually truly "real" data, just used in an unorthodox way) but there's a disadvantage to attaching a 4.6MB file to a benchmarking setup. Especially if more might come. Anyway, it's a reasonable representation of "data that has no discernable order that can occasionally take some time to compare." Think something like sorting a list of customer records by name. If they're ordered by ID, then the names would not likely have a discernable pattern and the comparison between names might be "more expensive" because some names can be common. We can do "more realistic" for that type of scenario, if you'd like. I could look at a distribution for last names/first names and generate fake names to sort in a reasonable approximation of a distribution of real names. I'm not certain the outcome would change that much.That said, it might also be reproduced "well enough" using a random generator to create similar strings to sort, but the basic idea is there. I just like using real genomes for performance testing things :)I am hoping for some more representative corpora, along the lines of http://sortbenchmark.org/. Some data that we can use as good proxies for typical application usage. Andrei
Nov 17 2013
On 11/17/13 6:20 AM, Chris Cain wrote:I'm not suggesting my benchmark be the only one; if we're going to use pseudorandom data (I'm not certain we could actually get "realistic data" that would serve us that much better) we might as well have different test cases that stress the sort routine in different ways. Obviously, using the real genome would be preferable to generating some (since it's actually truly "real" data, just used in an unorthodox way) but there's a disadvantage to attaching a 4.6MB file to a benchmarking setup. Especially if more might come.OK, since I see you have some interest... You said nobody would care to actually sort genome data. I'm aiming for data that's likely to be a good proxy for tasks people are interested in doing. For example, people may be interested in sorting floating-point numbers resulting from sales, measurements, frequencies, probabilities, and whatnot. Since most of those have a Gaussian distribution, a corpus with Gaussian-distributed measurements would be nice. Then, people may want to sort things by date/time. Depending on the scale the distribution is different - diurnal cycle, weekly cycle, seasonal cycle, secular ebbs and flows etc. I'm unclear on what would be a good set of data. For sub-day time ranges uniform distribution may be appropriate. Then, people may want to sort records by e.g. Lastname, Firstname, or index a text by words. For names we'd need some census data or phonebook. For general text sorting we can use classic texts such as Alice in Wonderland or the King James Bible (see http://corpus.canterbury.ac.nz/descriptions/). Sorting by word length is a possibility (word lengths are probably Gaussian-distributed). Uniform random data is also a baseline, not terribly representative, but worth keeping an eye on. In fact uniform data is unfairly rigged in quicksort's favor: any pivot is likely to be pretty good, and there are no sorted runs that often occur in real data. Andrei
Nov 17 2013
On Sunday, 17 November 2013 at 16:57:19 UTC, Andrei Alexandrescu wrote:Then, people may want to sort records by e.g. Lastname, Firstname, or index a text by words. For names we'd need some census data or phonebook.Sure. As a source of data, app_c.csv from http://www.census.gov/genealogy/www/data/2000surnames/ -> File B: Surnames... Test case: --- import std.stdio, std.algorithm, std.random, std.range, std.conv, std.datetime; // Name generator constants enum NumberOfPossibleNames = 1_000_000; enum NumberOfRandomNames = 1_000_000; enum SortStrategy = SwapStrategy.unstable; void main() { auto data = File("app_c.csv").byLine .drop(1) // First line is just column names .take(NumberOfPossibleNames) .map!(e => e.text.split(",")) .array; auto names = data.map!(e => e[0]).array; auto proportions = data.map!(e => e[2].to!size_t).array; auto rnd = Random(50); auto randomNames = fastDice(rnd, proportions) .take(NumberOfRandomNames) .map!(i => names[i]) .array; StopWatch sw = AutoStart.yes; randomNames.sort!("a < b", SortStrategy)(); sw.stop(); writeln(randomNames.length, " names sorted in ", sw.peek.msecs, " msecs using ", SortStrategy, " sort"); } struct FastDice(Rng, SearchPolicy pol = SearchPolicy.gallop) { SortedRange!(size_t[]) _propCumSumSorted; size_t _sum; size_t _front; Rng* _rng; this(ref Rng rng, size_t[] proportions) { size_t[] _propCumSum = proportions.save.array; _rng = &rng; size_t mass = 0; foreach(ref e; _propCumSum) { mass += e; e = mass; } _sum = _propCumSum.back; _propCumSumSorted = assumeSorted(_propCumSum); popFront(); } void popFront() { immutable point = uniform(0, _sum, *_rng); assert(point < _sum); _front = _propCumSumSorted.lowerBound!pol(point).length; } enum empty = false; property auto front() { return _front; } } auto fastDice(Rng)(ref Rng rng, size_t[] proportions) { return FastDice!(Rng)(rng, proportions); } auto fastDice(size_t[] proportions) { return fastDice(rndGen, proportions); } --- Results (using `-O -inline -release -noboundschecks` on my computer): unstable sort: 738 msecs stable sort: 1001 msecs Also, to do this (in reasonable time) I had to create a new range which I called "FastDice" ... it does the same as std.random.dice, but is intended for cases where you'll be throwing dice throws often on the same data, so it does a bit of precomputation (creating a cumulative sum array) and allows for binary search/gallop to reduce the time to come up with each throw. I opted for gallop in this since the data is sorted in such a way that most common names come first. It needs a bit of work to make it actually generic, but it's a good start and I'll just say it's WTFPL code, so use it for whatever. It'll be especially good for generating test cases like I have done above. FWIW, FastDice takes ~400ms to generate all those dice throws. I did a back-of-the-envelope calculation on using dice, and just the time saved caching the sum saved maybe 30 minutes (No, I didn't wait that long, I stopped after 5 and wrote FastDice) of time each run. And the time saved using a gallop search instead of linear search is pretty significant as well (difference between n and log n time).
Nov 17 2013
On 11/17/13 1:23 PM, Chris Cain wrote:On Sunday, 17 November 2013 at 16:57:19 UTC, Andrei Alexandrescu wrote:[snip]Then, people may want to sort records by e.g. Lastname, Firstname, or index a text by words. For names we'd need some census data or phonebook.Sure. As a source of data, app_c.csv from http://www.census.gov/genealogy/www/data/2000surnames/ -> File B: Surnames...It needs a bit of work to make it actually generic, but it's a good start and I'll just say it's WTFPL code, so use it for whatever. It'll be especially good for generating test cases like I have done above. FWIW, FastDice takes ~400ms to generate all those dice throws. I did a back-of-the-envelope calculation on using dice, and just the time saved caching the sum saved maybe 30 minutes (No, I didn't wait that long, I stopped after 5 and wrote FastDice) of time each run. And the time saved using a gallop search instead of linear search is pretty significant as well (difference between n and log n time).I think that's a terrific start! (Not sure I understand where the 30 minutes come from...) Andrei
Nov 17 2013
On Monday, 18 November 2013 at 02:36:40 UTC, Andrei Alexandrescu wrote:I think that's a terrific start!Thanks :)(Not sure I understand where the 30 minutes come from...)On my computer (`-release -inline -noboundscheck` ... `-O` is omitted because it removes the summation all together since it isn't used anywhere): --- auto benchmarkSumming() { auto proportions = iota(150_000); // copied from std.random.dice: auto sum = reduce!("(assert(b >= 0), a + b)")(0.0, proportions.save); } auto bench = benchmark!benchmarkSumming(1_000); writeln(bench.front.msecs, " ms"); --- Gives me 1618 ms. ~150_000 is the number of entries in the probability table (number of surnames), and generating the sum for 1_000 names would take about that long. I really wanted 1 million names to sort, so scaling that up would mean 1618 seconds, which is nearly 27 minutes. That summation doesn't change, hence the importance of caching it for repeated runs. That's about 27 minutes of completely wasted work. That all isn't to say that std.random.dice is fundamentally flawed (it's fine and perfectly suited for one-shot dice rolls and is, in fact, a bit better than a DiceRange for that task) but a DiceRange is really needed to efficiently do repeated rolls like this. Jeez, we're talking about maximizing sorting efficiency, and now I've gone off talking about making dice rolls faster too! :)
Nov 17 2013
On Monday, 18 November 2013 at 05:26:30 UTC, Chris Cain wrote:On my computer (`-release -inline -noboundscheck` ... `-O` is omitted because it removes the summation all together since it isn't used anywhere):The better way to avoid this problem is usually to compile the operation to be benchmarked separately, in such a way that the result can't be ignored (e.g. by making it read the parameters from function arguments and returning the result as the return value). Comparing non-optimized (i.e. -O) builds for execution speed is of questionable relevance for most cases. David
Dec 04 2013
Chris Cain:Also, to do this (in reasonable time) I had to create a new range which I called "FastDice" ... it does the same as std.random.dice, but is intended for cases where you'll be throwing dice throws often on the same data, so it does a bit of precomputation (creating a cumulative sum array)See also: http://d.puremagic.com/issues/show_bug.cgi?id=5849 Bye, bearophile
Nov 17 2013
On Monday, 18 November 2013 at 03:10:22 UTC, bearophile wrote:See also: http://d.puremagic.com/issues/show_bug.cgi?id=5849 Bye, bearophileNice one, bearophile. You called it. I've done a bit of testing with the solution you've attached to that bug report. It's 10-50% faster than my solution depending on how many proportions are provided (I've tested as much as 500_000). Despite mine being O(log(n)) to find the next dice roll and your solution being O(1), there really isn't that much of a difference. Strangely (to me), your solution actually seems to grow faster than mine as the number of proportions grow larger despite the proof claiming it should grow slower (although testing for > 500_000 seems pointless until actual data demands it). As far as integer proportions go, I think my solution might be better overall because it's effectively equivalent to the current dice implementation (meaning it could be used as a near drop-in replacement and get precisely the same results, as long as the proportions are floating point values) and the code is simpler and avoids hiding any real bugs. (That said, the original implementation has a bug that was easy to spot on my second comb-through) --- in popFront: // Return the number of elements that are not strictly greater than point _front = _propCumSumSorted.length - _propCumSumSorted.upperBound!pol(point).length; --- On the other hand, a cumulative sum array approach (like mine and the std.random.dice implementation) will likely suffer major precision issues with floating point numbers. Especially if the first proportions are very large and the later proportions are very small. I haven't closely looked through the algorithm you're using to see whether it fares better, but I can't imagine it doing much worse.
Nov 17 2013
On Sunday, 17 November 2013 at 00:18:24 UTC, Chris Cain wrote:I think it's more complicated than that. Let's assume for a moment that you've proven that such an unstable sort must exist that is faster (I'm not convinced that it necessarily must take extra work to maintain stability). You have not shown how much faster it might be (it could be only 1% faster) nor how much work it would take to discover (even an ideal pivot choice for quicksort actually cannot be as fast as Timsort on an already sorted sequence, and quicksort is not an appropriate algorithm for accepting presorted subsequences). If it takes 2 years to come up with an unstable sort faster than Timsort, then it seems like a long time to be using something that isn't the best that we currently have. Especially if D is to be used in benchmarks against other languages.Regarding an ideal pivot choice for quicksort, I'd like to emphasize that it is simply non-existent. Here's why. Let us measure quicksort performance as the number of comparisons it makes. Let us make some assumptions: quicksort goes as --(1) choose a pivot p in O(1) comparisons; --(2) partition the segment into two; --(3) run quicksort recursively on the two resulting segments. Now, (2) is effectively done by comparing every element with p, thus in Theta(n) (on the order of n) comparisons where n is the length of the current segment. Under these assumptions, we can construct the killer array for an arbitrary quicksort of such kind, even if we don't know how exactly the pivot is chosen (say, closed source or obfuscation), but we have the following interface to it: Instead of accessing the array a[], quicksort calls the function less(i,j) which tells it whether a[i]<a[j], and we control that function. Now, we start with all values in array a[] undefined. With each a[i], we associate a counter c[i]: how many times it took part in a comparison. As we can see from the above, during a single call to quicksort, in steps (1) and (2) the pivot will eventually get Theta(n) comparisons, while other elements will all get only O(1) comparisons each. So here's what we'll do. 1. When a comparison asks to relate two undefined values, we pick one of them with the larger c[i] and make it the lowest number possible. That is, the first fixed value will be 0, the next 1, and so on. (In the end, a[] will be a permutation of 0, 1, ..., n-1.) 2. When we are asked to relate a defined value to an undefined one, the defined value will always be lower (because, when we eventually define the undefined one, it will be higher than all the values defined before it). 3. When we are asked about two known values, just tell the truth. This simple algorithm ensures that, on each call to quicksort, we quickly fix the pivot as one of the O(1) lowest values on the segment. So, one of the next two segments will have length n-O(1), and the recursion gives us Theta(n) segments of linearly decreasing lengths, and thus Theta(n^2) total running time. Now, the assumption of picking a pivot in O(1) comparisons covers a broad variety of pivot choices, including first/last/middle/random element, median of three or five, median of medians, or any combination of these. The random pick fails in the following sense: if we seed the RNG, construct a killer case, and then start with the same seed, we get Theta(n^2) behavior reproduced. Double-pivot quicksort can be forced to go quadratic in a similar fashion. Now, introsort escapes this fate by switching to a guaranteed-n-log-n algorithm instead of (3) when it goes too deep. This comes at a (small) cost of checking the current depth on every call to quicksort. The above is just my retelling of a great short article "A Killer Adversary for Quicksort" by M. D. McIlroy here: http://www.cs.dartmouth.edu/~doug/mdmspe.pdf Ivan Kazmenko.
Nov 16 2013
On Sunday, 17 November 2013 at 01:07:20 UTC, Ivan Kazmenko wrote:Now, the assumption of picking a pivot in O(1) comparisons covers a broad variety of pivot choices, including first/last/middle/random element, median of three or five, median of medians, or any combination of these. The random pick fails in the following sense: if we seed the RNG, construct a killer case, and then start with the same seed, we get Theta(n^2) behavior reproduced.Correction: this does *not* cover the median-of-medians case [1]. Median of medians does not fall under our assumption that selecting a pivot is O(1). It also actually guarantees O(n log n) worst case performance, but of course at a cost of being slower than your typical pivot selection on an average case. [1] http://en.wikipedia.org/wiki/Median_of_medians Ivan Kazmenko.
Nov 16 2013
On Sunday, 17 November 2013 at 01:07:20 UTC, Ivan Kazmenko wrote:Regarding an ideal pivot choice for quicksort, I'd like to emphasize that it is simply non-existent. Here's why.Oh, of course. I did that in an algorithms class taught by a Professor who is into Cryptography. I agree that essentially there is no way to pick pivots that avoid worst case performance when an attacker is involved. When I mentioned "ideal pivot" I mean an actual good one on a sorted sequence Given [1,2,3,4,5,6,7] Picking the best pivot, 4 would result in scanning the entire array to assure that it is partitioned appropriately around the 4 (if a quicksort were designed wise to this sort of trick, but most, including D's quicksort, would actually shuffle everything around). By that point, Timsort is already done and drinking the victory champagne. And the quicksort still has to recur on the subsequences to sort them as well!
Nov 16 2013
On Sunday, 17 November 2013 at 01:39:43 UTC, Chris Cain wrote:(if a quicksort were designed wise to this sort of trick, but most, including D's quicksort, would actually shuffle everything around)Actually, not everything. It'd swap the pivot and the last element (4 and 7 for the first iteration) and then swap it back after it figures out the two partitions are good. There was a case where it ended up swapping everything around unnecessarily, but it wasn't this case... But I digress, the point remains that it would be slower than Timsort despite having a good pivot.
Nov 16 2013
On 11/16/13 5:39 PM, Chris Cain wrote:Given [1,2,3,4,5,6,7] Picking the best pivot, 4 would result in scanning the entire array to assure that it is partitioned appropriately around the 4 (if a quicksort were designed wise to this sort of trick, but most, including D's quicksort, would actually shuffle everything around).I keep on thinking of a sort with delayed pivot selection that starts scanning the array from left and right and only stops when it figures it's unpartitioned. At that point it proceeds with pivot selection, but if the array is already sorted there's no need to partition the left and right portions that were already explored. Essentially the pivot selection would be preceded by a sampling of the left and right of the array.By that point, Timsort is already done and drinking the victory champagne. And the quicksort still has to recur on the subsequences to sort them as well!Yah, already sorted data is ideal for Timsort. Andrei
Nov 16 2013
On Sunday, 17 November 2013 at 01:07:20 UTC, Ivan Kazmenko wrote:On Sunday, 17 November 2013 at 00:18:24 UTC, Chris Cain wrote:Woah, that sounds complicated. Not 100% sure I entirely understand but it seems to rely on changing the elements while you sort. How would that work in real life? For example, how would this defeat the following pivoting method. 1. Find the median value in the array. This can be done deterministically in linear time, so from a theoretical point of view is just as fast as picking a random element since you use linear time. 2. Use that as the pivot.I think it's more complicated than that. Let's assume for a moment that you've proven that such an unstable sort must exist that is faster (I'm not convinced that it necessarily must take extra work to maintain stability). You have not shown how much faster it might be (it could be only 1% faster) nor how much work it would take to discover (even an ideal pivot choice for quicksort actually cannot be as fast as Timsort on an already sorted sequence, and quicksort is not an appropriate algorithm for accepting presorted subsequences). If it takes 2 years to come up with an unstable sort faster than Timsort, then it seems like a long time to be using something that isn't the best that we currently have. Especially if D is to be used in benchmarks against other languages.Regarding an ideal pivot choice for quicksort, I'd like to emphasize that it is simply non-existent. Here's why. Let us measure quicksort performance as the number of comparisons it makes. Let us make some assumptions: quicksort goes as --(1) choose a pivot p in O(1) comparisons; --(2) partition the segment into two; --(3) run quicksort recursively on the two resulting segments. Now, (2) is effectively done by comparing every element with p, thus in Theta(n) (on the order of n) comparisons where n is the length of the current segment. Under these assumptions, we can construct the killer array for an arbitrary quicksort of such kind, even if we don't know how exactly the pivot is chosen (say, closed source or obfuscation), but we have the following interface to it: Instead of accessing the array a[], quicksort calls the function less(i,j) which tells it whether a[i]<a[j], and we control that function. Now, we start with all values in array a[] undefined. With each a[i], we associate a counter c[i]: how many times it took part in a comparison. As we can see from the above, during a single call to quicksort, in steps (1) and (2) the pivot will eventually get Theta(n) comparisons, while other elements will all get only O(1) comparisons each. So here's what we'll do. 1. When a comparison asks to relate two undefined values, we pick one of them with the larger c[i] and make it the lowest number possible. That is, the first fixed value will be 0, the next 1, and so on. (In the end, a[] will be a permutation of 0, 1, ..., n-1.) 2. When we are asked to relate a defined value to an undefined one, the defined value will always be lower (because, when we eventually define the undefined one, it will be higher than all the values defined before it). 3. When we are asked about two known values, just tell the truth. This simple algorithm ensures that, on each call to quicksort, we quickly fix the pivot as one of the O(1) lowest values on the segment. So, one of the next two segments will have length n-O(1), and the recursion gives us Theta(n) segments of linearly decreasing lengths, and thus Theta(n^2) total running time. Now, the assumption of picking a pivot in O(1) comparisons covers a broad variety of pivot choices, including first/last/middle/random element, median of three or five, median of medians, or any combination of these. The random pick fails in the following sense: if we seed the RNG, construct a killer case, and then start with the same seed, we get Theta(n^2) behavior reproduced. Double-pivot quicksort can be forced to go quadratic in a similar fashion. Now, introsort escapes this fate by switching to a guaranteed-n-log-n algorithm instead of (3) when it goes too deep. This comes at a (small) cost of checking the current depth on every call to quicksort. The above is just my retelling of a great short article "A Killer Adversary for Quicksort" by M. D. McIlroy here: http://www.cs.dartmouth.edu/~doug/mdmspe.pdf Ivan Kazmenko.
Nov 16 2013
On Sunday, 17 November 2013 at 01:48:14 UTC, Craig Dillabaugh wrote:1. Find the median value in the array. This can be done deterministically in linear time,My understanding that for unordered data, there is no algorithm that runs in worst-case O(n): http://en.wikipedia.org/wiki/Selection_algorithm http://en.wikipedia.org/wiki/Quickselectso from a theoretical point of view is just as fast as picking a random element since you use linear time.Picking a random element is constant time, though. You mean that O(1) and O(n) are equivalent here because they're both smaller than O(n log n), QuickSort's complexity?
Nov 16 2013
On Sunday, 17 November 2013 at 02:43:05 UTC, Vladimir Panteleev wrote:On Sunday, 17 November 2013 at 01:48:14 UTC, Craig Dillabaugh wrote:Unless I misunderstand something I am pretty sure there is a deterministic algorithm for finding the median in unsorted data: http://www.cs.cmu.edu/afs/cs.cmu.edu/academic/class/15451-s07/www/lecture_notes/lect0125.pdf1. Find the median value in the array. This can be done deterministically in linear time,My understanding that for unordered data, there is no algorithm that runs in worst-case O(n): http://en.wikipedia.org/wiki/Selection_algorithm http://en.wikipedia.org/wiki/QuickselectYeah, since you have to do O(n) work to partition the array, whether you do O(1) work, or O(n) work to find your pivot really doesn't matter. From a theoretical perspective anyway :o)so from a theoretical point of view is just as fast as picking a random element since you use linear time.Picking a random element is constant time, though. You mean that O(1) and O(n) are equivalent here because they're both smaller than O(n log n), QuickSort's complexity?
Nov 17 2013
On Sunday, 17 November 2013 at 08:33:11 UTC, Craig Dillabaugh wrote:http://www.cs.cmu.edu/afs/cs.cmu.edu/academic/class/15451-s07/www/lecture_notes/lect0125.pdfNice! Didn't know about this - although still seems to be a lot of work for each 'n'.
Nov 17 2013
On Sunday, 17 November 2013 at 01:48:14 UTC, Craig Dillabaugh wrote:On Sunday, 17 November 2013 at 01:07:20 UTC, Ivan Kazmenko wrote:Right, this method (median of medians is an example implementation) will not be defeated by the above algorithm. There's nothing wrong in it, since it takes Theta(n) comparisons to choose the pivot and so violates our attack's working conditions. In practice however, it means that, on an average case, the constant factor is very likely to be higher than for a quicksort vulnerable to our attack. Ivan Kazmenko.Regarding an ideal pivot choice for quicksort, I'd like to emphasize that it is simply non-existent. Here's why. Let us measure quicksort performance as the number of comparisons it makes. Let us make some assumptions: quicksort goes as --(1) choose a pivot p in O(1) comparisons; --(2) partition the segment into two; --(3) run quicksort recursively on the two resulting segments. Now, (2) is effectively done by comparing every element with p, thus in Theta(n) (on the order of n) comparisons where n is the length of the current segment. Under these assumptions, we can construct the killer array for an arbitrary quicksort of such kind, even if we don't know how exactly the pivot is chosen (say, closed source or obfuscation), but we have the following interface to it: Instead of accessing the array a[], quicksort calls the function less(i,j) which tells it whether a[i]<a[j], and we control that function. (snip)Woah, that sounds complicated. Not 100% sure I entirely understand but it seems to rely on changing the elements while you sort. How would that work in real life? For example, how would this defeat the following pivoting method. 1. Find the median value in the array. This can be done deterministically in linear time, so from a theoretical point of view is just as fast as picking a random element since you use linear time. 2. Use that as the pivot.
Nov 17 2013
On 11/16/13 5:07 PM, Ivan Kazmenko wrote:The above is just my retelling of a great short article "A Killer Adversary for Quicksort" by M. D. McIlroy here: http://www.cs.dartmouth.edu/~doug/mdmspe.pdfNice story, but the setup is a tad tenuous (albeit indeed theoretically interesting). For starters, if the attacker has a hook into the comparison function, they could trivially do a lot worse... Andrei
Nov 16 2013
On Sunday, 17 November 2013 at 03:58:58 UTC, Andrei Alexandrescu wrote:On 11/16/13 5:07 PM, Ivan Kazmenko wrote:Actually, I was thinking about a two phase attack, and it is not at all unlikely. 0. Say, the Sorter presents a library quicksort solution. It may be closed source, but can take comparison function as argument. 1. On the preparation phase, the Attacker uses the tricky comparison function described previously. What's important is that, besides observing Theta(n^2) behavior once, he now gets a real array a[] such that this behavior can be reproduced. 2. On the attack phase, the Attacker does not need access to the comparison function. He just feeds the array obtained on the previous step as plain data. Ivan Kazmenko.The above is just my retelling of a great short article "A Killer Adversary for Quicksort" by M. D. McIlroy here: http://www.cs.dartmouth.edu/~doug/mdmspe.pdfNice story, but the setup is a tad tenuous (albeit indeed theoretically interesting). For starters, if the attacker has a hook into the comparison function, they could trivially do a lot worse...
Nov 17 2013
On 11/17/13 2:20 AM, Ivan Kazmenko wrote:On Sunday, 17 November 2013 at 03:58:58 UTC, Andrei Alexandrescu wrote:That won't do with randomized pivot selection. AndreiOn 11/16/13 5:07 PM, Ivan Kazmenko wrote:Actually, I was thinking about a two phase attack, and it is not at all unlikely. 0. Say, the Sorter presents a library quicksort solution. It may be closed source, but can take comparison function as argument. 1. On the preparation phase, the Attacker uses the tricky comparison function described previously. What's important is that, besides observing Theta(n^2) behavior once, he now gets a real array a[] such that this behavior can be reproduced. 2. On the attack phase, the Attacker does not need access to the comparison function. He just feeds the array obtained on the previous step as plain data. Ivan Kazmenko.The above is just my retelling of a great short article "A Killer Adversary for Quicksort" by M. D. McIlroy here: http://www.cs.dartmouth.edu/~doug/mdmspe.pdfNice story, but the setup is a tad tenuous (albeit indeed theoretically interesting). For starters, if the attacker has a hook into the comparison function, they could trivially do a lot worse...
Nov 17 2013
On 11/17/2013 02:07 AM, Ivan Kazmenko wrote:The random pick fails in the following sense: if we seed the RNG, construct a killer case, and then start with the same seed, we get Theta(n^2) behavior reproduced.Hence, in no sense. This does not perform independent uniform random picks.
Nov 17 2013
On Sunday, 17 November 2013 at 13:07:40 UTC, Timon Gehr wrote:On 11/17/2013 02:07 AM, Ivan Kazmenko wrote:Not at all. There is a number of situations where you want your program to use RNG, but it is also important for the result to be reproducible. In such cases, you typically store the RNG seed for re-use. Of course there are also many cases where you don't need reproducibility guarantees, and there, the attack is useless. Ivan Kazmenko.The random pick fails in the following sense: if we seed the RNG, construct a killer case, and then start with the same seed, we get Theta(n^2) behavior reproduced.Hence, in no sense. This does not perform independent uniform random picks.
Nov 17 2013
On 11/17/2013 02:14 PM, Ivan Kazmenko wrote:On Sunday, 17 November 2013 at 13:07:40 UTC, Timon Gehr wrote:One can't say random picking is bad because one can use some other (deterministic) pivot selection algorithm instead which is bad. If you need deterministic reproducibility guarantees, then random picking is useless.On 11/17/2013 02:07 AM, Ivan Kazmenko wrote:Not at all. There is a number of situations where you want your program to use RNG, but it is also important for the result to be reproducible. In such cases, you typically store the RNG seed for re-use. Of course there are also many cases where you don't need reproducibility guarantees, and there, the attack is useless. Ivan Kazmenko.The random pick fails in the following sense: if we seed the RNG, construct a killer case, and then start with the same seed, we get Theta(n^2) behavior reproduced.Hence, in no sense. This does not perform independent uniform random picks.
Nov 17 2013
On Saturday, 16 November 2013 at 22:11:46 UTC, Xinok wrote:And the results (last number is predicate calls): Current Unstable Sort 50ms 32783474 New Unstable Sort 69ms 21503542 Timsort 35ms 3905887For the record, I tried both SwapStragegy options with my data (the data that got me to start this thread), and although TimSort did fewer comparisons (predicate calls), it ran about 20% slower.
Nov 16 2013
On Sunday, 17 November 2013 at 02:44:45 UTC, Vladimir Panteleev wrote:On Saturday, 16 November 2013 at 22:11:46 UTC, Xinok wrote:Could you try running a benchmark on the same data using this instead? https://github.com/Xinok/XSort/blob/master/unstablesort.d My implementation of quick sort has some odd tweaks which makes it slower in some cases, but I've never invoked worst-case behavior by accident.And the results (last number is predicate calls): Current Unstable Sort 50ms 32783474 New Unstable Sort 69ms 21503542 Timsort 35ms 3905887For the record, I tried both SwapStragegy options with my data (the data that got me to start this thread), and although TimSort did fewer comparisons (predicate calls), it ran about 20% slower.
Nov 16 2013
On Sunday, 17 November 2013 at 03:14:41 UTC, Xinok wrote:On Sunday, 17 November 2013 at 02:44:45 UTC, Vladimir Panteleev wrote:Sorry for the late reply - my access to my home PC was spotty in the past week. I wasn't entirely correct in the parent post - I didn't test things in the same way as the original problem, since I originally encountered it while using multiSort. multiSort can't use stable sort, because "stable $(D partition3) has not been implemented yet". If I hack multiSort to use TimSort instead of QuickSort, the program finishes quickly. I reran the test in the same way as in the parent post, incl. with unstableSort. Here are the results: SwapStrategy.unstable: 560576749 comparisons, 4796076 ticks SwapStrategy.stable : 340617464 comparisons, 5757974 ticks unstableSort : 573916901 comparisons, 5255061 ticksOn Saturday, 16 November 2013 at 22:11:46 UTC, Xinok wrote:Could you try running a benchmark on the same data using this instead? https://github.com/Xinok/XSort/blob/master/unstablesort.d My implementation of quick sort has some odd tweaks which makes it slower in some cases, but I've never invoked worst-case behavior by accident.And the results (last number is predicate calls): Current Unstable Sort 50ms 32783474 New Unstable Sort 69ms 21503542 Timsort 35ms 3905887For the record, I tried both SwapStragegy options with my data (the data that got me to start this thread), and although TimSort did fewer comparisons (predicate calls), it ran about 20% slower.
Dec 04 2013
On Friday, 15 November 2013 at 21:46:26 UTC, Vladimir Panteleev wrote:...I've made a pull request to fix this issue: https://github.com/D-Programming-Language/phobos/pull/1735
Nov 28 2013
On Friday, 29 November 2013 at 01:47:49 UTC, Xinok wrote:I've made a pull request to fix this issue: https://github.com/D-Programming-Language/phobos/pull/1735I've left some comments on Github. The greatest concern IMO is that the new implementation relies on assignments rather than swaps. It would be a regression to disallow sorting of entities with disabled assignment. Ivan Kazmenko.
Nov 28 2013