www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - Quadratic time to sort in a simple case?

reply "Ivan Kazmenko" <gassa mail.ru> writes:
Hi!

Consider a sorted array.  Append an element that is less than all 
the previous elements.  Then sort the array again by the sort 
function from std.algorithm.

An example follows:

-----
import std.algorithm;
import std.datetime;
import std.range;
import std.stdio;

void main ()
{
	int n = 30_000;
	auto a = array (iota (n)) ~ [0];
	auto st = Clock.currTime ();
	sort (a);
	auto et = Clock.currTime ();
	writeln (et - st);
}
-----

With n = 30_000 as in the example, this takes time of the order 
of a second on a modern computer, which is clearly O(n^2).  I am 
using DMD 2.062.

Well, now I have a point to make and a question.

My point is that I think this is bad because such a pattern 
(sorted array ~ small element, then sort) is fairly likely to 
occur - well, it did for me, hence the post.  Sure, every fast 
and simple method for choosing the pivot in quicksort will have 
its O(n^2) corner cases.  But there's a difference between a 
corner case that never occurs in "real" data (but still could be 
constructed artificially) and a corner case describable by such a 
simple pattern.

The question is rather predictable, really: what is the 
guaranteed-n-log-n replacement for sort in the standard library?  
I've found the following way...
	sort !("a < b", SwapStrategy.stable) (a);
...but it forces to specify the redundant "a < b" and looks a bit 
clumsy for an everyday sort() call.

Tried to keep it short(er) this time,

Ivan Kazmenko.
Apr 19 2013
next sibling parent reply "bearophile" <bearophileHUGS lycos.com> writes:
Ivan Kazmenko:

 Consider a sorted array.  Append an element that is less than 
 all the previous elements.  Then sort the array again by the 
 sort function from std.algorithm.
If you know that, then don't do a sort. Make some free space in the array, shift the items, sort just the first part with a slice.
 The question is rather predictable, really: what is the 
 guaranteed-n-log-n replacement for sort in the standard 
 library?  I've found the following way...
 	sort !("a < b", SwapStrategy.stable) (a);
That uses the TimSort that contains the optimization you look for.
 ...but it forces to specify the redundant "a < b" and looks a 
 bit clumsy for an everyday sort() call.
Then try to define an alias (untested): alias mySort = sort!("a < b", SwapStrategy.stable); Bye, bearophile
Apr 19 2013
parent reply "Ivan Kazmenko" <gassa mail.ru> writes:
 That [SwapStrategystable] uses the TimSort that contains
 the optimization you look for.

 alias mySort = sort!("a < b", SwapStrategy.stable);
Thank you, these are good for now. A seemingly easy way to provide the choice globally would be to have an assignable SwapStrategy.default in std.algorithm... or would that be a bad design decision to have such a global state?
 Consider a sorted array.  Append an element that is less than 
 all the previous elements.  Then sort the array again by the 
 sort function from std.algorithm.
If you know that, then don't do a sort. Make some free space in the array, shift the items, sort just the first part with a slice.
That behavior isn't always easy to predict. Still, I think this is a problem in Phobos which should be fixed. In most implementations of quicksort (even middle-element, the easiest to come up with), one expects it to perform in O(n log n) with probability close to one, except on some artificially constructed cases. On a side note, I'm surprised that median-of-three (getPivot in std.algorithm) fails at such a simple test, so I had something new to learn there, too. If I just comment out the switch in getPivot and "return mid" directly, it becomes fast enough in this case. Another view at this is the following. As far as I know by now, one of the points beyond D (and Phobos) design is to have everything safe by default, but faster if you want it explicitly and you know what you are doing. In this particular case, that would mean having a worst-case O(n log n) sort, and/or a stable sort, by default - but with the opportunity to switch to the (time-wise and stability-wise) unsafe quicksort if you really need the extra speedup and know what you are doing. So, why isn't TimSort the default? The one reason I can imagine is that then D would suffer in language comparisons which just take the most easy-to-use "default sort" and don't care about the extra features it got. Are there any other? Ivan Kazmenko.
Apr 19 2013
next sibling parent reply "bearophile" <bearophileHUGS lycos.com> writes:
Ivan Kazmenko:

 A seemingly easy way to provide the choice globally would be to 
 have an assignable SwapStrategy.default in std.algorithm... or 
 would that be a bad design decision to have such a global state?
With that I think there is no way to write a pure sort. Hopefully someday the sort() will be pure. Generally global mutable state is bad to write unittests, and to reason about code. So I don't think Andrei will do that. It's not kosher.
 Still, I think this is a problem in Phobos which should be 
 fixed.
  In most implementations of quicksort (even middle-element, the 
 easiest to come up with), one expects it to perform in O(n log 
 n) with probability close to one, except on some artificially 
 constructed cases.

 On a side note, I'm surprised that median-of-three (getPivot in 
 std.algorithm) fails at such a simple test, so I had something 
 new to learn there, too.  If I just comment out the switch in 
 getPivot and "return mid" directly, it becomes fast enough in 
 this case.
Your case is a common one, so I think this problem should be studied a little better. I suggest to write a little better benchmark that shows the problem, and put it in Bugzilla. At worst it will be closed with wontfix.
 Another view at this is the following.  As far as I know by 
 now, one of the points beyond D (and Phobos) design is to have 
 everything safe by default, but faster if you want it 
 explicitly and you know what you are doing.
Right, but I think that D Zen rule is mostly about things like memory safety, etc.
 So, why isn't TimSort the default?
Probably because someone thinks that "on average" the other sort is faster. In theory the other should be faster, because if you relax the constraints of the sort being stable I think in theory you should be able to write a little faster sorting algorithm (I don't know if this is true). Bye, bearophile
Apr 19 2013
parent reply "Chris Cain" <clcain uncg.edu> writes:
On Friday, 19 April 2013 at 22:56:00 UTC, bearophile wrote:
 So, why isn't TimSort the default?
Probably because someone thinks that "on average" the other sort is faster. In theory the other should be faster, because if you relax the constraints of the sort being stable I think in theory you should be able to write a little faster sorting algorithm (I don't know if this is true).
I'm just throwing my 2c in, but I think TimSort ought to be the default. It's simply the safer option. Since worst-case performance can be designed (and it can be designed unless the pivots are, at least, pseudo-randomly chosen), there is a very real risk of the default sort being used in a way that can be compromised by an attacker. From this perspective, seems to be "Make doing things the right way easier than the wrong way." Plus, TimSort seems to be faster in most cases in my attempts. The only cases I could find that it wasn't faster is when I could guarantee that the data I was passing in had no order to it. In cases where you suspect (but can't guarantee) data is sorted when passed in (think fetching from a web API and it gives you sorted data, but the docs don't say it should), calling TimSort is nearly as fast as calling an "isSorted" ... So, my recommendation is to just call TimSort and don't worry about the extra branching code ([checking sortedness, if not sorted, call sort] vs [call sort]). This makes it so the "fast" code is also very convenient to write. TBH, though, I think the reason that TimSort is not the default is because it was added only semi-recently. The old stable sort was not nearly as fast as the unstable sort (and, in fact, IIRC, it didn't work properly when I tried it). I doubt that someone intentionally said that quicksort was faster than TimSort on average ... it's just that no one thought to change the default when TimSort was added. Maybe an enhancement request could be made on this?
Apr 20 2013
parent reply Dmitry Olshansky <dmitry.olsh gmail.com> writes:
20-Apr-2013 16:22, Chris Cain пишет:
 On Friday, 19 April 2013 at 22:56:00 UTC, bearophile wrote:
 So, why isn't TimSort the default?
Probably because someone thinks that "on average" the other sort is faster. In theory the other should be faster, because if you relax the constraints of the sort being stable I think in theory you should be able to write a little faster sorting algorithm (I don't know if this is true).
I'm just throwing my 2c in, but I think TimSort ought to be the default. It's simply the safer option. Since worst-case performance can be designed (and it can be designed unless the pivots are, at least, pseudo-randomly chosen), there is a very real risk of the default sort being used in a way that can be compromised by an attacker. From this perspective, seems to be like TimSort being default would support the wrong way."
And this all is good but TimSort allocates O(N) memory. The constant in front of N is smallish less then 1.0 but it could cause some grief. -- Dmitry Olshansky
Apr 20 2013
parent reply "Xinok" <xinok live.com> writes:
On Saturday, 20 April 2013 at 16:35:25 UTC, Dmitry Olshansky 
wrote:
 And this all is good but TimSort allocates O(N) memory. The 
 constant in front of N is smallish less then 1.0 but it could 
 cause some grief.
Worst case is O(n/2), but it starts small and doubles in size as needed. On a partially sorted array, it may only use a small amount of memory.
Apr 22 2013
parent Dmitry Olshansky <dmitry.olsh gmail.com> writes:
23-Apr-2013 05:17, Xinok пишет:
 On Saturday, 20 April 2013 at 16:35:25 UTC, Dmitry Olshansky wrote:
 And this all is good but TimSort allocates O(N) memory. The constant
 in front of N is smallish less then 1.0 but it could cause some grief.
Worst case is O(n/2), but it starts small and doubles in size as needed. On a partially sorted array, it may only use a small amount of memory.
Good to know, still it's something to keep in mind. Especially for allocation-free minded folks. -- Dmitry Olshansky
Apr 24 2013
prev sibling next sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 4/19/13 6:37 PM, Ivan Kazmenko wrote:
 That [SwapStrategystable] uses the TimSort that contains
 the optimization you look for.

 alias mySort = sort!("a < b", SwapStrategy.stable);
Thank you, these are good for now. A seemingly easy way to provide the choice globally would be to have an assignable SwapStrategy.default in std.algorithm... or would that be a bad design decision to have such a global state?
Yah, that would mean one can't reason what a piece of code does without knowing what the context was.
 Consider a sorted array. Append an element that is less than all the
 previous elements. Then sort the array again by the sort function
 from std.algorithm.
If you know that, then don't do a sort. Make some free space in the array, shift the items, sort just the first part with a slice.
That behavior isn't always easy to predict. Still, I think this is a problem in Phobos which should be fixed. In most implementations of quicksort (even middle-element, the easiest to come up with), one expects it to perform in O(n log n) with probability close to one, except on some artificially constructed cases. On a side note, I'm surprised that median-of-three (getPivot in std.algorithm) fails at such a simple test, so I had something new to learn there, too. If I just comment out the switch in getPivot and "return mid" directly, it becomes fast enough in this case.
I could have sworn I made getPivot choose at random at some point. I agree this should be fixed; there are a number of possibilities: a) choose median of five b) if the array is large, sort a stride of it first (e.g. 1%) then choose the pivot as the median point of that stride c) add introspection to the algorithm, i.e. if an attempt to partition hits the worst case or near worst case, just try another pivot instead of moving forward with the sorting stage.
 Another view at this is the following. As far as I know by now, one of
 the points beyond D (and Phobos) design is to have everything safe by
 default, but faster if you want it explicitly and you know what you are
 doing.
That view of safety is different (memory safety).
 In this particular case, that would mean having a worst-case O(n
 log n) sort, and/or a stable sort, by default - but with the opportunity
 to switch to the (time-wise and stability-wise) unsafe quicksort if you
 really need the extra speedup and know what you are doing. So, why isn't
 TimSort the default?
TimSort is slower on average. Andrei
Apr 22 2013
next sibling parent reply "bearophile" <bearophileHUGS lycos.com> writes:
Andrei Alexandrescu:

 c) add introspection to the algorithm, i.e. if an attempt to 
 partition hits the worst case or near worst case, just try 
 another pivot instead of moving forward with the sorting stage.
Or switch to a sort that is guaranteed to have a pseudo-linear complexity. I am not sure, but I think the C++ STL sort does this.
 TimSort is slower on average.
Here it's not easy to define what "average" is. Python devs think that a common case is an array with mostly sorted data with unsorted data at the end. Bye, bearophile
Apr 22 2013
next sibling parent "bearophile" <bearophileHUGS lycos.com> writes:
And by the way, I still suggest a dual-pivot quicksort:

http://permalink.gmane.org/gmane.comp.java.openjdk.core-libs.devel/2628

Bye,
bearophile
Apr 22 2013
prev sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 4/22/13 5:52 PM, bearophile wrote:
 Andrei Alexandrescu:

 c) add introspection to the algorithm, i.e. if an attempt to partition
 hits the worst case or near worst case, just try another pivot instead
 of moving forward with the sorting stage.
Or switch to a sort that is guaranteed to have a pseudo-linear complexity. I am not sure, but I think the C++ STL sort does this.
There's not "the C++ STL sort". Any implementation is fine as long as it has O(n log n) expected run time.
 TimSort is slower on average.
Here it's not easy to define what "average" is. Python devs think that a common case is an array with mostly sorted data with unsorted data at the end.
Note that in this case it's sorted data followed by its smallest element. (Changing the value does improve sped.) This is hitting a corner case: median of 3 will pick the smallest element, which will lead to an inefficient partition. I haven't looked at the code, but it seems the smallest element again makes it to the beginning and the end of the array so it's again picked etc. One simple solution to this (which I forgot to mention) is picking a pivot at random. Andrei
Apr 23 2013
parent reply "bearophile" <bearophileHUGS lycos.com> writes:
Andrei Alexandrescu:

 There's not "the C++ STL sort". Any implementation is fine as 
 long as it has O(n log n) expected run time.
This seems to use the Introsort: http://www.sgi.com/tech/stl/sort.html I don't know if Xinok has tested a 2-pivot quicksort (called by the usual Introsort setting). Bye, bearophile
Apr 23 2013
parent reply "Xinok" <xinok live.com> writes:
On Tuesday, 23 April 2013 at 14:12:01 UTC, bearophile wrote:
 Andrei Alexandrescu:

 There's not "the C++ STL sort". Any implementation is fine as 
 long as it has O(n log n) expected run time.
This seems to use the Introsort: http://www.sgi.com/tech/stl/sort.html I don't know if Xinok has tested a 2-pivot quicksort (called by the usual Introsort setting). Bye, bearophile
On an average case, two-pivot quick sort will increase the number of comparisons by about 50%, reason being that about half the elements will be greater than the first pivot and have to he compared to the second pivot. There's also a greater complexity given there are three partitions rather than two, increasing the amount of I/O necessary. Introsort is simply a quick sort which falls back to a heap sort after so many recursions to guarantee O(n log n) performance. I've implemented this using a median of five and it works excellent. This is what I plan to contribute to Phobos. Choosing a random pivot will always invoke "average" performance where as finding the median of N elements usually achieves better performance on sorted lists. You also have to be careful that equal elements are distributed equally among both partitions, else a list with many elements equal to the pivot will still achieve poor performance. (equal elements would all fall onto one side of the pivot)
Apr 23 2013
parent "Ivan Kazmenko" <gassa mail.ru> writes:
On Tuesday, 23 April 2013 at 17:42:08 UTC, Xinok wrote:

 Choosing a random pivot will always invoke "average" 
 performance where as finding the median of N elements usually 
 achieves better performance on sorted lists. You also have to 
 be careful that equal elements are distributed equally among 
 both partitions, else a list with many elements equal to the 
 pivot will still achieve poor performance. (equal elements 
 would all fall onto one side of the pivot)

 Introsort is simply a quick sort which falls back to a heap 
 sort after so many recursions to guarantee O(n log n) 
 performance. I've implemented this using a median of five and 
 it works excellent. This is what I plan to contribute to Phobos.
I like the sound of that! Regarding my previous post, it's perhaps worth mentioning Go in the comparison, it uses QuickSort with median-of-three for small sizes and median of medians-of-three for larger sizes [1]. And it actually sorts the three medians in median-of-three, which sounds like a good thing to do. They cite "Engineering a Sort Function" (1993) by Bentley and McIlroy as inspiration [2]. Ivan Kazmenko. ----- [1] http://golang.org/src/pkg/sort/sort.go#L104 [2] http://www.enseignement.polytechnique.fr/informatique/profs/Luc.Maranget/421/09/bentley93engineering.pdf
Apr 23 2013
prev sibling parent reply "Ivan Kazmenko" <gassa mail.ru> writes:
On Monday, 22 April 2013 at 21:34:47 UTC, Andrei Alexandrescu 
wrote:

 a) choose median of five
Sounds like it could decrease speed on average, and still perhaps fail on a simple case like [0, 1, 2, 3, ..., 999, 0, 0] or something? Didn't test it though.
 b) if the array is large, sort a stride of it first (e.g. 1%) 
 then choose the pivot as the median point of that stride
A bad case would be, e.g., the array containing almost sorted parts, and the stride getting taken from the part where lowest or highest elements reside.
 c) add introspection to the algorithm, i.e. if an attempt to 
 partition hits the worst case or near worst case, just try 
 another pivot instead of moving forward with the sorting stage.
Now, trying another pivot again and again could give the same O(n^2) in a bad case.
 One simple solution to this (which I forgot to mention) is
 picking a pivot at random.
But it would again mean that the sort function depends on global state (RNG state). And if it doesn't (a local RNG is initialized somehow each time), an adversary would just have to hardcode it once to get the same O(n^2) worst case anytime he wants. And on Tuesday, 23 April 2013 at 01:10:26 UTC, Xinok wrote:
 I filed a bug report for this issue a year ago:
 http://d.puremagic.com/issues/show_bug.cgi?id=7767

 I've been meaning to fix this issue myself. Time allowing, I'll 
 do it soon.
What I wonder now, what would be the goal of such a fix? One possible goal would be to have an O(n log n) worst case sort. And that would perhaps cost some speed and/or memory on average. Besides, such a sort function is already implemented (TimSort), so it's just a matter of setting the default then. Another goal would be to have O(n^2) only for cases which don't have a reasonable chance to occur in real world, but could still be constructed artificially by an adversary. And that could perhaps be done by just changing the getPivot implementation. Other languages' experience may be useful here: 1. GNU C++ 4.x std::sort implementation is IntroSort [1]. 2. Microsoft .NET used Quicksort up to version 4.0 [2]. Now in .NET 4.5 they use Introsort too [3]. 3. For primitive types, Java 6 used a tuned QuickSort [4]. The current Java 7 uses Dual-Pivot QuickSort [5]. Java uses TimSort for Objects [6]. 4. Python uses TimSort since version 2.3 [7]. In any case, there are techniques to construct a bad case given a QuickSort implementation. One of them is described by M. Douglas McIlroy in "A Killer Adversary for Quicksort" [8]. We run QuickSort on a would-be-array "a" controlling the "a[i] < a[j]" predicate, and, using only certain assumptions (not looking at the code!), we build a killer-case array on the fly. Given some thought, the technique could perhaps be extended to Dual-Pivot QuickSort too. As long as some flavor of QuickSort (and topN) is in Phobos, such a unittest would fit the sort implementation nicely, if only to show just how "bad" it can get. Ivan Kazmenko. ----- References: [1] http://gcc.gnu.org/onlinedocs/libstdc++/libstdc++-html-USERS-4.4/a01027.html#g152148508b4a39e15ffbfbc987ab653a [2] http://msdn.microsoft.com/en-us/library/6tf1f0bc%28v=vs.100%29.aspx [3] http://msdn.microsoft.com/en-us/library/6tf1f0bc%28v=vs.110%29.aspx [4] http://docs.oracle.com/javase/6/docs/api/java/util/Arrays.html#sort%28int[]%29 [5] http://docs.oracle.com/javase/7/docs/api/java/util/Arrays.html#sort%28int[]%29 [6] http://docs.oracle.com/javase/7/docs/api/java/util/Arrays.html#sort%28java.lang.Object[]%29 [7] http://en.wikipedia.org/wiki/Timsort [8] http://www.cs.dartmouth.edu/~doug/mdmspe.pdf
Apr 23 2013
parent Dmitry Olshansky <dmitry.olsh gmail.com> writes:
24-Apr-2013 01:09, Ivan Kazmenko пишет:
 And on Tuesday, 23 April 2013 at 01:10:26 UTC, Xinok wrote:
 I filed a bug report for this issue a year ago:
 http://d.puremagic.com/issues/show_bug.cgi?id=7767

 I've been meaning to fix this issue myself. Time allowing, I'll do it
 soon.
What I wonder now, what would be the goal of such a fix? One possible goal would be to have an O(n log n) worst case sort. And that would perhaps cost some speed and/or memory on average. Besides, such a sort function is already implemented (TimSort), so it's just a matter of setting the default then.
A good unstable sort can do the job faster (at the very least not slower) then a good stable sort. I'm looking forward to a version of Introsort that Xinok has in mind as a "Q-sort fix". -- Dmitry Olshansky
Apr 24 2013
prev sibling parent "Xinok" <xinok live.com> writes:
On Friday, 19 April 2013 at 22:37:45 UTC, Ivan Kazmenko wrote:
 So, why isn't TimSort the default?
I would actually argue for this, not for safety (introsort is an adequate solution) but for different reasons. Timsort is stable and it's faster on data with low entropy, being nearly instantaneous on already sorted lists. I would guess it's the better choice for most cases. Then for those cases where stable sorting isn't necessary and unstable sorting would be faster, the user could choose the second option.
Apr 24 2013
prev sibling next sibling parent reply Dmitry Olshansky <dmitry.olsh gmail.com> writes:
20-Apr-2013 01:03, Ivan Kazmenko пишет:
 With n = 30_000 as in the example, this takes time of the order of a
 second on a modern computer, which is clearly O(n^2).  I am using DMD
 2.062.
Optimization flags if any? -- Dmitry Olshansky
Apr 19 2013
parent reply "Ivan Kazmenko" <gassa mail.ru> writes:
 With n = 30_000 as in the example, this takes time of the 
 order of a
 second on a modern computer, which is clearly O(n^2).  I am 
 using DMD
 2.062.
Optimization flags if any?
Both "-O" and no-flags give quadratic behavior. Well, if that does not convince you the time grows faster than n-log-n, maybe this comparison shall (dmd -O, Core i7-2600K 3400 MHz): n T, seconds 15_000 0.312 30_000 1.076 60_000 3.775 120_000 11.247 With n growing x2, the time grows x3 to x4 at each step.
Apr 19 2013
parent Dmitry Olshansky <dmitry.olsh gmail.com> writes:
20-Apr-2013 02:12, Ivan Kazmenko пишет:
 With n = 30_000 as in the example, this takes time of the order of a
 second on a modern computer, which is clearly O(n^2).  I am using DMD
 2.062.
Optimization flags if any?
Both "-O" and no-flags give quadratic behavior.
I sought after -O -inline -release In non-release mode it tests that it's sorted on exit etc. Anyway it's 8x times as fast with -release -inline for me but the progression is similarly bad.
 Well, if that does not convince you the time grows faster than n-log-n,
 maybe this comparison shall (dmd -O, Core i7-2600K  3400 MHz):

 n       T, seconds
   15_000   0.312
   30_000   1.076
   60_000   3.775
 120_000  11.247

 With n growing x2, the time grows x3 to x4 at each step.
-- Dmitry Olshansky
Apr 19 2013
prev sibling parent reply "Xinok" <xinok live.com> writes:
On Friday, 19 April 2013 at 21:03:23 UTC, Ivan Kazmenko wrote:
 Hi!

 Consider a sorted array.  Append an element that is less than 
 all the previous elements.  Then sort the array again by the 
 sort function from std.algorithm.

 ....

 With n = 30_000 as in the example, this takes time of the order 
 of a second on a modern computer, which is clearly O(n^2).  I 
 am using DMD 2.062.
I filed a bug report for this issue a year ago: http://d.puremagic.com/issues/show_bug.cgi?id=7767 I've been meaning to fix this issue myself. Time allowing, I'll do it soon.
Apr 22 2013
parent reply "bearophile" <bearophileHUGS lycos.com> writes:
Xinok:

 I've been meaning to fix this issue myself. Time allowing, I'll 
 do it soon.
Good. And if you are willing, please also take a look at the parallel sort problem I've raised: http://forum.dlang.org/thread/iyunhhsbmurqyouyrhob forum.dlang.org Bye, bearophile
Apr 22 2013
parent "Xinok" <xinok live.com> writes:
On Tuesday, 23 April 2013 at 01:28:56 UTC, bearophile wrote:
 Xinok:

 I've been meaning to fix this issue myself. Time allowing, 
 I'll do it soon.
Good. And if you are willing, please also take a look at the parallel sort problem I've raised: http://forum.dlang.org/thread/iyunhhsbmurqyouyrhob forum.dlang.org Bye, bearophile
I could, but I'm not sure what the general consensus is on std.parallel_algorithm as it's been brought up before in a similar context.
Apr 22 2013