digitalmars.D - Randomness in built-in .sort
- Bill Baxter (12/12) Jan 04 2009 It seems to me that the built-in .sort uses a randomized algorithm
- dsimcha (16/28) Jan 04 2009 IDK why sorting is builtin anyhow. I did some benchmarks a while back, ...
- Bill Baxter (15/43) Jan 04 2009 I agree. Builtin sort should probably just go away in D2.
- Andrei Alexandrescu (5/9) Jan 04 2009 I agree. I didn't have the time to implement a very good stable sort,
- dsimcha (15/24) Jan 05 2009 You could try the merge sort implementation from my dstats library at
- Don (2/28) Jan 05 2009 I want TempAlloc in dcore! Eventually, anyway.
- dsimcha (12/13) Jan 05 2009 The only problem is that, in its current form, TempAlloc is kind of dang...
- Bill Baxter (9/33) Jan 05 2009 Thanks for the pointer. Actually when I said I "found" Oskar Linde's
- dsimcha (7/11) Jan 05 2009 Am I (and possibly you) the only one(s) who think that sorting multiple ...
- Bill Baxter (14/27) Jan 05 2009 I use this all the time in NumPy in the form of "argsort" which
- bearophile (34/35) Jan 05 2009 order = npy.argsort(keys)
- bearophile (4/5) Jan 05 2009 'reordered' is better still, I've changed the name.
- Bill Baxter (10/43) Jan 05 2009 Making a permutation array like that is simple and nice, but another
- mw (8/26) Mar 23 2021 I'm now looking for np.argsort, and found this post.
- jmh530 (13/21) Mar 23 2021 import std.algorithm.sorting: makeIndex;
- Benji Smith (6/18) Jan 05 2009 I've written my own parallel-array quicksort implementation (several
- Andrei Alexandrescu (5/31) Jan 05 2009 std.algorithm's parameterized swap primitive is motivated in part by
- Walter Bright (2/5) Jan 05 2009 And may the schwartz be with your sorts!
- Andrei Alexandrescu (5/11) Jan 05 2009 A Schwartz joke has been present in
- Stewart Gordon (17/24) Jan 05 2009 A stable sort can be slower than an unstable sort. There are many cases...
- Bill Baxter (18/42) Jan 05 2009 Actually yeh, I don't really care so much if .sort is stable or not
- Walter Bright (3/6) Jan 06 2009 No, it doesn't use random sorts. The source code is included, so this
- Sean Kelly (5/11) Jan 06 2009 I don't know if it helps, but the built-in sort uses a similar algorithm
- Ary Borenszweig (5/19) Jan 06 2009 I think Bill speaks about a stable sort. You can have an unstable sort
- Walter Bright (4/8) Jan 06 2009 And I believe that the sort implementation is completely deterministic.
- Bill Baxter (7/16) Jan 06 2009 Really? Hmm, I wonder what the reason was, then. It did go away when
It seems to me that the built-in .sort uses a randomized algorithm that leads to results that differ from run-to-run when there are elements with identical keys. This seems like a bad idea to me for the built-in algorithm to have non-deterministic behavior because it means you can have bugs that aren't consistently repeatable. I think the built-in sort should be some kind of stable sort. Also the stability or lack thereof is not mentioned in the spec, and it probably should be because stability is critical for some uses of sorting. One shouldn't have to guess whether a given implementation of D will use a stable sort or not. --bb
Jan 04 2009
== Quote from Bill Baxter (wbaxter gmail.com)'s articleIt seems to me that the built-in .sort uses a randomized algorithm that leads to results that differ from run-to-run when there are elements with identical keys. This seems like a bad idea to me for the built-in algorithm to have non-deterministic behavior because it means you can have bugs that aren't consistently repeatable. I think the built-in sort should be some kind of stable sort. Also the stability or lack thereof is not mentioned in the spec, and it probably should be because stability is critical for some uses of sorting. One shouldn't have to guess whether a given implementation of D will use a stable sort or not. --bbIDK why sorting is builtin anyhow. I did some benchmarks a while back, and the builtin sort is slower than std.algorithm. IIRC the builtin sort uses function pointers for comparison, while the std.algorithm sort uses template parameters that can be inlined at compile time. I know this was mentioned before, but it didn't really get fully discussed. Why is sorting builtin? With property syntax, there's not even any syntactic sugar advantage. Furthermore, the builtin sort is substantially less flexible than a library sort. On another note, it would be nice if std.algorithm implemented a stable O(N log N) sort, like a merge sort. Right now, IIRC it uses an interesting stable swapping algorithm on top of a quick sort for O(N log^2 N) performance. This is good when space is tight, but I think in most cases a merge sort is better as a stable sort. On the other hand, if builtin sort is kept, then I agree with you: It should follow the D convention of doing the safe thing by default (stable sort with no O(N^2) corner cases) and allowing less safe behavior (unstable but possibly faster sort that may or may not have some O(N^2) corner cases) in libraries.
Jan 04 2009
On Mon, Jan 5, 2009 at 12:05 PM, dsimcha <dsimcha yahoo.com> wrote:== Quote from Bill Baxter (wbaxter gmail.com)'s articleI agree. Builtin sort should probably just go away in D2. But it's there in D1 and can't be removed now. The best we can hope for there is for the specification to be tightened up to specify that .sort must be a stable algorithm.It seems to me that the built-in .sort uses a randomized algorithm that leads to results that differ from run-to-run when there are elements with identical keys. This seems like a bad idea to me for the built-in algorithm to have non-deterministic behavior because it means you can have bugs that aren't consistently repeatable. I think the built-in sort should be some kind of stable sort. Also the stability or lack thereof is not mentioned in the spec, and it probably should be because stability is critical for some uses of sorting. One shouldn't have to guess whether a given implementation of D will use a stable sort or not. --bbIDK why sorting is builtin anyhow. I did some benchmarks a while back, and the builtin sort is slower than std.algorithm. IIRC the builtin sort uses function pointers for comparison, while the std.algorithm sort uses template parameters that can be inlined at compile time. I know this was mentioned before, but it didn't really get fully discussed. Why is sorting builtin? With property syntax, there's not even any syntactic sugar advantage. Furthermore, the builtin sort is substantially less flexible than a library sort.On another note, it would be nice if std.algorithm implemented a stable O(N log N) sort, like a merge sort. Right now, IIRC it uses an interesting stable swapping algorithm on top of a quick sort for O(N log^2 N) performance. This is good when space is tight, but I think in most cases a merge sort is better as a stable sort. On the other hand, if builtin sort is kept, then I agree with you: It should follow the D convention of doing the safe thing by default (stable sort with no O(N^2) corner cases) and allowing less safe behavior (unstable but possibly faster sort that may or may not have some O(N^2) corner cases) in libraries.For my purposes, I just found Oskar Linde's stable sort implementation (based on merge sort), and am using that now. It supports a delegate as a predicate, too, which made my code a lot simpler (got rid of a dummy struct who's only purpose was to provide the necessary opCmp I wanted). So yeh, there's not much point for built-in .sort (and .reverse)... except the ever-present "it it's easier to get CTFE working if it's built-in", because the compiler can have a special case CTFE implementation of those operations. --bb
Jan 04 2009
dsimcha wrote:On another note, it would be nice if std.algorithm implemented a stable O(N log N) sort, like a merge sort. Right now, IIRC it uses an interesting stable swapping algorithm on top of a quick sort for O(N log^2 N) performance. This is good when space is tight, but I think in most cases a merge sort is better as a stable sort.I agree. I didn't have the time to implement a very good stable sort, but I also think the extra slowness is not critical. If anyone comes with a better implementation, I'd be glad to integrate it. Andrei
Jan 04 2009
== Quote from Andrei Alexandrescu (SeeWebsiteForEmail erdani.org)'s articledsimcha wrote:You could try the merge sort implementation from my dstats library at http://dsource.org/projects/dstats/browser/trunk/sort.d. This is very heavily tested and optimized because some important higher level dstats functionality depends on it. You'd basically just have to add a template parameter to allow a user-provided swap function to make it conform to std.algorithm conventions. Obviously, since it's by its nature a stable sort, you don't really need the swapStrategy alias. One thing to note, though, is that this function is designed to allow for sorting parallel arrays in lockstep by simply passing them in as additional parameters. This adds some complexity to the implementation. Also, it uses the TempAlloc region allocator for temp space. You could put this in Phobos too (This allocator was actually your idea a while back, though you called it a SuperStack), or you could change the temp space allocation to simply use the regular heap allocation scheme.On another note, it would be nice if std.algorithm implemented a stable O(N log N) sort, like a merge sort. Right now, IIRC it uses an interesting stable swapping algorithm on top of a quick sort for O(N log^2 N) performance. This is good when space is tight, but I think in most cases a merge sort is better as a stable sort.I agree. I didn't have the time to implement a very good stable sort, but I also think the extra slowness is not critical. If anyone comes with a better implementation, I'd be glad to integrate it. Andrei
Jan 05 2009
dsimcha wrote:== Quote from Andrei Alexandrescu (SeeWebsiteForEmail erdani.org)'s articleI want TempAlloc in dcore! Eventually, anyway.dsimcha wrote:You could try the merge sort implementation from my dstats library at http://dsource.org/projects/dstats/browser/trunk/sort.d. This is very heavily tested and optimized because some important higher level dstats functionality depends on it. You'd basically just have to add a template parameter to allow a user-provided swap function to make it conform to std.algorithm conventions. Obviously, since it's by its nature a stable sort, you don't really need the swapStrategy alias. One thing to note, though, is that this function is designed to allow for sorting parallel arrays in lockstep by simply passing them in as additional parameters. This adds some complexity to the implementation. Also, it uses the TempAlloc region allocator for temp space. You could put this in Phobos too (This allocator was actually your idea a while back, though you called it a SuperStack), or you could change the temp space allocation to simply use the regular heap allocation scheme.On another note, it would be nice if std.algorithm implemented a stable O(N log N) sort, like a merge sort. Right now, IIRC it uses an interesting stable swapping algorithm on top of a quick sort for O(N log^2 N) performance. This is good when space is tight, but I think in most cases a merge sort is better as a stable sort.I agree. I didn't have the time to implement a very good stable sort, but I also think the extra slowness is not critical. If anyone comes with a better implementation, I'd be glad to integrate it. Andrei
Jan 05 2009
== Quote from Don (nospam nospam.com)'s articleI want TempAlloc in dcore! Eventually, anyway.The only problem is that, in its current form, TempAlloc is kind of dangerous because it puts the onus on the programmer to not store the only reference to a reference type in a TempAlloc-allocated block. If TempAlloc were more tightly integrated with the GC, this could be solved easily with negligible overhead. One would simply have to keep a thread-local variable that tracks how many blocks are currently in use, kind of like the stack pointer in the call stack, and a bit array that tracks whether each block that is currently in use should be scanned. However, without GC integration, just making everything scanned would cause so many false pointer issues and slowdowns when GC is run that I decided that, for now, given the anticipated use cases, making it a little dangerous and scanning nothing was the lesser of two evils.
Jan 05 2009
On Tue, Jan 6, 2009 at 12:16 AM, dsimcha <dsimcha yahoo.com> wrote:== Quote from Andrei Alexandrescu (SeeWebsiteForEmail erdani.org)'s articleThanks for the pointer. Actually when I said I "found" Oskar Linde's sorting code, that really meant I found it sitting in my source tree where I had incorporated it long ago after Oskar made it available. So I just used that.dsimcha wrote:You could try the merge sort implementation from my dstats library at http://dsource.org/projects/dstats/browser/trunk/sort.d. This is very heavily tested and optimized because some important higher level dstats functionality depends on it. You'd basically just have to add a template parameter to allow a user-provided swap function to make it conform to std.algorithm conventions. Obviously, since it's by its nature a stable sort, you don't really need the swapStrategy alias.On another note, it would be nice if std.algorithm implemented a stable O(N log N) sort, like a merge sort. Right now, IIRC it uses an interesting stable swapping algorithm on top of a quick sort for O(N log^2 N) performance. This is good when space is tight, but I think in most cases a merge sort is better as a stable sort.I agree. I didn't have the time to implement a very good stable sort, but I also think the extra slowness is not critical. If anyone comes with a better implementation, I'd be glad to integrate it. AndreiOne thing to note, though, is that this function is designed to allow for sorting parallel arrays in lockstep by simply passing them in as additional parameters. This adds some complexity to the implementation. Also, it uses the TempAlloc region allocator for temp space. You could put this in Phobos too (This allocator was actually your idea a while back, though you called it a SuperStack), or you could change the temp space allocation to simply use the regular heap allocation scheme.Actually, a function to sort multiple arrays in parallel was exactly what I was implementing using .sort. So that doesn't sound like a limitation to me at all. :-) --bb
Jan 05 2009
== Quote from Bill Baxter (wbaxter gmail.com)'s articleActually, a function to sort multiple arrays in parallel was exactly what I was implementing using .sort. So that doesn't sound like a limitation to me at all. :-) --bbAm I (and possibly you) the only one(s) who think that sorting multiple arrays in parallel should be standard library functionality? The standard rebuttal might be "use arrays of structs instead of parallel arrays". This is a good idea in some situations, but for others, parallel arrays are just plain better. Furthermore, with D's handling of variadic functions, generalizing any sort to handle parallel arrays is easy.
Jan 05 2009
On Tue, Jan 6, 2009 at 6:15 AM, dsimcha <dsimcha yahoo.com> wrote:== Quote from Bill Baxter (wbaxter gmail.com)'s articleI use this all the time in NumPy in the form of "argsort" which returns a list of indices giving the sort order. That can then be used as to index other arrays (thereby permuting them to the sorted order). order = npy.argsort(keys) sortedA = A[order] sortedB = B[order]Actually, a function to sort multiple arrays in parallel was exactly what I was implementing using .sort. So that doesn't sound like a limitation to me at all. :-) --bbAm I (and possibly you) the only one(s) who think that sorting multiple arrays in parallel should be standard library functionality?The standard rebuttal might be "use arrays of structs instead of parallel arrays". This is a good idea in some situations, but for others, parallel arrays are just plain better.Yeh, that's just a silly argument. Sometimes a column-oriented organization is more suitable than a row-oriented one. Sort can be made to work on column-oriented data, so there's no reason not to make it so.Furthermore, with D's handling of variadic functions, generalizing any sort to handle parallel arrays is easy.Yeh, this works pretty nicely. --bb
Jan 05 2009
Bill Baxter:I use this all the time in NumPy in the form of "argsort" which returns a list of indices giving the sort order. That can then be used as to index other arrays (thereby permuting them to the sorted order).order = npy.argsort(keys) sortedA = A[order] sortedB = B[order]< My dlibs already contain an efficient sorting routine, much faster than the built in one. So to my dlibs I have just added sortedIndexes and remapOrder, you can find their docs here: http://www.fantascienza.net/leonardo/so/dlibs/func.html The code: http://www.fantascienza.net/leonardo/so/libs_d.zip The usage is very simple, sortedIndexes is similar to sorted(): auto a = [-5, 2, 3, 1, -11].dup; auto order1 = a.sortedIndexes(); // now order1 == [4U, 0, 3, 1, 2] auto order2 = a.sortedIndexes((int x){return x < 0 ? -x : x;}); // Now order2 == [3U, 1, 2, 0, 4] int[2][] c = [[3, 4], [1,2], [1, 1]]; auto order3 = c.sortedIndexes(); // now order3 == [2U, 1, 0] auto a = [-5, 2, 3, 1, -11].dup; a.remapOrder([4, 0, 3, 1, 2]) ==> [-11, -5, 1, 2, 3] // a isn't changed here a.remapOrder([3, 1, 2, 0, 4]) ==> [1, 2, 3, -5, -11] auto b = ["oranges"[], "for", "apples", "is", "right"].dup; b.remapOrder([3, 1, 2, 0, 4]) ==> ["is"[], "for", "apples", "oranges", "right"].dup); (Maybe I have to rename remapOrder as remappedOrder). -------------------------------- While writing the unittest for those functions, I have seen this isn't accepted by DMD: int[2][] c = [[3, 4], [1, 2], [1, 1]].dup; While the compiler accepts this, I don't know why: int[2][] c0 = [[3, 4], [1, 2], [1, 1]]; int[2][] c2 = c0.dup; Bye, bearophile
Jan 05 2009
bearophile:(Maybe I have to rename remapOrder as remappedOrder).'reordered' is better still, I've changed the name. Bye, bearophile
Jan 05 2009
On Tue, Jan 6, 2009 at 10:18 AM, bearophile <bearophileHUGS lycos.com> wrote:Bill Baxter:Making a permutation array like that is simple and nice, but another alternative which may be more efficient is to override the "swap(i,j)" routine used by the sort implementation, and make it swap the i'th and jth elements of all the arrays that are being sorted. That can be done with O(1) extra storage instead of the O(n) required by making to make a permutation array. I think it's also hard to avoid O(n) extra memory for permuting an array. At least I can't think of how to do it with out using an extra allocation. --bbI use this all the time in NumPy in the form of "argsort" which returns a list of indices giving the sort order. That can then be used as to index other arrays (thereby permuting them to the sorted order).order = npy.argsort(keys) sortedA = A[order] sortedB = B[order]< My dlibs already contain an efficient sorting routine, much faster than the built in one. So to my dlibs I have just added sortedIndexes and remapOrder, you can find their docs here: http://www.fantascienza.net/leonardo/so/dlibs/func.html The code: http://www.fantascienza.net/leonardo/so/libs_d.zip The usage is very simple, sortedIndexes is similar to sorted(): auto a = [-5, 2, 3, 1, -11].dup; auto order1 = a.sortedIndexes(); // now order1 == [4U, 0, 3, 1, 2] auto order2 = a.sortedIndexes((int x){return x < 0 ? -x : x;}); // Now order2 == [3U, 1, 2, 0, 4] int[2][] c = [[3, 4], [1,2], [1, 1]]; auto order3 = c.sortedIndexes(); // now order3 == [2U, 1, 0] auto a = [-5, 2, 3, 1, -11].dup; a.remapOrder([4, 0, 3, 1, 2]) ==> [-11, -5, 1, 2, 3] // a isn't changed here a.remapOrder([3, 1, 2, 0, 4]) ==> [1, 2, 3, -5, -11] auto b = ["oranges"[], "for", "apples", "is", "right"].dup; b.remapOrder([3, 1, 2, 0, 4]) ==> ["is"[], "for", "apples", "oranges", "right"].dup); (Maybe I have to rename remapOrder as remappedOrder). -------------------------------- While writing the unittest for those functions, I have seen this isn't accepted by DMD: int[2][] c = [[3, 4], [1, 2], [1, 1]].dup; While the compiler accepts this, I don't know why: int[2][] c0 = [[3, 4], [1, 2], [1, 1]]; int[2][] c2 = c0.dup;
Jan 05 2009
On Tuesday, 6 January 2009 at 01:18:21 UTC, bearophile wrote:Bill Baxter:I'm now looking for np.argsort, and found this post. bearophile, the doc page above is gone, although the zip file is still there (~10 years old), do you want to create a github repo, and contribute the code to dub (https://code.dlang.org/)? BTW, does anyone know if there is a np.argsort function in some other d libraries? Thanks.I use this all the time in NumPy in the form of "argsort" which returns a list of indices giving the sort order. That can then be used as to index other arrays (thereby permuting them to the sorted order).order = npy.argsort(keys) sortedA = A[order] sortedB = B[order]< My dlibs already contain an efficient sorting routine, much faster than the built in one. So to my dlibs I have just added sortedIndexes and remapOrder, you can find their docs here: http://www.fantascienza.net/leonardo/so/dlibs/func.html The code: http://www.fantascienza.net/leonardo/so/libs_d.zip The usage is very simple, sortedIndexes is similar to sorted(): auto a = [-5, 2, 3, 1, -11].dup; auto order1 = a.sortedIndexes(); // now order1 == [4U, 0, 3, 1, 2]
Mar 23 2021
On Tuesday, 23 March 2021 at 21:22:25 UTC, mw wrote:[snip] I'm now looking for np.argsort, and found this post. bearophile, the doc page above is gone, although the zip file is still there (~10 years old), do you want to create a github repo, and contribute the code to dub (https://code.dlang.org/)? BTW, does anyone know if there is a np.argsort function in some other d libraries? Thanks.import std.algorithm.sorting: makeIndex; import std.algorithm.comparison: equal; import std.range: indexed; void main() { int[] arr0 = [ 2, 3, 1, 5, 0 ]; int[] arr1 = [ 1, 5, 4, 2, -1 ]; auto index = new int[arr0.length]; makeIndex!("a < b")(arr0, index); assert(index == [4, 2, 0, 1, 3]); auto view = arr1.indexed(index); assert(view.equal([-1, 4, 1, 5, 2])); }
Mar 23 2021
dsimcha wrote:== Quote from Bill Baxter (wbaxter gmail.com)'s articleI've written my own parallel-array quicksort implementation (several times over, in many different languages). Parallel sorting is one of my favorite tricks, and I think it definitely belongs in the standard library. --benjiActually, a function to sort multiple arrays in parallel was exactly what I was implementing using .sort. So that doesn't sound like a limitation to me at all. :-) --bbAm I (and possibly you) the only one(s) who think that sorting multiple arrays in parallel should be standard library functionality? The standard rebuttal might be "use arrays of structs instead of parallel arrays". This is a good idea in some situations, but for others, parallel arrays are just plain better. Furthermore, with D's handling of variadic functions, generalizing any sort to handle parallel arrays is easy.
Jan 05 2009
Benji Smith wrote:dsimcha wrote:std.algorithm's parameterized swap primitive is motivated in part by parallel array manipulation. See the schwartz routines in there for examples. Andrei== Quote from Bill Baxter (wbaxter gmail.com)'s articleI've written my own parallel-array quicksort implementation (several times over, in many different languages). Parallel sorting is one of my favorite tricks, and I think it definitely belongs in the standard library. --benjiActually, a function to sort multiple arrays in parallel was exactly what I was implementing using .sort. So that doesn't sound like a limitation to me at all. :-) --bbAm I (and possibly you) the only one(s) who think that sorting multiple arrays in parallel should be standard library functionality? The standard rebuttal might be "use arrays of structs instead of parallel arrays". This is a good idea in some situations, but for others, parallel arrays are just plain better. Furthermore, with D's handling of variadic functions, generalizing any sort to handle parallel arrays is easy.
Jan 05 2009
Andrei Alexandrescu wrote:std.algorithm's parameterized swap primitive is motivated in part by parallel array manipulation. See the schwartz routines in there for examples.And may the schwartz be with your sorts!
Jan 05 2009
Walter Bright wrote:Andrei Alexandrescu wrote:A Schwartz joke has been present in http://digitalmars.com/d/2.0/phobos/std_algorithm.html since its early days. I am appalled nobody has noticed it. Andreistd.algorithm's parameterized swap primitive is motivated in part by parallel array manipulation. See the schwartz routines in there for examples.And may the schwartz be with your sorts!
Jan 05 2009
Bill Baxter wrote: <snip>I think the built-in sort should be some kind of stable sort. Also the stability or lack thereof is not mentioned in the spec, and it probably should be because stability is critical for some uses of sorting. One shouldn't have to guess whether a given implementation of D will use a stable sort or not. --bbA stable sort can be slower than an unstable sort. There are many cases where the order of equally-ranking keys does not matter or it has no effect whatsoever (e.g. sorting an array of integers). Then why take the extra overhead of making the sort stable? We could have an extra property .stableSort, for those cases where you need stability. Of course, it would be valid to implement .sort and .stableSort with the same algorithm, though to do so would be naive unless someone comes up with a stable sorting algorithm with O(n log n) or better time complexity and O(log n) or better extra memory requirement. But in the absence of such an algorithm, the compiler could still replace .stableSort with .sort in those cases where it can guarantee it will make no difference to the end result. Or you could argue that requiring sort to be stable is sufficiently uncommon that it might as well be left to a library function. Stewart.
Jan 05 2009
On Mon, Jan 5, 2009 at 8:40 PM, Stewart Gordon <smjg_1998 yahoo.com> wrote:Bill Baxter wrote: <snip>I think the built-in sort should be some kind of stable sort. Also the stability or lack thereof is not mentioned in the spec, and it probably should be because stability is critical for some uses of sorting. One shouldn't have to guess whether a given implementation of D will use a stable sort or not. --bbA stable sort can be slower than an unstable sort. There are many cases where the order of equally-ranking keys does not matter or it has no effect whatsoever (e.g. sorting an array of integers). Then why take the extra overhead of making the sort stable?We could have an extra property .stableSort, for those cases where you need stability. Of course, it would be valid to implement .sort and .stableSort with the same algorithm, though to do so would be naive unless someone comes up with a stable sorting algorithm with O(n log n) or better time complexity and O(log n) or better extra memory requirement. But in the absence of such an algorithm, the compiler could still replace .stableSort with .sort in those cases where it can guarantee it will make no difference to the end result.Actually yeh, I don't really care so much if .sort is stable or not (as long as it's clear in the spec what to expect). But I do care that it is deterministic. The current implementation seems to use some randomness (random partitions?) so if I run my program 5 times I get 5 different sort orders when there are duplicated keys. This makes it really hard to debug things that only happen with certain orders.Or you could argue that requiring sort to be stable is sufficiently uncommon that it might as well be left to a library function.There is no one sort algorithm that is fastest for all inputs. So it would make more sense to me to argue that if you want the fastest possible sort for your problem then you should go with a library solution. Built-ins should be efficient, yes, but to argue that efficiency is more important than generality in a built-in seems wrong to me. A least for .sort. There's no performance benefit to having a built-in sort, so it's really only there for convenience. Using a stable sort would make .sort applicable to more problems, thus increasing the integral of convenience. --bb
Jan 05 2009
Bill Baxter wrote:It seems to me that the built-in .sort uses a randomized algorithm that leads to results that differ from run-to-run when there are elements with identical keys.No, it doesn't use random sorts. The source code is included, so this should be easy to verify.
Jan 06 2009
== Quote from Walter Bright (newshound1 digitalmars.com)'s articleBill Baxter wrote:I don't know if it helps, but the built-in sort uses a similar algorithm to the sort routine in tango.core.Array. It picks a "median of 3" value for the pivot. As Walter said, I don't believe any randomness is involved. SeanIt seems to me that the built-in .sort uses a randomized algorithm that leads to results that differ from run-to-run when there are elements with identical keys.No, it doesn't use random sorts. The source code is included, so this should be easy to verify.
Jan 06 2009
Sean Kelly wrote:== Quote from Walter Bright (newshound1 digitalmars.com)'s articleI think Bill speaks about a stable sort. You can have an unstable sort algorithm without having explicity a random invocation. Note that he's saying "leads to results that differ from run-to-run when there are elements with identical keys".Bill Baxter wrote:I don't know if it helps, but the built-in sort uses a similar algorithm to the sort routine in tango.core.Array. It picks a "median of 3" value for the pivot. As Walter said, I don't believe any randomness is involved. SeanIt seems to me that the built-in .sort uses a randomized algorithm that leads to results that differ from run-to-run when there are elements with identical keys.No, it doesn't use random sorts. The source code is included, so this should be easy to verify.
Jan 06 2009
Ary Borenszweig wrote:I think Bill speaks about a stable sort. You can have an unstable sort algorithm without having explicity a random invocation. Note that he's saying "leads to results that differ from run-to-run when there are elements with identical keys".And I believe that the sort implementation is completely deterministic. If Bill is getting different results from different runs, something else is going on.
Jan 06 2009
On Wed, Jan 7, 2009 at 4:09 AM, Walter Bright <newshound1 digitalmars.com> wrote:Ary Borenszweig wrote:Really? Hmm, I wonder what the reason was, then. It did go away when I switched to a different sort routine. Don't have time to investigate right now, unfortunately. But thank you very much for the info that it should be deterministic. --bbI think Bill speaks about a stable sort. You can have an unstable sort algorithm without having explicity a random invocation. Note that he's saying "leads to results that differ from run-to-run when there are elements with identical keys".And I believe that the sort implementation is completely deterministic. If Bill is getting different results from different runs, something else is going on.
Jan 06 2009