digitalmars.D - Another algo for faster sorting
- Dmitry Olshansky (4/4) Apr 07 2016 Coincidentally with another NG thread I'm curious if we can
- Andrea Fontana (9/13) Apr 07 2016 Radix sort is a very fast way to sort strings and integers . But
- tsbockman (25/27) Apr 07 2016 I was planning to respond to your original question, I just
- Andrea Fontana (15/23) Apr 08 2016 Ditto
- deadalnix (6/16) Apr 08 2016 You can make it work with float as well with some horrible hacks.
- Andrea Fontana (7/26) Apr 08 2016 Using your functions to radixify float this seems to run faster
- Andrei Alexandrescu (4/31) Apr 08 2016 Guess we need a pull request. In order to distinguish the "less than"
- Andrea Fontana (4/8) Apr 08 2016 Why not a sort with SortAscending.yes/no or an enum as
- Jonathan Villa (12/16) Apr 07 2016 The other day I had problems using Quicksort (trying to sorting
- tsbockman (8/12) Apr 07 2016 It would be cool to have a shell sort implementation available
Coincidentally with another NG thread I'm curious if we can special-case our sort for strings to Three-Way Radix QuickSort which is more efficient: http://www.drdobbs.com/database/sorting-strings-with-three-way-radix-qui/184410724
Apr 07 2016
On Thursday, 7 April 2016 at 10:58:21 UTC, Dmitry Olshansky wrote:Coincidentally with another NG thread I'm curious if we can special-case our sort for strings to Three-Way Radix QuickSort which is more efficient: http://www.drdobbs.com/database/sorting-strings-with-three-way-radix-qui/184410724Radix sort is a very fast way to sort strings and integers . But it does not work with a custom " less" function . It just sorts date to lexical way . If phobos had a lexicalSort ( T [ ] , Ascending.YES ) ; probably a lot of optimizations could be done for many data types. Anyway my topic about sort optimization seems not to have a good luck if not for benchmark inaccuracy :) Andrea
Apr 07 2016
On Thursday, 7 April 2016 at 13:32:43 UTC, Andrea Fontana wrote:Anyway my topic about sort optimization seems not to have a good luck if not for benchmark inaccuracy :)I was planning to respond to your original question, I just didn't have time last night... I don't think that the simple "counting sort" you demonstrated in the other thread is particularly useful to have in Phobos, by itself: - It doesn't scale well at all to higher-entropy element types, like `int` or `string`. - It only works on value types, not reference types. - It's so trivial to implement, that anyone who really needs it can write it themselves. However, what might be really useful is to have a good generic implementation of "bucket sort". Using a user-supplied fast hash function whose output follows the same ordering as the original elements: 1. Move/copy each element into a bucket list chosen by the hash function. 2. Sort each individual bucket list using another algorithm. 3. Iterate through all the buckets lists, and move/copy each element to the output. With good design, it should be possible to express counting sort for small value types like `bool` and `byte` simply as a parameterization of bucket sort. (This would require that the bucket list type be parameterized, in addition to the hash function.)
Apr 07 2016
On Thursday, 7 April 2016 at 18:15:32 UTC, tsbockman wrote:I don't think that the simple "counting sort" you demonstrated in the other thread is particularly useful to have in Phobos, by itself: - It doesn't scale well at all to higher-entropy element types, like `int` or `string`.I mean to replace sort() on phobos if T == byte || T == ...- It only works on value types, not reference types.Ditto- It's so trivial to implement, that anyone who really needs it can write it themselves.Ok, but it's not a good reason to keep an inefficient sort() on phobos. I would use algorithms in phobos without thinking if I have to reimplement them or not. If not we should implement ... sort(T)(T[]) if (T == byte) assert(0, "hey this is a bad implementation but is trivial to implement the right one"); :) Moreover: many algorithms are generic in phobos and in D ecosystem, and it could happen that byte's sort (or even bool) isn't used directly, but instead inside other algorithms. Let's say we need to write a burrows-wheeler transform for bzip2 and we need to sort byte... Won't you use the default sort function for it? Andrea
Apr 08 2016
On Friday, 8 April 2016 at 07:28:38 UTC, Andrea Fontana wrote:I mean to replace sort() on phobos if T == byte || T == ...I know that's what you meant.Ok, but it's not a good reason to keep an inefficient sort() on phobos. I would use algorithms in phobos without thinking if I have to reimplement them or not.There are tons of algorithms in Phobos that *could* be made faster for some combinations of inputs, but it doesn't make sense to add every possible template specialization because of the long-term maintenance costs. My point is that I doubt specializing `sort()` for `bool` and `byte` is actually worth the costs. That's just my opinion though, and I could be swayed easily enough given some evidence of a specific real-world need for the optimization. In any case, you are free to submit a Phobos pull request if you think it's worth it.
Apr 08 2016
On Friday, 8 April 2016 at 07:28:38 UTC, Andrea Fontana wrote:Ok, but it's not a good reason to keep an inefficient sort() on phobos...Also, your response makes it sound like I was advocating for just keeping everything like it is now. But, my main point was actually that counting sort is just a variant of bucket sort, and it would (in my opinion) make more sense to write a good generic bucket sort implementation. This would be applicable to a much wider range of use cases, and could still be used to speed up `sort()` for `bool`, `byte`, and `ubyte` if needed.
Apr 08 2016
On Thursday, 7 April 2016 at 13:32:43 UTC, Andrea Fontana wrote:On Thursday, 7 April 2016 at 10:58:21 UTC, Dmitry Olshansky wrote:You can make it work with float as well with some horrible hacks. I tried to implement it, but performances were not that great at the end. Maybe I missed something, or maybe this is just not CPU friendly enough for the sample I had to throw at it. See: https://github.com/deadalnix/Dsort/blob/master/sort/radix.dCoincidentally with another NG thread I'm curious if we can special-case our sort for strings to Three-Way Radix QuickSort which is more efficient: http://www.drdobbs.com/database/sorting-strings-with-three-way-radix-qui/184410724Radix sort is a very fast way to sort strings and integers . But it does not work with a custom " less" function . It just sorts date to lexical way .
Apr 08 2016
On Friday, 8 April 2016 at 07:33:32 UTC, deadalnix wrote:On Thursday, 7 April 2016 at 13:32:43 UTC, Andrea Fontana wrote:Using your functions to radixify float this seems to run faster for me than phobos sort (for large arrays 2x, 3x it seems... not so many benchmarks) http://dpaste.dzfl.pl/1b4752ff37b1 I don't optimize it that much. Just did a try. AndreaOn Thursday, 7 April 2016 at 10:58:21 UTC, Dmitry Olshansky wrote:You can make it work with float as well with some horrible hacks. I tried to implement it, but performances were not that great at the end. Maybe I missed something, or maybe this is just not CPU friendly enough for the sample I had to throw at it. See: https://github.com/deadalnix/Dsort/blob/master/sort/radix.dCoincidentally with another NG thread I'm curious if we can special-case our sort for strings to Three-Way Radix QuickSort which is more efficient: http://www.drdobbs.com/database/sorting-strings-with-three-way-radix-qui/184410724Radix sort is a very fast way to sort strings and integers . But it does not work with a custom " less" function . It just sorts date to lexical way .
Apr 08 2016
On 4/8/16 6:06 AM, Andrea Fontana wrote:On Friday, 8 April 2016 at 07:33:32 UTC, deadalnix wrote:Guess we need a pull request. In order to distinguish the "less than" (default) predicate from arbitrary predicates, just define two overloads of sort, one without the predicate parameter. -- AndreiOn Thursday, 7 April 2016 at 13:32:43 UTC, Andrea Fontana wrote:Using your functions to radixify float this seems to run faster for me than phobos sort (for large arrays 2x, 3x it seems... not so many benchmarks) http://dpaste.dzfl.pl/1b4752ff37b1 I don't optimize it that much. Just did a try. AndreaOn Thursday, 7 April 2016 at 10:58:21 UTC, Dmitry Olshansky wrote:You can make it work with float as well with some horrible hacks. I tried to implement it, but performances were not that great at the end. Maybe I missed something, or maybe this is just not CPU friendly enough for the sample I had to throw at it. See: https://github.com/deadalnix/Dsort/blob/master/sort/radix.dCoincidentally with another NG thread I'm curious if we can special-case our sort for strings to Three-Way Radix QuickSort which is more efficient: http://www.drdobbs.com/database/sorting-strings-with-three-way-radix-qui/184410724Radix sort is a very fast way to sort strings and integers . But it does not work with a custom " less" function . It just sorts date to lexical way .
Apr 08 2016
On Friday, 8 April 2016 at 13:35:53 UTC, Andrei Alexandrescu wrote:Guess we need a pull request. In order to distinguish the "less than" (default) predicate from arbitrary predicates, just define two overloads of sort, one without the predicate parameter. -- AndreiWhy not a sort with SortAscending.yes/no or an enum as (template?) param?
Apr 08 2016
On Thursday, 7 April 2016 at 10:58:21 UTC, Dmitry Olshansky wrote:Coincidentally with another NG thread I'm curious if we can special-case our sort for strings to Three-Way Radix QuickSort which is more efficient: http://www.drdobbs.com/database/sorting-strings-with-three-way-radix-qui/184410724The other day I had problems using Quicksort (trying to sorting 100_000 doubles) causing stackoverflow for so many recursive calls (using .NET), then searching for other sorting algorithms I found SHELL-SORT, it's not recursive and it ended being even faster than Quicksort (what the heck? xd, well probably the JIT compiler). Now you want to sort strings, I don't know in that case, but maybe that can be useful, who knows? c: Here's the link where I got the algorithm (is in visual basic D:) https://support.microsoft.com/en-us/kb/169617 JV
Apr 07 2016
On Thursday, 7 April 2016 at 16:23:31 UTC, Jonathan Villa wrote:...then searching for other sorting algorithms I found SHELL-SORT, it's not recursive and it ended being even faster than Quicksort (what the heck? xd, well probably the JIT compiler).It would be cool to have a shell sort implementation available for small data sets. It's a bad choice in the general case though, because its asymptotic scaling is worse than the standard O(N log(N)) achieved by stuff like Timsort. Quicksort is actually bad too, because its worst case asymptotic complexity is an embarrassing O(N^2), which makes it highly vulnerable to denial-of-service attacks.
Apr 07 2016