digitalmars.D - Another algo for faster sorting

Dmitry Olshansky (4/4) Apr 07 2016 Coincidentally with another NG thread I'm curious if we can

Andrea Fontana (9/13) Apr 07 2016 Radix sort is a very fast way to sort strings and integers . But

tsbockman (25/27) Apr 07 2016 I was planning to respond to your original question, I just

Andrea Fontana (15/23) Apr 08 2016 Ditto

tsbockman (12/16) Apr 08 2016 There are tons of algorithms in Phobos that *could* be made
tsbockman (9/11) Apr 08 2016 Also, your response makes it sound like I was advocating for just

deadalnix (6/16) Apr 08 2016 You can make it work with float as well with some horrible hacks.

Andrea Fontana (7/26) Apr 08 2016 Using your functions to radixify float this seems to run faster

Andrei Alexandrescu (4/31) Apr 08 2016 Guess we need a pull request. In order to distinguish the "less than"

Andrea Fontana (4/8) Apr 08 2016 Why not a sort with SortAscending.yes/no or an enum as

Jonathan Villa (12/16) Apr 07 2016 The other day I had problems using Quicksort (trying to sorting

tsbockman (8/12) Apr 07 2016 It would be cool to have a shell sort implementation available

Dmitry Olshansky <dmitry.olsh gmail.com> writes:

Coincidentally with another NG thread I'm curious if we can 
special-case our sort for
strings to Three-Way Radix QuickSort which is more efficient:

http://www.drdobbs.com/database/sorting-strings-with-three-way-radix-qui/184410724

Apr 07 2016

Andrea Fontana <nospam example.com> writes:

On Thursday, 7 April 2016 at 10:58:21 UTC, Dmitry Olshansky wrote:
 Coincidentally with another NG thread I'm curious if we can 
 special-case our sort for
 strings to Three-Way Radix QuickSort which is more efficient:

 http://www.drdobbs.com/database/sorting-strings-with-three-way-radix-qui/184410724

Radix sort is a very fast way to sort strings and integers . But 
it does not work with a custom " less" function . It just sorts 
date to lexical way .

If phobos had a lexicalSort ( T [ ] , Ascending.YES ) ; probably 
a lot of optimizations could be done for many data types.

Anyway my topic about sort optimization seems not to have a good 
luck if not for benchmark inaccuracy :)


Andrea

Apr 07 2016

tsbockman <thomas.bockman gmail.com> writes:

On Thursday, 7 April 2016 at 13:32:43 UTC, Andrea Fontana wrote:
 Anyway my topic about sort optimization seems not to have a 
 good luck if not for benchmark inaccuracy :)

I was planning to respond to your original question, I just 
didn't have time last night...

I don't think that the simple "counting sort" you demonstrated in 
the other thread is particularly useful to have in Phobos, by 
itself:
     - It doesn't scale well at all to higher-entropy element 
types, like `int` or `string`.
     - It only works on value types, not reference types.
     - It's so trivial to implement, that anyone who really needs 
it can write it themselves.

However, what might be really useful is to have a good generic 
implementation of "bucket sort". Using a user-supplied fast hash 
function whose output follows the same ordering as the original 
elements:
     1. Move/copy each element into a bucket list chosen by the 
hash function.
     2. Sort each individual bucket list using another algorithm.
     3. Iterate through all the buckets lists, and move/copy each 
element to the output.

With good design, it should be possible to express counting sort 
for small value types like `bool` and `byte` simply as a 
parameterization of bucket sort. (This would require that the 
bucket list type be parameterized, in addition to the hash 
function.)

Apr 07 2016

Andrea Fontana <nospam example.com> writes:

On Thursday, 7 April 2016 at 18:15:32 UTC, tsbockman wrote:
 I don't think that the simple "counting sort" you demonstrated 
 in the other thread is particularly useful to have in Phobos, 
 by itself:
     - It doesn't scale well at all to higher-entropy element 
 types, like `int` or `string`.

I mean to replace sort() on phobos if T == byte || T == ...

     - It only works on value types, not reference types.

Ditto

     - It's so trivial to implement, that anyone who really 
 needs it can write it themselves.

Ok, but it's not a good reason to keep an inefficient sort() on 
phobos. I would use algorithms in phobos without thinking if I 
have to reimplement them or not. If not we should implement

... sort(T)(T[]) if (T == byte) assert(0, "hey this is a bad 
implementation but is trivial to implement the right one"); :)

Moreover: many algorithms are generic in phobos and in D 
ecosystem, and it could happen that byte's sort (or even bool) 
isn't used directly, but instead inside other algorithms. Let's 
say we need to write a burrows-wheeler transform for bzip2 and we 
need to sort byte... Won't you use the default sort function for 
it?

Andrea

Apr 08 2016

tsbockman <thomas.bockman gmail.com> writes:

On Friday, 8 April 2016 at 07:28:38 UTC, Andrea Fontana wrote:
 I mean to replace sort() on phobos if T == byte || T == ...

I know that's what you meant.

 Ok, but it's not a good reason to keep an inefficient sort() on 
 phobos. I would use algorithms in phobos without thinking if I 
 have to reimplement them or not.

There are tons of algorithms in Phobos that *could* be made 
faster for some combinations of inputs, but it doesn't make sense 
to add every possible template specialization because of the 
long-term maintenance costs.

My point is that I doubt specializing `sort()` for `bool` and 
`byte` is actually worth the costs. That's just my opinion 
though, and I could be swayed easily enough given some evidence 
of a specific real-world need for the optimization.

In any case, you are free to submit a Phobos pull request if you 
think it's worth it.

Apr 08 2016

tsbockman <thomas.bockman gmail.com> writes:

On Friday, 8 April 2016 at 07:28:38 UTC, Andrea Fontana wrote:
 Ok, but it's not a good reason to keep an inefficient sort() on 
 phobos...

Also, your response makes it sound like I was advocating for just 
keeping everything like it is now.

But, my main point was actually that counting sort is just a 
variant of bucket sort, and it would (in my opinion) make more 
sense to write a good generic bucket sort implementation. This 
would be applicable to a much wider range of use cases, and could 
still be used to speed up `sort()` for `bool`, `byte`, and 
`ubyte` if needed.

Apr 08 2016

deadalnix <deadalnix gmail.com> writes:

On Thursday, 7 April 2016 at 13:32:43 UTC, Andrea Fontana wrote:
 On Thursday, 7 April 2016 at 10:58:21 UTC, Dmitry Olshansky 
 wrote:
 Coincidentally with another NG thread I'm curious if we can 
 special-case our sort for
 strings to Three-Way Radix QuickSort which is more efficient:

 http://www.drdobbs.com/database/sorting-strings-with-three-way-radix-qui/184410724

 Radix sort is a very fast way to sort strings and integers . 
 But it does not work with a custom " less" function . It just 
 sorts date to lexical way .

You can make it work with float as well with some horrible hacks. 
I tried to implement it, but performances were not that great at 
the end. Maybe I missed something, or maybe this is just not CPU 
friendly enough for the sample I had to throw at it.

See: https://github.com/deadalnix/Dsort/blob/master/sort/radix.d

Apr 08 2016

Andrea Fontana <nospam example.com> writes:

On Friday, 8 April 2016 at 07:33:32 UTC, deadalnix wrote:
 On Thursday, 7 April 2016 at 13:32:43 UTC, Andrea Fontana wrote:
 On Thursday, 7 April 2016 at 10:58:21 UTC, Dmitry Olshansky 
 wrote:
 Coincidentally with another NG thread I'm curious if we can 
 special-case our sort for
 strings to Three-Way Radix QuickSort which is more efficient:

 http://www.drdobbs.com/database/sorting-strings-with-three-way-radix-qui/184410724

 Radix sort is a very fast way to sort strings and integers . 
 But it does not work with a custom " less" function . It just 
 sorts date to lexical way .

 You can make it work with float as well with some horrible 
 hacks. I tried to implement it, but performances were not that 
 great at the end. Maybe I missed something, or maybe this is 
 just not CPU friendly enough for the sample I had to throw at 
 it.

 See: https://github.com/deadalnix/Dsort/blob/master/sort/radix.d

Using your functions to radixify float this seems to run faster 
for me than phobos sort (for large arrays 2x, 3x it seems... not 
so many benchmarks)

http://dpaste.dzfl.pl/1b4752ff37b1

I don't optimize it that much. Just did a try.

Andrea

Apr 08 2016

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 4/8/16 6:06 AM, Andrea Fontana wrote:
 On Friday, 8 April 2016 at 07:33:32 UTC, deadalnix wrote:
 On Thursday, 7 April 2016 at 13:32:43 UTC, Andrea Fontana wrote:
 On Thursday, 7 April 2016 at 10:58:21 UTC, Dmitry Olshansky wrote:
 Coincidentally with another NG thread I'm curious if we can
 special-case our sort for
 strings to Three-Way Radix QuickSort which is more efficient:

 http://www.drdobbs.com/database/sorting-strings-with-three-way-radix-qui/184410724

 Radix sort is a very fast way to sort strings and integers . But it
 does not work with a custom " less" function . It just sorts date to
 lexical way .

 You can make it work with float as well with some horrible hacks. I
 tried to implement it, but performances were not that great at the
 end. Maybe I missed something, or maybe this is just not CPU friendly
 enough for the sample I had to throw at it.

 See: https://github.com/deadalnix/Dsort/blob/master/sort/radix.d

 Using your functions to radixify float this seems to run faster for me
 than phobos sort (for large arrays 2x, 3x it seems... not so many
 benchmarks)

 http://dpaste.dzfl.pl/1b4752ff37b1

 I don't optimize it that much. Just did a try.

 Andrea

Guess we need a pull request. In order to distinguish the "less than" 
(default) predicate from arbitrary predicates, just define two overloads 
of sort, one without the predicate parameter. -- Andrei

Apr 08 2016

Andrea Fontana <nospam example.com> writes:

On Friday, 8 April 2016 at 13:35:53 UTC, Andrei Alexandrescu 
wrote:
 Guess we need a pull request. In order to distinguish the "less 
 than" (default) predicate from arbitrary predicates, just 
 define two overloads of sort, one without the predicate 
 parameter. -- Andrei

Why not a sort with SortAscending.yes/no or an enum as 
(template?) param?

Apr 08 2016

Jonathan Villa <jv_vortex msn.com> writes:

On Thursday, 7 April 2016 at 10:58:21 UTC, Dmitry Olshansky wrote:
 Coincidentally with another NG thread I'm curious if we can 
 special-case our sort for
 strings to Three-Way Radix QuickSort which is more efficient:

 http://www.drdobbs.com/database/sorting-strings-with-three-way-radix-qui/184410724

The other day I had problems using Quicksort (trying to sorting 
100_000 doubles) causing stackoverflow for so many recursive 
calls (using .NET), then searching for other sorting algorithms I 
found SHELL-SORT, it's not recursive and it ended being even 
faster than Quicksort (what the heck? xd, well probably the JIT 
compiler).

Now you want to sort strings, I don't know in that case, but 
maybe that can be useful, who knows? c:

Here's the link where I got the algorithm (is in visual basic D:)
https://support.microsoft.com/en-us/kb/169617

JV

Apr 07 2016

tsbockman <thomas.bockman gmail.com> writes:

On Thursday, 7 April 2016 at 16:23:31 UTC, Jonathan Villa wrote:
 ...then searching for other sorting algorithms I found 
 SHELL-SORT, it's not recursive and it ended being even faster 
 than Quicksort (what the heck? xd, well probably the JIT 
 compiler).

It would be cool to have a shell sort implementation available 
for small data sets. It's a bad choice in the general case 
though, because its asymptotic scaling is worse than the standard 
O(N log(N)) achieved by stuff like Timsort.

Quicksort is actually bad too, because its worst case asymptotic 
complexity is an embarrassing O(N^2), which makes it highly 
vulnerable to denial-of-service attacks.

Apr 07 2016

D Programming

C/C++ Programming

Other

digitalmars.D - Another algo for faster sorting