## digitalmars.D - qsort

- Mike (2/2) Jul 13 2005 has anyone written a qsort() for D? We need one that supports delegates.
- Andrew Fedoniouk (91/93) Jul 13 2005 harmonia/tools.d
- Uwe Salomon (5/8) Jul 13 2005 Did you ever test it with input data like this:
- Andrew Fedoniouk (46/53) Jul 13 2005 I did.
- Uwe Salomon (13/15) Jul 13 2005 You can also look into Indigo. List has a very good QuickSort, but it is...
- Jan-Eric Duden (6/121) Jul 14 2005 Your Qsort has one draw-back its O(N^2) (worst-case). You can fix this,
- Ben Hinkle (7/14) Jul 14 2005 Have you analyzed why it is faster? I'd be curious. My first guess from
- Uwe Salomon (7/12) Jul 14 2005 That should be the reason. Function calls are quite expensive. I have
- Andrew Fedoniouk (29/47) Jul 14 2005 It is all about function calls:

has anyone written a qsort() for D? We need one that supports delegates. If not I will probably have to write one myself.

Jul 13 2005

"Mike" <Mike_member pathlink.com> wrote in message news:db34a9$9hs$1 digitaldaemon.com...has anyone written a qsort() for D? We need one that supports delegates. If not I will probably have to write one myself.harmonia/tools.d // "smart" sort implementation, works 1.3 times faster than D builtin sort template sort(T) { void sort(T[] arr, int function(in T l, in T r) cmp) { const int quick_sort_threshold = 9; if(arr.length < 2) return; void swap_if_less( inout T t1, inout T t2 ) { if( cmp(t1, t2) < 0 ) { T t = t1; t1 = t2; t2 = t; } } void swap( inout T t1, inout T t2 ) { T t = t1; t1 = t2; t2 = t; } int stack[80]; int* top = stack; int limit = arr.length; int base = 0; for(;;) { int len = limit - base; int i; int j; int pivot; if(len > quick_sort_threshold) { // we use base + len/2 as the pivot pivot = base + len / 2; swap(arr[base], arr[pivot]); i = base + 1; j = limit - 1; // now ensure that *i <= *base <= *j swap_if_less(arr[j], arr[i]); swap_if_less(arr[base],arr[i]); swap_if_less(arr[j],arr[base]); for(;;) { do i++; while( cmp(arr[i],arr[base]) < 0 ); do j--; while( cmp(arr[base],arr[j]) < 0 ); if( i > j ) { break; } swap(arr[i], arr[j]); } swap(arr[base], arr[j]); // now, push the largest sub-array if(j - base > limit - i) { top[0] = base; top[1] = j; base = i; } else { top[0] = i; top[1] = limit; limit = j; } top += 2; } else { // the sub-array is small, perform insertion sort j = base; i = j + 1; for(; i < limit; j = i, i++) { for(; cmp(arr[j + 1],arr[j]) < 0; j--) { swap(arr[j + 1],arr[j]); if(j == base) break; } } if(top > stack) { top -= 2; base = top[0]; limit = top[1]; } else break; } } } }

Jul 13 2005

harmonia/tools.d // "smart" sort implementation, works 1.3 times faster than D builtin sortDid you ever test it with input data like this: 1, 2, 3, 4, 5, ..., 99998, 99999, 100000, 13, 12, 9, -1, 7, 6 That means a lot of sorted elements, and a bunch of unsorted afterwards. Ciao uwe

Jul 13 2005

"Uwe Salomon" <post uwesalomon.de> wrote in message news:op.stvdh7qm6yjbe6 sandmann.maerchenwald.net...I did. 'sort!' is mine, 'sort' is D's builtin. less number - the better. sequence from 1 to 1_000_000 sort! - 203 ms sort - 265 ms takes 76% of D time sequence from 1_000_000 to 1 sort! - 125 ms sort - 171 ms takes 71% of D time ------------------------------------------------------------- Theory is here: http://www.seeingwithc.org/topic2html.html ------------------------------------------------------------- ATTN!!!: As algorithms like this belong to the whole humanity then my implementation is completely free :-P ------------------------------------------------------------ //1, 2, 3, 4, 5, ..., 99998, 99999, 100000, 13, 12, 9, -1, 7, 6 int icmp(in int l, in int r) { return l - r; } int main(char[][] args) { ProcessTimesCounter counter = new ProcessTimesCounter(); const uint size = 1000000; int[] data = new int[size]; for(uint i = 0; i < size; ++i) data[i] = i; data[size - 6] = 13; data[size - 5] = 12; data[size - 4] = 9; data[size - 3] = -1; data[size - 2] = 7; data[size - 1] = 6; counter.start(); sort!(int)(data,&icmp); //data.sort; counter.stop(); ProcessTimesCounter.interval_type ms1 = counter.milliseconds(); writef("ms=%d\n",ms1); return 0; } Andrew. http://terrainformatica.comharmonia/tools.d // "smart" sort implementation, works 1.3 times faster than D builtin sortDid you ever test it with input data like this: 1, 2, 3, 4, 5, ..., 99998, 99999, 100000, 13, 12, 9, -1, 7, 6 That means a lot of sorted elements, and a bunch of unsorted afterwards.

Jul 13 2005

// "smart" sort implementation, works 1.3 times faster than D builtin sortYou can also look into Indigo. List has a very good QuickSort, but it is a little bit difficult to read because of the specialties of the List implementation, and it does not support special sorting functions. But i have included such a function for Vector, and after changing it to receive the comparison function as a template argument instead of a regular parameter, it needs 70% of the time of the D builtin sort. And it has the great advantage that it is linear for radix sort cases (i.e. a lot of elements, but only few different values: 3 3 4 5 2 2 2 2 3 4 7 5 3 4 5 2 2 2 etc.). But keep in mind: The library is LGPL'd, and as it is a derivative work, your program has to be conformant to the GPL. Ciao uwe

Jul 13 2005

Andrew Fedoniouk wrote:"Mike" <Mike_member pathlink.com> wrote in message news:db34a9$9hs$1 digitaldaemon.com...Your Qsort has one draw-back its O(N^2) (worst-case). You can fix this, by switching to heap sort if your stack gets too large. :) Regards, Jan-Erichas anyone written a qsort() for D? We need one that supports delegates. If not I will probably have to write one myself.harmonia/tools.d // "smart" sort implementation, works 1.3 times faster than D builtin sort template sort(T) { void sort(T[] arr, int function(in T l, in T r) cmp) { const int quick_sort_threshold = 9; if(arr.length < 2) return; void swap_if_less( inout T t1, inout T t2 ) { if( cmp(t1, t2) < 0 ) { T t = t1; t1 = t2; t2 = t; } } void swap( inout T t1, inout T t2 ) { T t = t1; t1 = t2; t2 = t; } int stack[80]; int* top = stack; int limit = arr.length; int base = 0; for(;;) { int len = limit - base; int i; int j; int pivot; if(len > quick_sort_threshold) { // we use base + len/2 as the pivot pivot = base + len / 2; swap(arr[base], arr[pivot]); i = base + 1; j = limit - 1; // now ensure that *i <= *base <= *j swap_if_less(arr[j], arr[i]); swap_if_less(arr[base],arr[i]); swap_if_less(arr[j],arr[base]); for(;;) { do i++; while( cmp(arr[i],arr[base]) < 0 ); do j--; while( cmp(arr[base],arr[j]) < 0 ); if( i > j ) { break; } swap(arr[i], arr[j]); } swap(arr[base], arr[j]); // now, push the largest sub-array if(j - base > limit - i) { top[0] = base; top[1] = j; base = i; } else { top[0] = i; top[1] = limit; limit = j; } top += 2; } else { // the sub-array is small, perform insertion sort j = base; i = j + 1; for(; i < limit; j = i, i++) { for(; cmp(arr[j + 1],arr[j]) < 0; j--) { swap(arr[j + 1],arr[j]); if(j == base) break; } } if(top > stack) { top -= 2; base = top[0]; limit = top[1]; } else break; } } } }

Jul 14 2005

"Andrew Fedoniouk" <news terrainformatica.com> wrote in message news:db3h2l$k2s$1 digitaldaemon.com..."Mike" <Mike_member pathlink.com> wrote in message news:db34a9$9hs$1 digitaldaemon.com...Have you analyzed why it is faster? I'd be curious. My first guess from looking at phobos/internal/qsort.d is that the templated version has more inlinable code. The algorithm looks the same to me at first glance. Two differences I notice are a quicker switch to insertion sort (9 vs 7) and inlined swap (dmd's calls the typeinfo's swap).

Jul 14 2005

Have you analyzed why it is faster? I'd be curious. My first guess from looking at phobos/internal/qsort.d is that the templated version has more inlinable code. The algorithm looks the same to me at first glance. Two differences I notice are a quicker switch to insertion sort (9 vs 7) and inlined swap (dmd's calls the typeinfo's swap).That should be the reason. Function calls are quite expensive. I have switched my quicksort from receiving the comparison function as a normal parameter to receiving it as a template parameter, and that has increased its speed from 1.2 times Phobos to 0.7 times Phobos (the algorithm is quite different, too, but that is not relevant for this comparison). Ciao uwe

Jul 14 2005

"Ben Hinkle" <ben.hinkle gmail.com> wrote in message news:db5mhn$2lfo$1 digitaldaemon.com..."Andrew Fedoniouk" <news terrainformatica.com> wrote in message news:db3h2l$k2s$1 digitaldaemon.com...It is all about function calls: I've removed cmp comparator function and result is even more spectacular: 'sort!' is mine, 'sort!-inline', is mine but with inlined cmp function, 'sort' is D's builtin. less number - the better. sequence from 1 to 1_000_000 sort!-inline - 78 ms sort! - 203 ms sort - 265 ms 'sort!' takes 76% of D time 'sort!-inline' takes 29% of D's builtin sort time sequence from 1_000_000 to 1 sort!-inline - 46 ms sort! - 125 ms sort - 171 ms 'sort!' takes 71% of D time 'sort!-inline' 26% of D time -------------- template sort(T) { void sort(T[] arr) { int cmp(in T l, in T r) { return l - r; } ..................... } }"Mike" <Mike_member pathlink.com> wrote in message news:db34a9$9hs$1 digitaldaemon.com...Have you analyzed why it is faster? I'd be curious. My first guess from looking at phobos/internal/qsort.d is that the templated version has more inlinable code. The algorithm looks the same to me at first glance. Two differences I notice are a quicker switch to insertion sort (9 vs 7) and inlined swap (dmd's calls the typeinfo's swap).

Jul 14 2005