www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - Sorting according to a primary and secondary criterion

reply Joseph Rushton Wakeling <joseph.wakeling webdrake.net> writes:
Hi all :-)

Suppose that I have two different arrays of the same length, arr1 and arr2, and
an array of index values idx = [0 .. arr1.length].

Now, suppose that I want to sort the index values according to the corresponding
values of arr1.  This is easy with schwartzSort:

    idx.schwartzSort!(a => arr1[a]);

But suppose now that I want to sort _primarily_ according to arr1, but
secondarily according to arr2?  That is, that the index i should come before j
if arr1[i] < arr1[j], but also if arr1[i] == arr1[j] && arr2[i] < arr2[j] ... ?

One option might be first to sort with respect to arr2 and then to sort with
respect to arr1 using SwapStrategy.stable, but that seems overkill and also
uncertain.

I guess I could do something like,

    .sort!("arr1[a] < arr1[b] || (arr1[a] == arr1[b] && arr2[a] < arr2[b]"));

... but I'm not sure that would be an optimal strategy.

Is there a standard, accepted approach for this kind of sort with
primary/secondary criterion?

Thanks & best wishes,

    -- Joe
Jul 17 2013
next sibling parent reply "bearophile" <bearophileHUGS lycos.com> writes:
Joseph Rushton Wakeling:

 Is there a standard, accepted approach for this kind of sort 
 with primary/secondary criterion?
There are various ways to do it. One way is to use a stable sort and sort the data two or more times. Another way is to use something like this, but this needs some memory: idx.schwartzSort!(i => tuple(arr1[i], arr2[i])); But often the most efficient way is to use sort() with a comparison function that takes in account all your sorting criteria. Bye, bearophile
Jul 17 2013
next sibling parent reply Joseph Rushton Wakeling <joseph.wakeling webdrake.net> writes:
On 07/17/2013 02:07 PM, bearophile wrote:
 Another way is to use something like this, but this needs some memory:
 
 idx.schwartzSort!(i => tuple(arr1[i], arr2[i]));
Oh, nice thought! :-)
 But often the most efficient way is to use sort() with a comparison function
 that takes in account all your sorting criteria.
That's what I assumed, but I don't understand how to provide that criterion to sort: if I do e.g. idx.sort!("arr1[a] < arr1[b] || (arr1[a] == arr1[b] && arr2[a] < arr2[b])"); I (unsurprisingly) get a load of errors about std.functional not having access to arr1 or arr2. Contrast with the example in std.algorithm docs: words.sort!("toUpper(a) < toUpper(b)", SwapStrategy.stable); ... but of course toUpper is a publicly available function.
Jul 17 2013
parent reply "bearophile" <bearophileHUGS lycos.com> writes:
Joseph Rushton Wakeling:

     idx.sort!("arr1[a] < arr1[b] || (arr1[a] == arr1[b] && 
 arr2[a] < arr2[b])");

 I (unsurprisingly) get a load of errors about std.functional 
 not having access to arr1 or arr2.
You need a lambda delegate for that. But I forgot about multisort algorithm... It's probably the right tool. Bye, bearophile
Jul 17 2013
parent reply Joseph Rushton Wakeling <joseph.wakeling webdrake.net> writes:
On 07/17/2013 02:42 PM, bearophile wrote:
 You need a lambda delegate for that. But I forgot about multisort algorithm...
 It's probably the right tool.
So, in the end I tried out 3 different alternatives: schwartzSort(a => tuple(arr1[a], arr2[a]), "a < b") sort((a, b) => arr1[a] < arr1[b] || (arr1[a] == arr1[b] && arr2[a] < arr2[b])) multiSort((a, b) => arr1[a] < arr1[b], (a, b) => arr2[a] < arr2[b]) sort() seems faster than schwartzSort for smaller-scale stuff, but takes longer for larger-scale sorts. multiSort wins out over both of them. I guess in principle it might be possible to have a multiSchwartzSort which allows for multiple transformations: multiSchwartzSort(a => arr1[a], a => arr2[a], "a < b", "a < b") ... which might gain some speed by caching the results of the transformations, as schwartzSort is supposed to do? But anyway, I think this solution works at limited extra cost speed-wise. :-) Thanks very much to both of you!
Jul 17 2013
parent "monarch_dodra" <monarchdodra gmail.com> writes:
On Wednesday, 17 July 2013 at 13:12:18 UTC, Joseph Rushton 
Wakeling wrote:
 On 07/17/2013 02:42 PM, bearophile wrote:
 You need a lambda delegate for that. But I forgot about 
 multisort algorithm...
 It's probably the right tool.
So, in the end I tried out 3 different alternatives: schwartzSort(a => tuple(arr1[a], arr2[a]), "a < b") sort((a, b) => arr1[a] < arr1[b] || (arr1[a] == arr1[b] && arr2[a] < arr2[b])) multiSort((a, b) => arr1[a] < arr1[b], (a, b) => arr2[a] < arr2[b]) sort() seems faster than schwartzSort for smaller-scale stuff, but takes longer for larger-scale sorts. multiSort wins out over both of them. I guess in principle it might be possible to have a multiSchwartzSort which allows for multiple transformations: multiSchwartzSort(a => arr1[a], a => arr2[a], "a < b", "a < b") ... which might gain some speed by caching the results of the transformations, as schwartzSort is supposed to do? But anyway, I think this solution works at limited extra cost speed-wise. :-) Thanks very much to both of you!
schwartzSort should really only be used if the transformation function is expensive. For example, if you want to sort 3D dots according to their norm. In your case, your transformation function (a => arr1[a]) isn't expensive. So a straight up (multi)sort should be preferred over schwartzSort.
Jul 17 2013
prev sibling parent reply Joseph Rushton Wakeling <joseph.wakeling webdrake.net> writes:
On 07/17/2013 02:07 PM, bearophile wrote:
 Another way is to use something like this, but this needs some memory:
 
 idx.schwartzSort!(i => tuple(arr1[i], arr2[i]));
Actually, I don't find it needs any more memory than regular schwartzSort (which I was using anyway), but it does cost _speed_ -- quite a lot. :-(
Jul 17 2013
parent "bearophile" <bearophileHUGS lycos.com> writes:
Joseph Rushton Wakeling:

 Actually, I don't find it needs any more memory than regular 
 schwartzSort (which I was using anyway),
A and array of tuples should take more memory. Try with a much larger input array.
 but it does cost _speed_ -- quite a lot. :-(
Right, schwartzSort is quite slow: http://d.puremagic.com/issues/show_bug.cgi?id=5077 But thankfully there are simple means to speed up schwartzSort... (like using alloca for small input arrays, improving its code, using minimallyInitializedArray, etc). Bye, bearophile
Jul 17 2013
prev sibling parent reply "John Colvin" <john.loughran.colvin gmail.com> writes:
On Wednesday, 17 July 2013 at 11:43:37 UTC, Joseph Rushton 
Wakeling wrote:
     .sort!("arr1[a] < arr1[b] || (arr1[a] == arr1[b] && arr2[a] 
 < arr2[b]"));

 ... but I'm not sure that would be an optimal strategy.
Is std.algorithm.multisort what you'd be looking for?
Jul 17 2013
parent Joseph Rushton Wakeling <joseph.wakeling webdrake.net> writes:
On 07/17/2013 02:35 PM, John Colvin wrote:
 Is std.algorithm.multisort what you'd be looking for?
Good thought. Thanks to pointing me here I also noticed the following example in the schwartzSort docs which might be relevant: sort!((a, b) => hashFun(a) < hashFun(b))(array); I'm going to try out multisort and try out this second way of writing the condition.
Jul 17 2013