digitalmars.D - medianOfMedians

Andrei Alexandrescu (3/3) Jan 19 2016 I've seldom have code write itself so beautifully. Which, of course,

Jack Stouffer (4/8) Jan 19 2016 I left some notes about some minor tweaks that I think should be
Timon Gehr (20/23) Jan 19 2016 This is not an implementation of the median of medians algorithm.

Ilya (3/8) Jan 19 2016 The approximate medianOfMedians algorithm can be used for topN.
Andrei Alexandrescu (4/27) Jan 19 2016 Thanks. Urgh. So the approximate median part cannot be separated from
Andrei Alexandrescu (21/26) Jan 20 2016 [snip]

Andrei Alexandrescu (9/9) Jan 20 2016 On 01/20/2016 10:20 AM, Andrei Alexandrescu wrote:

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

I've seldom have code write itself so beautifully. Which, of course, 
means it needs to be destroyed. 
https://github.com/D-Programming-Language/phobos/pull/3938 -- Andrei

Jan 19 2016

Jack Stouffer <jack jackstouffer.com> writes:

On Wednesday, 20 January 2016 at 01:20:11 UTC, Andrei 
Alexandrescu wrote:
 I've seldom have code write itself so beautifully. Which, of 
 course, means it needs to be destroyed. 
 https://github.com/D-Programming-Language/phobos/pull/3938 -- 
 Andrei

I left some notes about some minor tweaks that I think should be 
made, but overall it looks good!

Jan 19 2016

Timon Gehr <timon.gehr gmx.ch> writes:

On 01/20/2016 02:20 AM, Andrei Alexandrescu wrote:
 I've seldom have code write itself so beautifully. Which, of course,
 means it needs to be destroyed.
 https://github.com/D-Programming-Language/phobos/pull/3938 -- Andrei

This is not an implementation of the median of medians algorithm.

int[] bad(int n){
     if(n<=5){ return iota(n).array; }
     auto next=bad(n/5);
     return next.map!(a=>[a-2,a-1,a,a+n,a+n+1]).join;
}

void main(){
     import std.stdio;
     auto a=bad(5^^10);
     auto idx=medianOfMedians(a);
     double notLarger=a.count!(x=>x<=a[idx])/(1.0*a.length);
     writeln(notLarger); // 0.0100766
     assert(notLarger>=0.3); // fail
}

The real algorithm works by computing an /exact/ median of the medians 
and uses it as the pivot value for quickselect in order to compute the 
precise median of the full array. In the given implementation, the 
approximations accumulate with each recursive invocation and no constant 
guarantees can be given on the percentile of the result.

Jan 19 2016

Ilya <ilyayaroshenko gmail.com> writes:

On Wednesday, 20 January 2016 at 02:26:35 UTC, Timon Gehr wrote:
 On 01/20/2016 02:20 AM, Andrei Alexandrescu wrote:
 [...]

 This is not an implementation of the median of medians 
 algorithm.

 [...]

The approximate medianOfMedians algorithm can be used for topN. 
--Ilya

Jan 19 2016

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 01/19/2016 09:26 PM, Timon Gehr wrote:
 On 01/20/2016 02:20 AM, Andrei Alexandrescu wrote:
 I've seldom have code write itself so beautifully. Which, of course,
 means it needs to be destroyed.
 https://github.com/D-Programming-Language/phobos/pull/3938 -- Andrei

 This is not an implementation of the median of medians algorithm.

 int[] bad(int n){
      if(n<=5){ return iota(n).array; }
      auto next=bad(n/5);
      return next.map!(a=>[a-2,a-1,a,a+n,a+n+1]).join;
 }

 void main(){
      import std.stdio;
      auto a=bad(5^^10);
      auto idx=medianOfMedians(a);
      double notLarger=a.count!(x=>x<=a[idx])/(1.0*a.length);
      writeln(notLarger); // 0.0100766
      assert(notLarger>=0.3); // fail
 }

 The real algorithm works by computing an /exact/ median of the medians
 and uses it as the pivot value for quickselect in order to compute the
 precise median of the full array. In the given implementation, the
 approximations accumulate with each recursive invocation and no constant
 guarantees can be given on the percentile of the result.

Thanks. Urgh. So the approximate median part cannot be separated from 
partitioning. It was suspiciously cute :o). Back to the drawing board. 
-- Andrei

Jan 19 2016

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 01/19/2016 09:26 PM, Timon Gehr wrote:
 On 01/20/2016 02:20 AM, Andrei Alexandrescu wrote:
 I've seldom have code write itself so beautifully. Which, of course,
 means it needs to be destroyed.
 https://github.com/D-Programming-Language/phobos/pull/3938 -- Andrei

 This is not an implementation of the median of medians algorithm.

[snip]

Thanks again for sharing your insight. FWIW there's a bit of variation 
floating on the Net regarding MoM. The Wikipedia article at 
https://en.wikipedia.org/wiki/Median_of_medians moves the medians of 
five to the beginning of the array (my implementation uses stride(), 
thus trading computation for data movement). I'm unclear on which 
approach is generally better.

http://austinrochford.com/posts/2013-10-28-median-of-medians.html does 
not mention the mutual recursion, suggesting (at least in a cursory 
reading) my wishy-washy previous implementation that doesn't use 
quickselect.

https://www.ics.uci.edu/~eppstein/161/960130.html only uses one 
recursive function, not two.

The original PICK algorithm at 
https://people.csail.mit.edu/rivest/pubs/BFPRT73.pdf only uses one 
recursive function.

Anyhow, I've implemented the two-functions version at 
https://github.com/D-Programming-Language/phobos/pull/3938. I'll next 
try whether the one-function version is just as good or better. Destroy?


Andrei

Jan 20 2016

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 01/20/2016 10:20 AM, Andrei Alexandrescu wrote:
[snip]

And btw I now understand better why medianOfMedians is not so fast in 
practice. In fact my wishy-washy version at 
https://github.com/andralex/phobos/commit/9e004c35b824aac108e0
615183065e73384e9f9 
seems to be practically attractive for choosing a good pivot even though 
it doesn't offer theoretical guarantees. I wonder how some jitter can be 
injected into it so as to improve its worst-case performance.

Andrei

Jan 20 2016

D Programming

C/C++ Programming

Other

digitalmars.D - medianOfMedians