www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - medianOfMedians

reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
I've seldom have code write itself so beautifully. Which, of course, 
means it needs to be destroyed. 
https://github.com/D-Programming-Language/phobos/pull/3938 -- Andrei
Jan 19 2016
next sibling parent Jack Stouffer <jack jackstouffer.com> writes:
On Wednesday, 20 January 2016 at 01:20:11 UTC, Andrei 
Alexandrescu wrote:
 I've seldom have code write itself so beautifully. Which, of 
 course, means it needs to be destroyed. 
 https://github.com/D-Programming-Language/phobos/pull/3938 -- 
 Andrei
I left some notes about some minor tweaks that I think should be made, but overall it looks good!
Jan 19 2016
prev sibling parent reply Timon Gehr <timon.gehr gmx.ch> writes:
On 01/20/2016 02:20 AM, Andrei Alexandrescu wrote:
 I've seldom have code write itself so beautifully. Which, of course,
 means it needs to be destroyed.
 https://github.com/D-Programming-Language/phobos/pull/3938 -- Andrei
This is not an implementation of the median of medians algorithm. int[] bad(int n){ if(n<=5){ return iota(n).array; } auto next=bad(n/5); return next.map!(a=>[a-2,a-1,a,a+n,a+n+1]).join; } void main(){ import std.stdio; auto a=bad(5^^10); auto idx=medianOfMedians(a); double notLarger=a.count!(x=>x<=a[idx])/(1.0*a.length); writeln(notLarger); // 0.0100766 assert(notLarger>=0.3); // fail } The real algorithm works by computing an /exact/ median of the medians and uses it as the pivot value for quickselect in order to compute the precise median of the full array. In the given implementation, the approximations accumulate with each recursive invocation and no constant guarantees can be given on the percentile of the result.
Jan 19 2016
next sibling parent Ilya <ilyayaroshenko gmail.com> writes:
On Wednesday, 20 January 2016 at 02:26:35 UTC, Timon Gehr wrote:
 On 01/20/2016 02:20 AM, Andrei Alexandrescu wrote:
 [...]
This is not an implementation of the median of medians algorithm. [...]
The approximate medianOfMedians algorithm can be used for topN. --Ilya
Jan 19 2016
prev sibling next sibling parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 01/19/2016 09:26 PM, Timon Gehr wrote:
 On 01/20/2016 02:20 AM, Andrei Alexandrescu wrote:
 I've seldom have code write itself so beautifully. Which, of course,
 means it needs to be destroyed.
 https://github.com/D-Programming-Language/phobos/pull/3938 -- Andrei
This is not an implementation of the median of medians algorithm. int[] bad(int n){ if(n<=5){ return iota(n).array; } auto next=bad(n/5); return next.map!(a=>[a-2,a-1,a,a+n,a+n+1]).join; } void main(){ import std.stdio; auto a=bad(5^^10); auto idx=medianOfMedians(a); double notLarger=a.count!(x=>x<=a[idx])/(1.0*a.length); writeln(notLarger); // 0.0100766 assert(notLarger>=0.3); // fail } The real algorithm works by computing an /exact/ median of the medians and uses it as the pivot value for quickselect in order to compute the precise median of the full array. In the given implementation, the approximations accumulate with each recursive invocation and no constant guarantees can be given on the percentile of the result.
Thanks. Urgh. So the approximate median part cannot be separated from partitioning. It was suspiciously cute :o). Back to the drawing board. -- Andrei
Jan 19 2016
prev sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 01/19/2016 09:26 PM, Timon Gehr wrote:
 On 01/20/2016 02:20 AM, Andrei Alexandrescu wrote:
 I've seldom have code write itself so beautifully. Which, of course,
 means it needs to be destroyed.
 https://github.com/D-Programming-Language/phobos/pull/3938 -- Andrei
This is not an implementation of the median of medians algorithm.
[snip] Thanks again for sharing your insight. FWIW there's a bit of variation floating on the Net regarding MoM. The Wikipedia article at https://en.wikipedia.org/wiki/Median_of_medians moves the medians of five to the beginning of the array (my implementation uses stride(), thus trading computation for data movement). I'm unclear on which approach is generally better. http://austinrochford.com/posts/2013-10-28-median-of-medians.html does not mention the mutual recursion, suggesting (at least in a cursory reading) my wishy-washy previous implementation that doesn't use quickselect. https://www.ics.uci.edu/~eppstein/161/960130.html only uses one recursive function, not two. The original PICK algorithm at https://people.csail.mit.edu/rivest/pubs/BFPRT73.pdf only uses one recursive function. Anyhow, I've implemented the two-functions version at https://github.com/D-Programming-Language/phobos/pull/3938. I'll next try whether the one-function version is just as good or better. Destroy? Andrei
Jan 20 2016
parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 01/20/2016 10:20 AM, Andrei Alexandrescu wrote:
[snip]

And btw I now understand better why medianOfMedians is not so fast in 
practice. In fact my wishy-washy version at 
https://github.com/andralex/phobos/commit/9e004c35b824aac108e0
615183065e73384e9f9 
seems to be practically attractive for choosing a good pivot even though 
it doesn't offer theoretical guarantees. I wonder how some jitter can be 
injected into it so as to improve its worst-case performance.

Andrei
Jan 20 2016