digitalmars.D - medianOfMedians
- Andrei Alexandrescu (3/3) Jan 19 2016 I've seldom have code write itself so beautifully. Which, of course,
- Jack Stouffer (4/8) Jan 19 2016 I left some notes about some minor tweaks that I think should be
- Timon Gehr (20/23) Jan 19 2016 This is not an implementation of the median of medians algorithm.
- Ilya (3/8) Jan 19 2016 The approximate medianOfMedians algorithm can be used for topN.
- Andrei Alexandrescu (4/27) Jan 19 2016 Thanks. Urgh. So the approximate median part cannot be separated from
- Andrei Alexandrescu (21/26) Jan 20 2016 [snip]
- Andrei Alexandrescu (9/9) Jan 20 2016 On 01/20/2016 10:20 AM, Andrei Alexandrescu wrote:
I've seldom have code write itself so beautifully. Which, of course, means it needs to be destroyed. https://github.com/D-Programming-Language/phobos/pull/3938 -- Andrei
Jan 19 2016
On Wednesday, 20 January 2016 at 01:20:11 UTC, Andrei Alexandrescu wrote:I've seldom have code write itself so beautifully. Which, of course, means it needs to be destroyed. https://github.com/D-Programming-Language/phobos/pull/3938 -- AndreiI left some notes about some minor tweaks that I think should be made, but overall it looks good!
Jan 19 2016
On 01/20/2016 02:20 AM, Andrei Alexandrescu wrote:I've seldom have code write itself so beautifully. Which, of course, means it needs to be destroyed. https://github.com/D-Programming-Language/phobos/pull/3938 -- AndreiThis is not an implementation of the median of medians algorithm. int[] bad(int n){ if(n<=5){ return iota(n).array; } auto next=bad(n/5); return next.map!(a=>[a-2,a-1,a,a+n,a+n+1]).join; } void main(){ import std.stdio; auto a=bad(5^^10); auto idx=medianOfMedians(a); double notLarger=a.count!(x=>x<=a[idx])/(1.0*a.length); writeln(notLarger); // 0.0100766 assert(notLarger>=0.3); // fail } The real algorithm works by computing an /exact/ median of the medians and uses it as the pivot value for quickselect in order to compute the precise median of the full array. In the given implementation, the approximations accumulate with each recursive invocation and no constant guarantees can be given on the percentile of the result.
Jan 19 2016
On Wednesday, 20 January 2016 at 02:26:35 UTC, Timon Gehr wrote:On 01/20/2016 02:20 AM, Andrei Alexandrescu wrote:The approximate medianOfMedians algorithm can be used for topN. --Ilya[...]This is not an implementation of the median of medians algorithm. [...]
Jan 19 2016
On 01/19/2016 09:26 PM, Timon Gehr wrote:On 01/20/2016 02:20 AM, Andrei Alexandrescu wrote:Thanks. Urgh. So the approximate median part cannot be separated from partitioning. It was suspiciously cute :o). Back to the drawing board. -- AndreiI've seldom have code write itself so beautifully. Which, of course, means it needs to be destroyed. https://github.com/D-Programming-Language/phobos/pull/3938 -- AndreiThis is not an implementation of the median of medians algorithm. int[] bad(int n){ if(n<=5){ return iota(n).array; } auto next=bad(n/5); return next.map!(a=>[a-2,a-1,a,a+n,a+n+1]).join; } void main(){ import std.stdio; auto a=bad(5^^10); auto idx=medianOfMedians(a); double notLarger=a.count!(x=>x<=a[idx])/(1.0*a.length); writeln(notLarger); // 0.0100766 assert(notLarger>=0.3); // fail } The real algorithm works by computing an /exact/ median of the medians and uses it as the pivot value for quickselect in order to compute the precise median of the full array. In the given implementation, the approximations accumulate with each recursive invocation and no constant guarantees can be given on the percentile of the result.
Jan 19 2016
On 01/19/2016 09:26 PM, Timon Gehr wrote:On 01/20/2016 02:20 AM, Andrei Alexandrescu wrote:[snip] Thanks again for sharing your insight. FWIW there's a bit of variation floating on the Net regarding MoM. The Wikipedia article at https://en.wikipedia.org/wiki/Median_of_medians moves the medians of five to the beginning of the array (my implementation uses stride(), thus trading computation for data movement). I'm unclear on which approach is generally better. http://austinrochford.com/posts/2013-10-28-median-of-medians.html does not mention the mutual recursion, suggesting (at least in a cursory reading) my wishy-washy previous implementation that doesn't use quickselect. https://www.ics.uci.edu/~eppstein/161/960130.html only uses one recursive function, not two. The original PICK algorithm at https://people.csail.mit.edu/rivest/pubs/BFPRT73.pdf only uses one recursive function. Anyhow, I've implemented the two-functions version at https://github.com/D-Programming-Language/phobos/pull/3938. I'll next try whether the one-function version is just as good or better. Destroy? AndreiI've seldom have code write itself so beautifully. Which, of course, means it needs to be destroyed. https://github.com/D-Programming-Language/phobos/pull/3938 -- AndreiThis is not an implementation of the median of medians algorithm.
Jan 20 2016
On 01/20/2016 10:20 AM, Andrei Alexandrescu wrote: [snip] And btw I now understand better why medianOfMedians is not so fast in practice. In fact my wishy-washy version at https://github.com/andralex/phobos/commit/9e004c35b824aac108e0 615183065e73384e9f9 seems to be practically attractive for choosing a good pivot even though it doesn't offer theoretical guarantees. I wonder how some jitter can be injected into it so as to improve its worst-case performance. Andrei
Jan 20 2016