www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.bugs - [Issue 6787] New: Lazy sort in Phobos?

reply d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=6787

           Summary: Lazy sort in Phobos?
           Product: D
           Version: D2
          Platform: All
        OS/Version: All
            Status: NEW
          Severity: enhancement
          Priority: P2
         Component: Phobos
        AssignedTo: nobody puremagic.com
        ReportedBy: bearophile_hugs eml.cc



Sometimes in my code I have to find the first few smallest (or biggest) items
of an array, I don't know how many I will need of them, but I know in general I
will need only few of them, much less than the whole array.

Turning the array into an heap is a slow operation if I will only need few
items, and I can't use std.algorithm.partialSort because I don't know the
number of items I will need.

So I have created this very simple LazySort range, based on partialSort (works
with DMD 2.056head):


import std.stdio, std.algorithm, std.random, std.array, std.range, std.traits;

// Missing still: less and SwapStrategy template arguments.
struct LazySort(Range) if (isRandomAccessRange!Range) {
    Range data;
    private size_t idx, idxSorted;

    bool empty() { return idx >= data.length; }

    ForeachType!Range front() {
        if (idx >= idxSorted) {
            immutable oldIdxSorted = idxSorted;
            idxSorted = min(data.length, idxSorted ? (idxSorted * 2) : 1);
            partialSort(data[oldIdxSorted .. $], idxSorted - oldIdxSorted);
        }
        return data[idx];
    }

    void popFront() { idx++; }
}

void main() {
    auto A = array(iota(25));
    randomShuffle(A);
    writeln(A);

    foreach (x; LazySort!(int[])(A))
        write(x, " ");
}



I have not done benchmarks on it yet. This code seems to work, but it is not
efficient, it's just to show the semantics of the idea. A better implementation
is welcome.

I think a lazy sort will be useful in Phobos. Timon Gehr seems to appreciate
the idea.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Oct 07 2011
next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=6787


Andrei Alexandrescu <andrei metalanguage.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |andrei metalanguage.com



15:58:21 PDT ---
The canonical solution uses a heap. Creating a heap is cheap and quickly
amortized over only a few pops. An input range that creates a heap and then
yields one element at a time would be a better idea.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Oct 07 2011
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=6787





 The canonical solution uses a heap. Creating a heap is cheap and quickly
 amortized over only a few pops. An input range that creates a heap and then
 yields one element at a time would be a better idea.
If benchmarks show that a range that heapifies the input array is about as efficient as a tailored lazy sorting solution for about 4 to 10 requested max/min items, then I am OK with this idea :-) -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Oct 07 2011
prev sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=6787




17:17:20 PDT ---


 The canonical solution uses a heap. Creating a heap is cheap and quickly
 amortized over only a few pops. An input range that creates a heap and then
 yields one element at a time would be a better idea.
If benchmarks show that a range that heapifies the input array is about as efficient as a tailored lazy sorting solution for about 4 to 10 requested max/min items, then I am OK with this idea :-)
This is odd at a few levels. First, you opened this report. So it's not you who's supposed to be OK, it's the rest of us. You have the burden of proof. Second, there's no need for benchmarking anything - simple complexity analysis shows that partialSort does O(n) log(n-k) at each step, which pretty much makes it bankrupt compared to the heap approach. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Oct 07 2011