digitalmars.D - Imprecise running time for topN?

Magnus Lie Hetland (17/17) Feb 01 2011 I was reading the docs for std.algorithm, when I came across topN. This

Andrei Alexandrescu (5/19) Feb 01 2011 You're right (and randomization should be there, too). Could you please

Magnus Lie Hetland (6/15) Feb 01 2011 Will do. :)

Magnus Lie Hetland <magnus hetland.org> writes:

I was reading the docs for std.algorithm, when I came across topN. This 
is, of course, a highly useful problem, with several solutions; I was a 
bit surprised to see the claim that it runs in linear time. As far as I 
know, the only ways of achieving that would be (1) using the 
super-elegant, but highly inefficient, algorithm of Blum, Floyd, Pratt, 
Rivest and Tarjan, often known as Select, or (2) using soft heaps. (The 
latter, I know less about.)

Checking the source, I found that -- as I suspected -- it uses the more 
common Randomized-Select (without actual randomization here, though), 
which only has an *expected* (or average-case) linear running time. It 
suffers the same worst-case problems as Quicksort.

I'm not objecting to the use of algorithm -- it's a good choice in 
practice -- but the docs should probably specify that the linear 
guarantee does not hold in the worst case?

-- 
Magnus Lie Hetland
http://hetland.org

Feb 01 2011

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 2/1/11 8:12 AM, Magnus Lie Hetland wrote:
 I was reading the docs for std.algorithm, when I came across topN. This
 is, of course, a highly useful problem, with several solutions; I was a
 bit surprised to see the claim that it runs in linear time. As far as I
 know, the only ways of achieving that would be (1) using the
 super-elegant, but highly inefficient, algorithm of Blum, Floyd, Pratt,
 Rivest and Tarjan, often known as Select, or (2) using soft heaps. (The
 latter, I know less about.)

 Checking the source, I found that -- as I suspected -- it uses the more
 common Randomized-Select (without actual randomization here, though),
 which only has an *expected* (or average-case) linear running time. It
 suffers the same worst-case problems as Quicksort.

 I'm not objecting to the use of algorithm -- it's a good choice in
 practice -- but the docs should probably specify that the linear
 guarantee does not hold in the worst case?

You're right (and randomization should be there, too). Could you please 
add a bugzilla entry so we don't forget about this? 
http://d.puremagic.com/issues. Thanks!

Andrei

Feb 01 2011

Magnus Lie Hetland <magnus hetland.org> writes:

On 2011-02-01 16:29:56 +0100, Andrei Alexandrescu said:

 On 2/1/11 8:12 AM, Magnus Lie Hetland wrote:

[snip]
 I'm not objecting to the use of algorithm -- it's a good choice in
 practice -- but the docs should probably specify that the linear
 guarantee does not hold in the worst case?

 
 You're right (and randomization should be there, too). Could you please 
 add a bugzilla entry so we don't forget about this? 
 http://d.puremagic.com/issues. Thanks!

Will do. :)

 Andrei


-- 
Magnus Lie Hetland
http://hetland.org

Feb 01 2011

D Programming

C/C++ Programming

Other

digitalmars.D - Imprecise running time for topN?