digitalmars.D - std.partition is fucked
- superdan (24/24) May 13 2009 dis must be related to bug 2966 sayin' std.sort is fucked. problem must ...
- Andrei Alexandrescu (37/42) May 13 2009 That is correct. Where were you yesterday when I was looking for this?
- Sean Kelly (4/8) May 13 2009 So all the elements that satisfy the predicate end up at the end of the
- Andrei Alexandrescu (5/14) May 13 2009 The elements satisfying the predicate are at the beginning, see e.g. the
- Sean Kelly (5/18) May 13 2009 Weird. I'd always thought that the standard behavior was the opposite.
- Andrei Alexandrescu (7/24) May 13 2009 Nono, lessThan is a binary predicate whereas partition takes a unary
- Sean Kelly (9/32) May 13 2009 Okay, I think we're talking at cross-purposes :-) It seems we both agre...
- Andrei Alexandrescu (20/52) May 13 2009 Oh, now I understand. Sorry.
- Sean Kelly (8/25) May 13 2009 It looks like there may be another bug in partition then. The static el...
- Steven Schveighoffer (14/19) May 13 2009 It depends on if the sentinel is static.
- Andrei Alexandrescu (6/30) May 13 2009 Well it's not quite an oversight because I was well aware of the issue.
- Sean Kelly (6/10) May 13 2009 The built-in sort uses an internal stack rather than recursion, which
- Andrei Alexandrescu (6/17) May 13 2009 Yeah, I saw that fixed-size stack. I'm not sure whether that's
- Lutger (4/23) May 13 2009 Some time ago I reinvented these wheels for study purpose. A custom stac...
- Andrei Alexandrescu (4/27) May 13 2009 Would be interesting if it's not too much trouble.
- Sean Kelly (5/6) May 13 2009 It only works for arrays, but what I ended up doing to get swap inlined
- Andrei Alexandrescu (11/18) May 13 2009 Well the least we can do is to say
- Lutger (5/5) May 13 2009 fwiw, I found the code and attached it with a benchmark found in the bug...
- Denis Koroskin (2/3) May 13 2009 Can you elaborate on that one? I'm not so sure that it can't be inlined.
- Sean Kelly (9/12) May 13 2009 degenerate cases.
- Matti Niemenmaa (5/7) May 13 2009 *Ahem*, I believe that http://www.dsource.org/projects/tango/ticket/571
- Sean Kelly (2/7) May 13 2009 Darnit, I knew if I didn't look it up I'd get it wrong. Sorry about tha...
- Lutger (2/16) May 13 2009 It wasn't me, I used the Tango code as one the sources to study this alg...
- bearophile (10/11) May 15 2009 The built-in sort is snail slow, and its stack overflows if your array i...
- Andrei Alexandrescu (4/16) May 15 2009 That pretty much settles it. Seeing sort's fixed stack, I had this
dis must be related to bug 2966 sayin' std.sort is fucked. problem must be with std.partition. i tested and unstable partition is 10 times slower than stable. should be faster akshully. so looked into tat and found in da loop for std.partition unstable and found da range r1 is fucked. for (;;) { for (;;) { if (r.empty) return r; if (!pred(r.front)) break; r.popFront; } // found the left bound assert(!r.empty); auto r1 = r; for (;;) { if (pred(r1.back)) break; r1.popBack; if (r1.empty) return r; } // found the right bound, swap & make progress swap(r.front, r1.back); r.popFront; r1.popBack; } r1 is popbacked a few times but then all tat is forgotten the next time around da loop coz r1 is local to da loop. so da loop is quadratic methinks. what u need iz save r outside loop & then popfront & popback from da same range 'n' shit.
May 13 2009
superdan wrote:dis must be related to bug 2966 sayin' std.sort is fucked. problem must be with std.partition. i tested and unstable partition is 10 times slower than stable. should be faster akshully. so looked into tat and found in da loop for std.partition unstable and found da range r1 is fucked.That is correct. Where were you yesterday when I was looking for this? :o) The fixed loop is: // Inspired from www.stepanovpapers.com/PAM3-partition_notes.pdf, // section "Bidirectional Partition Algorithm (Hoare)" auto result = r; for (;;) { for (;;) { if (r.empty) return result; if (!pred(r.front)) break; r.popFront; result.popFront; } // found the left bound assert(!r.empty); for (;;) { if (pred(r.back)) break; r.popBack; if (r.empty) return result; } // found the right bound, swap & make progress swap(r.front, r.back); r.popFront; result.popFront; r.popBack; } Note how the left edge of result follows the left edge of r, but the right edge stays put because partition() returns the right-hand-side range. r shrinks from both ends to exhaustion. The performance of std.sort is back to normal - still slower than the built-in sort (but not by orders of magnitude), probably because bumping ranges is at a disadvantage. Thanks David and superdan. Andrei
May 13 2009
Andrei Alexandrescu wrote:Note how the left edge of result follows the left edge of r, but the right edge stays put because partition() returns the right-hand-side range. r shrinks from both ends to exhaustion.So all the elements that satisfy the predicate end up at the end of the original range instead of the beginning? Was that an arbitrary choice, or is there a reason for it?
May 13 2009
Sean Kelly wrote:Andrei Alexandrescu wrote:The elements satisfying the predicate are at the beginning, see e.g. the unittests. The range returned is (as always) the right-hand side, i.e. the range containing the elements that don't satisfy the predicate. AndreiNote how the left edge of result follows the left edge of r, but the right edge stays put because partition() returns the right-hand-side range. r shrinks from both ends to exhaustion.So all the elements that satisfy the predicate end up at the end of the original range instead of the beginning? Was that an arbitrary choice, or is there a reason for it?
May 13 2009
== Quote from Andrei Alexandrescu (SeeWebsiteForEmail erdani.org)'s articleSean Kelly wrote:Weird. I'd always thought that the standard behavior was the opposite. That way partition could be passed a lessThan predicate and be used for sorting. Though I guess you can just use a greaterThan predicate instead.Andrei Alexandrescu wrote:The elements satisfying the predicate are at the beginning, see e.g. the unittests. The range returned is (as always) the right-hand side, i.e. the range containing the elements that don't satisfy the predicate.Note how the left edge of result follows the left edge of r, but the right edge stays put because partition() returns the right-hand-side range. r shrinks from both ends to exhaustion.So all the elements that satisfy the predicate end up at the end of the original range instead of the beginning? Was that an arbitrary choice, or is there a reason for it?
May 13 2009
Sean Kelly wrote:== Quote from Andrei Alexandrescu (SeeWebsiteForEmail erdani.org)'s articleNono, lessThan is a binary predicate whereas partition takes a unary predicate. The spec of partition is very simple: do whatever the hell it takes to put elements e that satisfy pred(e) before elements that don't. If you follow the way std.sort uses partition, you'll see that it passes a unary predicate that compares for less than against the chosen pivot. AndreiSean Kelly wrote:Weird. I'd always thought that the standard behavior was the opposite. That way partition could be passed a lessThan predicate and be used for sorting. Though I guess you can just use a greaterThan predicate instead.Andrei Alexandrescu wrote:The elements satisfying the predicate are at the beginning, see e.g. the unittests. The range returned is (as always) the right-hand side, i.e. the range containing the elements that don't satisfy the predicate.Note how the left edge of result follows the left edge of r, but the right edge stays put because partition() returns the right-hand-side range. r shrinks from both ends to exhaustion.So all the elements that satisfy the predicate end up at the end of the original range instead of the beginning? Was that an arbitrary choice, or is there a reason for it?
May 13 2009
== Quote from Andrei Alexandrescu (SeeWebsiteForEmail erdani.org)'s articleSean Kelly wrote:Okay, I think we're talking at cross-purposes :-) It seems we both agree that partition should place the elements satisfying pred before those that do not. And I should have been more clear about pred being a unary function. In any case, what I'm wondering is why partition returns the range representing the elements that do not satisfy pred. When I call partition, wouldn't I be interested in the result containing the elements that have satisfied the supplied predicate? Or does this cause problems for the sort routine.== Quote from Andrei Alexandrescu (SeeWebsiteForEmail erdani.org)'s articleNono, lessThan is a binary predicate whereas partition takes a unary predicate. The spec of partition is very simple: do whatever the hell it takes to put elements e that satisfy pred(e) before elements that don't. If you follow the way std.sort uses partition, you'll see that it passes a unary predicate that compares for less than against the chosen pivot.Sean Kelly wrote:Weird. I'd always thought that the standard behavior was the opposite. That way partition could be passed a lessThan predicate and be used for sorting. Though I guess you can just use a greaterThan predicate instead.Andrei Alexandrescu wrote:The elements satisfying the predicate are at the beginning, see e.g. the unittests. The range returned is (as always) the right-hand side, i.e. the range containing the elements that don't satisfy the predicate.Note how the left edge of result follows the left edge of r, but the right edge stays put because partition() returns the right-hand-side range. r shrinks from both ends to exhaustion.So all the elements that satisfy the predicate end up at the end of the original range instead of the beginning? Was that an arbitrary choice, or is there a reason for it?
May 13 2009
Sean Kelly wrote:== Quote from Andrei Alexandrescu (SeeWebsiteForEmail erdani.org)'s articleOh, now I understand. Sorry. As a general rule, when they could return either the left or the right subrange of a range, functions in std.algorithm return the right range. This is because sentinel-terminated ranges cannot express the left-hand-side range. Partition is in fact the perfect example because it works with forward ranges. If you want to partition a singly-linked list, you'd have to return the right-hand sublist. There's nothing else you could possibly return! If you wanted to return the left-hand sublist, you'd have to return a different type of range (something like "list up to this node"). So for generality's sake, whenever you have a choice, return the right-hand part of a range. There is growing interest in defining additional ranges that express (at a cost) things like "the portion of a singly-linked list starting at node X and ending at node Y". For example, people want to do a find() and then deal with the portion _before_ the found element. I'd love to discuss that further, but I guess I'll have to wait for the d.next newsgroup. :oD AndreiSean Kelly wrote:Okay, I think we're talking at cross-purposes :-) It seems we both agree that partition should place the elements satisfying pred before those that do not. And I should have been more clear about pred being a unary function. In any case, what I'm wondering is why partition returns the range representing the elements that do not satisfy pred. When I call partition, wouldn't I be interested in the result containing the elements that have satisfied the supplied predicate? Or does this cause problems for the sort routine.== Quote from Andrei Alexandrescu (SeeWebsiteForEmail erdani.org)'s articleNono, lessThan is a binary predicate whereas partition takes a unary predicate. The spec of partition is very simple: do whatever the hell it takes to put elements e that satisfy pred(e) before elements that don't. If you follow the way std.sort uses partition, you'll see that it passes a unary predicate that compares for less than against the chosen pivot.Sean Kelly wrote:Weird. I'd always thought that the standard behavior was the opposite. That way partition could be passed a lessThan predicate and be used for sorting. Though I guess you can just use a greaterThan predicate instead.Andrei Alexandrescu wrote:The elements satisfying the predicate are at the beginning, see e.g. the unittests. The range returned is (as always) the right-hand side, i.e. the range containing the elements that don't satisfy the predicate.Note how the left edge of result follows the left edge of r, but the right edge stays put because partition() returns the right-hand-side range. r shrinks from both ends to exhaustion.So all the elements that satisfy the predicate end up at the end of the original range instead of the beginning? Was that an arbitrary choice, or is there a reason for it?
May 13 2009
== Quote from Andrei Alexandrescu (SeeWebsiteForEmail erdani.org)'s articleAs a general rule, when they could return either the left or the right subrange of a range, functions in std.algorithm return the right range. This is because sentinel-terminated ranges cannot express the left-hand-side range.Gotcha. That's unfortunate, but makes perfect sense.Partition is in fact the perfect example because it works with forward ranges. If you want to partition a singly-linked list, you'd have to return the right-hand sublist. There's nothing else you could possibly return! If you wanted to return the left-hand sublist, you'd have to return a different type of range (something like "list up to this node"). So for generality's sake, whenever you have a choice, return the right-hand part of a range.It looks like there may be another bug in partition then. The static else clause (ss == SwapStrategy.unstable) is supposed to work for forward ranges but it calls popBack(). It looks like only the semistable option is actually available to forward ranges. Is this correct?There is growing interest in defining additional ranges that express (at a cost) things like "the portion of a singly-linked list starting at node X and ending at node Y". For example, people want to do a find() and then deal with the portion _before_ the found element. I'd love to discuss that further, but I guess I'll have to wait for the d.next newsgroup. :oDYeah, it would be great if it were possible to slice and dice a range in any manner of different ways.
May 13 2009
On Wed, 13 May 2009 17:17:33 -0400, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:Partition is in fact the perfect example because it works with forward ranges. If you want to partition a singly-linked list, you'd have to return the right-hand sublist. There's nothing else you could possibly return! If you wanted to return the left-hand sublist, you'd have to return a different type of range (something like "list up to this node").It depends on if the sentinel is static. For example, it would be perfectly legal to specify a linked list as the nodes between node x and y, where y is null at runtime in the case of a full list. I'm not even sure the performance would suffer significantly. dcollections' linked list is a doubly linked list, but I use a sentinel dummy element to denote both the beginning and the end. It makes insertion/deletion code completely uniform because there is *always* a valid previous and next node for each node. No checking for "if this is the head node, update the original list pointer" Not being able to return a subrange of a container as fundamental as a linked list is a pretty significant oversight in my opinion. -Steve
May 13 2009
Steven Schveighoffer wrote:On Wed, 13 May 2009 17:17:33 -0400, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:Well it's not quite an oversight because I was well aware of the issue. The only thing is, I started with the narrowest interface I could to figure how far I can get. Primitives can be added to select range categories as needed. AndreiPartition is in fact the perfect example because it works with forward ranges. If you want to partition a singly-linked list, you'd have to return the right-hand sublist. There's nothing else you could possibly return! If you wanted to return the left-hand sublist, you'd have to return a different type of range (something like "list up to this node").It depends on if the sentinel is static. For example, it would be perfectly legal to specify a linked list as the nodes between node x and y, where y is null at runtime in the case of a full list. I'm not even sure the performance would suffer significantly. dcollections' linked list is a doubly linked list, but I use a sentinel dummy element to denote both the beginning and the end. It makes insertion/deletion code completely uniform because there is *always* a valid previous and next node for each node. No checking for "if this is the head node, update the original list pointer" Not being able to return a subrange of a container as fundamental as a linked list is a pretty significant oversight in my opinion.
May 13 2009
Andrei Alexandrescu wrote:The performance of std.sort is back to normal - still slower than the built-in sort (but not by orders of magnitude), probably because bumping ranges is at a disadvantage.The built-in sort uses an internal stack rather than recursion, which makes its performance on a best-case dataset hard to beat. I finally satisfied myself by having a constant time overhead vs. the built-in sort for best-case and beat it by polynomial time for worst-case (the built-in sort doesn't adapt to worst-case datasets very well).
May 13 2009
Sean Kelly wrote:Andrei Alexandrescu wrote:Yeah, I saw that fixed-size stack. I'm not sure whether that's responsible for much of its performance, one of these days I need to measure. My speculation is that the depth of the stack is small enough to not really contribute much to the running time. AndreiThe performance of std.sort is back to normal - still slower than the built-in sort (but not by orders of magnitude), probably because bumping ranges is at a disadvantage.The built-in sort uses an internal stack rather than recursion, which makes its performance on a best-case dataset hard to beat. I finally satisfied myself by having a constant time overhead vs. the built-in sort for best-case and beat it by polynomial time for worst-case (the built-in sort doesn't adapt to worst-case datasets very well).
May 13 2009
Andrei Alexandrescu wrote:Sean Kelly wrote:Some time ago I reinvented these wheels for study purpose. A custom stack was a little faster but not that much. std.swap can't be inlined because it uses ref params, that cost also a bit since it's called many times. Switching to another sort if a certain recursion depth is reached helped, but mostly in degenerate cases. I still have the thing somewhere, it is about twice as fast as builtin sort. It's not a lot of work but I could dig it up and provide some timings if you want.Andrei Alexandrescu wrote:Yeah, I saw that fixed-size stack. I'm not sure whether that's responsible for much of its performance, one of these days I need to measure. My speculation is that the depth of the stack is small enough to not really contribute much to the running time. AndreiThe performance of std.sort is back to normal - still slower than the built-in sort (but not by orders of magnitude), probably because bumping ranges is at a disadvantage.The built-in sort uses an internal stack rather than recursion, which makes its performance on a best-case dataset hard to beat. I finally satisfied myself by having a constant time overhead vs. the built-in sort for best-case and beat it by polynomial time for worst-case (the built-in sort doesn't adapt to worst-case datasets very well).
May 13 2009
Lutger wrote:Andrei Alexandrescu wrote:Would be interesting if it's not too much trouble. If swap is not inlined that would be serious. Sort does a lot of swapping. AndreiSean Kelly wrote:Some time ago I reinvented these wheels for study purpose. A custom stack was a little faster but not that much. std.swap can't be inlined because it uses ref params, that cost also a bit since it's called many times. Switching to another sort if a certain recursion depth is reached helped, but mostly in degenerate cases. I still have the thing somewhere, it is about twice as fast as builtin sort. It's not a lot of work but I could dig it up and provide some timings if you want.Andrei Alexandrescu wrote:Yeah, I saw that fixed-size stack. I'm not sure whether that's responsible for much of its performance, one of these days I need to measure. My speculation is that the depth of the stack is small enough to not really contribute much to the running time. AndreiThe performance of std.sort is back to normal - still slower than the built-in sort (but not by orders of magnitude), probably because bumping ranges is at a disadvantage.The built-in sort uses an internal stack rather than recursion, which makes its performance on a best-case dataset hard to beat. I finally satisfied myself by having a constant time overhead vs. the built-in sort for best-case and beat it by polynomial time for worst-case (the built-in sort doesn't adapt to worst-case datasets very well).
May 13 2009
== Quote from Andrei Alexandrescu (SeeWebsiteForEmail erdani.org)'s articleIf swap is not inlined that would be serious. Sort does a lot of swapping.It only works for arrays, but what I ended up doing to get swap inlined was to pass it two array indices instead of two references. There must be some way to address this for ranges if ref parameters prevent inlining with DMD.
May 13 2009
Sean Kelly wrote:== Quote from Andrei Alexandrescu (SeeWebsiteForEmail erdani.org)'s articleWell the least we can do is to say static if (is(Range R == T[], T)) { ... use array-specific swap ... } else { ... use regular swap ... } AndreiIf swap is not inlined that would be serious. Sort does a lot of swapping.It only works for arrays, but what I ended up doing to get swap inlined was to pass it two array indices instead of two references. There must be some way to address this for ranges if ref parameters prevent inlining with DMD.
May 13 2009
fwiw, I found the code and attached it with a benchmark found in the bugreport (2966). It works only on arrays, might be buggy too. But it does sort ;) Timings based on 1000_000 elements: std.algorithm: 624 // modified with the fix posted in this thread Builtin sort: 540 iterative introsort: 244
May 13 2009
On Wed, 13 May 2009 22:24:51 +0400, Lutger <lutger.blijdestijn gmail.com> wrote:std.swap can't be inlined because it uses ref paramsCan you elaborate on that one? I'm not so sure that it can't be inlined.
May 13 2009
== Quote from Lutger (lutger.blijdestijn gmail.com)'s articleSome time ago I reinvented these wheels for study purpose. A custom stack was a little faster but notthat much. std.swap can't be inlined because it uses ref params, that cost also a bit since it's calledmany times. Switching to another sort if a certain recursion depth is reached helped, but mostly indegenerate cases.I still have the thing somewhere, it is about twice as fast as builtin sort. It's not a lot of work but I coulddig it up and provide some timings if you want. The sort I wrote for Tango uses the same basic heuristics, thanks to a ticket that either you or Stewart Gordon submitted long ago. I've been meaning to compare it against the sort routine in std.algorithm to see whether the Phobos routine could benefit from any of the same tweaks.
May 13 2009
Sean Kelly wrote:The sort I wrote for Tango uses the same basic heuristics, thanks to a ticket that either you or Stewart Gordon submitted long ago.*Ahem*, I believe that http://www.dsource.org/projects/tango/ticket/571 was one of mine ;-) -- E-mail address: matti.niemenmaa+news, domain is iki (DOT) fi
May 13 2009
== Quote from Matti Niemenmaa (see_signature for.real.address)'s articleSean Kelly wrote:Darnit, I knew if I didn't look it up I'd get it wrong. Sorry about that :-)The sort I wrote for Tango uses the same basic heuristics, thanks to a ticket that either you or Stewart Gordon submitted long ago.*Ahem*, I believe that http://www.dsource.org/projects/tango/ticket/571 was one of mine ;-)
May 13 2009
Sean Kelly wrote:== Quote from Lutger (lutger.blijdestijn gmail.com)'s articleIt wasn't me, I used the Tango code as one the sources to study this algorithm :)Some time ago I reinvented these wheels for study purpose. A custom stack was a little faster but notthat much. std.swap can't be inlined because it uses ref params, that cost also a bit since it's calledmany times. Switching to another sort if a certain recursion depth is reached helped, but mostly indegenerate cases.I still have the thing somewhere, it is about twice as fast as builtin sort. It's not a lot of work but I coulddig it up and provide some timings if you want. The sort I wrote for Tango uses the same basic heuristics, thanks to a ticket that either you or Stewart Gordon submitted long ago. I've been meaning to compare it against the sort routine in std.algorithm to see whether the Phobos routine could benefit from any of the same tweaks.
May 13 2009
Sean Kelly:The built-in sort uses an internal stack rather than recursion, which makes its performance on a best-case dataset hard to beat.<The built-in sort is snail slow, and its stack overflows if your array isn't small: void main() { auto a = new uint[0x8F_FFFF]; a.sort; } So the current built-in sort is bad, and it must be fixed or removed. Bye, bearophile (Sorry for the answering delay, I was busy for about a week. Tons of posts to catch up!)
May 15 2009
bearophile wrote:Sean Kelly:That pretty much settles it. Seeing sort's fixed stack, I had this feeling for a while, but only now I saw the proof :o). AndreiThe built-in sort uses an internal stack rather than recursion, which makes its performance on a best-case dataset hard to beat.<The built-in sort is snail slow, and its stack overflows if your array isn't small: void main() { auto a = new uint[0x8F_FFFF]; a.sort; } So the current built-in sort is bad, and it must be fixed or removed.
May 15 2009