digitalmars.D - Stable topN?
- Xinok (21/21) Jun 23 2014 I've managed to build a small laundry list the past couple days,
- Peter Alexander (4/7) Jun 23 2014 I agree with your definition, and can't think of a better
- Xinok (4/11) Jun 24 2014 A stable sort would lose the original ordering of the elements
- matovitch (5/17) Jun 24 2014 Yes you can't use stable sort. However you can use a priority
- Peter Alexander (5/17) Jun 24 2014 Yeah ignore me. Just woke up realising I'd made that mistake, but
- Andrei Alexandrescu (6/24) Jun 24 2014 There's a notion of "semiStable" in std.algorithm that you may use for
- Xinok (6/49) Jun 24 2014 From what I know of partitioning algorithms, they're either
- Andrei Alexandrescu (3/7) Jun 24 2014 I'd say start spinning some code and code will lead you to good insights...
- Andrei Alexandrescu (3/3) Jun 24 2014 On 6/24/14, 11:50 AM, Xinok wrote:
- Xinok (6/10) Jun 24 2014 Given that's a breaking change, I'd rather leave it open to
- Andrei Alexandrescu (3/13) Jun 24 2014 I think the breakage brought about from changing return types from void
- David Nadlinger (4/22) Jun 24 2014 Agreed. Let's add that.
- Xinok (4/22) Jun 24 2014 Ahhh, I missed that minor detail. :-P I had mistakenly assumed it
I've managed to build a small laundry list the past couple days, so why not add another item? I've been tasked with implementing a "deterministic topN", but I also noticed that topN is lacking a stable implementation, so this seemed like an ideal opportunity to kill two birds with one stone. However, for me, it begged the question: What exactly does stable topN imply? A stable sort only implies that equal elements retain their original order. However, topN is a partitioning function whose pivot is unknown beforehand. To me, it would make more sense that not just equal elements but ALL elements retain their original order in their respective partitions. However, to accomplish the above, the Nth element would need to be discovered before any other elements are reordered. This suggests making a full copy of the range to search for the Nth element. The unstable topN could be applied to the copy to discover the Nth element, then the stable partition function could finish the job with the correct pivot. What do you all think? Do you agree with my definition of stable topN? Do you know of a better algorithm/approach for implementing this function?
Jun 23 2014
On Monday, 23 June 2014 at 22:47:20 UTC, Xinok wrote:What do you all think? Do you agree with my definition of stable topN? Do you know of a better algorithm/approach for implementing this function?I agree with your definition, and can't think of a better algorithm, but wouldn't a stable sort have the same effect as your algorithm? Both are O(n lg(n)) time and O(n) space.
Jun 23 2014
On Tuesday, 24 June 2014 at 04:40:48 UTC, Peter Alexander wrote:On Monday, 23 June 2014 at 22:47:20 UTC, Xinok wrote:A stable sort would lose the original ordering of the elements though. Furthermore, the entire point of the topN function is to be faster than simply sorting the range.What do you all think? Do you agree with my definition of stable topN? Do you know of a better algorithm/approach for implementing this function?I agree with your definition, and can't think of a better algorithm, but wouldn't a stable sort have the same effect as your algorithm? Both are O(n lg(n)) time and O(n) space.
Jun 24 2014
On Tuesday, 24 June 2014 at 11:50:37 UTC, Xinok wrote:On Tuesday, 24 June 2014 at 04:40:48 UTC, Peter Alexander wrote:Yes you can't use stable sort. However you can use a priority queue, which would made the algorithm worst case O(M log N) in time and O(N) in space. (M is the number of elements in the original array and N the number of top elements)On Monday, 23 June 2014 at 22:47:20 UTC, Xinok wrote:A stable sort would lose the original ordering of the elements though. Furthermore, the entire point of the topN function is to be faster than simply sorting the range.What do you all think? Do you agree with my definition of stable topN? Do you know of a better algorithm/approach for implementing this function?I agree with your definition, and can't think of a better algorithm, but wouldn't a stable sort have the same effect as your algorithm? Both are O(n lg(n)) time and O(n) space.
Jun 24 2014
On Tuesday, 24 June 2014 at 11:50:37 UTC, Xinok wrote:On Tuesday, 24 June 2014 at 04:40:48 UTC, Peter Alexander wrote:Yeah ignore me. Just woke up realising I'd made that mistake, but you've already replied :-) However, is your version faster? The last stable partition step is O(n lg n), same as stable sort.On Monday, 23 June 2014 at 22:47:20 UTC, Xinok wrote:A stable sort would lose the original ordering of the elements though. Furthermore, the entire point of the topN function is to be faster than simply sorting the range.What do you all think? Do you agree with my definition of stable topN? Do you know of a better algorithm/approach for implementing this function?I agree with your definition, and can't think of a better algorithm, but wouldn't a stable sort have the same effect as your algorithm? Both are O(n lg(n)) time and O(n) space.
Jun 24 2014
On 6/23/14, 3:47 PM, Xinok wrote:I've managed to build a small laundry list the past couple days, so why not add another item? I've been tasked with implementing a "deterministic topN", but I also noticed that topN is lacking a stable implementation, so this seemed like an ideal opportunity to kill two birds with one stone. However, for me, it begged the question: What exactly does stable topN imply? A stable sort only implies that equal elements retain their original order. However, topN is a partitioning function whose pivot is unknown beforehand. To me, it would make more sense that not just equal elements but ALL elements retain their original order in their respective partitions. However, to accomplish the above, the Nth element would need to be discovered before any other elements are reordered. This suggests making a full copy of the range to search for the Nth element. The unstable topN could be applied to the copy to discover the Nth element, then the stable partition function could finish the job with the correct pivot. What do you all think? Do you agree with my definition of stable topN? Do you know of a better algorithm/approach for implementing this function?There's a notion of "semiStable" in std.algorithm that you may use for further refining topN guarantees. I'd say use semiStable for stability among the top N items (seeing as those are the most interesting), and stable for stability of all items. Andrei
Jun 24 2014
On Tuesday, 24 June 2014 at 16:03:26 UTC, Andrei Alexandrescu wrote:On 6/23/14, 3:47 PM, Xinok wrote:From what I know of partitioning algorithms, they're either intrinsically stable or intrinsically unstable. Implementing an algorithm with the attribute you described would likely come with little benefit. I could be wrong, but this is my intuition.I've managed to build a small laundry list the past couple days, so why not add another item? I've been tasked with implementing a "deterministic topN", but I also noticed that topN is lacking a stable implementation, so this seemed like an ideal opportunity to kill two birds with one stone. However, for me, it begged the question: What exactly does stable topN imply? A stable sort only implies that equal elements retain their original order. However, topN is a partitioning function whose pivot is unknown beforehand. To me, it would make more sense that not just equal elements but ALL elements retain their original order in their respective partitions. However, to accomplish the above, the Nth element would need to be discovered before any other elements are reordered. This suggests making a full copy of the range to search for the Nth element. The unstable topN could be applied to the copy to discover the Nth element, then the stable partition function could finish the job with the correct pivot. What do you all think? Do you agree with my definition of stable topN? Do you know of a better algorithm/approach for implementing this function?There's a notion of "semiStable" in std.algorithm that you may use for further refining topN guarantees. I'd say use semiStable for stability among the top N items (seeing as those are the most interesting), and stable for stability of all items. Andrei
Jun 24 2014
On 6/24/14, 11:50 AM, Xinok wrote:From what I know of partitioning algorithms, they're either intrinsically stable or intrinsically unstable. Implementing an algorithm with the attribute you described would likely come with little benefit. I could be wrong, but this is my intuition.I'd say start spinning some code and code will lead you to good insights regarding the relative costs of different stability guarantees. -- Andrei
Jun 24 2014
On 6/24/14, 11:50 AM, Xinok wrote: [snip] See also https://issues.dlang.org/show_bug.cgi?id=12987. Thanks! -- Andrei
Jun 24 2014
On Tuesday, 24 June 2014 at 22:09:39 UTC, Andrei Alexandrescu wrote:On 6/24/14, 11:50 AM, Xinok wrote: [snip] See also https://issues.dlang.org/show_bug.cgi?id=12987. Thanks! -- AndreiGiven that's a breaking change, I'd rather leave it open to discussion for a while. It would be really bad to silently change its behavior without so much as a deprecation warning. For now, I'll work on implementing deterministic & stable topN.
Jun 24 2014
On 6/24/14, 4:41 PM, Xinok wrote:On Tuesday, 24 June 2014 at 22:09:39 UTC, Andrei Alexandrescu wrote:I think the breakage brought about from changing return types from void to something else is acceptable. -- AndreiOn 6/24/14, 11:50 AM, Xinok wrote: [snip] See also https://issues.dlang.org/show_bug.cgi?id=12987. Thanks! -- AndreiGiven that's a breaking change, I'd rather leave it open to discussion for a while. It would be really bad to silently change its behavior without so much as a deprecation warning. For now, I'll work on implementing deterministic & stable topN.
Jun 24 2014
On Wednesday, 25 June 2014 at 00:50:48 UTC, Andrei Alexandrescu wrote:On 6/24/14, 4:41 PM, Xinok wrote:Agreed. Let's add that. DavidOn Tuesday, 24 June 2014 at 22:09:39 UTC, Andrei Alexandrescu wrote:I think the breakage brought about from changing return types from void to something else is acceptable. -- AndreiOn 6/24/14, 11:50 AM, Xinok wrote: [snip] See also https://issues.dlang.org/show_bug.cgi?id=12987. Thanks! -- AndreiGiven that's a breaking change, I'd rather leave it open to discussion for a while. It would be really bad to silently change its behavior without so much as a deprecation warning. For now, I'll work on implementing deterministic & stable topN.
Jun 24 2014
On Wednesday, 25 June 2014 at 00:50:48 UTC, Andrei Alexandrescu wrote:On 6/24/14, 4:41 PM, Xinok wrote:Ahhh, I missed that minor detail. :-P I had mistakenly assumed it returns the full range.On Tuesday, 24 June 2014 at 22:09:39 UTC, Andrei Alexandrescu wrote:I think the breakage brought about from changing return types from void to something else is acceptable. -- AndreiOn 6/24/14, 11:50 AM, Xinok wrote: [snip] See also https://issues.dlang.org/show_bug.cgi?id=12987. Thanks! -- AndreiGiven that's a breaking change, I'd rather leave it open to discussion for a while. It would be really bad to silently change its behavior without so much as a deprecation warning. For now, I'll work on implementing deterministic & stable topN.
Jun 24 2014