digitalmars.D - sortUniq
- Andrei Alexandrescu (8/8) Jan 22 2015 There's this classic patter on Unix: |sort|uniq, i.e. sort some data and...
- H. S. Teoh via Digitalmars-d (9/17) Jan 22 2015 [...]
- Vladimir Panteleev (7/12) Jan 22 2015 I think the only way you can go lower than O(n log n) is
- bearophile (5/7) Jan 22 2015 In Bugzilla I've asked for a hashGroup, it returns a built-in
- Justin Whear (3/16) Jan 22 2015 Efficiency improvement-wise, perhaps a generalization of a counting sort
- Andrei Alexandrescu (10/26) Jan 22 2015 Thanks. I'm thinking of something based on general comparisons.
- H. S. Teoh via Digitalmars-d (11/25) Jan 22 2015 [...]
- Vladimir Panteleev (3/4) Jan 22 2015 This should be always true, no? R1 might be empty, though.
- Andrei Alexandrescu (19/24) Jan 22 2015 Right, forgot to mention the pivot.
- H. S. Teoh via Digitalmars-d (21/51) Jan 22 2015 [...]
- Andrei Alexandrescu (5/18) Jan 22 2015 The glass-half-full view of this is you combine the advantages of
- H. S. Teoh via Digitalmars-d (30/50) Jan 22 2015 Well, I've been in the field long enough to realize that putting things
- H. S. Teoh via Digitalmars-d (65/88) Jan 22 2015 [...]
- Vladimir Panteleev (8/23) Jan 22 2015 I'm pretty sure that when Andrei said:
- H. S. Teoh via Digitalmars-d (19/41) Jan 22 2015 That's what I thought. At each step, you have to coalesce the elements
- Andrei Alexandrescu (6/23) Jan 22 2015 Yah, the key here is to exploit the combination to get better
- bearophile (5/8) Jan 22 2015 What I need most is a "groping by hashing", since years. A unique
- Andrei Alexandrescu (2/8) Jan 22 2015 I agree a hash uniq would be nice. -- Andrei
- Paul O'Neil (5/16) Jan 22 2015 Is that different than groupBy?
- bearophile (6/7) Jan 22 2015 It works with hashing, and doesn't require a sorting, so it's
- Xinok (20/28) Jan 22 2015 My thought is that uniq on a sorted list is only an O(n)
- H. S. Teoh via Digitalmars-d (15/30) Jan 22 2015 The problem with RedBlackTree is that it's cache-unfriendly due to the
- Laeeth Isharc (4/23) Jan 24 2015 In an interview Stepanov said he would have used b* trees instead
- zeljkog (10/12) Jan 23 2015 Loosely-related, compiling
- bearophile (4/13) Jan 23 2015 Please show a minimal nonworking example in the D.learn newsgroup.
- Andrei Alexandrescu (3/16) Jan 23 2015 Must be you put that code at top level and the compiler tried to
- H. S. Teoh via Digitalmars-d (8/31) Jan 23 2015 I think what he's trying to do is to call a function that returns a
- zeljkog (3/9) Jan 23 2015 Yes, I tried to use filter for unique, need closure.
- Andrei Alexandrescu (2/11) Jan 23 2015 Please post a complete snippet then. Thanks! -- Andrei
- zeljkog (17/29) Jan 23 2015 import std.stdio, std.algorithm;
- ixid (12/28) Jan 24 2015 Just add static to the closure.
- ixid (2/2) Jan 24 2015 Ignore me, sorry! Obviously that will not work for reuse of
- Peter Alexander (5/39) Jan 24 2015 auto f = unique();
- zeljkog (19/23) Jan 24 2015 Yes, I see :)
- Andrei Alexandrescu (3/26) Jan 24 2015 I can't make sense of this - where is Uniq supposed to be instantiated?
- zeljkog (30/62) Jan 24 2015 Require following changes in std.algorithm.filter.
- zeljkog (4/5) Jan 24 2015 Should be:
- Andrei Alexandrescu (4/11) Jan 24 2015 The problem is here - the creation of an empty object is unlikely to be
- zeljkog (6/19) Jan 24 2015 I think empty object could be useful, but perhaps there is better :)
- zeljkog (2/4) Jan 24 2015 Should be input range.
- Andrei Alexandrescu (2/7) Jan 24 2015 For filter it's a forward range if input is forward or better. -- Andrei
- zeljkog (2/9) Jan 24 2015 But here there is potential problem with shared state after save?
- Andrei Alexandrescu (3/14) Jan 24 2015 It's not a problem - save saves the state of the iteration, not the data...
- zeljkog (27/33) Jan 24 2015 If you pass stateful object with opCall or closure it becomes part of "s...
- H. S. Teoh via Digitalmars-d (8/17) Jan 24 2015 Hmm. Couldn't filter be made bidirectional too, if the underlying range
- Andrei Alexandrescu (3/17) Jan 24 2015 History: http://dlang.org/phobos/std_algorithm.html#.filterBidirectional
- H. S. Teoh via Digitalmars-d (30/41) Jan 24 2015 [...]
- Andrei Alexandrescu (4/42) Jan 24 2015 Hmmm... we do allow rvalues sometimes (e.g. for strings). I think we
- Peter Alexander (7/55) Jan 24 2015 I was wrong. R-values work, as long as they are compile time
- zeljkog (6/17) Jan 24 2015 It comes down to a question:
There's this classic patter on Unix: |sort|uniq, i.e. sort some data and only display the unique elements. What would be a better integrated version - one that does sorting and uniq in one shot? I suspect the combination could be quite a bit better than doing the two in sequence. A few google searches didn't yield much. Ideas? Thanks, Andrei
Jan 22 2015
On Thu, Jan 22, 2015 at 01:40:56PM -0800, Andrei Alexandrescu via Digitalmars-d wrote:There's this classic patter on Unix: |sort|uniq, i.e. sort some data and only display the unique elements. What would be a better integrated version - one that does sorting and uniq in one shot? I suspect the combination could be quite a bit better than doing the two in sequence. A few google searches didn't yield much. Ideas?[...] Off the top of my head: a sorting algorithm that swallows identical elements? I've never actually seen such an algorithm before, though. Generally, sorting algorithms try not to output less elements than they were given. :-P T -- Marketing: the art of convincing people to pay for what they didn't need before which you fail to deliver after.
Jan 22 2015
On Thursday, 22 January 2015 at 21:40:57 UTC, Andrei Alexandrescu wrote:There's this classic patter on Unix: |sort|uniq, i.e. sort some data and only display the unique elements. What would be a better integrated version - one that does sorting and uniq in one shot? I suspect the combination could be quite a bit better than doing the two in sequence.I think the only way you can go lower than O(n log n) is pigeonholing (counting sort without the counting), but for generic keys you need AAs which aren't O(1) (and slow). Maybe heapsort (you could drop duplicates while building the heap), but that needs an extra allocation.
Jan 22 2015
Andrei Alexandrescu:There's this classic patter on Unix: |sort|uniq, i.e. sort some data and only display the unique elements.In Bugzilla I've asked for a hashGroup, it returns a built-in associative array. Bye, bearophile
Jan 22 2015
On Thu, 22 Jan 2015 13:40:56 -0800, Andrei Alexandrescu wrote:There's this classic patter on Unix: |sort|uniq, i.e. sort some data and only display the unique elements. What would be a better integrated version - one that does sorting and uniq in one shot? I suspect the combination could be quite a bit better than doing the two in sequence. A few google searches didn't yield much. Ideas? Thanks, AndreiEfficiency improvement-wise, perhaps a generalization of a counting sort (http://en.wikipedia.org/wiki/Counting_sort), see "Variant algorithms."
Jan 22 2015
On 1/22/15 2:06 PM, Justin Whear wrote:On Thu, 22 Jan 2015 13:40:56 -0800, Andrei Alexandrescu wrote:Thanks. I'm thinking of something based on general comparisons. Something like a modified quicksort with fat pivot: A. Partition into three subranges: R1<pivot, R2=pivot, R3>pivot B. Recurse on R1 obtaining R11, a subrange of R1 containing only unique elements. C. Sort-move-unique R2 into the portion after R11, obtaining the result. A and B seem fine, I'm unclear about C. It would be an algorithm that sorts and simultaneously moves data in a range that overlaps the input. AndreiThere's this classic patter on Unix: |sort|uniq, i.e. sort some data and only display the unique elements. What would be a better integrated version - one that does sorting and uniq in one shot? I suspect the combination could be quite a bit better than doing the two in sequence. A few google searches didn't yield much. Ideas? Thanks, AndreiEfficiency improvement-wise, perhaps a generalization of a counting sort (http://en.wikipedia.org/wiki/Counting_sort), see "Variant algorithms."
Jan 22 2015
On Thu, Jan 22, 2015 at 02:29:58PM -0800, Andrei Alexandrescu via Digitalmars-d wrote: [...]Thanks. I'm thinking of something based on general comparisons. Something like a modified quicksort with fat pivot: A. Partition into three subranges: R1<pivot, R2=pivot, R3>pivot B. Recurse on R1 obtaining R11, a subrange of R1 containing only unique elements. C. Sort-move-unique R2 into the portion after R11, obtaining the result. A and B seem fine, I'm unclear about C. It would be an algorithm that sorts and simultaneously moves data in a range that overlaps the input.[...] Wait, if R1 and R3, after the recursion, are sorted and contain only unique elements, doesn't that mean that we can just collapse R2 into a single element along with the endpoints of R1 and R3? So if R1[$-1] < pivot, then we're done, otherwise coalesce it with the pivot, and likewise with R3[0]. T -- Famous last words: I wonder what will happen if I do *this*...
Jan 22 2015
On Thursday, 22 January 2015 at 22:53:59 UTC, H. S. Teoh via Digitalmars-d wrote:So if R1[$-1] < pivot,This should be always true, no? R1 might be empty, though.
Jan 22 2015
On 1/22/15 2:51 PM, H. S. Teoh via Digitalmars-d wrote:Wait, if R1 and R3, after the recursion, are sorted and contain only unique elements, doesn't that mean that we can just collapse R2 into a single element along with the endpoints of R1 and R3? So if R1[$-1] < pivot, then we're done, otherwise coalesce it with the pivot, and likewise with R3[0].Right, forgot to mention the pivot. You don't recurse on the right, only the left; that way you massage the unique stuff in R1 toward the left in the possibly smaller range R11. Then the challenge is to sort, uinique-fy, and move in one fell swoop from R3 (padded with the pivot to the left) into the portion adjacent to R11. Example: say the range has 100 elements. Then we divide like this: r[0 .. 30] is <pivot r[30 .. 34] is =pivot r[34 .. 100] is >pivot First, recurse on r[0 .. 30] getting a smaller range that's sorted and unique. Say r[0 .. 24]. The step needed now is to take r[33 .. 100], which is ONE copy of the pivot plus everything greater than pivot, and in one shot deposit it starting at r[24], sorted and uniquefied. I'm thinking of a modified heapsort that progressively creates the heap at r[24], or something like that. Andrei
Jan 22 2015
On Thu, Jan 22, 2015 at 04:10:45PM -0800, Andrei Alexandrescu via Digitalmars-d wrote:On 1/22/15 2:51 PM, H. S. Teoh via Digitalmars-d wrote:[...] Well, that would work. Recurse on left to get a uniqSorted left segment, then modified heapsort on right (instead of heapsorting in place, deposit extracted heap elements into the target range). The question, though, is whether this hybrid recursion actually wins us anything, since heapsort is known to be somewhat slower than quicksort. In the worst case, you choose a poor pivot and get *both* the worst case of quicksort and the inferior performance of heapsort compounded together. Unless there's a way to modify heapsort to be more efficient as well? Actually, another idea occurred to me: suppose we modify partition() instead, so that instead of partitioning in-place, it deposits partitioned elements into a target (possibly overlapping) range. Then we modify the right-recursion, so that its partitioning step actually simultaneously moves the range over to the target area for us. Not sure if this will eliminate the copying cost, but it might, since partition has to traverse the entire subrange anyway. T -- Старый друг лучше новых двух.Wait, if R1 and R3, after the recursion, are sorted and contain only unique elements, doesn't that mean that we can just collapse R2 into a single element along with the endpoints of R1 and R3? So if R1[$-1] < pivot, then we're done, otherwise coalesce it with the pivot, and likewise with R3[0].Right, forgot to mention the pivot. You don't recurse on the right, only the left; that way you massage the unique stuff in R1 toward the left in the possibly smaller range R11. Then the challenge is to sort, uinique-fy, and move in one fell swoop from R3 (padded with the pivot to the left) into the portion adjacent to R11. Example: say the range has 100 elements. Then we divide like this: r[0 .. 30] is <pivot r[30 .. 34] is =pivot r[34 .. 100] is >pivot First, recurse on r[0 .. 30] getting a smaller range that's sorted and unique. Say r[0 .. 24]. The step needed now is to take r[33 .. 100], which is ONE copy of the pivot plus everything greater than pivot, and in one shot deposit it starting at r[24], sorted and uniquefied. I'm thinking of a modified heapsort that progressively creates the heap at r[24], or something like that.
Jan 22 2015
On 1/22/15 4:26 PM, H. S. Teoh via Digitalmars-d wrote:The question, though, is whether this hybrid recursion actually wins us anything, since heapsort is known to be somewhat slower than quicksort. In the worst case, you choose a poor pivot and get*both* the worst case of quicksort and the inferior performance of heapsort compounded together.The glass-half-full view of this is you combine the advantages of quicksort and heapsort :o).Unless there's a way to modify heapsort to be more efficient as well? Actually, another idea occurred to me: suppose we modify partition() instead, so that instead of partitioning in-place, it deposits partitioned elements into a target (possibly overlapping) range. Then we modify the right-recursion, so that its partitioning step actually simultaneously moves the range over to the target area for us. Not sure if this will eliminate the copying cost, but it might, since partition has to traverse the entire subrange anyway.That's interesting! Andrei
Jan 22 2015
On Thu, Jan 22, 2015 at 04:33:28PM -0800, Andrei Alexandrescu via Digitalmars-d wrote:On 1/22/15 4:26 PM, H. S. Teoh via Digitalmars-d wrote:Well, I've been in the field long enough to realize that putting things together more often than not causes increase in complexity (usually accompanied by poorer performance) rather than the other way round. It's part of why algorithms research is so hard... but also so rewarding, when you do hit upon a winning combination that actually reduces complexity and/or improves performance.The question, though, is whether this hybrid recursion actually wins us anything, since heapsort is known to be somewhat slower than quicksort. In the worst case, you choose a poor pivot and get*both* the worst case of quicksort and the inferior performance of heapsort compounded together.The glass-half-full view of this is you combine the advantages of quicksort and heapsort :o).[...] Here's a crude first-impressions sketch of how this modified partition might work: - As we move up the range, if the current element is less than the pivot, we deposit it into the growing left partition which is in the target location rather than the original location. - If the current element is equal to the pivot, we ignore it (this eliminates the pivot and its duplicates altogether from the left/right partitions). - The tricky case is if it's greater than the pivot. In the normal quicksort partition, we do nothing -- it will get moved later if it's occupying the space that belongs to the left partition. However, in our case, we need to move it *somewhere*, otherwise it will get lost if it doesn't lie in the overlapping area of the target range and the range being partitioned. One idea is to move it to the end of the target range, so that the two partitions grow from opposite ends of the target range, and they should meet in the middle (otherwise the target range is of the wrong size). However, the details of how this can be done is still murky. I'll have to think more about this. T -- Talk is cheap. Whining is actually free. -- Lars WirzeniusUnless there's a way to modify heapsort to be more efficient as well? Actually, another idea occurred to me: suppose we modify partition() instead, so that instead of partitioning in-place, it deposits partitioned elements into a target (possibly overlapping) range. Then we modify the right-recursion, so that its partitioning step actually simultaneously moves the range over to the target area for us. Not sure if this will eliminate the copying cost, but it might, since partition has to traverse the entire subrange anyway.That's interesting!
Jan 22 2015
On Thu, Jan 22, 2015 at 02:51:38PM -0800, H. S. Teoh via Digitalmars-d wrote:On Thu, Jan 22, 2015 at 02:29:58PM -0800, Andrei Alexandrescu via Digitalmars-d wrote: [...][...] Working proof of concept: import std.algorithm; import std.array; import std.stdio; // Returns: final index of pivot size_t partition(R)(R r) { auto pivot = r[0]; swap(r[0], r[$-1]); // move pivot to end auto leftIdx = 0; foreach (i; 0 .. r.length) { if (r[i] < pivot) swap(r[i], r[leftIdx++]); } swap(r[leftIdx], r[$-1]); return leftIdx; } R uniqueSort(R)(R r) { if (r.empty) return r; // Partition auto pivotIdx = partition(r); auto pivot = r[pivotIdx]; // Recurse auto left = uniqueSort(r[0 .. pivotIdx]); auto right = uniqueSort(r[pivotIdx + 1 .. $]); // Coalesce left ~ pivot ~ right. if (!left.empty && left[$-1] == pivot) left = left[0 .. $-1]; if (!right.empty && right[0] == pivot) right = right[1 .. $]; r[left.length] = pivot; auto vacant = copy(right, r[left.length + 1 .. $]); return r[0 .. $ - vacant.length]; } unittest { auto data = [4, 0, 9, 7, 2, 5, 1, 1, 3, 4, 2, 0, 9, 5, 3, 6, 1, 7, 4, 8]; auto sorted = data.uniqueSort(); assert(sorted == [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]); } I'm not sure about the performance of this algorithm, though, because the call to copy() might become prohibitive once you add up all the recursive calls to uniqueSort(). Also, I only got one test case working; no idea if there are subtle bugs that are lurking in there somewhere. Please test with more test cases. :-) T -- "No, John. I want formats that are actually useful, rather than over-featured megaliths that address all questions by piling on ridiculous internal links in forms which are hideously over-complex." -- Simon St. Laurent on xml-devThanks. I'm thinking of something based on general comparisons. Something like a modified quicksort with fat pivot: A. Partition into three subranges: R1<pivot, R2=pivot, R3>pivot B. Recurse on R1 obtaining R11, a subrange of R1 containing only unique elements. C. Sort-move-unique R2 into the portion after R11, obtaining the result. A and B seem fine, I'm unclear about C. It would be an algorithm that sorts and simultaneously moves data in a range that overlaps the input.[...] Wait, if R1 and R3, after the recursion, are sorted and contain only unique elements, doesn't that mean that we can just collapse R2 into a single element along with the endpoints of R1 and R3? So if R1[$-1] < pivot, then we're done, otherwise coalesce it with the pivot, and likewise with R3[0].
Jan 22 2015
On Thursday, 22 January 2015 at 23:35:32 UTC, H. S. Teoh via Digitalmars-d wrote:On Thu, Jan 22, 2015 at 02:51:38PM -0800, H. S. Teoh via Digitalmars-d wrote:I'm pretty sure that when Andrei said:Wait, if R1 and R3, after the recursion, are sorted and contain only unique elements, doesn't that mean that we can just collapse R2 into a single element along with the endpoints of R1 and R3? So if R1[$-1] < pivot, then we're done, otherwise coalesce it with the pivot, and likewise with R3[0].He meant R3, not R2. R2 is obviously collapsed to one element (the pivot).C. Sort-move-unique R2 into the portion after R11, obtaining the result.Working proof of concept:I think this will actually have worse performance than sort+uniq, since you're now shuffling half of the elements at each recursion depth - an additional O(n log n).
Jan 22 2015
On Thu, Jan 22, 2015 at 11:52:18PM +0000, Vladimir Panteleev via Digitalmars-d wrote:On Thursday, 22 January 2015 at 23:35:32 UTC, H. S. Teoh via Digitalmars-d wrote:Ah, that's why I misunderstood him.On Thu, Jan 22, 2015 at 02:51:38PM -0800, H. S. Teoh via Digitalmars-d wrote:I'm pretty sure that when Andrei said:Wait, if R1 and R3, after the recursion, are sorted and contain only unique elements, doesn't that mean that we can just collapse R2 into a single element along with the endpoints of R1 and R3? So if R1[$-1] < pivot, then we're done, otherwise coalesce it with the pivot, and likewise with R3[0].He meant R3, not R2. R2 is obviously collapsed to one element (the pivot).C. Sort-move-unique R2 into the portion after R11, obtaining > the result.That's what I thought. At each step, you have to coalesce the elements by copying `right` over. So that adds up to quite a cost. Actually, I just realized that the code has an unnecessary check for left[$-1] being equal to pivot: this is actually impossible, since partition() makes sure that the left half of the range is strictly less than the pivot, so it cannot possibly contain the pivot. So we can eliminate that check. But it doesn't solve the problem of excessive copying. Hmm... actually, I think it might be possible to modify partition so that it eliminates duplicated pivots for you. However, this still doesn't solve the recursive case where `left` might have shrunk, thus necessitating `right` be shifted over afterwards. A modified heapsort might be a better candidate for uniqueSort() than quicksort, it looks like. T -- Music critic: "That's an imitation fugue!"Working proof of concept:I think this will actually have worse performance than sort+uniq, since you're now shuffling half of the elements at each recursion depth - an additional O(n log n).
Jan 22 2015
On 1/22/15 3:52 PM, Vladimir Panteleev wrote:On Thursday, 22 January 2015 at 23:35:32 UTC, H. S. Teoh via Digitalmars-d wrote:Yes, thanks and sorry for the distraction.On Thu, Jan 22, 2015 at 02:51:38PM -0800, H. S. Teoh via Digitalmars-d wrote:I'm pretty sure that when Andrei said:Wait, if R1 and R3, after the recursion, are sorted and contain only unique elements, doesn't that mean that we can just collapse R2 into a single element along with the endpoints of R1 and R3? So if R1[$-1] < pivot, then we're done, otherwise coalesce it with the pivot, and likewise with R3[0].He meant R3, not R2. R2 is obviously collapsed to one element (the pivot).C. Sort-move-unique R2 into the portion after R11, obtaining > the result.Yah, the key here is to exploit the combination to get better performance than composing the two. So you must do surgery on the core algorithms involved. AndreiWorking proof of concept:I think this will actually have worse performance than sort+uniq, since you're now shuffling half of the elements at each recursion depth - an additional O(n log n).
Jan 22 2015
Andrei Alexandrescu:What would be a better integrated version - one that does sorting and uniq in one shot? I suspect the combination could be quite a bit better than doing the two in sequence.What I need most is a "groping by hashing", since years. A unique sort can be handy, but it's less useful that the first. Bye, bearophile
Jan 22 2015
On 1/22/15 4:06 PM, bearophile wrote:Andrei Alexandrescu:I agree a hash uniq would be nice. -- AndreiWhat would be a better integrated version - one that does sorting and uniq in one shot? I suspect the combination could be quite a bit better than doing the two in sequence.What I need most is a "groping by hashing", since years. A unique sort can be handy, but it's less useful that the first.
Jan 22 2015
On 01/22/2015 07:06 PM, bearophile wrote:Andrei Alexandrescu:Is that different than groupBy? -- Paul O'Neil Github / IRC: todaymanWhat would be a better integrated version - one that does sorting and uniq in one shot? I suspect the combination could be quite a bit better than doing the two in sequence.What I need most is a "groping by hashing", since years. A unique sort can be handy, but it's less useful that the first. Bye, bearophile
Jan 22 2015
Paul O'Neil:Is that different than groupBy?It works with hashing, and doesn't require a sorting, so it's different. On the other hand it's just few lines of code long, and it's quite easy to write... Bye, bearophile
Jan 22 2015
On Thursday, 22 January 2015 at 21:40:57 UTC, Andrei Alexandrescu wrote:There's this classic patter on Unix: |sort|uniq, i.e. sort some data and only display the unique elements. What would be a better integrated version - one that does sorting and uniq in one shot? I suspect the combination could be quite a bit better than doing the two in sequence. A few google searches didn't yield much. Ideas? Thanks, AndreiMy thought is that uniq on a sorted list is only an O(n) operation, so it's not an expensive operation by any means. If there's to be any benefit to a sortUniq, it has to be capable of reducing work incurred during the sorting process; else you're just going to end up with something less efficient. One solution may be RedBlackTree, which has an option to disallow duplicate elements. This container has three useful properties: (1) The tree grows one element at a time. This is in opposition to other algorithms like quicksort or heapsort in which you must operate on the entire set of elements at once. (2) Removing duplicates only requires a single comparison per element, thus retaining the worst-case of |sort|uniq. (3) Duplicate elements are removed immediately. Inserting an element into a balanced tree with N elements is an O(lg n) operation, so the smaller n is, the less work that is required. The shortcoming of RedBlackTree is that it's not very fast. However, I don't know any other O(n lg n) sorting algorithm which has these properties.
Jan 22 2015
On Fri, Jan 23, 2015 at 12:59:51AM +0000, Xinok via Digitalmars-d wrote:On Thursday, 22 January 2015 at 21:40:57 UTC, Andrei Alexandrescu wrote:[...]There's this classic patter on Unix: |sort|uniq, i.e. sort some data and only display the unique elements. What would be a better integrated version - one that does sorting and uniq in one shot? I suspect the combination could be quite a bit better than doing the two in sequence.My thought is that uniq on a sorted list is only an O(n) operation, so it's not an expensive operation by any means. If there's to be any benefit to a sortUniq, it has to be capable of reducing work incurred during the sorting process; else you're just going to end up with something less efficient. One solution may be RedBlackTree, which has an option to disallow duplicate elements. This container has three useful properties:The problem with RedBlackTree is that it's cache-unfriendly due to the memory allocation and per-node pointers. One reason quicksort performs so well is because it works in-place, and the partition operation accesses elements bilinearly (i.e., two linear traversals over the same stretch of memory interleaved with each other), so it's very likely that most of the accesses will hit the cache (possibly even all, if hardware prefetching is present). Whereas with RedBlackTree, it's jumping all over memory, so there will be a higher rate of cache misses and the resulting slow RAM fetches. Plus, memory allocation is slooow, and is best avoided where possible. T -- What's a "hot crossed bun"? An angry rabbit.
Jan 22 2015
The problem with RedBlackTree is that it's cache-unfriendly due to the memory allocation and per-node pointers. One reason quicksort performs so well is because it works in-place, and the partition operation accesses elements bilinearly (i.e., two linear traversals over the same stretch of memory interleaved with each other), so it's very likely that most of the accesses will hit the cache (possibly even all, if hardware prefetching is present). Whereas with RedBlackTree, it's jumping all over memory, so there will be a higher rate of cache misses and the resulting slow RAM fetches. Plus, memory allocation is slooow, and is best avoided where possible.In an interview Stepanov said he would have used b* trees instead for this reason if he were writing STL today. Here is a (dated) benchmark comparison of hash libraries: https://attractivechaos.wordpress.com/2008/08/28/comparison-of-hash-table-libraries/
Jan 24 2015
On 22.01.15 22:40, Andrei Alexandrescu wrote:There's this classic patter on Unix: |sort|uniq, i.e. sort some data and only display the unique elements.Loosely-related, compiling ... auto ret = arr.filter!(myFilter()); ... I got: Error: closures are not yet supported in CTFE Using struct with opCall directly also does not pass. Did I miss something? Is it near?
Jan 23 2015
zeljkog:Loosely-related, compiling ... auto ret = arr.filter!(myFilter()); ... I got: Error: closures are not yet supported in CTFE Using struct with opCall directly also does not pass. Did I miss something? Is it near?Please show a minimal nonworking example in the D.learn newsgroup. Bye, bearophile
Jan 23 2015
On 1/23/15 7:04 AM, zeljkog wrote:On 22.01.15 22:40, Andrei Alexandrescu wrote:Must be you put that code at top level and the compiler tried to interpret it during compilation. -- AndreiThere's this classic patter on Unix: |sort|uniq, i.e. sort some data and only display the unique elements.Loosely-related, compiling ... auto ret = arr.filter!(myFilter()); ... I got: Error: closures are not yet supported in CTFE Using struct with opCall directly also does not pass. Did I miss something? Is it near?
Jan 23 2015
On Fri, Jan 23, 2015 at 09:40:42AM -0800, Andrei Alexandrescu via Digitalmars-d wrote:On 1/23/15 7:04 AM, zeljkog wrote:I think what he's trying to do is to call a function that returns a delegate, and use that delegate to instantiate the filter template. AFAIK I've never seen code like this before, and it looks like the compiler isn't prepared to handle this. T -- I am not young enough to know everything. -- Oscar WildeOn 22.01.15 22:40, Andrei Alexandrescu wrote:Must be you put that code at top level and the compiler tried to interpret it during compilation. -- AndreiThere's this classic patter on Unix: |sort|uniq, i.e. sort some data and only display the unique elements.Loosely-related, compiling ... auto ret = arr.filter!(myFilter()); ... I got: Error: closures are not yet supported in CTFE Using struct with opCall directly also does not pass. Did I miss something? Is it near?
Jan 23 2015
On 23.01.15 18:48, H. S. Teoh via Digitalmars-d wrote:I think what he's trying to do is to call a function that returns a delegate, and use that delegate to instantiate the filter template. AFAIK I've never seen code like this before, and it looks like the compiler isn't prepared to handle this.Yes, I tried to use filter for unique, need closure. I think there are many applications for this pattern.
Jan 23 2015
On 1/23/15 10:05 AM, zeljkog wrote:On 23.01.15 18:48, H. S. Teoh via Digitalmars-d wrote:Please post a complete snippet then. Thanks! -- AndreiI think what he's trying to do is to call a function that returns a delegate, and use that delegate to instantiate the filter template. AFAIK I've never seen code like this before, and it looks like the compiler isn't prepared to handle this.Yes, I tried to use filter for unique, need closure. I think there are many applications for this pattern.
Jan 23 2015
On 23.01.15 19:13, Andrei Alexandrescu wrote:On 1/23/15 10:05 AM, zeljkog wrote:import std.stdio, std.algorithm; auto unique(){ bool[int] c; return (int a){ if (a in c) return false; else{ c[a] = true; return true; } }; } void main() { [1, 5, 5, 2, 1, 5, 6, 6].filter!(unique()).writeln; }On 23.01.15 18:48, H. S. Teoh via Digitalmars-d wrote:Please post a complete snippet then. Thanks! -- AndreiI think what he's trying to do is to call a function that returns a delegate, and use that delegate to instantiate the filter template. AFAIK I've never seen code like this before, and it looks like the compiler isn't prepared to handle this.Yes, I tried to use filter for unique, need closure. I think there are many applications for this pattern.
Jan 23 2015
import std.stdio, std.algorithm; auto unique(){ bool[int] c; return (int a){ if (a in c) return false; else{ c[a] = true; return true; } }; } void main() { [1, 5, 5, 2, 1, 5, 6, 6].filter!(unique()).writeln; }Just add static to the closure. auto unique(){ static bool[int] c; return (int a){ if (a in c) return false; else{ c[a] = true; return true; } }; }
Jan 24 2015
Ignore me, sorry! Obviously that will not work for reuse of unique.
Jan 24 2015
On Friday, 23 January 2015 at 18:22:08 UTC, zeljkog wrote:On 23.01.15 19:13, Andrei Alexandrescu wrote:auto f = unique(); [1, 5, 5, 2, 1, 5, 6, 6].filter!(f).writeln; // [1, 5, 2, 6] Filter needs an alias, and you cannot alias an R-value (it has no symbol).On 1/23/15 10:05 AM, zeljkog wrote:import std.stdio, std.algorithm; auto unique(){ bool[int] c; return (int a){ if (a in c) return false; else{ c[a] = true; return true; } }; } void main() { [1, 5, 5, 2, 1, 5, 6, 6].filter!(unique()).writeln; }On 23.01.15 18:48, H. S. Teoh via Digitalmars-d wrote:Please post a complete snippet then. Thanks! -- AndreiI think what he's trying to do is to call a function that returns a delegate, and use that delegate to instantiate the filter template. AFAIK I've never seen code like this before, and it looks like the compiler isn't prepared to handle this.Yes, I tried to use filter for unique, need closure. I think there are many applications for this pattern.
Jan 24 2015
On 24.01.15 12:50, Peter Alexander wrote:auto f = unique(); [1, 5, 5, 2, 1, 5, 6, 6].filter!(f).writeln; // [1, 5, 2, 6] Filter needs an alias, and you cannot alias an R-value (it has no symbol).Yes, I see :) But I think this should be supported by std.algorithm: import std.stdio, std.algorithm; struct Uniq{ bool[int] c; bool opCall(int a){ if (a in c) return false; else{ c[a] = true; return true; } } } void main() { [1, 5, 5, 2, 1, 5, 6, 6].filter!Uniq.writeln; }
Jan 24 2015
On 1/24/15 4:13 AM, zeljkog wrote:On 24.01.15 12:50, Peter Alexander wrote:I can't make sense of this - where is Uniq supposed to be instantiated? -- Andreiauto f = unique(); [1, 5, 5, 2, 1, 5, 6, 6].filter!(f).writeln; // [1, 5, 2, 6] Filter needs an alias, and you cannot alias an R-value (it has no symbol).Yes, I see :) But I think this should be supported by std.algorithm: import std.stdio, std.algorithm; struct Uniq{ bool[int] c; bool opCall(int a){ if (a in c) return false; else{ c[a] = true; return true; } } } void main() { [1, 5, 5, 2, 1, 5, 6, 6].filter!Uniq.writeln; }
Jan 24 2015
On 24.01.15 15:53, Andrei Alexandrescu wrote:On 1/24/15 4:13 AM, zeljkog wrote:Require following changes in std.algorithm.filter. original: template filter(alias predicate) if (is(typeof(unaryFun!predicate))) { auto filter(Range)(Range range) if (isInputRange!(Unqual!Range)) { return FilterResult!(unaryFun!predicate, Range)(range); } } private struct FilterResult(alias pred, Range) ... changed (to run this case): template filter(alias predicate) if (is(typeof(unaryFun!predicate))) { auto filter(Range)(Range range) if (isInputRange!(Unqual!Range)) { static if (is(predicate == struct)) return FilterResult!(predicate, Range)(range); else return FilterResult!(unaryFun!predicate, Range)(range); } } private struct FilterResult(alias predicate, Range) { static if (is(predicate == struct)) predicate pred; else alias pred = predicate; ...On 24.01.15 12:50, Peter Alexander wrote:I can't make sense of this - where is Uniq supposed to be instantiated? -- Andreiauto f = unique(); [1, 5, 5, 2, 1, 5, 6, 6].filter!(f).writeln; // [1, 5, 2, 6] Filter needs an alias, and you cannot alias an R-value (it has no symbol).Yes, I see :) But I think this should be supported by std.algorithm: import std.stdio, std.algorithm; struct Uniq{ bool[int] c; bool opCall(int a){ if (a in c) return false; else{ c[a] = true; return true; } } } void main() { [1, 5, 5, 2, 1, 5, 6, 6].filter!Uniq.writeln; }
Jan 24 2015
On 24.01.15 16:17, zeljkog wrote:template filter(alias predicate) if (is(typeof(unaryFun!predicate)))Should be: template filter(alias predicate) if (is(typeof(unaryFun!predicate)) || is(predicate == struct))
Jan 24 2015
On 1/24/15 7:17 AM, zeljkog wrote:private struct FilterResult(alias predicate, Range) { static if (is(predicate == struct)) predicate pred; else alias pred = predicate; ...The problem is here - the creation of an empty object is unlikely to be of use to the caller. We should, however, support the case when the caller passes a stateful predicate into the call. -- Andrei
Jan 24 2015
On 24.01.15 18:02, Andrei Alexandrescu wrote:On 1/24/15 7:17 AM, zeljkog wrote:I think empty object could be useful, but perhaps there is better :) Until now I concluded: 1. Resulting range must be output range. Right? I don't know is it considerable problem? 2. unaryFun can pass down struct (with opCall check?)private struct FilterResult(alias predicate, Range) { static if (is(predicate == struct)) predicate pred; else alias pred = predicate; ...The problem is here - the creation of an empty object is unlikely to be of use to the caller. We should, however, support the case when the caller passes a stateful predicate into the call. -- Andrei
Jan 24 2015
On 24.01.15 18:26, zeljkog wrote:Until now I concluded: 1. Resulting range must be output range. Right?Should be input range.
Jan 24 2015
On 1/24/15 9:34 AM, zeljkog wrote:On 24.01.15 18:26, zeljkog wrote:For filter it's a forward range if input is forward or better. -- AndreiUntil now I concluded: 1. Resulting range must be output range. Right?Should be input range.
Jan 24 2015
On 24.01.15 19:35, Andrei Alexandrescu wrote:On 1/24/15 9:34 AM, zeljkog wrote:But here there is potential problem with shared state after save?On 24.01.15 18:26, zeljkog wrote:For filter it's a forward range if input is forward or better. -- AndreiUntil now I concluded: 1. Resulting range must be output range. Right?Should be input range.
Jan 24 2015
On 1/24/15 10:41 AM, zeljkog wrote:On 24.01.15 19:35, Andrei Alexandrescu wrote:It's not a problem - save saves the state of the iteration, not the data being iterated. -- AndreiOn 1/24/15 9:34 AM, zeljkog wrote:But here there is potential problem with shared state after save?On 24.01.15 18:26, zeljkog wrote:For filter it's a forward range if input is forward or better. -- AndreiUntil now I concluded: 1. Resulting range must be output range. Right?Should be input range.
Jan 24 2015
On 24.01.15 19:47, Andrei Alexandrescu wrote:If you pass stateful object with opCall or closure it becomes part of "state of the iteration"? Consider: import std.stdio, std.algorithm; struct Uniq{ bool[int] c; bool opCall(int a){ if (a in c) return false; else{ c[a] = true; return true; } } } void main() { Uniq a; auto arr = [1, 5, 5, 2, 1, 5, 6, 6]; auto r = arr.filter!a; auto r1 = r.save(); r.writeln; r1.writeln; } output: [1, 2, 6] [5]But here there is potential problem with shared state after save?It's not a problem - save saves the state of the iteration, not the data being iterated. -- Andrei
Jan 24 2015
On Sat, Jan 24, 2015 at 10:35:13AM -0800, Andrei Alexandrescu via Digitalmars-d wrote:On 1/24/15 9:34 AM, zeljkog wrote:Hmm. Couldn't filter be made bidirectional too, if the underlying range is also bidirectional? After all, filtering doesn't depend on the surrounding elements, so in theory you should be able to popBack as well. T -- "How are you doing?" "Doing what?"On 24.01.15 18:26, zeljkog wrote:For filter it's a forward range if input is forward or better. -- AndreiUntil now I concluded: 1. Resulting range must be output range. Right?Should be input range.
Jan 24 2015
On 1/24/15 11:00 AM, H. S. Teoh via Digitalmars-d wrote:On Sat, Jan 24, 2015 at 10:35:13AM -0800, Andrei Alexandrescu via Digitalmars-d wrote:AndreiOn 1/24/15 9:34 AM, zeljkog wrote:Hmm. Couldn't filter be made bidirectional too, if the underlying range is also bidirectional? After all, filtering doesn't depend on the surrounding elements, so in theory you should be able to popBack as well.On 24.01.15 18:26, zeljkog wrote:For filter it's a forward range if input is forward or better. -- AndreiUntil now I concluded: 1. Resulting range must be output range. Right?Should be input range.
Jan 24 2015
On Sat, Jan 24, 2015 at 11:20:55AM -0800, Andrei Alexandrescu via Digitalmars-d wrote:On 1/24/15 11:00 AM, H. S. Teoh via Digitalmars-d wrote:[...]On Sat, Jan 24, 2015 at 10:35:13AM -0800, Andrei Alexandrescu via Digitalmars-d wrote:[...] Ah, I should have known, since I do remember seeing this when splitting std.algorithm. I guess it didn't really register beyond "this function belongs in std.algorithm.iteration". :-P It's kinda sad that we can't merge the two functions and make it so that the initial scanning of .back only happens lazily. I mean, in theory, this should be easy: struct Filter { R range; ... property auto back() { while (!range.empty && !pred(range.back)) range.popBack(); return range.back; } void popBack() { range.popBack(); } } But I remember there was a huge discussion over whether .front/.back/.empty should do any non-trivial work. The current implementation appears to be the consequence of shoehorning all work into popFront/popBack. But in my mind, these methods ought to be abstract, and the user shouldn't be able to see any difference in behaviour regardless of where the real work is actually done. T -- People who are more than casually interested in computers should have at least some idea of what the underlying hardware is like. Otherwise the programs they write will be pretty weird. -- D. KnuthFor filter it's a forward range if input is forward or better. -- AndreiHmm. Couldn't filter be made bidirectional too, if the underlying range is also bidirectional? After all, filtering doesn't depend on the surrounding elements, so in theory you should be able to popBack as well.
Jan 24 2015
On 1/24/15 3:50 AM, Peter Alexander wrote:On Friday, 23 January 2015 at 18:22:08 UTC, zeljkog wrote:Hmmm... we do allow rvalues sometimes (e.g. for strings). I think we could and should relax the rule to allow rvalues in this case, too. -- AndreiOn 23.01.15 19:13, Andrei Alexandrescu wrote:auto f = unique(); [1, 5, 5, 2, 1, 5, 6, 6].filter!(f).writeln; // [1, 5, 2, 6] Filter needs an alias, and you cannot alias an R-value (it has no symbol).On 1/23/15 10:05 AM, zeljkog wrote:import std.stdio, std.algorithm; auto unique(){ bool[int] c; return (int a){ if (a in c) return false; else{ c[a] = true; return true; } }; } void main() { [1, 5, 5, 2, 1, 5, 6, 6].filter!(unique()).writeln; }On 23.01.15 18:48, H. S. Teoh via Digitalmars-d wrote:Please post a complete snippet then. Thanks! -- AndreiI think what he's trying to do is to call a function that returns a delegate, and use that delegate to instantiate the filter template. AFAIK I've never seen code like this before, and it looks like the compiler isn't prepared to handle this.Yes, I tried to use filter for unique, need closure. I think there are many applications for this pattern.
Jan 24 2015
On Saturday, 24 January 2015 at 14:53:02 UTC, Andrei Alexandrescu wrote:On 1/24/15 3:50 AM, Peter Alexander wrote:I was wrong. R-values work, as long as they are compile time constants. The problem here is that closures don't yet work in CTFE (as the error says). Pulling it out as a local variable works because the alias then binds to the symbol rather than the value.On Friday, 23 January 2015 at 18:22:08 UTC, zeljkog wrote:Hmmm... we do allow rvalues sometimes (e.g. for strings). I think we could and should relax the rule to allow rvalues in this case, too. -- AndreiOn 23.01.15 19:13, Andrei Alexandrescu wrote:auto f = unique(); [1, 5, 5, 2, 1, 5, 6, 6].filter!(f).writeln; // [1, 5, 2, 6] Filter needs an alias, and you cannot alias an R-value (it has no symbol).On 1/23/15 10:05 AM, zeljkog wrote:import std.stdio, std.algorithm; auto unique(){ bool[int] c; return (int a){ if (a in c) return false; else{ c[a] = true; return true; } }; } void main() { [1, 5, 5, 2, 1, 5, 6, 6].filter!(unique()).writeln; }On 23.01.15 18:48, H. S. Teoh via Digitalmars-d wrote:Please post a complete snippet then. Thanks! -- AndreiI think what he's trying to do is to call a function that returns a delegate, and use that delegate to instantiate the filter template. AFAIK I've never seen code like this before, and it looks like the compiler isn't prepared to handle this.Yes, I tried to use filter for unique, need closure. I think there are many applications for this pattern.
Jan 24 2015
On 23.01.15 19:05, zeljkog wrote:On 23.01.15 18:48, H. S. Teoh via Digitalmars-d wrote:It comes down to a question: Why structs with opCall are not supported as template parameters in std.algorithm together with functions? Like this: filter!SomeStructWithOpCall It is simple addition. Or is it with reason?I think what he's trying to do is to call a function that returns a delegate, and use that delegate to instantiate the filter template. AFAIK I've never seen code like this before, and it looks like the compiler isn't prepared to handle this.Yes, I tried to use filter for unique, need closure. I think there are many applications for this pattern.
Jan 24 2015