digitalmars.D - sortUniq

Andrei Alexandrescu (8/8) Jan 22 2015 There's this classic patter on Unix: |sort|uniq, i.e. sort some data and...

H. S. Teoh via Digitalmars-d (9/17) Jan 22 2015 [...]
Vladimir Panteleev (7/12) Jan 22 2015 I think the only way you can go lower than O(n log n) is
bearophile (5/7) Jan 22 2015 In Bugzilla I've asked for a hashGroup, it returns a built-in
Justin Whear (3/16) Jan 22 2015 Efficiency improvement-wise, perhaps a generalization of a counting sort

Andrei Alexandrescu (10/26) Jan 22 2015 Thanks. I'm thinking of something based on general comparisons.

H. S. Teoh via Digitalmars-d (11/25) Jan 22 2015 [...]

Vladimir Panteleev (3/4) Jan 22 2015 This should be always true, no? R1 might be empty, though.
Andrei Alexandrescu (19/24) Jan 22 2015 Right, forgot to mention the pivot.

H. S. Teoh via Digitalmars-d (21/51) Jan 22 2015 [...]

Andrei Alexandrescu (5/18) Jan 22 2015 The glass-half-full view of this is you combine the advantages of

H. S. Teoh via Digitalmars-d (30/50) Jan 22 2015 Well, I've been in the field long enough to realize that putting things

H. S. Teoh via Digitalmars-d (65/88) Jan 22 2015 [...]

Vladimir Panteleev (8/23) Jan 22 2015 I'm pretty sure that when Andrei said:

H. S. Teoh via Digitalmars-d (19/41) Jan 22 2015 That's what I thought. At each step, you have to coalesce the elements
Andrei Alexandrescu (6/23) Jan 22 2015 Yah, the key here is to exploit the combination to get better

bearophile (5/8) Jan 22 2015 What I need most is a "groping by hashing", since years. A unique

Andrei Alexandrescu (2/8) Jan 22 2015 I agree a hash uniq would be nice. -- Andrei
Paul O'Neil (5/16) Jan 22 2015 Is that different than groupBy?

bearophile (6/7) Jan 22 2015 It works with hashing, and doesn't require a sorting, so it's

Xinok (20/28) Jan 22 2015 My thought is that uniq on a sorted list is only an O(n)

H. S. Teoh via Digitalmars-d (15/30) Jan 22 2015 The problem with RedBlackTree is that it's cache-unfriendly due to the

Laeeth Isharc (4/23) Jan 24 2015 In an interview Stepanov said he would have used b* trees instead

zeljkog (10/12) Jan 23 2015 Loosely-related, compiling

bearophile (4/13) Jan 23 2015 Please show a minimal nonworking example in the D.learn newsgroup.
Andrei Alexandrescu (3/16) Jan 23 2015 Must be you put that code at top level and the compiler tried to

H. S. Teoh via Digitalmars-d (8/31) Jan 23 2015 I think what he's trying to do is to call a function that returns a

zeljkog (3/9) Jan 23 2015 Yes, I tried to use filter for unique, need closure.

Andrei Alexandrescu (2/11) Jan 23 2015 Please post a complete snippet then. Thanks! -- Andrei

zeljkog (17/29) Jan 23 2015 import std.stdio, std.algorithm;

ixid (12/28) Jan 24 2015 Just add static to the closure.

ixid (2/2) Jan 24 2015 Ignore me, sorry! Obviously that will not work for reuse of

Peter Alexander (5/39) Jan 24 2015 auto f = unique();

zeljkog (19/23) Jan 24 2015 Yes, I see :)

Andrei Alexandrescu (3/26) Jan 24 2015 I can't make sense of this - where is Uniq supposed to be instantiated?

zeljkog (30/62) Jan 24 2015 Require following changes in std.algorithm.filter.

zeljkog (4/5) Jan 24 2015 Should be:
Andrei Alexandrescu (4/11) Jan 24 2015 The problem is here - the creation of an empty object is unlikely to be

zeljkog (6/19) Jan 24 2015 I think empty object could be useful, but perhaps there is better :)

zeljkog (2/4) Jan 24 2015 Should be input range.

Andrei Alexandrescu (2/7) Jan 24 2015 For filter it's a forward range if input is forward or better. -- Andrei

zeljkog (2/9) Jan 24 2015 But here there is potential problem with shared state after save?

Andrei Alexandrescu (3/14) Jan 24 2015 It's not a problem - save saves the state of the iteration, not the data...

zeljkog (27/33) Jan 24 2015 If you pass stateful object with opCall or closure it becomes part of "s...

H. S. Teoh via Digitalmars-d (8/17) Jan 24 2015 Hmm. Couldn't filter be made bidirectional too, if the underlying range

Andrei Alexandrescu (3/17) Jan 24 2015 History: http://dlang.org/phobos/std_algorithm.html#.filterBidirectional

H. S. Teoh via Digitalmars-d (30/41) Jan 24 2015 [...]

Andrei Alexandrescu (4/42) Jan 24 2015 Hmmm... we do allow rvalues sometimes (e.g. for strings). I think we

Peter Alexander (7/55) Jan 24 2015 I was wrong. R-values work, as long as they are compile time

zeljkog (6/17) Jan 24 2015 It comes down to a question:

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

There's this classic patter on Unix: |sort|uniq, i.e. sort some data and 
only display the unique elements.

What would be a better integrated version - one that does sorting and 
uniq in one shot? I suspect the combination could be quite a bit better 
than doing the two in sequence.

A few google searches didn't yield much. Ideas?


Thanks,

Andrei

Jan 22 2015

"H. S. Teoh via Digitalmars-d" <digitalmars-d puremagic.com> writes:

On Thu, Jan 22, 2015 at 01:40:56PM -0800, Andrei Alexandrescu via Digitalmars-d
wrote:
 There's this classic patter on Unix: |sort|uniq, i.e. sort some data
 and only display the unique elements.
 
 What would be a better integrated version - one that does sorting and
 uniq in one shot? I suspect the combination could be quite a bit
 better than doing the two in sequence.
 
 A few google searches didn't yield much. Ideas?

[...]

Off the top of my head: a sorting algorithm that swallows identical
elements? I've never actually seen such an algorithm before, though.
Generally, sorting algorithms try not to output less elements than they
were given. :-P


T

-- 
Marketing: the art of convincing people to pay for what they didn't need before
which you fail to deliver after.

Jan 22 2015

"Vladimir Panteleev" <vladimir thecybershadow.net> writes:

On Thursday, 22 January 2015 at 21:40:57 UTC, Andrei Alexandrescu 
wrote:
 There's this classic patter on Unix: |sort|uniq, i.e. sort some 
 data and only display the unique elements.

 What would be a better integrated version - one that does 
 sorting and uniq in one shot? I suspect the combination could 
 be quite a bit better than doing the two in sequence.

I think the only way you can go lower than O(n log n) is 
pigeonholing (counting sort without the counting), but for 
generic keys you need AAs which aren't O(1) (and slow).

Maybe heapsort (you could drop duplicates while building the 
heap), but that needs an extra allocation.

Jan 22 2015

"bearophile" <bearophileHUGS lycos.com> writes:

Andrei Alexandrescu:

 There's this classic patter on Unix: |sort|uniq, i.e. sort some 
 data and only display the unique elements.

In Bugzilla I've asked for a hashGroup, it returns a built-in 
associative array.

Bye,
bearophile

Jan 22 2015

Justin Whear <justin economicmodeling.com> writes:

On Thu, 22 Jan 2015 13:40:56 -0800, Andrei Alexandrescu wrote:

 There's this classic patter on Unix: |sort|uniq, i.e. sort some data and
 only display the unique elements.
 
 What would be a better integrated version - one that does sorting and
 uniq in one shot? I suspect the combination could be quite a bit better
 than doing the two in sequence.
 
 A few google searches didn't yield much. Ideas?
 
 
 Thanks,
 
 Andrei

Efficiency improvement-wise, perhaps a generalization of a counting sort
(http://en.wikipedia.org/wiki/Counting_sort), see "Variant algorithms."

Jan 22 2015

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 1/22/15 2:06 PM, Justin Whear wrote:
 On Thu, 22 Jan 2015 13:40:56 -0800, Andrei Alexandrescu wrote:

 There's this classic patter on Unix: |sort|uniq, i.e. sort some data and
 only display the unique elements.

 What would be a better integrated version - one that does sorting and
 uniq in one shot? I suspect the combination could be quite a bit better
 than doing the two in sequence.

 A few google searches didn't yield much. Ideas?


 Thanks,

 Andrei

 Efficiency improvement-wise, perhaps a generalization of a counting sort
 (http://en.wikipedia.org/wiki/Counting_sort), see "Variant algorithms."

Thanks. I'm thinking of something based on general comparisons. 
Something like a modified quicksort with fat pivot:

A. Partition into three subranges: R1<pivot, R2=pivot, R3>pivot

B. Recurse on R1 obtaining R11, a subrange of R1 containing only unique 
elements.

C. Sort-move-unique R2 into the portion after R11, obtaining the result.

A and B seem fine, I'm unclear about C. It would be an algorithm that 
sorts and simultaneously moves data in a range that overlaps the input.


Andrei

Jan 22 2015

"H. S. Teoh via Digitalmars-d" <digitalmars-d puremagic.com> writes:

On Thu, Jan 22, 2015 at 02:29:58PM -0800, Andrei Alexandrescu via Digitalmars-d
wrote:
[...]
 Thanks. I'm thinking of something based on general comparisons.
 Something like a modified quicksort with fat pivot:
 
 A. Partition into three subranges: R1<pivot, R2=pivot, R3>pivot
 
 B. Recurse on R1 obtaining R11, a subrange of R1 containing only
 unique elements.
 
 C. Sort-move-unique R2 into the portion after R11, obtaining the
 result.
 
 A and B seem fine, I'm unclear about C. It would be an algorithm that
 sorts and simultaneously moves data in a range that overlaps the
 input.

[...]

Wait, if R1 and R3, after the recursion, are sorted and contain only
unique elements, doesn't that mean that we can just collapse R2 into a
single element along with the endpoints of R1 and R3? So if R1[$-1] <
pivot, then we're done, otherwise coalesce it with the pivot, and
likewise with R3[0].


T

-- 
Famous last words: I wonder what will happen if I do *this*...

Jan 22 2015

"Vladimir Panteleev" <vladimir thecybershadow.net> writes:

On Thursday, 22 January 2015 at 22:53:59 UTC, H. S. Teoh via 
Digitalmars-d wrote:
 So if R1[$-1] < pivot,

This should be always true, no? R1 might be empty, though.

Jan 22 2015

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 1/22/15 2:51 PM, H. S. Teoh via Digitalmars-d wrote:
 Wait, if R1 and R3, after the recursion, are sorted and contain only
 unique elements, doesn't that mean that we can just collapse R2 into a
 single element along with the endpoints of R1 and R3? So if R1[$-1] <
 pivot, then we're done, otherwise coalesce it with the pivot, and
 likewise with R3[0].

Right, forgot to mention the pivot.

You don't recurse on the right, only the left; that way you massage the 
unique stuff in R1 toward the left in the possibly smaller range R11. 
Then the challenge is to sort, uinique-fy, and move in one fell swoop 
from R3 (padded with the pivot to the left) into the portion adjacent to 
R11.

Example: say the range has 100 elements. Then we divide like this:

r[0 .. 30] is <pivot
r[30 .. 34] is =pivot
r[34 .. 100] is >pivot

First, recurse on r[0 .. 30] getting a smaller range that's sorted and 
unique. Say r[0 .. 24].

The step needed now is to take r[33 .. 100], which is ONE copy of the 
pivot plus everything greater than pivot, and in one shot deposit it 
starting at r[24], sorted and uniquefied.

I'm thinking of a modified heapsort that progressively creates the heap 
at r[24], or something like that.



Andrei

Jan 22 2015

"H. S. Teoh via Digitalmars-d" <digitalmars-d puremagic.com> writes:

On Thu, Jan 22, 2015 at 04:10:45PM -0800, Andrei Alexandrescu via Digitalmars-d
wrote:
 On 1/22/15 2:51 PM, H. S. Teoh via Digitalmars-d wrote:
Wait, if R1 and R3, after the recursion, are sorted and contain only
unique elements, doesn't that mean that we can just collapse R2 into
a single element along with the endpoints of R1 and R3? So if R1[$-1]
< pivot, then we're done, otherwise coalesce it with the pivot, and
likewise with R3[0].

 
 Right, forgot to mention the pivot.
 
 You don't recurse on the right, only the left; that way you massage
 the unique stuff in R1 toward the left in the possibly smaller range
 R11. Then the challenge is to sort, uinique-fy, and move in one fell
 swoop from R3 (padded with the pivot to the left) into the portion
 adjacent to R11.
 
 Example: say the range has 100 elements. Then we divide like this:
 
 r[0 .. 30] is <pivot
 r[30 .. 34] is =pivot
 r[34 .. 100] is >pivot
 
 First, recurse on r[0 .. 30] getting a smaller range that's sorted and
 unique. Say r[0 .. 24].
 
 The step needed now is to take r[33 .. 100], which is ONE copy of the
 pivot plus everything greater than pivot, and in one shot deposit it
 starting at r[24], sorted and uniquefied.
 
 I'm thinking of a modified heapsort that progressively creates the
 heap at r[24], or something like that.

[...]

Well, that would work. Recurse on left to get a uniqSorted left segment,
then modified heapsort on right (instead of heapsorting in place,
deposit extracted heap elements into the target range).

The question, though, is whether this hybrid recursion actually wins us
anything, since heapsort is known to be somewhat slower than quicksort.
In the worst case, you choose a poor pivot and get *both* the worst case
of quicksort and the inferior performance of heapsort compounded
together.

Unless there's a way to modify heapsort to be more efficient as well?

Actually, another idea occurred to me: suppose we modify partition()
instead, so that instead of partitioning in-place, it deposits
partitioned elements into a target (possibly overlapping) range. Then we
modify the right-recursion, so that its partitioning step actually
simultaneously moves the range over to the target area for us. Not sure
if this will eliminate the copying cost, but it might, since partition
has to traverse the entire subrange anyway.


T

-- 
Старый друг лучше новых двух.

Jan 22 2015

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 1/22/15 4:26 PM, H. S. Teoh via Digitalmars-d wrote:
 The question, though, is whether this hybrid recursion actually wins us
 anything, since heapsort is known to be somewhat slower than quicksort.
 In the worst case, you choose a poor pivot and get*both*  the worst case
 of quicksort and the inferior performance of heapsort compounded
 together.

The glass-half-full view of this is you combine the advantages of 
quicksort and heapsort :o).

 Unless there's a way to modify heapsort to be more efficient as well?

 Actually, another idea occurred to me: suppose we modify partition()
 instead, so that instead of partitioning in-place, it deposits
 partitioned elements into a target (possibly overlapping) range. Then we
 modify the right-recursion, so that its partitioning step actually
 simultaneously moves the range over to the target area for us. Not sure
 if this will eliminate the copying cost, but it might, since partition
 has to traverse the entire subrange anyway.

That's interesting!


Andrei

Jan 22 2015

"H. S. Teoh via Digitalmars-d" <digitalmars-d puremagic.com> writes:

On Thu, Jan 22, 2015 at 04:33:28PM -0800, Andrei Alexandrescu via Digitalmars-d
wrote:
 On 1/22/15 4:26 PM, H. S. Teoh via Digitalmars-d wrote:
The question, though, is whether this hybrid recursion actually wins
us anything, since heapsort is known to be somewhat slower than
quicksort.  In the worst case, you choose a poor pivot and get*both*
the worst case of quicksort and the inferior performance of heapsort
compounded together.

 
 The glass-half-full view of this is you combine the advantages of
 quicksort and heapsort :o).

Well, I've been in the field long enough to realize that putting things
together more often than not causes increase in complexity (usually
accompanied by poorer performance) rather than the other way round. It's
part of why algorithms research is so hard... but also so rewarding,
when you do hit upon a winning combination that actually reduces
complexity and/or improves performance.


Unless there's a way to modify heapsort to be more efficient as well?

Actually, another idea occurred to me: suppose we modify partition()
instead, so that instead of partitioning in-place, it deposits
partitioned elements into a target (possibly overlapping) range. Then
we modify the right-recursion, so that its partitioning step actually
simultaneously moves the range over to the target area for us. Not
sure if this will eliminate the copying cost, but it might, since
partition has to traverse the entire subrange anyway.

 
 That's interesting!

[...]

Here's a crude first-impressions sketch of how this modified partition
might work:

- As we move up the range, if the current element is less than the
  pivot, we deposit it into the growing left partition which is in the
  target location rather than the original location.

- If the current element is equal to the pivot, we ignore it (this
  eliminates the pivot and its duplicates altogether from the left/right
  partitions).

- The tricky case is if it's greater than the pivot. In the normal
  quicksort partition, we do nothing -- it will get moved later if it's
  occupying the space that belongs to the left partition. However, in
  our case, we need to move it *somewhere*, otherwise it will get lost
  if it doesn't lie in the overlapping area of the target range and the
  range being partitioned.

  One idea is to move it to the end of the target range, so that the
  two partitions grow from opposite ends of the target range, and they
  should meet in the middle (otherwise the target range is of the wrong
  size). However, the details of how this can be done is still murky.
  I'll have to think more about this.


T

-- 
Talk is cheap. Whining is actually free. -- Lars Wirzenius

Jan 22 2015

"H. S. Teoh via Digitalmars-d" <digitalmars-d puremagic.com> writes:

On Thu, Jan 22, 2015 at 02:51:38PM -0800, H. S. Teoh via Digitalmars-d wrote:
 On Thu, Jan 22, 2015 at 02:29:58PM -0800, Andrei Alexandrescu via
Digitalmars-d wrote:
 [...]
 Thanks. I'm thinking of something based on general comparisons.
 Something like a modified quicksort with fat pivot:
 
 A. Partition into three subranges: R1<pivot, R2=pivot, R3>pivot
 
 B. Recurse on R1 obtaining R11, a subrange of R1 containing only
 unique elements.
 
 C. Sort-move-unique R2 into the portion after R11, obtaining the
 result.
 
 A and B seem fine, I'm unclear about C. It would be an algorithm that
 sorts and simultaneously moves data in a range that overlaps the
 input.

 [...]
 
 Wait, if R1 and R3, after the recursion, are sorted and contain only
 unique elements, doesn't that mean that we can just collapse R2 into a
 single element along with the endpoints of R1 and R3? So if R1[$-1] <
 pivot, then we're done, otherwise coalesce it with the pivot, and
 likewise with R3[0].

[...]

Working proof of concept:

	import std.algorithm;
	import std.array;
	import std.stdio;
	
	// Returns: final index of pivot
	size_t partition(R)(R r)
	{
	    auto pivot = r[0];
	    swap(r[0], r[$-1]); // move pivot to end
	
	    auto leftIdx = 0;
	    foreach (i; 0 .. r.length)
	    {
	        if (r[i] < pivot)
	            swap(r[i], r[leftIdx++]);
	    }
	    swap(r[leftIdx], r[$-1]);
	
	    return leftIdx;
	}
	
	R uniqueSort(R)(R r)
	{
	    if (r.empty) return r;
	
	    // Partition
	    auto pivotIdx = partition(r);
	    auto pivot = r[pivotIdx];
	
	    // Recurse
	    auto left = uniqueSort(r[0 .. pivotIdx]);
	    auto right = uniqueSort(r[pivotIdx + 1 .. $]);
	
	    // Coalesce left ~ pivot ~ right.
	    if (!left.empty && left[$-1] == pivot)
	        left = left[0 .. $-1];
	
	    if (!right.empty && right[0] == pivot)
	        right = right[1 .. $];
	
	    r[left.length] = pivot;
	    auto vacant = copy(right, r[left.length + 1 .. $]);
	
	    return r[0 .. $ - vacant.length];
	}
	
	unittest
	{
	    auto data = [4, 0, 9, 7, 2, 5, 1, 1, 3, 4, 2, 0, 9, 5, 3, 6, 1, 7, 4, 8];
	    auto sorted = data.uniqueSort();
	
	    assert(sorted == [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]);
	}


I'm not sure about the performance of this algorithm, though, because
the call to copy() might become prohibitive once you add up all the
recursive calls to uniqueSort().

Also, I only got one test case working; no idea if there are subtle bugs
that are lurking in there somewhere. Please test with more test cases.
:-)


T

-- 
"No, John.  I want formats that are actually useful, rather than over-featured
megaliths that address all questions by piling on ridiculous internal links in
forms which are hideously over-complex." -- Simon St. Laurent on xml-dev

Jan 22 2015

"Vladimir Panteleev" <vladimir thecybershadow.net> writes:

On Thursday, 22 January 2015 at 23:35:32 UTC, H. S. Teoh via 
Digitalmars-d wrote:
 On Thu, Jan 22, 2015 at 02:51:38PM -0800, H. S. Teoh via 
 Digitalmars-d wrote:
 Wait, if R1 and R3, after the recursion, are sorted and 
 contain only
 unique elements, doesn't that mean that we can just collapse 
 R2 into a
 single element along with the endpoints of R1 and R3? So if 
 R1[$-1] <
 pivot, then we're done, otherwise coalesce it with the pivot, 
 and
 likewise with R3[0].


I'm pretty sure that when Andrei said:

 C. Sort-move-unique R2 into the portion after R11, obtaining 
 the
 result.



He meant R3, not R2. R2 is obviously collapsed to one element 
(the pivot).

 Working proof of concept:

I think this will actually have worse performance than sort+uniq, 
since you're now shuffling half of the elements at each recursion 
depth - an additional O(n log n).

Jan 22 2015

"H. S. Teoh via Digitalmars-d" <digitalmars-d puremagic.com> writes:

On Thu, Jan 22, 2015 at 11:52:18PM +0000, Vladimir Panteleev via Digitalmars-d
wrote:
 On Thursday, 22 January 2015 at 23:35:32 UTC, H. S. Teoh via Digitalmars-d
 wrote:
On Thu, Jan 22, 2015 at 02:51:38PM -0800, H. S. Teoh via Digitalmars-d
wrote:
Wait, if R1 and R3, after the recursion, are sorted and contain only
unique elements, doesn't that mean that we can just collapse R2 into
a single element along with the endpoints of R1 and R3? So if
R1[$-1] < pivot, then we're done, otherwise coalesce it with the
pivot, and likewise with R3[0].


 
 I'm pretty sure that when Andrei said:
 
 C. Sort-move-unique R2 into the portion after R11, obtaining > the
 result.



 
 He meant R3, not R2. R2 is obviously collapsed to one element (the
 pivot).

Ah, that's why I misunderstood him.


Working proof of concept:

 
 I think this will actually have worse performance than sort+uniq,
 since you're now shuffling half of the elements at each recursion
 depth - an additional O(n log n).

That's what I thought. At each step, you have to coalesce the elements
by copying `right` over. So that adds up to quite a cost.

Actually, I just realized that the code has an unnecessary check for
left[$-1] being equal to pivot: this is actually impossible, since
partition() makes sure that the left half of the range is strictly less
than the pivot, so it cannot possibly contain the pivot. So we can
eliminate that check. But it doesn't solve the problem of excessive
copying.

Hmm... actually, I think it might be possible to modify partition so
that it eliminates duplicated pivots for you. However, this still
doesn't solve the recursive case where `left` might have shrunk, thus
necessitating `right` be shifted over afterwards.

A modified heapsort might be a better candidate for uniqueSort() than
quicksort, it looks like.


T

-- 
Music critic: "That's an imitation fugue!"

Jan 22 2015

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 1/22/15 3:52 PM, Vladimir Panteleev wrote:
 On Thursday, 22 January 2015 at 23:35:32 UTC, H. S. Teoh via
 Digitalmars-d wrote:
 On Thu, Jan 22, 2015 at 02:51:38PM -0800, H. S. Teoh via Digitalmars-d
 wrote:
 Wait, if R1 and R3, after the recursion, are sorted and contain only
 unique elements, doesn't that mean that we can just collapse R2 into a
 single element along with the endpoints of R1 and R3? So if R1[$-1] <
 pivot, then we're done, otherwise coalesce it with the pivot, and
 likewise with R3[0].


 I'm pretty sure that when Andrei said:

 C. Sort-move-unique R2 into the portion after R11, obtaining > the
 result.



 He meant R3, not R2. R2 is obviously collapsed to one element (the pivot).

Yes, thanks and sorry for the distraction.

 Working proof of concept:

 I think this will actually have worse performance than sort+uniq, since
 you're now shuffling half of the elements at each recursion depth - an
 additional O(n log n).

Yah, the key here is to exploit the combination to get better 
performance than composing the two. So you must do surgery on the core 
algorithms involved.


Andrei

Jan 22 2015

"bearophile" <bearophileHUGS lycos.com> writes:

Andrei Alexandrescu:

 What would be a better integrated version - one that does 
 sorting and uniq in one shot? I suspect the combination could 
 be quite a bit better than doing the two in sequence.

What I need most is a "groping by hashing", since years. A unique 
sort can be handy, but it's less useful that the first.

Bye,
bearophile

Jan 22 2015

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 1/22/15 4:06 PM, bearophile wrote:
 Andrei Alexandrescu:

 What would be a better integrated version - one that does sorting and
 uniq in one shot? I suspect the combination could be quite a bit
 better than doing the two in sequence.

 What I need most is a "groping by hashing", since years. A unique sort
 can be handy, but it's less useful that the first.

I agree a hash uniq would be nice. -- Andrei

Jan 22 2015

Paul O'Neil <redballoon36 gmail.com> writes:

On 01/22/2015 07:06 PM, bearophile wrote:
 Andrei Alexandrescu:
 
 What would be a better integrated version - one that does sorting and
 uniq in one shot? I suspect the combination could be quite a bit
 better than doing the two in sequence.

 
 What I need most is a "groping by hashing", since years. A unique sort
 can be handy, but it's less useful that the first.
 
 Bye,
 bearophile

Is that different than groupBy?

-- 
Paul O'Neil
Github / IRC: todayman

Jan 22 2015

"bearophile" <bearophileHUGS lycos.com> writes:

Paul O'Neil:

 Is that different than groupBy?

It works with hashing, and doesn't require a sorting, so it's 
different. On the other hand it's just few lines of code long, 
and it's quite easy to write...

Bye,
bearophile

Jan 22 2015

"Xinok" <xinok live.com> writes:

On Thursday, 22 January 2015 at 21:40:57 UTC, Andrei Alexandrescu 
wrote:
 There's this classic patter on Unix: |sort|uniq, i.e. sort some 
 data and only display the unique elements.

 What would be a better integrated version - one that does 
 sorting and uniq in one shot? I suspect the combination could 
 be quite a bit better than doing the two in sequence.

 A few google searches didn't yield much. Ideas?


 Thanks,

 Andrei

My thought is that uniq on a sorted list is only an O(n) 
operation, so it's not an expensive operation by any means. If 
there's to be any benefit to a sortUniq, it has to be capable of 
reducing work incurred during the sorting process; else you're 
just going to end up with something less efficient.

One solution may be RedBlackTree, which has an option to disallow 
duplicate elements. This container has three useful properties:

(1) The tree grows one element at a time. This is in opposition 
to other algorithms like quicksort or heapsort in which you must 
operate on the entire set of elements at once.

(2) Removing duplicates only requires a single comparison per 
element, thus retaining the worst-case of |sort|uniq.

(3) Duplicate elements are removed immediately. Inserting an 
element into a balanced tree with N elements is an O(lg n) 
operation, so the smaller n is, the less work that is required.

The shortcoming of RedBlackTree is that it's not very fast. 
However, I don't know any other O(n lg n) sorting algorithm which 
has these properties.

Jan 22 2015

"H. S. Teoh via Digitalmars-d" <digitalmars-d puremagic.com> writes:

On Fri, Jan 23, 2015 at 12:59:51AM +0000, Xinok via Digitalmars-d wrote:
 On Thursday, 22 January 2015 at 21:40:57 UTC, Andrei Alexandrescu wrote:
There's this classic patter on Unix: |sort|uniq, i.e. sort some data
and only display the unique elements.

What would be a better integrated version - one that does sorting and
uniq in one shot? I suspect the combination could be quite a bit
better than doing the two in sequence.


[...]
 My thought is that uniq on a sorted list is only an O(n) operation, so
 it's not an expensive operation by any means. If there's to be any
 benefit to a sortUniq, it has to be capable of reducing work incurred
 during the sorting process; else you're just going to end up with
 something less efficient.
 
 One solution may be RedBlackTree, which has an option to disallow
 duplicate elements. This container has three useful properties:

The problem with RedBlackTree is that it's cache-unfriendly due to the
memory allocation and per-node pointers. One reason quicksort performs
so well is because it works in-place, and the partition operation
accesses elements bilinearly (i.e., two linear traversals over the same
stretch of memory interleaved with each other), so it's very likely that
most of the accesses will hit the cache (possibly even all, if hardware
prefetching is present).  Whereas with RedBlackTree, it's jumping all
over memory, so there will be a higher rate of cache misses and the
resulting slow RAM fetches. Plus, memory allocation is slooow, and is
best avoided where possible.


T

-- 
What's a "hot crossed bun"? An angry rabbit.

Jan 22 2015

"Laeeth Isharc" <Laeeth.nospam nospam-laeeth.com> writes:

 The problem with RedBlackTree is that it's cache-unfriendly due 
 to the
 memory allocation and per-node pointers. One reason quicksort 
 performs
 so well is because it works in-place, and the partition 
 operation
 accesses elements bilinearly (i.e., two linear traversals over 
 the same
 stretch of memory interleaved with each other), so it's very 
 likely that
 most of the accesses will hit the cache (possibly even all, if 
 hardware
 prefetching is present).  Whereas with RedBlackTree, it's 
 jumping all
 over memory, so there will be a higher rate of cache misses and 
 the
 resulting slow RAM fetches. Plus, memory allocation is slooow, 
 and is
 best avoided where possible.

In an interview Stepanov said he would have used b* trees instead 
for this reason if he were writing STL today.

Here is a (dated) benchmark comparison of hash libraries:
https://attractivechaos.wordpress.com/2008/08/28/comparison-of-hash-table-libraries/

Jan 24 2015

zeljkog <zeljkog home.com> writes:

On 22.01.15 22:40, Andrei Alexandrescu wrote:
 There's this classic patter on Unix: |sort|uniq, i.e. sort some data and
 only display the unique elements.

Loosely-related, compiling

...
auto ret = arr.filter!(myFilter());
...

I got:

Error: closures are not yet supported in CTFE

Using struct with opCall directly also does not pass.

Did I miss something?
Is it near?

Jan 23 2015

"bearophile" <bearophileHUGS lycos.com> writes:

zeljkog:

 Loosely-related, compiling

 ...
 auto ret = arr.filter!(myFilter());
 ...

 I got:

 Error: closures are not yet supported in CTFE

 Using struct with opCall directly also does not pass.

 Did I miss something?
 Is it near?

Please show a minimal nonworking example in the D.learn newsgroup.

Bye,
bearophile

Jan 23 2015

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 1/23/15 7:04 AM, zeljkog wrote:
 On 22.01.15 22:40, Andrei Alexandrescu wrote:
 There's this classic patter on Unix: |sort|uniq, i.e. sort some data and
 only display the unique elements.

 Loosely-related, compiling

 ...
 auto ret = arr.filter!(myFilter());
 ...

 I got:

 Error: closures are not yet supported in CTFE

 Using struct with opCall directly also does not pass.

 Did I miss something?
 Is it near?

Must be you put that code at top level and the compiler tried to 
interpret it during compilation. -- Andrei

Jan 23 2015

"H. S. Teoh via Digitalmars-d" <digitalmars-d puremagic.com> writes:

On Fri, Jan 23, 2015 at 09:40:42AM -0800, Andrei Alexandrescu via Digitalmars-d
wrote:
 On 1/23/15 7:04 AM, zeljkog wrote:
On 22.01.15 22:40, Andrei Alexandrescu wrote:
There's this classic patter on Unix: |sort|uniq, i.e. sort some data
and only display the unique elements.

Loosely-related, compiling

...
auto ret = arr.filter!(myFilter());
...

I got:

Error: closures are not yet supported in CTFE

Using struct with opCall directly also does not pass.

Did I miss something?
Is it near?

 
 Must be you put that code at top level and the compiler tried to
 interpret it during compilation. -- Andrei

I think what he's trying to do is to call a function that returns a
delegate, and use that delegate to instantiate the filter template.
AFAIK I've never seen code like this before, and it looks like the
compiler isn't prepared to handle this.


T

-- 
I am not young enough to know everything. -- Oscar Wilde

Jan 23 2015

zeljkog <zeljkog home.com> writes:

On 23.01.15 18:48, H. S. Teoh via Digitalmars-d wrote:
 
 I think what he's trying to do is to call a function that returns a
 delegate, and use that delegate to instantiate the filter template.
 AFAIK I've never seen code like this before, and it looks like the
 compiler isn't prepared to handle this.
 

Yes, I tried to use filter for unique, need closure.
I think there are many applications for this pattern.

Jan 23 2015

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 1/23/15 10:05 AM, zeljkog wrote:
 On 23.01.15 18:48, H. S. Teoh via Digitalmars-d wrote:
 I think what he's trying to do is to call a function that returns a
 delegate, and use that delegate to instantiate the filter template.
 AFAIK I've never seen code like this before, and it looks like the
 compiler isn't prepared to handle this.

 Yes, I tried to use filter for unique, need closure.
 I think there are many applications for this pattern.

Please post a complete snippet then. Thanks! -- Andrei

Jan 23 2015

zeljkog <zeljkog home.com> writes:

On 23.01.15 19:13, Andrei Alexandrescu wrote:
 On 1/23/15 10:05 AM, zeljkog wrote:
 On 23.01.15 18:48, H. S. Teoh via Digitalmars-d wrote:
 I think what he's trying to do is to call a function that returns a
 delegate, and use that delegate to instantiate the filter template.
 AFAIK I've never seen code like this before, and it looks like the
 compiler isn't prepared to handle this.

 Yes, I tried to use filter for unique, need closure.
 I think there are many applications for this pattern.

 Please post a complete snippet then. Thanks! -- Andrei

import std.stdio, std.algorithm;

auto unique(){
     bool[int] c;
     return (int a){
         if (a in c)
             return false;
         else{
             c[a] = true;
             return true;
         }
     };
}

void main()
{
     [1, 5, 5, 2, 1, 5, 6, 6].filter!(unique()).writeln;
}

Jan 23 2015

"ixid" <adamsibson hotmail.com> writes:

 import std.stdio, std.algorithm;

 auto unique(){
     bool[int] c;
     return (int a){
         if (a in c)
             return false;
         else{
             c[a] = true;
             return true;
         }
     };
 }

 void main()
 {
     [1, 5, 5, 2, 1, 5, 6, 6].filter!(unique()).writeln;
 }

Just add static to the closure.

auto unique(){
     static bool[int] c;
     return (int a){
         if (a in c)
             return false;
         else{
             c[a] = true;
             return true;
         }
     };
}

Jan 24 2015

"ixid" <adamsibson hotmail.com> writes:

Ignore me, sorry! Obviously that will not work for reuse of 
unique.

Jan 24 2015

"Peter Alexander" <peter.alexander.au gmail.com> writes:

On Friday, 23 January 2015 at 18:22:08 UTC, zeljkog wrote:
 On 23.01.15 19:13, Andrei Alexandrescu wrote:
 On 1/23/15 10:05 AM, zeljkog wrote:
 On 23.01.15 18:48, H. S. Teoh via Digitalmars-d wrote:
 I think what he's trying to do is to call a function that 
 returns a
 delegate, and use that delegate to instantiate the filter 
 template.
 AFAIK I've never seen code like this before, and it looks 
 like the
 compiler isn't prepared to handle this.

 Yes, I tried to use filter for unique, need closure.
 I think there are many applications for this pattern.

 Please post a complete snippet then. Thanks! -- Andrei

 import std.stdio, std.algorithm;

 auto unique(){
     bool[int] c;
     return (int a){
         if (a in c)
             return false;
         else{
             c[a] = true;
             return true;
         }
     };
 }

 void main()
 {
     [1, 5, 5, 2, 1, 5, 6, 6].filter!(unique()).writeln;
 }

auto f = unique();
[1, 5, 5, 2, 1, 5, 6, 6].filter!(f).writeln;  // [1, 5, 2, 6]

Filter needs an alias, and you cannot alias an R-value (it has no 
symbol).

Jan 24 2015

zeljkog <zeljkog home.com> writes:

On 24.01.15 12:50, Peter Alexander wrote:
 auto f = unique();
 [1, 5, 5, 2, 1, 5, 6, 6].filter!(f).writeln;  // [1, 5, 2, 6]
 
 Filter needs an alias, and you cannot alias an R-value (it has no symbol).

Yes, I see :)

But I think this should be supported by std.algorithm:

import std.stdio, std.algorithm;

struct Uniq{
    bool[int] c;
    bool opCall(int a){
        if (a in c)
            return false;
        else{
            c[a] = true;
            return true;
        }
    }
}

void main()
{
    [1, 5, 5, 2, 1, 5, 6, 6].filter!Uniq.writeln;
}

Jan 24 2015

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 1/24/15 4:13 AM, zeljkog wrote:
 On 24.01.15 12:50, Peter Alexander wrote:
 auto f = unique();
 [1, 5, 5, 2, 1, 5, 6, 6].filter!(f).writeln;  // [1, 5, 2, 6]

 Filter needs an alias, and you cannot alias an R-value (it has no symbol).

 Yes, I see :)

 But I think this should be supported by std.algorithm:

 import std.stdio, std.algorithm;

 struct Uniq{
      bool[int] c;
      bool opCall(int a){
          if (a in c)
              return false;
          else{
              c[a] = true;
              return true;
          }
      }
 }

 void main()
 {
      [1, 5, 5, 2, 1, 5, 6, 6].filter!Uniq.writeln;
 }

I can't make sense of this - where is Uniq supposed to be instantiated? 
-- Andrei

Jan 24 2015

zeljkog <zeljkog home.com> writes:

On 24.01.15 15:53, Andrei Alexandrescu wrote:
 On 1/24/15 4:13 AM, zeljkog wrote:
 On 24.01.15 12:50, Peter Alexander wrote:
 auto f = unique();
 [1, 5, 5, 2, 1, 5, 6, 6].filter!(f).writeln;  // [1, 5, 2, 6]

 Filter needs an alias, and you cannot alias an R-value (it has no
 symbol).

 Yes, I see :)

 But I think this should be supported by std.algorithm:

 import std.stdio, std.algorithm;

 struct Uniq{
      bool[int] c;
      bool opCall(int a){
          if (a in c)
              return false;
          else{
              c[a] = true;
              return true;
          }
      }
 }

 void main()
 {
      [1, 5, 5, 2, 1, 5, 6, 6].filter!Uniq.writeln;
 }

 I can't make sense of this - where is Uniq supposed to be instantiated?
 -- Andrei

Require following changes in std.algorithm.filter.

original:

template filter(alias predicate) if (is(typeof(unaryFun!predicate)))
{
     auto filter(Range)(Range range) if (isInputRange!(Unqual!Range))
     {
         return FilterResult!(unaryFun!predicate, Range)(range);
     }
}

private struct FilterResult(alias pred, Range)
...

changed (to run this case):

template filter(alias predicate) if (is(typeof(unaryFun!predicate)))
{
     auto filter(Range)(Range range) if (isInputRange!(Unqual!Range))
     {
         static if (is(predicate == struct))
             return FilterResult!(predicate, Range)(range);
         else
             return FilterResult!(unaryFun!predicate, Range)(range);
     }
}

private struct FilterResult(alias predicate, Range)
{
     static if (is(predicate == struct))
         predicate pred;
     else
         alias pred = predicate;
...

Jan 24 2015

zeljkog <zeljkog home.com> writes:

On 24.01.15 16:17, zeljkog wrote:
 template filter(alias predicate) if (is(typeof(unaryFun!predicate)))

Should be:

template filter(alias predicate) 
if (is(typeof(unaryFun!predicate)) || is(predicate == struct))

Jan 24 2015

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 1/24/15 7:17 AM, zeljkog wrote:
 private struct FilterResult(alias predicate, Range)
 {
      static if (is(predicate == struct))
          predicate pred;
      else
          alias pred = predicate;
 ...

The problem is here - the creation of an empty object is unlikely to be 
of use to the caller. We should, however, support the case when the 
caller passes a stateful predicate into the call. -- Andrei

Jan 24 2015

zeljkog <zeljkog home.com> writes:

On 24.01.15 18:02, Andrei Alexandrescu wrote:
 On 1/24/15 7:17 AM, zeljkog wrote:
 private struct FilterResult(alias predicate, Range)
 {
      static if (is(predicate == struct))
          predicate pred;
      else
          alias pred = predicate;
 ...

 
 The problem is here - the creation of an empty object is unlikely to be 
 of use to the caller. We should, however, support the case when the 
 caller passes a stateful predicate into the call. -- Andrei

I think empty object could be useful, but perhaps there is better :)

Until now I concluded:
1. Resulting range must be output range. Right?
   I don't know is it considerable problem?
2. unaryFun can pass down struct (with opCall check?)

Jan 24 2015

zeljkog <zeljkog home.com> writes:

On 24.01.15 18:26, zeljkog wrote:
 Until now I concluded:
 1. Resulting range must be output range. Right?

Should be input range.

Jan 24 2015

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 1/24/15 9:34 AM, zeljkog wrote:
 On 24.01.15 18:26, zeljkog wrote:
 Until now I concluded:
 1. Resulting range must be output range. Right?

 Should be input range.

For filter it's a forward range if input is forward or better. -- Andrei

Jan 24 2015

zeljkog <zeljkog home.com> writes:

On 24.01.15 19:35, Andrei Alexandrescu wrote:
 On 1/24/15 9:34 AM, zeljkog wrote:
 On 24.01.15 18:26, zeljkog wrote:
 Until now I concluded:
 1. Resulting range must be output range. Right?

 Should be input range.

 For filter it's a forward range if input is forward or better. -- Andrei

But here there is potential problem with shared state after save?

Jan 24 2015

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 1/24/15 10:41 AM, zeljkog wrote:
 On 24.01.15 19:35, Andrei Alexandrescu wrote:
 On 1/24/15 9:34 AM, zeljkog wrote:
 On 24.01.15 18:26, zeljkog wrote:
 Until now I concluded:
 1. Resulting range must be output range. Right?

 Should be input range.

 For filter it's a forward range if input is forward or better. -- Andrei

 But here there is potential problem with shared state after save?

It's not a problem - save saves the state of the iteration, not the data 
being iterated. -- Andrei

Jan 24 2015

zeljkog <zeljkog home.com> writes:

On 24.01.15 19:47, Andrei Alexandrescu wrote:
 But here there is potential problem with shared state after save?

 
 It's not a problem - save saves the state of the iteration, not the data 
 being iterated. -- Andrei
 

If you pass stateful object with opCall or closure it becomes part of "state of
the iteration"?

Consider:

import std.stdio, std.algorithm;

struct Uniq{
     bool[int] c;
     bool opCall(int a){
         if (a in c)
             return false;
         else{
             c[a] = true;
             return true;
         }
     }
} 

void main()
{
    Uniq a;
    auto arr = [1, 5, 5, 2, 1, 5, 6, 6];
    auto r = arr.filter!a;
    auto r1 = r.save();
    r.writeln;
    r1.writeln;
}

output:

[1, 2, 6]
[5]

Jan 24 2015

"H. S. Teoh via Digitalmars-d" <digitalmars-d puremagic.com> writes:

On Sat, Jan 24, 2015 at 10:35:13AM -0800, Andrei Alexandrescu via Digitalmars-d
wrote:
 On 1/24/15 9:34 AM, zeljkog wrote:
On 24.01.15 18:26, zeljkog wrote:
Until now I concluded:
1. Resulting range must be output range. Right?

Should be input range.

 
 For filter it's a forward range if input is forward or better. --
 Andrei

Hmm. Couldn't filter be made bidirectional too, if the underlying range
is also bidirectional? After all, filtering doesn't depend on the
surrounding elements, so in theory you should be able to popBack as
well.


T

-- 
"How are you doing?" "Doing what?"

Jan 24 2015

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 1/24/15 11:00 AM, H. S. Teoh via Digitalmars-d wrote:
 On Sat, Jan 24, 2015 at 10:35:13AM -0800, Andrei Alexandrescu via
Digitalmars-d wrote:
 On 1/24/15 9:34 AM, zeljkog wrote:
 On 24.01.15 18:26, zeljkog wrote:
 Until now I concluded:
 1. Resulting range must be output range. Right?

 Should be input range.

 For filter it's a forward range if input is forward or better. --
 Andrei

 Hmm. Couldn't filter be made bidirectional too, if the underlying range
 is also bidirectional? After all, filtering doesn't depend on the
 surrounding elements, so in theory you should be able to popBack as
 well.



Andrei

Jan 24 2015

"H. S. Teoh via Digitalmars-d" <digitalmars-d puremagic.com> writes:

On Sat, Jan 24, 2015 at 11:20:55AM -0800, Andrei Alexandrescu via Digitalmars-d
wrote:
 On 1/24/15 11:00 AM, H. S. Teoh via Digitalmars-d wrote:
On Sat, Jan 24, 2015 at 10:35:13AM -0800, Andrei Alexandrescu via Digitalmars-d
wrote:


[...]
For filter it's a forward range if input is forward or better. --
Andrei

Hmm. Couldn't filter be made bidirectional too, if the underlying
range is also bidirectional? After all, filtering doesn't depend on
the surrounding elements, so in theory you should be able to popBack
as well.

 


[...]

Ah, I should have known, since I do remember seeing this when splitting
std.algorithm. I guess it didn't really register beyond "this function
belongs in std.algorithm.iteration". :-P

It's kinda sad that we can't merge the two functions and make it so that
the initial scanning of .back only happens lazily. I mean, in theory,
this should be easy:

	struct Filter {
		R range;
		...
		 property auto back() {
			while (!range.empty && !pred(range.back))
				range.popBack();
			return range.back;
		}
		void popBack() {
			range.popBack();
		}
	}

But I remember there was a huge discussion over whether
.front/.back/.empty should do any non-trivial work. The current
implementation appears to be the consequence of shoehorning all work
into popFront/popBack. But in my mind, these methods ought to be
abstract, and the user shouldn't be able to see any difference in
behaviour regardless of where the real work is actually done.


T

-- 
People who are more than casually interested in computers should have at least
some idea of what the underlying hardware is like. Otherwise the programs they
write will be pretty weird. -- D. Knuth

Jan 24 2015

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 1/24/15 3:50 AM, Peter Alexander wrote:
 On Friday, 23 January 2015 at 18:22:08 UTC, zeljkog wrote:
 On 23.01.15 19:13, Andrei Alexandrescu wrote:
 On 1/23/15 10:05 AM, zeljkog wrote:
 On 23.01.15 18:48, H. S. Teoh via Digitalmars-d wrote:
 I think what he's trying to do is to call a function that returns a
 delegate, and use that delegate to instantiate the filter template.
 AFAIK I've never seen code like this before, and it looks like the
 compiler isn't prepared to handle this.

 Yes, I tried to use filter for unique, need closure.
 I think there are many applications for this pattern.

 Please post a complete snippet then. Thanks! -- Andrei

 import std.stdio, std.algorithm;

 auto unique(){
     bool[int] c;
     return (int a){
         if (a in c)
             return false;
         else{
             c[a] = true;
             return true;
         }
     };
 }

 void main()
 {
     [1, 5, 5, 2, 1, 5, 6, 6].filter!(unique()).writeln;
 }

 auto f = unique();
 [1, 5, 5, 2, 1, 5, 6, 6].filter!(f).writeln;  // [1, 5, 2, 6]

 Filter needs an alias, and you cannot alias an R-value (it has no symbol).

Hmmm... we do allow rvalues sometimes (e.g. for strings). I think we 
could and should relax the rule to allow rvalues in this case, too. -- 
Andrei

Jan 24 2015

"Peter Alexander" <peter.alexander.au gmail.com> writes:

On Saturday, 24 January 2015 at 14:53:02 UTC, Andrei Alexandrescu
wrote:
 On 1/24/15 3:50 AM, Peter Alexander wrote:
 On Friday, 23 January 2015 at 18:22:08 UTC, zeljkog wrote:
 On 23.01.15 19:13, Andrei Alexandrescu wrote:
 On 1/23/15 10:05 AM, zeljkog wrote:
 On 23.01.15 18:48, H. S. Teoh via Digitalmars-d wrote:
 I think what he's trying to do is to call a function that 
 returns a
 delegate, and use that delegate to instantiate the filter 
 template.
 AFAIK I've never seen code like this before, and it looks 
 like the
 compiler isn't prepared to handle this.

 Yes, I tried to use filter for unique, need closure.
 I think there are many applications for this pattern.

 Please post a complete snippet then. Thanks! -- Andrei

 import std.stdio, std.algorithm;

 auto unique(){
    bool[int] c;
    return (int a){
        if (a in c)
            return false;
        else{
            c[a] = true;
            return true;
        }
    };
 }

 void main()
 {
    [1, 5, 5, 2, 1, 5, 6, 6].filter!(unique()).writeln;
 }

 auto f = unique();
 [1, 5, 5, 2, 1, 5, 6, 6].filter!(f).writeln;  // [1, 5, 2, 6]

 Filter needs an alias, and you cannot alias an R-value (it has 
 no symbol).

 Hmmm... we do allow rvalues sometimes (e.g. for strings). I 
 think we could and should relax the rule to allow rvalues in 
 this case, too. -- Andrei

I was wrong. R-values work, as long as they are compile time
constants. The problem here is that closures don't yet work in
CTFE (as the error says). Pulling it out as a local variable
works because the alias then binds to the symbol rather than the
value.

Jan 24 2015

zeljkog <zeljkog home.com> writes:

On 23.01.15 19:05, zeljkog wrote:
 On 23.01.15 18:48, H. S. Teoh via Digitalmars-d wrote:
 I think what he's trying to do is to call a function that returns a
 delegate, and use that delegate to instantiate the filter template.
 AFAIK I've never seen code like this before, and it looks like the
 compiler isn't prepared to handle this.

 
 Yes, I tried to use filter for unique, need closure.
 I think there are many applications for this pattern.
 

It comes down to a question: 
Why structs with opCall are not supported as template parameters in
std.algorithm together with functions?

Like this:
filter!SomeStructWithOpCall

It is simple addition. Or is it with reason?

Jan 24 2015

D Programming

C/C++ Programming

Other

digitalmars.D - sortUniq