digitalmars.D.learn - uniq and array of enum members (That are all strings)

bauss (18/18) Jan 16 2019 Is there a way to achieve the following:

H. S. Teoh (15/33) Jan 16 2019 .uniq only works on adjacent identical elements. You should sort your

bauss (6/39) Jan 16 2019 Sorting will not work in my case though because it's an enum of

H. S. Teoh (17/37) Jan 16 2019 [...]

bauss (6/35) Jan 16 2019 That's not necessarily true.

H. S. Teoh (18/28) Jan 16 2019 Sorting is the simplest approach because then the problem is reduced to
H. S. Teoh (19/19) Jan 16 2019 De-duplicating a range that's not necessarily sorted seems to be a

bauss (7/8) Jan 16 2019 I'm aware of how to do it manually as I already stated I went

H. S. Teoh (25/32) Jan 16 2019 Feel free to submit a PR to Phobos for this. The templatized function I

Alex (22/66) Jan 16 2019 yeah... searching by hand is somewhat inefficient.

bauss (18/88) Jan 16 2019 The problem with sorting is that the following:

Alex (3/20) Jan 16 2019 Ah... I see. You want something like a special fold by

bauss (4/38) Jan 16 2019 Yep.

bauss <jj_1337 live.dk> writes:

Is there a way to achieve the following:

import std.stdio;

import std.algorithm : uniq;
import std.array : array;

enum Foo : string
{
     a = "aa",
     b = "bb",
     c = "cc"
}

void main()
{
     auto a = [Foo.a, Foo.b, Foo.a, Foo.b, Foo.c];

     auto b = a.uniq;

     writeln(b);
     // Expected output: [a, b, c]
     // Outputs: [a, b, a, b, c]
}

Jan 16 2019

"H. S. Teoh" <hsteoh quickfur.ath.cx> writes:

On Wed, Jan 16, 2019 at 03:57:49PM +0000, bauss via Digitalmars-d-learn wrote:
 Is there a way to achieve the following:

[...]
 enum Foo : string
 {
     a = "aa",
     b = "bb",
     c = "cc"
 }
 
 void main()
 {
     auto a = [Foo.a, Foo.b, Foo.a, Foo.b, Foo.c];
 
     auto b = a.uniq;
 
     writeln(b);
     // Expected output: [a, b, c]
     // Outputs: [a, b, a, b, c]
 }

.uniq only works on adjacent identical elements.  You should sort your
array first.

If you need to preserve the original order but eliminate duplicates,
then you could use an AA to keep track of what has been seen. E.g.:

	bool[string] seen;
	auto b = a.filter!((e) {
			if (e in seen) return false;
			seen[e] = true;
			return true;
		});


T

-- 
People walk. Computers run.

Jan 16 2019

bauss <jj_1337 live.dk> writes:

On Wednesday, 16 January 2019 at 16:12:28 UTC, H. S. Teoh wrote:
 On Wed, Jan 16, 2019 at 03:57:49PM +0000, bauss via 
 Digitalmars-d-learn wrote:
 Is there a way to achieve the following:

 [...]
 enum Foo : string
 {
     a = "aa",
     b = "bb",
     c = "cc"
 }
 
 void main()
 {
     auto a = [Foo.a, Foo.b, Foo.a, Foo.b, Foo.c];
 
     auto b = a.uniq;
 
     writeln(b);
     // Expected output: [a, b, c]
     // Outputs: [a, b, a, b, c]
 }

 .uniq only works on adjacent identical elements.  You should 
 sort your array first.

 If you need to preserve the original order but eliminate 
 duplicates, then you could use an AA to keep track of what has 
 been seen. E.g.:

 	bool[string] seen;
 	auto b = a.filter!((e) {
 			if (e in seen) return false;
 			seen[e] = true;
 			return true;
 		});


 T

Sorting will not work in my case though because it's an enum of 
strings that are not sorted alphabetically.

Right now I'm doing it manually by a foreach in similar way 
you're using filter.

I just feel like that's an overkill for something so trivial.

Jan 16 2019

"H. S. Teoh" <hsteoh quickfur.ath.cx> writes:

On Wed, Jan 16, 2019 at 04:21:12PM +0000, bauss via Digitalmars-d-learn wrote:
 On Wednesday, 16 January 2019 at 16:12:28 UTC, H. S. Teoh wrote:

[...]
 .uniq only works on adjacent identical elements.  You should sort
 your array first.
 
 If you need to preserve the original order but eliminate duplicates,
 then you could use an AA to keep track of what has been seen. E.g.:
 
 	bool[string] seen;
 	auto b = a.filter!((e) {
 			if (e in seen) return false;
 			seen[e] = true;
 			return true;
 		});


[...]
 Sorting will not work in my case though because it's an enum of
 strings that are not sorted alphabetically.
 
 Right now I'm doing it manually by a foreach in similar way you're
 using filter.
 
 I just feel like that's an overkill for something so trivial.

It's not trivial. In order for the computer to know whether or not the
i'th element should be excluded, it needs to know what has come before
it. In the case of .uniq, this is easy because the assumption is that
identical elements are adjacent, so we only need to know what the
previous element was.  But in your case, we need to know elements that
were seen an arbitrary distance in the past.  So there must be some way
of answering the question "has the i'th element been seen x iterations
ago?".  So either you search backward until you find the element (very
inefficient: it will turn your algorithm into O(n^2)), or you need some
kind of lookup structure like an AA to remember what has been seen
before.


T

-- 
It is the quality rather than the quantity that matters. -- Lucius Annaeus
Seneca

Jan 16 2019

bauss <jj_1337 live.dk> writes:

On Wednesday, 16 January 2019 at 16:35:04 UTC, H. S. Teoh wrote:
 On Wed, Jan 16, 2019 at 04:21:12PM +0000, bauss via 
 Digitalmars-d-learn wrote:
 On Wednesday, 16 January 2019 at 16:12:28 UTC, H. S. Teoh 
 wrote:

 [...]
 .uniq only works on adjacent identical elements.  You should 
 sort your array first.
 
 If you need to preserve the original order but eliminate 
 duplicates, then you could use an AA to keep track of what 
 has been seen. E.g.:
 
 	bool[string] seen;
 	auto b = a.filter!((e) {
 			if (e in seen) return false;
 			seen[e] = true;
 			return true;
 		});


 [...]
 Sorting will not work in my case though because it's an enum 
 of strings that are not sorted alphabetically.
 
 Right now I'm doing it manually by a foreach in similar way 
 you're using filter.
 
 I just feel like that's an overkill for something so trivial.

 It's not trivial. In order for the computer to know whether or 
 not the i'th element should be excluded, it needs to know what 
 has come before it.

That's not necessarily true.

It just has to know whether the element matches an earlier 
element.

Your filter example demonstrates exactly why sorting isn't 
necessary.

Jan 16 2019

"H. S. Teoh" <hsteoh quickfur.ath.cx> writes:

On Wed, Jan 16, 2019 at 04:37:21PM +0000, bauss via Digitalmars-d-learn wrote:
 On Wednesday, 16 January 2019 at 16:35:04 UTC, H. S. Teoh wrote:

[...]
 It's not trivial. In order for the computer to know whether or not
 the i'th element should be excluded, it needs to know what has come
 before it.

 
 That's not necessarily true.
 
 It just has to know whether the element matches an earlier element.
 
 Your filter example demonstrates exactly why sorting isn't necessary.

Sorting is the simplest approach because then the problem is reduced to
using .uniq.  It's not necessarily the most *efficient* approach.

But regardless, the point is that in order to know whether some given
element e matches an earlier element, the computer must somehow keep
track of previously-seen elements. Either you do this implicitly by
sorting it so that the relevant previously-seen element occurs
immediately before the current one, or you have to store it explicitly
somewhere, like in an AA, or you have to compute it on the fly.  There
is no way to get around this; you cannot filter an element based on
information you don't have.  So either you recompute that information
(search backward until you find a match), or you store it in a structure
like an AA where you can do the lookup easily.  The information has to
come from *somewhere*.


T

-- 
Nearly all men can stand adversity, but if you want to test a man's character,
give him power. -- Abraham Lincoln

Jan 16 2019

"H. S. Teoh" <hsteoh quickfur.ath.cx> writes:

De-duplicating a range that's not necessarily sorted seems to be a
pretty common task, so here's a generic function for whoever else might
want to do this:

	import std.range.primitives;

	auto deduplicate(R)(R range)
		if (isInputRange!R)
	{
		import std.algorithm : filter;
		alias E = ElementType!R;
		bool[E] seen;
		return range.filter!((e) {
			if (e in seen) return false;
			seen[e] = true;
			return true;
		});
	}


T

-- 
Why have vacation when you can work?? -- EC

Jan 16 2019

bauss <jj_1337 live.dk> writes:

On Wednesday, 16 January 2019 at 18:20:57 UTC, H. S. Teoh wrote:
 T

I'm aware of how to do it manually as I already stated I went 
with a similar approach.

There should just be something standard for it and uniq should 
have an overload or something that allows for another behavior 
that doesn't rely on sorting.

Even if it's "slower".

Jan 16 2019

"H. S. Teoh" <hsteoh quickfur.ath.cx> writes:

On Wed, Jan 16, 2019 at 06:25:48PM +0000, bauss via Digitalmars-d-learn wrote:
[...]
 I'm aware of how to do it manually as I already stated I went with a
 similar approach.
 
 There should just be something standard for it and uniq should have an
 overload or something that allows for another behavior that doesn't
 rely on sorting.

Feel free to submit a PR to Phobos for this. The templatized function I
posted is probably already pretty close to Phobos standards, you just
need to add a few unittests and a nice ddoc header.

I wouldn't call it `uniq`, though.  It doesn't quite jive with the
"examine nearby elements only" philosophy of .uniq (which inherits from
the Unix `uniq` that does the same thing), and it's not  nogc unlike
.uniq, etc., and IME, trying to shoehorn something like this into an
existing symbol with already preconceived semantics will make the PR
review needlessly controversial.  I'd just submit it under a different
name.


 Even if it's "slower".

Actually, filtering with an AA is not slower, since the AA amortizes the
overall cost to O(n), which is even better than the O(n log n) of the
pre-sorting approach.

Of course, this comes at the cost of not being  nogc, which might be a
turnoff for some folks.  It's probably possible to write a  nogc version
of this function, but it will be more complicated (much more complicated
if you're working with a range of unknown length). This is one of the
places where IMO the hassle of manually managing the memory of a lookup
table far outweighs the "cost" (perceived or real) of just embracing the
GC.


T

-- 
Elegant or ugly code as well as fine or rude sentences have something in
common: they don't depend on the language. -- Luca De Vitis

Jan 16 2019

Alex <sascha.orlov gmail.com> writes:

On Wednesday, 16 January 2019 at 16:21:12 UTC, bauss wrote:
 On Wednesday, 16 January 2019 at 16:12:28 UTC, H. S. Teoh wrote:
 On Wed, Jan 16, 2019 at 03:57:49PM +0000, bauss via 
 Digitalmars-d-learn wrote:
 Is there a way to achieve the following:

 [...]
 enum Foo : string
 {
     a = "aa",
     b = "bb",
     c = "cc"
 }
 
 void main()
 {
     auto a = [Foo.a, Foo.b, Foo.a, Foo.b, Foo.c];
 
     auto b = a.uniq;
 
     writeln(b);
     // Expected output: [a, b, c]
     // Outputs: [a, b, a, b, c]
 }

 .uniq only works on adjacent identical elements.  You should 
 sort your array first.

 If you need to preserve the original order but eliminate 
 duplicates, then you could use an AA to keep track of what has 
 been seen. E.g.:

 	bool[string] seen;
 	auto b = a.filter!((e) {
 			if (e in seen) return false;
 			seen[e] = true;
 			return true;
 		});


 T

 Sorting will not work in my case though because it's an enum of 
 strings that are not sorted alphabetically.

 Right now I'm doing it manually by a foreach in similar way 
 you're using filter.

 I just feel like that's an overkill for something so trivial.

yeah... searching by hand is somewhat inefficient.
but this would work also with an enum, wouldn't it?

´´´
import std.stdio;

import std.algorithm : uniq;
import std.array : array;
import std.algorithm.sorting : sort;

enum Foo : string
{
     a = "aa",
     b = "bb",
     c = "cc"
}

void main()
{
     enum a = [Foo.a, Foo.b, Foo.a, Foo.b, Foo.c];

     auto b = a.sort.uniq;

     writeln(b);
}
´´´

And if you have something like immutable, dup would help, maybe?

Jan 16 2019

bauss <jj_1337 live.dk> writes:

On Wednesday, 16 January 2019 at 16:40:34 UTC, Alex wrote:
 On Wednesday, 16 January 2019 at 16:21:12 UTC, bauss wrote:
 On Wednesday, 16 January 2019 at 16:12:28 UTC, H. S. Teoh 
 wrote:
 On Wed, Jan 16, 2019 at 03:57:49PM +0000, bauss via 
 Digitalmars-d-learn wrote:
 Is there a way to achieve the following:

 [...]
 enum Foo : string
 {
     a = "aa",
     b = "bb",
     c = "cc"
 }
 
 void main()
 {
     auto a = [Foo.a, Foo.b, Foo.a, Foo.b, Foo.c];
 
     auto b = a.uniq;
 
     writeln(b);
     // Expected output: [a, b, c]
     // Outputs: [a, b, a, b, c]
 }

 .uniq only works on adjacent identical elements.  You should 
 sort your array first.

 If you need to preserve the original order but eliminate 
 duplicates, then you could use an AA to keep track of what 
 has been seen. E.g.:

 	bool[string] seen;
 	auto b = a.filter!((e) {
 			if (e in seen) return false;
 			seen[e] = true;
 			return true;
 		});


 T

 Sorting will not work in my case though because it's an enum 
 of strings that are not sorted alphabetically.

 Right now I'm doing it manually by a foreach in similar way 
 you're using filter.

 I just feel like that's an overkill for something so trivial.

 yeah... searching by hand is somewhat inefficient.
 but this would work also with an enum, wouldn't it?

 ´´´
 import std.stdio;

 import std.algorithm : uniq;
 import std.array : array;
 import std.algorithm.sorting : sort;

 enum Foo : string
 {
     a = "aa",
     b = "bb",
     c = "cc"
 }

 void main()
 {
     enum a = [Foo.a, Foo.b, Foo.a, Foo.b, Foo.c];

     auto b = a.sort.uniq;

     writeln(b);
 }
 ´´´

 And if you have something like immutable, dup would help, maybe?

The problem with sorting is that the following:

[3,5,6,6,2,1,2,5,3]

will then become

[1,2,3,5,6]

or

[6,5,3,2,1]

and not:

[3,5,6,2,1]

which would be what you'd wanna use in some situations.

The important thing to know here is that the numbers are not a 
sequence, they're a set of numbers.

It's important the set doesn't have a change of order.

The filter example works and that's what I already did.

But something that is a bit better would be appreciated.

Sorting really makes no sense to make something unique.

It makes sense for the most "trivial" implementation, but not the 
most trivial usages.

Jan 16 2019

Alex <sascha.orlov gmail.com> writes:

On Wednesday, 16 January 2019 at 16:52:50 UTC, bauss wrote:
 The problem with sorting is that the following:

 [3,5,6,6,2,1,2,5,3]

 will then become

 [1,2,3,5,6]

 or

 [6,5,3,2,1]

 and not:

 [3,5,6,2,1]

 which would be what you'd wanna use in some situations.

 The important thing to know here is that the numbers are not a 
 sequence, they're a set of numbers.

 It's important the set doesn't have a change of order.

 The filter example works and that's what I already did.

 But something that is a bit better would be appreciated.

 Sorting really makes no sense to make something unique.

 It makes sense for the most "trivial" implementation, but not 
 the most trivial usages.

Ah... I see. You want something like a special fold by 
filtering... hmm ;)

Jan 16 2019

bauss <jj_1337 live.dk> writes:

On Wednesday, 16 January 2019 at 17:28:14 UTC, Alex wrote:
 On Wednesday, 16 January 2019 at 16:52:50 UTC, bauss wrote:
 The problem with sorting is that the following:

 [3,5,6,6,2,1,2,5,3]

 will then become

 [1,2,3,5,6]

 or

 [6,5,3,2,1]

 and not:

 [3,5,6,2,1]

 which would be what you'd wanna use in some situations.

 The important thing to know here is that the numbers are not a 
 sequence, they're a set of numbers.

 It's important the set doesn't have a change of order.

 The filter example works and that's what I already did.

 But something that is a bit better would be appreciated.

 Sorting really makes no sense to make something unique.

 It makes sense for the most "trivial" implementation, but not 
 the most trivial usages.

 Ah... I see. You want something like a special fold by 
 filtering... hmm ;)

Yep.

I just want to make sure there are no duplicates in the array 
while maintaining the order of the items.

Jan 16 2019

D Programming

C/C++ Programming

Other

digitalmars.D.learn - uniq and array of enum members (That are all strings)