www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - uniq and array of enum members (That are all strings)

reply bauss <jj_1337 live.dk> writes:
Is there a way to achieve the following:

import std.stdio;

import std.algorithm : uniq;
import std.array : array;

enum Foo : string
{
     a = "aa",
     b = "bb",
     c = "cc"
}

void main()
{
     auto a = [Foo.a, Foo.b, Foo.a, Foo.b, Foo.c];

     auto b = a.uniq;

     writeln(b);
     // Expected output: [a, b, c]
     // Outputs: [a, b, a, b, c]
}
Jan 16
parent reply "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Wed, Jan 16, 2019 at 03:57:49PM +0000, bauss via Digitalmars-d-learn wrote:
 Is there a way to achieve the following:
[...]
 enum Foo : string
 {
     a = "aa",
     b = "bb",
     c = "cc"
 }
 
 void main()
 {
     auto a = [Foo.a, Foo.b, Foo.a, Foo.b, Foo.c];
 
     auto b = a.uniq;
 
     writeln(b);
     // Expected output: [a, b, c]
     // Outputs: [a, b, a, b, c]
 }
.uniq only works on adjacent identical elements. You should sort your array first. If you need to preserve the original order but eliminate duplicates, then you could use an AA to keep track of what has been seen. E.g.: bool[string] seen; auto b = a.filter!((e) { if (e in seen) return false; seen[e] = true; return true; }); T -- People walk. Computers run.
Jan 16
parent reply bauss <jj_1337 live.dk> writes:
On Wednesday, 16 January 2019 at 16:12:28 UTC, H. S. Teoh wrote:
 On Wed, Jan 16, 2019 at 03:57:49PM +0000, bauss via 
 Digitalmars-d-learn wrote:
 Is there a way to achieve the following:
[...]
 enum Foo : string
 {
     a = "aa",
     b = "bb",
     c = "cc"
 }
 
 void main()
 {
     auto a = [Foo.a, Foo.b, Foo.a, Foo.b, Foo.c];
 
     auto b = a.uniq;
 
     writeln(b);
     // Expected output: [a, b, c]
     // Outputs: [a, b, a, b, c]
 }
.uniq only works on adjacent identical elements. You should sort your array first. If you need to preserve the original order but eliminate duplicates, then you could use an AA to keep track of what has been seen. E.g.: bool[string] seen; auto b = a.filter!((e) { if (e in seen) return false; seen[e] = true; return true; }); T
Sorting will not work in my case though because it's an enum of strings that are not sorted alphabetically. Right now I'm doing it manually by a foreach in similar way you're using filter. I just feel like that's an overkill for something so trivial.
Jan 16
next sibling parent reply "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Wed, Jan 16, 2019 at 04:21:12PM +0000, bauss via Digitalmars-d-learn wrote:
 On Wednesday, 16 January 2019 at 16:12:28 UTC, H. S. Teoh wrote:
[...]
 .uniq only works on adjacent identical elements.  You should sort
 your array first.
 
 If you need to preserve the original order but eliminate duplicates,
 then you could use an AA to keep track of what has been seen. E.g.:
 
 	bool[string] seen;
 	auto b = a.filter!((e) {
 			if (e in seen) return false;
 			seen[e] = true;
 			return true;
 		});
[...]
 Sorting will not work in my case though because it's an enum of
 strings that are not sorted alphabetically.
 
 Right now I'm doing it manually by a foreach in similar way you're
 using filter.
 
 I just feel like that's an overkill for something so trivial.
It's not trivial. In order for the computer to know whether or not the i'th element should be excluded, it needs to know what has come before it. In the case of .uniq, this is easy because the assumption is that identical elements are adjacent, so we only need to know what the previous element was. But in your case, we need to know elements that were seen an arbitrary distance in the past. So there must be some way of answering the question "has the i'th element been seen x iterations ago?". So either you search backward until you find the element (very inefficient: it will turn your algorithm into O(n^2)), or you need some kind of lookup structure like an AA to remember what has been seen before. T -- It is the quality rather than the quantity that matters. -- Lucius Annaeus Seneca
Jan 16
parent reply bauss <jj_1337 live.dk> writes:
On Wednesday, 16 January 2019 at 16:35:04 UTC, H. S. Teoh wrote:
 On Wed, Jan 16, 2019 at 04:21:12PM +0000, bauss via 
 Digitalmars-d-learn wrote:
 On Wednesday, 16 January 2019 at 16:12:28 UTC, H. S. Teoh 
 wrote:
[...]
 .uniq only works on adjacent identical elements.  You should 
 sort your array first.
 
 If you need to preserve the original order but eliminate 
 duplicates, then you could use an AA to keep track of what 
 has been seen. E.g.:
 
 	bool[string] seen;
 	auto b = a.filter!((e) {
 			if (e in seen) return false;
 			seen[e] = true;
 			return true;
 		});
[...]
 Sorting will not work in my case though because it's an enum 
 of strings that are not sorted alphabetically.
 
 Right now I'm doing it manually by a foreach in similar way 
 you're using filter.
 
 I just feel like that's an overkill for something so trivial.
It's not trivial. In order for the computer to know whether or not the i'th element should be excluded, it needs to know what has come before it.
That's not necessarily true. It just has to know whether the element matches an earlier element. Your filter example demonstrates exactly why sorting isn't necessary.
Jan 16
next sibling parent "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Wed, Jan 16, 2019 at 04:37:21PM +0000, bauss via Digitalmars-d-learn wrote:
 On Wednesday, 16 January 2019 at 16:35:04 UTC, H. S. Teoh wrote:
[...]
 It's not trivial. In order for the computer to know whether or not
 the i'th element should be excluded, it needs to know what has come
 before it.
That's not necessarily true. It just has to know whether the element matches an earlier element. Your filter example demonstrates exactly why sorting isn't necessary.
Sorting is the simplest approach because then the problem is reduced to using .uniq. It's not necessarily the most *efficient* approach. But regardless, the point is that in order to know whether some given element e matches an earlier element, the computer must somehow keep track of previously-seen elements. Either you do this implicitly by sorting it so that the relevant previously-seen element occurs immediately before the current one, or you have to store it explicitly somewhere, like in an AA, or you have to compute it on the fly. There is no way to get around this; you cannot filter an element based on information you don't have. So either you recompute that information (search backward until you find a match), or you store it in a structure like an AA where you can do the lookup easily. The information has to come from *somewhere*. T -- Nearly all men can stand adversity, but if you want to test a man's character, give him power. -- Abraham Lincoln
Jan 16
prev sibling parent reply "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
De-duplicating a range that's not necessarily sorted seems to be a
pretty common task, so here's a generic function for whoever else might
want to do this:

	import std.range.primitives;

	auto deduplicate(R)(R range)
		if (isInputRange!R)
	{
		import std.algorithm : filter;
		alias E = ElementType!R;
		bool[E] seen;
		return range.filter!((e) {
			if (e in seen) return false;
			seen[e] = true;
			return true;
		});
	}


T

-- 
Why have vacation when you can work?? -- EC
Jan 16
parent reply bauss <jj_1337 live.dk> writes:
On Wednesday, 16 January 2019 at 18:20:57 UTC, H. S. Teoh wrote:
 T
I'm aware of how to do it manually as I already stated I went with a similar approach. There should just be something standard for it and uniq should have an overload or something that allows for another behavior that doesn't rely on sorting. Even if it's "slower".
Jan 16
parent "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Wed, Jan 16, 2019 at 06:25:48PM +0000, bauss via Digitalmars-d-learn wrote:
[...]
 I'm aware of how to do it manually as I already stated I went with a
 similar approach.
 
 There should just be something standard for it and uniq should have an
 overload or something that allows for another behavior that doesn't
 rely on sorting.
Feel free to submit a PR to Phobos for this. The templatized function I posted is probably already pretty close to Phobos standards, you just need to add a few unittests and a nice ddoc header. I wouldn't call it `uniq`, though. It doesn't quite jive with the "examine nearby elements only" philosophy of .uniq (which inherits from the Unix `uniq` that does the same thing), and it's not nogc unlike .uniq, etc., and IME, trying to shoehorn something like this into an existing symbol with already preconceived semantics will make the PR review needlessly controversial. I'd just submit it under a different name.
 Even if it's "slower".
Actually, filtering with an AA is not slower, since the AA amortizes the overall cost to O(n), which is even better than the O(n log n) of the pre-sorting approach. Of course, this comes at the cost of not being nogc, which might be a turnoff for some folks. It's probably possible to write a nogc version of this function, but it will be more complicated (much more complicated if you're working with a range of unknown length). This is one of the places where IMO the hassle of manually managing the memory of a lookup table far outweighs the "cost" (perceived or real) of just embracing the GC. T -- Elegant or ugly code as well as fine or rude sentences have something in common: they don't depend on the language. -- Luca De Vitis
Jan 16
prev sibling parent reply Alex <sascha.orlov gmail.com> writes:
On Wednesday, 16 January 2019 at 16:21:12 UTC, bauss wrote:
 On Wednesday, 16 January 2019 at 16:12:28 UTC, H. S. Teoh wrote:
 On Wed, Jan 16, 2019 at 03:57:49PM +0000, bauss via 
 Digitalmars-d-learn wrote:
 Is there a way to achieve the following:
[...]
 enum Foo : string
 {
     a = "aa",
     b = "bb",
     c = "cc"
 }
 
 void main()
 {
     auto a = [Foo.a, Foo.b, Foo.a, Foo.b, Foo.c];
 
     auto b = a.uniq;
 
     writeln(b);
     // Expected output: [a, b, c]
     // Outputs: [a, b, a, b, c]
 }
.uniq only works on adjacent identical elements. You should sort your array first. If you need to preserve the original order but eliminate duplicates, then you could use an AA to keep track of what has been seen. E.g.: bool[string] seen; auto b = a.filter!((e) { if (e in seen) return false; seen[e] = true; return true; }); T
Sorting will not work in my case though because it's an enum of strings that are not sorted alphabetically. Right now I'm doing it manually by a foreach in similar way you're using filter. I just feel like that's an overkill for something so trivial.
yeah... searching by hand is somewhat inefficient. but this would work also with an enum, wouldn't it? ´´´ import std.stdio; import std.algorithm : uniq; import std.array : array; import std.algorithm.sorting : sort; enum Foo : string { a = "aa", b = "bb", c = "cc" } void main() { enum a = [Foo.a, Foo.b, Foo.a, Foo.b, Foo.c]; auto b = a.sort.uniq; writeln(b); } ´´´ And if you have something like immutable, dup would help, maybe?
Jan 16
parent reply bauss <jj_1337 live.dk> writes:
On Wednesday, 16 January 2019 at 16:40:34 UTC, Alex wrote:
 On Wednesday, 16 January 2019 at 16:21:12 UTC, bauss wrote:
 On Wednesday, 16 January 2019 at 16:12:28 UTC, H. S. Teoh 
 wrote:
 On Wed, Jan 16, 2019 at 03:57:49PM +0000, bauss via 
 Digitalmars-d-learn wrote:
 Is there a way to achieve the following:
[...]
 enum Foo : string
 {
     a = "aa",
     b = "bb",
     c = "cc"
 }
 
 void main()
 {
     auto a = [Foo.a, Foo.b, Foo.a, Foo.b, Foo.c];
 
     auto b = a.uniq;
 
     writeln(b);
     // Expected output: [a, b, c]
     // Outputs: [a, b, a, b, c]
 }
.uniq only works on adjacent identical elements. You should sort your array first. If you need to preserve the original order but eliminate duplicates, then you could use an AA to keep track of what has been seen. E.g.: bool[string] seen; auto b = a.filter!((e) { if (e in seen) return false; seen[e] = true; return true; }); T
Sorting will not work in my case though because it's an enum of strings that are not sorted alphabetically. Right now I'm doing it manually by a foreach in similar way you're using filter. I just feel like that's an overkill for something so trivial.
yeah... searching by hand is somewhat inefficient. but this would work also with an enum, wouldn't it? ´´´ import std.stdio; import std.algorithm : uniq; import std.array : array; import std.algorithm.sorting : sort; enum Foo : string { a = "aa", b = "bb", c = "cc" } void main() { enum a = [Foo.a, Foo.b, Foo.a, Foo.b, Foo.c]; auto b = a.sort.uniq; writeln(b); } ´´´ And if you have something like immutable, dup would help, maybe?
The problem with sorting is that the following: [3,5,6,6,2,1,2,5,3] will then become [1,2,3,5,6] or [6,5,3,2,1] and not: [3,5,6,2,1] which would be what you'd wanna use in some situations. The important thing to know here is that the numbers are not a sequence, they're a set of numbers. It's important the set doesn't have a change of order. The filter example works and that's what I already did. But something that is a bit better would be appreciated. Sorting really makes no sense to make something unique. It makes sense for the most "trivial" implementation, but not the most trivial usages.
Jan 16
parent reply Alex <sascha.orlov gmail.com> writes:
On Wednesday, 16 January 2019 at 16:52:50 UTC, bauss wrote:
 The problem with sorting is that the following:

 [3,5,6,6,2,1,2,5,3]

 will then become

 [1,2,3,5,6]

 or

 [6,5,3,2,1]

 and not:

 [3,5,6,2,1]

 which would be what you'd wanna use in some situations.

 The important thing to know here is that the numbers are not a 
 sequence, they're a set of numbers.

 It's important the set doesn't have a change of order.

 The filter example works and that's what I already did.

 But something that is a bit better would be appreciated.

 Sorting really makes no sense to make something unique.

 It makes sense for the most "trivial" implementation, but not 
 the most trivial usages.
Ah... I see. You want something like a special fold by filtering... hmm ;)
Jan 16
parent bauss <jj_1337 live.dk> writes:
On Wednesday, 16 January 2019 at 17:28:14 UTC, Alex wrote:
 On Wednesday, 16 January 2019 at 16:52:50 UTC, bauss wrote:
 The problem with sorting is that the following:

 [3,5,6,6,2,1,2,5,3]

 will then become

 [1,2,3,5,6]

 or

 [6,5,3,2,1]

 and not:

 [3,5,6,2,1]

 which would be what you'd wanna use in some situations.

 The important thing to know here is that the numbers are not a 
 sequence, they're a set of numbers.

 It's important the set doesn't have a change of order.

 The filter example works and that's what I already did.

 But something that is a bit better would be appreciated.

 Sorting really makes no sense to make something unique.

 It makes sense for the most "trivial" implementation, but not 
 the most trivial usages.
Ah... I see. You want something like a special fold by filtering... hmm ;)
Yep. I just want to make sure there are no duplicates in the array while maintaining the order of the items.
Jan 16