digitalmars.D.learn - how to get top N distinct elements from range?
- Andrea Fontana (5/5) Mar 08 2013 I wonder if exists a way to get top n distinct elements from a
- Ary Borenszweig (2/4) Mar 08 2013 Top from an infinite range? o_O
- bearophile (13/19) Mar 08 2013 What does it mean "top"?
- Andrea Fontana (3/23) Mar 08 2013 Yes I wonder if a lazy function like this function was in phobos.
- jerro (16/22) Mar 08 2013 You could do something like (untested code):
- Andrea Fontana (6/10) Mar 08 2013 Why?
- Ivan Kazmenko (8/15) Mar 08 2013 I don't quite get it, do you want the least n elements? What
- Andrea Fontana (9/27) Mar 08 2013 Infinite loop as:
- bearophile (16/17) Mar 08 2013 Take my code and put it inside a struct, add empty, popFront and
- bearophile (20/22) Mar 08 2013 It seems to work:
- Chris Cain (3/16) Mar 08 2013 I have to say, that's a great example of beautiful simplicity in
- Andrei Alexandrescu (3/17) Mar 09 2013 Yah, impressive. Nice work bearophile.
- bearophile (22/23) Mar 09 2013 There are 3 problems:
- bearophile (13/13) Mar 08 2013 It needs a constraint:
- bearophile (9/19) Mar 08 2013 I think the standard library of Ada2012 has bounded containers,
- Andrea Fontana (12/33) Mar 08 2013 Something goes wrong by the way. I removed "n" template params
- bearophile (27/30) Mar 08 2013 I think take() is not the cause.
- bearophile (21/22) Mar 08 2013 Better seen with this code:
- Andrea Fontana (6/42) Mar 08 2013 Here the answer:
- Andrea Fontana (26/72) Mar 08 2013 This is my "struct" version that helped me to findout that
- bearophile (16/17) Mar 08 2013 The caching of map front was recently removed.
- bearophile (8/8) Mar 08 2013 Simpler:
- Andrea Fontana (10/18) Mar 08 2013 Chech this:
- bearophile (26/26) Mar 08 2013 This is a shortened version of the filter(), it calls pred only n
- Andrea Fontana (4/30) Mar 08 2013 Not funny btw :) It's not so easy to find out. And it's not easy
- bearophile (15/18) Mar 08 2013 Not caching _input.front in filter() means that filter() assumes
- bearophile (7/12) Mar 08 2013 Sorry, I meant:
- bearophile (53/53) Mar 08 2013 With this modified filter (that caches _input.front) it seems to
- Andrea Fontana (13/17) Mar 09 2013 Maybe two versions (filter and cachedFilter) or a bool template
- Andrea Fontana (16/35) Mar 09 2013 I almost got it:
- Andrea Fontana (27/66) Mar 09 2013 auto distinct(alias fun = "a", Range)(Range r)
- bearophile (6/10) Mar 09 2013 I don't know if that's the best way, but I think it's OK. (I
- Timon Gehr (3/7) Mar 09 2013 It's ok, but the redundant parentheses around the element type are
- bearophile (6/8) Mar 09 2013 In the meantime I have filed the problem, I hope Andrei will take
- bearophile (5/8) Mar 08 2013 I think that's not the answer. I think the answer is a bug in
- Andrea Fontana (5/14) Mar 08 2013 My struct version never work if map doesn't cache. Every time I
- jerro (20/31) Mar 08 2013 I didn't understand you correctly then. I thought that top n
- Ivan Kazmenko (11/17) Mar 08 2013 For an arbitrary infinite range, it is impossible from the
- Chris Cain (6/13) Mar 08 2013 You can also use an array if you use a BinaryHeap. If you use a
- Ivan Kazmenko (3/17) Mar 08 2013 My first thought was about a heap, too, but it does not allow to
- Jacob Carlborg (6/11) Mar 08 2013 There's a function called "topN" in std.algorithm, I don't know if
I wonder if exists a way to get top n distinct elements from a range (infinite too!) A (not efficient) way to to this is range.array.sort.uniq.take(n) but it's a bit overkill, it sorts elements, and of course doesn't work with infinite ranges. Am i missing any function?
Mar 08 2013
On 3/8/13 10:33 AM, Andrea Fontana wrote:I wonder if exists a way to get top n distinct elements from a range (infinite too!)Top from an infinite range? o_O
Mar 08 2013
Andrea Fontana:I wonder if exists a way to get top n distinct elements from a range (infinite too!) A (not efficient) way to to this is range.array.sort.uniq.take(n) but it's a bit overkill, it sorts elements, and of course doesn't work with infinite ranges. Am i missing any function?What does it mean "top"? I think you are not missing functions. One solution (not tested): bool[ForeachType!(typeof(myRange))] mySet; foreach (item; myRange) { mySet[item] = true; if (mySet.length >= n) break; } auto top = mySet.byKey; Bye, bearophile
Mar 08 2013
On Friday, 8 March 2013 at 14:10:55 UTC, bearophile wrote:Andrea Fontana:Yes I wonder if a lazy function like this function was in phobos. It's quite common, and I didn't want to reinvent wheel :)I wonder if exists a way to get top n distinct elements from a range (infinite too!) A (not efficient) way to to this is range.array.sort.uniq.take(n) but it's a bit overkill, it sorts elements, and of course doesn't work with infinite ranges. Am i missing any function?What does it mean "top"? I think you are not missing functions. One solution (not tested): bool[ForeachType!(typeof(myRange))] mySet; foreach (item; myRange) { mySet[item] = true; if (mySet.length >= n) break; } auto top = mySet.byKey; Bye, bearophile
Mar 08 2013
On Friday, 8 March 2013 at 13:33:24 UTC, Andrea Fontana wrote:I wonder if exists a way to get top n distinct elements from a range (infinite too!)It's impossible to do that for infinite rangesA (not efficient) way to to this is range.array.sort.uniq.take(n) but it's a bit overkill, it sorts elements, and of course doesn't work with infinite ranges. Am i missing any function?You could do something like (untested code): for(size_t m = 0, mnext = n; ; n = mnext, mnext = min(2 * mnext, range.length)) { partialSort!condition(range[m .. $], mnext - m); if(range[0 .. mnext].uniq.count >= n) break; assert(mnext < range.length); } auto topN = range.uniq.take(n); You will need a random access range with slicing for that, and it will modify it. If I am not mistaken, this has time complexity O(log(ndup / n) * range.length), where ndup is the number of all elements (including duplicates) with values among the top n.
Mar 08 2013
On Friday, 8 March 2013 at 14:43:29 UTC, jerro wrote:On Friday, 8 March 2013 at 13:33:24 UTC, Andrea Fontana wrote:Why? sequence!"n*2".myTopDistinct!"a==b"(3); will give [2,4,6] We just need to keep a list of "just-seen" elements But I don't know how to make it lazyI wonder if exists a way to get top n distinct elements from a range (infinite too!)It's impossible to do that for infinite ranges
Mar 08 2013
I don't quite get it, do you want the least n elements? What result would you want if the sequence was the following: sequence!"-n*2".myTopDistinct!"a==b"(3); The sequence goes like [-2, -4, -6, -8, -10, ...] ad infinitum. To clarify a doubt, do you mean "top" as in "front", or "top" as the "greatest" or "least" in terms of comparison? Or maybe "the first n distinct elements found"? If so, why "sort" in the original post? It could bring middle elements to front.Why? sequence!"n*2".myTopDistinct!"a==b"(3); will give [2,4,6]I wonder if exists a way to get top n distinct elements from a range (infinite too!)It's impossible to do that for infinite ranges
Mar 08 2013
On Friday, 8 March 2013 at 16:12:37 UTC, Ivan Kazmenko wrote:Infinite loop as: sequence!"n*2".filter!"a>1"; and a lot of other cases.I don't quite get it, do you want the least n elements? What result would you want if the sequence was the following: sequence!"-n*2".myTopDistinct!"a==b"(3);Why? sequence!"n*2".myTopDistinct!"a==b"(3); will give [2,4,6]I wonder if exists a way to get top n distinct elements from a range (infinite too!)It's impossible to do that for infinite rangesThe sequence goes like [-2, -4, -6, -8, -10, ...] ad infinitum. To clarify a doubt, do you mean "top" as in "front", or "top" as the "greatest" or "least" in terms of comparison? Or maybe "the first n distinct elements found"? If so, why "sort" in the original post? It could bring middle elements to front.I mean "the first n distinct elements found". Example was a mistake. Uniq works with sorted range (if you want to get distinct element) so I sort it to make it works. But I should not :) Whoops. Bearophile got it btw. So, is there any lazy way to do it?
Mar 08 2013
Andrea Fontana:So, is there any lazy way to do it?Take my code and put it inside a struct, add empty, popFront and front. Otherwise return an impure take of a filter closure from a function that keeps the set. Something like (untested): auto firstDistinct(Range)(Range r, size_t n) { bool[ForeachType!(Range)] mySet; return f.filter!((k) { if (k in mySet) return false; mySet[k] = true; return true; }).take(n); } Bye, bearophile
Mar 08 2013
Otherwise return an impure take of a filter closure from a function that keeps the set. Something like (untested):It seems to work: import std.stdio, std.range, std.algorithm, std.traits; auto firstDistinct(Range)(Range r, in size_t n) { bool[ForeachType!Range] mySet; return r.filter!((k) { if (k in mySet) return false; mySet[k] = true; return true; }).take(n); } void main() { immutable data = [1, 0, 0, 0, 0, 3, 3, 0, 3, 3, 0, 2, 3, 2, 3, 2, 3, 1, 1, 3, 2, 1, 0, 0, 0, 1, 0, 3, 0, 3]; foreach (i; 0 .. 6) data.firstDistinct(i).writeln; } Bye, bearophile
Mar 08 2013
On Friday, 8 March 2013 at 18:17:22 UTC, bearophile wrote:I have to say, that's a great example of beautiful simplicity in programming. Bravo.Otherwise return an impure take of a filter closure from a function that keeps the set. Something like (untested):It seems to work: import std.stdio, std.range, std.algorithm, std.traits; auto firstDistinct(Range)(Range r, in size_t n) { bool[ForeachType!Range] mySet; return r.filter!((k) { if (k in mySet) return false; mySet[k] = true; return true; }).take(n); }
Mar 08 2013
On 3/8/13 1:34 PM, Chris Cain wrote:On Friday, 8 March 2013 at 18:17:22 UTC, bearophile wrote:Yah, impressive. Nice work bearophile. Andreiauto firstDistinct(Range)(Range r, in size_t n) { bool[ForeachType!Range] mySet; return r.filter!((k) { if (k in mySet) return false; mySet[k] = true; return true; }).take(n); }I have to say, that's a great example of beautiful simplicity in programming. Bravo.
Mar 09 2013
Andrei Alexandrescu:Yah,There are 3 problems: - In a successive post I have also added a template constraint, so Range is an InputRange. - firstDistinct() packs two behaviours that are better kept separated, it's like a takeDistinct. It's better to remove the final take, and return all distinct items. If a user wants the first ones, he/she/shi will add a take after it. This is why later in this thread we have used distinct(). In very few cases I'd like some extra functions in Phobos like flatMap and amap, because they are very common operations. But in most cases it's better to keep only the simplest algorithms (like distinct and take), so you can combine them in many different ways. - Successive posts have shown that the code doesn't work. After several tests I have opened this: http://d.puremagic.com/issues/show_bug.cgi?id=9674 Please take a look at it. - I have recently written a little D program, about something else, that I think you will appreciate even more than that firstDistinct/distinct :-) Bye, bearophile
Mar 09 2013
It needs a constraint: auto firstDistinct(Range)(Range r, in size_t n) if (isInputRange!Range) { bool[ForeachType!Range] mySet; return r.filter!((k) { if (k in mySet) return false; mySet[k] = true; return true; }).take(n); } Bye, bearophile
Mar 08 2013
auto firstDistinct(Range)(Range r, in size_t n) if (isInputRange!Range) { bool[ForeachType!Range] mySet; return r.filter!((k) { if (k in mySet) return false; mySet[k] = true; return true; }).take(n); }I think the standard library of Ada2012 has bounded containers, so you can use them to stack-allocate a hash set that can contain up to n items (plus some more free cells to keep its load factor low enough). In Rust there are fully safe stack-allocated closures. Combining the two things it's possible to create a function like that firstDistinct() with zero heap allocations, that is probably fast. Bye, bearophile
Mar 08 2013
On Friday, 8 March 2013 at 19:36:04 UTC, bearophile wrote:Something goes wrong by the way. I removed "n" template params and "take(n)" (i can do after distinct() call, isn't it the same?). Try this (sorted for better reading): iota(100).map!((x) => uniform(0,10))().distinct().array.sort.writeln; iota(100).map!((x) => uniform(0,10))().array.distinct().array.sort.writeln; output: [0, 0, 3, 3, 4, 4, 4, 6, 7, 8] [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]auto firstDistinct(Range)(Range r, in size_t n) if (isInputRange!Range) { bool[ForeachType!Range] mySet; return r.filter!((k) { if (k in mySet) return false; mySet[k] = true; return true; }).take(n); }I think the standard library of Ada2012 has bounded containers, so you can use them to stack-allocate a hash set that can contain up to n items (plus some more free cells to keep its load factor low enough). In Rust there are fully safe stack-allocated closures. Combining the two things it's possible to create a function like that firstDistinct() with zero heap allocations, that is probably fast. Bye, bearophile
Mar 08 2013
Andrea Fontana:Something goes wrong by the way. I removed "n" template params and "take(n)" (i can do after distinct() call, isn't it the same?).I think take() is not the cause. See this program: import std.stdio, std.range, std.algorithm, std.traits, std.random; auto distinct(Range)(Range r) if (isInputRange!Range) { bool[ForeachType!Range] mySet; return r.filter!((k) { if (k in mySet) return false; mySet[k] = true; return true; }); } void main() { 5.iota.map!((_) { auto x = uniform(0, 10); write("*"); return x; }).distinct.writeln; } It outputs something like: *[*2*, *0*, *2**, *1] You expect that output to contain only 5 stars. Maybe map() or filter() is the cause of this. Bye, bearophile
Mar 08 2013
You expect that output to contain only 5 stars.Better seen with this code: import std.stdio, std.range, std.algorithm, std.traits, std.random; auto distinct(Range)(Range r) if (isInputRange!Range) { bool[ForeachType!Range] mySet; return r.filter!((k) { if (k in mySet) return false; mySet[k] = true; return true; }); } void main() { 5.iota.map!((_) { auto x = uniform(0, 10); write("*"); return x; }).distinct.array; } Bye, bearophile
Mar 08 2013
On Friday, 8 March 2013 at 22:36:35 UTC, Andrea Fontana wrote:On Friday, 8 March 2013 at 19:36:04 UTC, bearophile wrote:Here the answer: auto r=iota(100).map!((x) => uniform(0,10))(); writeln(r.front," ", r.front, " ", r.front, " ", r.front); But i think "front" was "cached", but it seems not...Something goes wrong by the way. I removed "n" template params and "take(n)" (i can do after distinct() call, isn't it the same?). Try this (sorted for better reading): iota(100).map!((x) => uniform(0,10))().distinct().array.sort.writeln; iota(100).map!((x) => uniform(0,10))().array.distinct().array.sort.writeln; output: [0, 0, 3, 3, 4, 4, 4, 6, 7, 8] [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]auto firstDistinct(Range)(Range r, in size_t n) if (isInputRange!Range) { bool[ForeachType!Range] mySet; return r.filter!((k) { if (k in mySet) return false; mySet[k] = true; return true; }).take(n); }I think the standard library of Ada2012 has bounded containers, so you can use them to stack-allocate a hash set that can contain up to n items (plus some more free cells to keep its load factor low enough). In Rust there are fully safe stack-allocated closures. Combining the two things it's possible to create a function like that firstDistinct() with zero heap allocations, that is probably fast. Bye, bearophile
Mar 08 2013
On Friday, 8 March 2013 at 23:37:20 UTC, Andrea Fontana wrote:On Friday, 8 March 2013 at 22:36:35 UTC, Andrea Fontana wrote:This is my "struct" version that helped me to findout that problem. auto distinct(Range)(Range rs) if (isInputRange!(Unqual!Range)) { return DistinctResult!Range(rs); } struct DistinctResult(Range) if (isInputRange!(Unqual!Range)) { alias Unqual!Range T; T range; bool[ElementType!T] justSeen; property front() { return range.front; } property empty() { return range.empty; } void popFront() { justSeen[range.front] = true; while(true) { range.popFront(); if (range.empty || range.front !in justSeen) break; } } }On Friday, 8 March 2013 at 19:36:04 UTC, bearophile wrote:Here the answer: auto r=iota(100).map!((x) => uniform(0,10))(); writeln(r.front," ", r.front, " ", r.front, " ", r.front); But i think "front" was "cached", but it seems not...Something goes wrong by the way. I removed "n" template params and "take(n)" (i can do after distinct() call, isn't it the same?). Try this (sorted for better reading): iota(100).map!((x) => uniform(0,10))().distinct().array.sort.writeln; iota(100).map!((x) => uniform(0,10))().array.distinct().array.sort.writeln; output: [0, 0, 3, 3, 4, 4, 4, 6, 7, 8] [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]auto firstDistinct(Range)(Range r, in size_t n) if (isInputRange!Range) { bool[ForeachType!Range] mySet; return r.filter!((k) { if (k in mySet) return false; mySet[k] = true; return true; }).take(n); }I think the standard library of Ada2012 has bounded containers, so you can use them to stack-allocate a hash set that can contain up to n items (plus some more free cells to keep its load factor low enough). In Rust there are fully safe stack-allocated closures. Combining the two things it's possible to create a function like that firstDistinct() with zero heap allocations, that is probably fast. Bye, bearophile
Mar 08 2013
Andrea Fontana:But i think "front" was "cached", but it seems not...The caching of map front was recently removed. More test code: import std.stdio, std.range, std.algorithm, std.traits, std.random; void main() { 5.iota.map!((_) { auto x = uniform(0, 10); write("*"); return x; }).filter!(_ => true).array; } Prints: ********** Bye, bearophile
Mar 08 2013
Simpler: import std.stdio, std.range, std.algorithm; void main() { iota(5).map!((x) { write("*"); return x; }).filter!(_ => true).array; }
Mar 08 2013
On Friday, 8 March 2013 at 23:44:40 UTC, bearophile wrote:Simpler: import std.stdio, std.range, std.algorithm; void main() { iota(5).map!((x) { write("*"); return x; }).filter!(_ => true).array; }Chech this: 5.iota.map!((_) { auto x = uniform(0, 1000); writeln(" MAP-1 = ", x); return x; }).filter!(_ => true).map!((x){ writeln(" MAP-2 = ", x); return x; })().array.writeln; Here some function calls front() internally (filter? map?) to check some conditions and triggers random creation again.
Mar 08 2013
This is a shortened version of the filter(), it calls pred only n times, but it calls _input.front n*2 times. If _input.front is not deterministic (that means it's not pure, as in the case of distinct filtering function) it gives wrong results: struct FilterResult(alias pred, Range) { Range _input; this(Range r) { _input = r; while (!_input.empty && !pred(_input.front)) { _input.popFront(); } } property bool empty() { return _input.empty; } void popFront() { do { _input.popFront(); } while (!_input.empty && !pred(_input.front)); } property auto ref front() { return _input.front; } } This isn't exactly a bug of filter(), it's a design decision of not caching _input.front. Bye, bearophile
Mar 08 2013
On Saturday, 9 March 2013 at 01:00:26 UTC, bearophile wrote:This is a shortened version of the filter(), it calls pred only n times, but it calls _input.front n*2 times. If _input.front is not deterministic (that means it's not pure, as in the case of distinct filtering function) it gives wrong results: struct FilterResult(alias pred, Range) { Range _input; this(Range r) { _input = r; while (!_input.empty && !pred(_input.front)) { _input.popFront(); } } property bool empty() { return _input.empty; } void popFront() { do { _input.popFront(); } while (!_input.empty && !pred(_input.front)); } property auto ref front() { return _input.front; } } This isn't exactly a bug of filter(), it's a design decision of not caching _input.front. Bye, bearophileNot funny btw :) It's not so easy to find out. And it's not easy to write a "cached map" function if you use classes or some pointers as ElementType...
Mar 08 2013
Andrea Fontana:Not funny btw :) It's not so easy to find out. And it's not easy to write a "cached map" function if you use classes or some pointers as ElementType...Not caching _input.front in filter() means that filter() assumes the front() of its argument range to act as a pure function (even if this isn't enforced by the type system). At least this should be documented in std.algorithm ddocs... Python filter() does not work that way: from sys import stdout def fun(x): stdout.write("*") return x list(filter(lambda x: True, map(fun, xrange(5)))) Prints: ***** Bye, bearophile
Mar 08 2013
from sys import stdout def fun(x): stdout.write("*") return x list(filter(lambda x: True, map(fun, xrange(5))))Sorry, I meant: from itertools import imap, ifilter from sys import stdout def fun(x): stdout.write("*") return x list(ifilter(lambda x: True, imap(fun, xrange(5))))
Mar 08 2013
With this modified filter (that caches _input.front) it seems to work: import std.stdio, std.range, std.algorithm, std.traits, std.random; struct FilterResult2(alias pred, Range) { import std.traits: ForeachType; Range _input; ForeachType!Range rFront; this(Range r) { _input = r; while (!_input.empty) { this.rFront = _input.front; if (pred(this.rFront)) break; _input.popFront(); } } property bool empty() { return _input.empty; } void popFront() { do { _input.popFront(); if (_input.empty) break; this.rFront = _input.front; if (pred(this.rFront)) break; } while (true); } property auto ref front() { return this.rFront; } } FilterResult2!(pred, Range) filter2(alias pred, Range)(Range rs) { return FilterResult2!(pred, Range)(rs); } auto distinct(Range)(Range r) if (isInputRange!Range) { bool[ForeachType!Range] mySet; return r.filter2!((k) { if (k in mySet) return false; mySet[k] = true; return true; }); } void main() { 100.iota.map!(_ => uniform(0, 10)).distinct.array.sort.writeln; } Is it a good idea to replace std.algorithm.filter with something like that filter2 (plus some missing methods)? Bye, bearophile
Mar 08 2013
On Saturday, 9 March 2013 at 01:36:53 UTC, bearophile wrote:Is it a good idea to replace std.algorithm.filter with something like that filter2 (plus some missing methods)? Bye, bearophileMaybe two versions (filter and cachedFilter) or a bool template param? I was thinking about pure front() too: but I think it's a wrong assumption. The right assumption would be that front should return the same value until popFront is called again. It can read front value lazily from front() call. It can do a lot of impure things (lol) but it shouldn't change front "randomly" at each call. I would improve distinct to support an alias pred = "a < b" to build a bst instead of an AA. Or just a field like distinct!"a.id" (that should work with aa backend)
Mar 09 2013
On Saturday, 9 March 2013 at 09:13:26 UTC, Andrea Fontana wrote:On Saturday, 9 March 2013 at 01:36:53 UTC, bearophile wrote:I almost got it: auto distinct(alias fun = "a", Range)(Range r) if (isInputRange!Range) { alias unaryFun!fun _fun; bool[ReturnType!_fun] mySet; return r.filter2!((k) { if (_fun(k) in mySet) return false; mySet[_fun(k)] = true; return true; }); } ReturnType!_fun doesn't work, if i set to the right type, function works as expectedIs it a good idea to replace std.algorithm.filter with something like that filter2 (plus some missing methods)? Bye, bearophileMaybe two versions (filter and cachedFilter) or a bool template param? I was thinking about pure front() too: but I think it's a wrong assumption. The right assumption would be that front should return the same value until popFront is called again. It can read front value lazily from front() call. It can do a lot of impure things (lol) but it shouldn't change front "randomly" at each call. I would improve distinct to support an alias pred = "a < b" to build a bst instead of an AA. Or just a field like distinct!"a.id" (that should work with aa backend)
Mar 09 2013
On Saturday, 9 March 2013 at 09:32:43 UTC, Andrea Fontana wrote:On Saturday, 9 March 2013 at 09:13:26 UTC, Andrea Fontana wrote:auto distinct(alias fun = "a", Range)(Range r) if (isInputRange!Range) { alias unaryFun!fun _fun; bool[typeof(_fun((ElementType!Range).init))] mySet; return r.filter2!((k) { if (_fun(k) in mySet) return false; mySet[_fun(k)] = true; return true; }); } struct User { int id; string name; } void main() { User[] users = [ {1, "first"}, {2, "second"}, {3, "first"}]; users.distinct!("a.id")().writeln; users.distinct!("a.name")().writeln; } this works. Is this: bool[typeof(_fun((ElementType!Range).init))] mySet; the right way?On Saturday, 9 March 2013 at 01:36:53 UTC, bearophile wrote:I almost got it: auto distinct(alias fun = "a", Range)(Range r) if (isInputRange!Range) { alias unaryFun!fun _fun; bool[ReturnType!_fun] mySet; return r.filter2!((k) { if (_fun(k) in mySet) return false; mySet[_fun(k)] = true; return true; }); } ReturnType!_fun doesn't work, if i set to the right type, function works as expectedIs it a good idea to replace std.algorithm.filter with something like that filter2 (plus some missing methods)? Bye, bearophileMaybe two versions (filter and cachedFilter) or a bool template param? I was thinking about pure front() too: but I think it's a wrong assumption. The right assumption would be that front should return the same value until popFront is called again. It can read front value lazily from front() call. It can do a lot of impure things (lol) but it shouldn't change front "randomly" at each call. I would improve distinct to support an alias pred = "a < b" to build a bst instead of an AA. Or just a field like distinct!"a.id" (that should work with aa backend)
Mar 09 2013
Andrea Fontana:Is this: bool[typeof(_fun((ElementType!Range).init))] mySet; the right way?I don't know if that's the best way, but I think it's OK. (I think it should even go in the D wiki, so if it's not the best to do it, someone will improve it.) Bye, bearophile
Mar 09 2013
On 03/09/2013 10:35 AM, Andrea Fontana wrote:... this works. Is this: bool[typeof(_fun((ElementType!Range).init))] mySet; the right way?It's ok, but the redundant parentheses around the element type are confusing.
Mar 09 2013
Andrea Fontana:Maybe two versions (filter and cachedFilter) or a bool template param?In the meantime I have filed the problem, I hope Andrei will take a look: http://d.puremagic.com/issues/show_bug.cgi?id=9674 Bye, bearophile
Mar 09 2013
Andrea Fontana:Here the answer: auto r=iota(100).map!((x) => uniform(0,10))(); writeln(r.front," ", r.front, " ", r.front, " ", r.front);I think that's not the answer. I think the answer is a bug in filter(). Bye, bearophile
Mar 08 2013
On Friday, 8 March 2013 at 23:51:48 UTC, bearophile wrote:Andrea Fontana:My struct version never work if map doesn't cache. Every time I call (inside popFront() too!) front() gives different values. If i do array.distinct() it works fine: an array is filled with random values and they will never changes again.Here the answer: auto r=iota(100).map!((x) => uniform(0,10))(); writeln(r.front," ", r.front, " ", r.front, " ", r.front);I think that's not the answer. I think the answer is a bug in filter(). Bye, bearophile
Mar 08 2013
On Friday, 8 March 2013 at 15:53:56 UTC, Andrea Fontana wrote:On Friday, 8 March 2013 at 14:43:29 UTC, jerro wrote:I didn't understand you correctly then. I thought that top n distinct elements meant n largest distinct elements. Now, I'm assuming that you just need first n distinct elements.On Friday, 8 March 2013 at 13:33:24 UTC, Andrea Fontana wrote:Why? sequence!"n*2".myTopDistinct!"a==b"(3); will give [2,4,6]I wonder if exists a way to get top n distinct elements from a range (infinite too!)It's impossible to do that for infinite rangesWe just need to keep a list of "just-seen" elements But I don't know how to make it lazyYou could write your own range that uses an associative array to check for duplicates: struct Unique(R) { R r; bool[elementType!R] seen; property front(){ return r.front; } property empty(){ return r.empty; } void popFront() { seen[r] = bool.init; while(r.front in seen) r.popFront(); } } I don't think there's anything like this in Phobos.
Mar 08 2013
I wonder if exists a way to get top n distinct elements from a range (infinite too!)A (not efficient) way to to this is range.array.sort.uniq.take(n) but it's a bit overkill, it sorts elements, and of course doesn't work with infinite ranges. Am i missing any function?For an arbitrary infinite range, it is impossible from the mathematical point of view. An impossible example would be doing that with the infinite range of natural numbers, stored in bignums. For a finite range, you can iterate over the range while maintaining a collection of the top n elements and discarding duplicates as you find them. For example, using a RedBlackTree as the collection, you will get O(log(n) * range.length) total time and O(n) total memory used. If n is small (on the order of 10s), an array will suffice, resulting in O(n * range.length) total time but with lower constant factor.
Mar 08 2013
On Friday, 8 March 2013 at 15:04:32 UTC, Ivan Kazmenko wrote:For a finite range, you can iterate over the range while maintaining a collection of the top n elements and discarding duplicates as you find them. For example, using a RedBlackTree as the collection, you will get O(log(n) * range.length) total time and O(n) total memory used. If n is small (on the order of 10s), an array will suffice, resulting in O(n * range.length) total time but with lower constant factor.You can also use an array if you use a BinaryHeap. If you use a heap then it should have a much lower constant factor than a red-black tree so, even though they would both have O(m*log(n)) (where n is the number of elements you want the top of and m is the size of the collection) running time, a heap would be faster.
Mar 08 2013
My first thought was about a heap, too, but it does not allow to search for a duplicate in O(log(n)) as it guarantees only partial ordering.For a finite range, you can iterate over the range while maintaining a collection of the top n elements and discarding duplicates as you find them. For example, using a RedBlackTree as the collection, you will get O(log(n) * range.length) total time and O(n) total memory used. If n is small (on the order of 10s), an array will suffice, resulting in O(n * range.length) total time but with lower constant factor.You can also use an array if you use a BinaryHeap. If you use a heap then it should have a much lower constant factor than a red-black tree so, even though they would both have O(m*log(n)) (where n is the number of elements you want the top of and m is the size of the collection) running time, a heap would be faster.
Mar 08 2013
On 2013-03-08 14:33, Andrea Fontana wrote:I wonder if exists a way to get top n distinct elements from a range (infinite too!) A (not efficient) way to to this is range.array.sort.uniq.take(n) but it's a bit overkill, it sorts elements, and of course doesn't work with infinite ranges. Am i missing any function?There's a function called "topN" in std.algorithm, I don't know if that's what you're looking for. http://dlang.org/phobos/std_algorithm.html#topN -- /Jacob Carlborg
Mar 08 2013