digitalmars.D.learn - Ranges/algorithms for aggregation
- =?UTF-8?B?Ikx1w61z?= Marques" (40/40) Mar 21 2014 Is there a neat way to do this transformation with ranges and
- bearophile (10/24) Mar 21 2014 What is the desired output data structure? An associative array
- =?UTF-8?B?Ikx1w61z?= Marques" (16/24) Mar 21 2014 I'm doing this for a D newbie, to teach him the range/algorithmic
- bearophile (11/18) Mar 21 2014 Yes, it happens, But it's a problem, because often if you know
- =?UTF-8?B?Ikx1w61z?= Marques" (4/9) Mar 21 2014 The number of keys is large and unbounded (not just three as in
- Justin Whear (16/59) Mar 21 2014 This pull request[1] for groupBy has been hanging around for a year now,...
- H. S. Teoh (20/38) Mar 21 2014 Be aware, though, that groupBy only compares *adjacent* elements for
- =?UTF-8?B?Ikx1w61z?= Marques" (4/11) Mar 21 2014 I think that's why Justin used sort. The hashGroupBy proposed by
- =?UTF-8?B?Ikx1w61z?= Marques" (4/7) Mar 21 2014 I was thinking, we don't even need the full power of sort. Is
- bearophile (5/8) Mar 21 2014 I don't know any, how is it supposed to know where to group
- =?UTF-8?B?Ikx1w61z?= Marques" (11/13) Mar 21 2014 It would swap elements, like sort, so it doesn't need to put them
- bearophile (5/10) Mar 22 2014 I think to swap items it must know where to swap them to. And you
Is there a neat way to do this transformation with ranges and std.algorithms? Input: ------- B foo B bar C ble B big A begga Output: (aggregated and sorted on length) ------- B -> [foo, bar, big] C -> [ble] A -> [begga] The most obvious way (to me) to do this without standard algorithms is with an AA to the aggregation. The most obvious way (to me) to do this with std.algorithms is: B foo B bar C ble B big A begga => [B, foo] [B, bar] [C, ble] [B, big] [A, begga] => [B, foo] [B, bar] [B, big] [C, ble] [A, begga] => B -> [foo, bar, big] C -> [ble] A -> [begga] But this seems wasteful on memory. Is there a better way to do this in a more algorithmic way?
Mar 21 2014
Luís Marques:Is there a neat way to do this transformation with ranges and std.algorithms? Input: ------- B foo B bar C ble B big A begga Output: (aggregated and sorted on length) ------- B -> [foo, bar, big] C -> [ble] A -> [begga]What is the desired output data structure? An associative array of dynamic arrays? Or is a dynamic arrays of dynamic arrays of 2-tuples enough? There are various ways to solve your problem. Related: https://d.puremagic.com/issues/show_bug.cgi?id=5968 https://d.puremagic.com/issues/show_bug.cgi?id=9842 Bye, bearophile
Mar 21 2014
On Friday, 21 March 2014 at 15:38:23 UTC, bearophile wrote:I'm doing this for a D newbie, to teach him the range/algorithmic approach. The function he wrote to output the result of that transformation takes as an input an array of arrays (the latter), but he builds that input iteratively using an AA of arrays (the former). I asked him about that mismatch and at the time he told me that "for now" he only needed the latter, suggesting he had other future plans where he might need the AA, but I'm not sure. So let's just say that the client is unclear on his requirements, which does happen in the real world anyway :-). In any case, I think the hashGroupBy is what I was asking about :-). Neat. (did anyone actually implement it?) I'm not sure how if "a dynamic arrays of dynamic arrays of 2-tuples" sufficed that would help with the intermediate step, if we wanted to avoid the sorting step. Did you have anything in particular in mind there?Output: (aggregated and sorted on length) ------- B -> [foo, bar, big] C -> [ble] A -> [begga]What is the desired output data structure? An associative array of dynamic arrays? Or is a dynamic arrays of dynamic arrays of 2-tuples enough?
Mar 21 2014
Luís Marques:So let's just say that the client is unclear on his requirements, which does happen in the real world anyway :-).Yes, it happens, But it's a problem, because often if you know what you need you can produce the results more efficiently :-)(did anyone actually implement it?)Not yet.I'm not sure how if "a dynamic arrays of dynamic arrays of 2-tuples" sufficed that would help with the intermediate step, if we wanted to avoid the sorting step. Did you have anything in particular in mind there?I think this problem needs a sorting or a hash. One possible solution, if you don't need an associative array as output, is to use a multiSort followed by a building of groups using slicing. It could be efficient enough. Later you search the keys with a some kind of binary search. Bye, bearophile
Mar 21 2014
On Friday, 21 March 2014 at 16:04:45 UTC, bearophile wrote:I think this problem needs a sorting or a hash. One possible solution, if you don't need an associative array as output, is to use a multiSort followed by a building of groups using slicing. It could be efficient enough. Later you search the keys with a some kind of binary search.The number of keys is large and unbounded (not just three as in my example), so I guess this multiSort approach would not be practical, right? I think we really need the hashGroupBy.
Mar 21 2014
On Fri, 21 Mar 2014 15:22:13 +0000, Luís Marques wrote:Is there a neat way to do this transformation with ranges and std.algorithms? Input: ------- B foo B bar C ble B big A begga Output: (aggregated and sorted on length) ------- B -> [foo, bar, big] C -> [ble] A -> [begga] The most obvious way (to me) to do this without standard algorithms is with an AA to the aggregation. The most obvious way (to me) to do this with std.algorithms is: B foo B bar C ble B big A begga => [B, foo] [B, bar] [C, ble] [B, big] [A, begga] => [B, foo] [B, bar] [B, big] [C, ble] [A, begga] => B -> [foo, bar, big] C -> [ble] A -> [begga] But this seems wasteful on memory. Is there a better way to do this in a more algorithmic way?This pull request[1] for groupBy has been hanging around for a year now, driving me to copy-and-paste the implementation into a couple of my projects. Using it, you could do this: auto tuples = ... // get your list of (B, foo), (B, bar), etc. auto output = tuples.sort!`a[0] < b[0]` .groupBy!`a[0] == b[0]`; // output is a range of: // [ // [(A, begga)], // [(B, foo), (B, bar), (B, big)], // [(C, ble)] // ] The advantage being that output isn't an array at all but a lazy range of lazy ranges. 1 https://github.com/D-Programming-Language/phobos/pull/1186
Mar 21 2014
On Fri, Mar 21, 2014 at 04:10:12PM +0000, Justin Whear wrote: [...]This pull request[1] for groupBy has been hanging around for a year now, driving me to copy-and-paste the implementation into a couple of my projects. Using it, you could do this: auto tuples = ... // get your list of (B, foo), (B, bar), etc. auto output = tuples.sort!`a[0] < b[0]` .groupBy!`a[0] == b[0]`; // output is a range of: // [ // [(A, begga)], // [(B, foo), (B, bar), (B, big)], // [(C, ble)] // ] The advantage being that output isn't an array at all but a lazy range of lazy ranges. 1 https://github.com/D-Programming-Language/phobos/pull/1186Be aware, though, that groupBy only compares *adjacent* elements for equivalence; it does not sort the input. So if your input has equivalent elements interspersed with non-equivalent elements, you will have the equivalent elements split into multiple runs in the output. Example: auto data = [ tuple(1, "a") tuple(1, "b") tuple(2, "c") tuple(1, "d") ]; writeln(data.groupBy!((a,b) => a[0] == b[0])); Will output: [[tuple(1, "a"), tuple(1, "b")], [tuple(2, "c")], [tuple(1, "d")]] Which may not be what the OP wants. T -- Unix was not designed to stop people from doing stupid things, because that would also stop them from doing clever things. -- Doug Gwyn
Mar 21 2014
On Friday, 21 March 2014 at 16:53:46 UTC, H. S. Teoh wrote:Be aware, though, that groupBy only compares *adjacent* elements for equivalence; it does not sort the input. So if your input has equivalent elements interspersed with non-equivalent elements, you will have the equivalent elements split into multiple runs in the output.I think that's why Justin used sort. The hashGroupBy proposed by bearophile would avoid the sort and the additional memory usage though, so that would be even better.
Mar 21 2014
On Friday, 21 March 2014 at 17:20:38 UTC, Luís Marques wrote:I think that's why Justin used sort. The hashGroupBy proposed by bearophile would avoid the sort and the additional memory usage though, so that would be even better.I was thinking, we don't even need the full power of sort. Is there a standard algorithm that makes elements with equal keys be in sequence, but that otherwise is less expensive than sort?
Mar 21 2014
Luís Marques:I was thinking, we don't even need the full power of sort. Is there a standard algorithm that makes elements with equal keys be in sequence, but that otherwise is less expensive than sort?I don't know any, how is it supposed to know where to group items? Usually you build an associative array for that. Bye, bearophile
Mar 21 2014
On Saturday, 22 March 2014 at 01:08:11 UTC, bearophile wrote:how is it supposed to know where to group items? Usually you build an associative array for that.It would swap elements, like sort, so it doesn't need to put them anywhere, just permute them. The advantage is this: Input: [7, 3, 7, 1, 1, 1, 1] Output sort: [1, 1, 1, 1, 3, 7, 7] Output groupSort: [3, 7, 7, 1, 1, 1, 1] groupSort (or whatever it would be called) only makes one swap, while sort makes a lot of them. So groupSort is a lot cheaper. I'm not sure what the asymptotic time complexity of groupSort is, at this moment's notice (I guess it would depend on what strategy it would use).
Mar 21 2014
Luís Marques:It would swap elements, like sort, so it doesn't need to put them anywhere, just permute them. The advantage is this: Input: [7, 3, 7, 1, 1, 1, 1] Output sort: [1, 1, 1, 1, 3, 7, 7] Output groupSort: [3, 7, 7, 1, 1, 1, 1]I think to swap items it must know where to swap them to. And you can't do that efficiently if the groups are unsorted. Bye, bearophile
Mar 22 2014