www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - hashGroup, pairwise

In Phobos we have std.array.assocArray that allows to build a 
built-in associative array from a range of key-value Phobos 
Tuples (it will become mostly legacy when we will have a good 
associative array in Phobos and built-in tuples in D, that is the 
opposite of the data structures we have today).

But there is another very common pattern of creation of 
associative arrays, where you want to "accumulate" inside the 
values.

The basic form of accumulation is counting:

//A
const a = [1, 2, 1, 3, 5, 1, 2];
uint[int] aa1;
foreach (x; a)
     aa1[x]++;
aa1.writeln; // [1:3, 5:1, 2:2, 3:1]

Another example is just emumeration in an array:

//B
const b = [-1, 2, 1, 3, 5, -1, -2];
int[][int] aa2;
foreach (x; b)
     aa2[x.abs] ~= x;
aa2.writeln; // [1:[-1, 1, -1], 5:[5], 2:[2, -2], 3:[3]]

Or even in a set (here implemented with another associative array 
of int-bools):

//C
bool[int][int] aa3;
foreach (x; b)
     aa3[x.abs][x] = true;
aa3.writeln; // [1:[1:true, -1:true], 5:[5:true], 2:[2:true, 
-2:true], 3:[3:true]]


There are ways to perform those operations with Phobos today, but 
the code is so bad looking that it's better to skip this.


So I suggest to add a std.array.hashGroup function to Phobos that 
as argument takes a range, and as template argument takes one or 
two functions.

The first function is a mapping key unary function.

The second function is optional and takes as argument the 
sub-range of grouped values.

The patterns above become:

//A
a.hashGroup!(id, count).writeln;

//B
a.hashGroup!abs.writeln;

//C
a.hashGroup!(abs, items => 
items.zip(true.repeat).assocArray)).writeln;


Where "id" is the identity function (not present in Phobos):

//A
a.hashGroup!(x => x, count).writeln;

Once Phobos has a Set data structure with a constructor helper 
function that accepts a range as argument you can also write just:

//C
a.hashGroup!(abs, set).writeln;

----------------------

Another very commonly useful range function is pairwise 
(https://issues.dlang.org/show_bug.cgi?id=6788 ). It's a range 
similar to cartesianProduct, but it generates a different and 
equally commonly useful sequence:


[1, 2, 3, 4].pairwise
Or:
[1, 2, 3, 4].pairwise!2

should yield:

(1, 2)
(1, 3)
(1, 4)
(2, 3)
(2, 4)
(3, 4)


[1, 2, 3, 4].pairwise!3

should yield:

(1, 2, 3)
(1, 2, 4)
(1, 3, 4)
(2, 3, 4)


So despite its name pairwise() can generate more than just pairs 
(so better names are welcome).


Like in this lower level code that generates those outputs:

void main() {
     import std.stdio, std.range, std.algorithm;
     auto a = [1, 2, 3, 4];

     foreach (immutable i; 0 .. a.length)
         foreach (immutable j; i + 1 .. a.length)
             writefln("(%d, %d)", a[i], a[j]);

     writeln;

     foreach (immutable i; 0 .. a.length)
         foreach (immutable j; i + 1 .. a.length)
             foreach (immutable k; j + 1 .. a.length)
                 writefln("(%d, %d, %d)", a[i], a[j], a[k]);
}


Essentially pairwise() encapsulates in a handy range the very 
common pattern of nested loops that iterate on distinct pairs, 
triples, etc.

----------------------


http://msdn.microsoft.com/en-us/library/ee353635.aspx

Bye,
bearophile
Jan 06 2015