digitalmars.D.learn - How to sum multidimensional arrays?

p.shkadzko (59/59) Feb 27 2020 I'd like to sum 2D arrays. Let's create 2 random 2D arrays and

p.shkadzko (6/9) Feb 27 2020 And yes, benchmarks show that summing 2D arrays like in the

jmh530 (25/35) Feb 27 2020 What's the performance of mir like?

9il (11/49) Feb 27 2020 The same as numpy for large matrixes because the cost is memory

jmh530 (10/18) Feb 27 2020 Those differences largely came from a lack of attention to

bachmeier (10/14) Feb 27 2020 Is there a reason you can't create a struct around a double[]

p.shkadzko (39/53) Feb 27 2020 Right! Ok, here is how I do it.

9il (93/98) Feb 27 2020 Yes, the output for the following benchmark shows that Mir is 43%

9il (18/19) Feb 27 2020 ndslice Poker Face

p.shkadzko (5/24) Feb 27 2020 Yes, mir.ndslice is a straightforward choice for multidimensional

Andrea Fontana (12/17) Feb 27 2020 I think you're trying to do this:
AB (65/85) Feb 28 2020 Maybe this is already clear, but it is important to highlight

bachmeier (4/7) Feb 28 2020 Here's an old version of some of the things I've been using:
p.shkadzko (5/25) Feb 29 2020 Can you please explain what is the purpose of "alias data this"

AB (9/41) Mar 01 2020 Without "alias data this" the call

p.shkadzko <p.shkadzko gmail.com> writes:

I'd like to sum 2D arrays. Let's create 2 random 2D arrays and 
sum them.

```
import std.random : Xorshift, unpredictableSeed, uniform;
import std.range : generate, take, chunks;
import std.array : array;

static T[][] rndMatrix(T)(T max, in int rows, in int cols)
{
     Xorshift rnd;
     rnd.seed(unpredictableSeed);
     const amount = rows * cols;
     return generate(() => uniform(0, max, 
rnd)).take(amount).array.chunks(cols).array;
}

void main() {
     int[][] m1 = rndMatrix(10, 2, 3);
     int[][] m2 = rndMatrix(10, 2, 3);

     auto c = m1[] + m2[];
}
```

This won't work because the compiler will throw "Error: array 
operation m[] + m2[] without destination memory not allowed". 
Looking at 
https://forum.dlang.org/thread/wnjepbggivhutgbyjagm forum.dlang.org, I modified
the code to:


```
void main() {
     int[][] m1 = rndMatrix(10, 2, 3);
     int[][] m2 = rndMatrix(10, 2, 3);

     int[][] c;
     c.length = m[0].length;
     c[1].length = m[1].length;
	
     c[] = m1[] + m2[];
}
```
Well then I am getting

"/dlang/dmd/linux/bin64/../../src/druntime/import/core/internal/arra
/operations.d(165): Error: static assert:  "Binary + not supported for types
int[] and int[]."

Right, then I am trying the following

```
void main() {
     int[][] m1 = rndMatrix(10, 2, 3);
     int[][] m2 = rndMatrix(10, 2, 3);

     auto c = zip(m[], m2[]).map!((a, b) => a + b);

}
```

Doesn't work either because

"Error: template D main.__lambda1 cannot deduce function from 
argument types !()(Tuple!(int[], int[])), candidates are:
onlineapp.d(21):        __lambda1
(...)


So, I have to flatten first, then zip + sum and then reshape back 
to the original.

```
auto c = zip(m.joiner, m2.joiner).map!(t => t[0] + 
t[1]).array.chunks(3).array;
```

This works but it does not look very efficient considering we 
flatten and then calling array twice. It will get even worse with 
3D arrays. Is there a better way without relying on mir.ndslice?

Feb 27 2020

p.shkadzko <p.shkadzko gmail.com> writes:

On Thursday, 27 February 2020 at 14:15:26 UTC, p.shkadzko wrote:
 This works but it does not look very efficient considering we 
 flatten and then calling array twice. It will get even worse 
 with 3D arrays.

And yes, benchmarks show that summing 2D arrays like in the 
example above is significantly slower than in numpy. But that is 
to be expected... I guess.

D -- sum of two 5000 x 6000 2D arrays: 3.4 sec.
numpy -- sum of two 5000 x 6000 2D arrays: 0.0367800739913946 sec.

Feb 27 2020

jmh530 <john.michael.hall gmail.com> writes:

On Thursday, 27 February 2020 at 15:28:01 UTC, p.shkadzko wrote:
 On Thursday, 27 February 2020 at 14:15:26 UTC, p.shkadzko wrote:
 This works but it does not look very efficient considering we 
 flatten and then calling array twice. It will get even worse 
 with 3D arrays.

 And yes, benchmarks show that summing 2D arrays like in the 
 example above is significantly slower than in numpy. But that 
 is to be expected... I guess.

 D -- sum of two 5000 x 6000 2D arrays: 3.4 sec.
 numpy -- sum of two 5000 x 6000 2D arrays: 0.0367800739913946 
 sec.

What's the performance of mir like?

The code below seems to work without issue.

/+dub.sdl:
dependency "mir-algorithm" version="~>3.7.17"
dependency "mir-random" version="~>2.2.10"
+/
import std.stdio : writeln;
import mir.random : Random, unpredictableSeed;
import mir.random.variable: UniformVariable;
import mir.random.algorithm: randomSlice;

auto rndMatrix(T)(T max, in int rows, in int cols)
{
     auto gen = Random(unpredictableSeed);
     auto rv = UniformVariable!T(0.0, max);
     return randomSlice(gen, rv, rows, cols);
}

void main() {
     auto m1 = rndMatrix(10.0, 2, 3);
     auto m2 = rndMatrix(10.0, 2, 3);
     auto m3 = m1 + m2;

     writeln(m1);
     writeln(m2);
     writeln(m3);
}

Feb 27 2020

9il <ilyayaroshenko gmail.com> writes:

On Thursday, 27 February 2020 at 16:31:49 UTC, jmh530 wrote:
 On Thursday, 27 February 2020 at 15:28:01 UTC, p.shkadzko wrote:
 On Thursday, 27 February 2020 at 14:15:26 UTC, p.shkadzko 
 wrote:
 This works but it does not look very efficient considering we 
 flatten and then calling array twice. It will get even worse 
 with 3D arrays.

 And yes, benchmarks show that summing 2D arrays like in the 
 example above is significantly slower than in numpy. But that 
 is to be expected... I guess.

 D -- sum of two 5000 x 6000 2D arrays: 3.4 sec.
 numpy -- sum of two 5000 x 6000 2D arrays: 0.0367800739913946 
 sec.

 What's the performance of mir like?

 The code below seems to work without issue.

 /+dub.sdl:
 dependency "mir-algorithm" version="~>3.7.17"
 dependency "mir-random" version="~>2.2.10"
 +/
 import std.stdio : writeln;
 import mir.random : Random, unpredictableSeed;
 import mir.random.variable: UniformVariable;
 import mir.random.algorithm: randomSlice;

 auto rndMatrix(T)(T max, in int rows, in int cols)
 {
     auto gen = Random(unpredictableSeed);
     auto rv = UniformVariable!T(0.0, max);
     return randomSlice(gen, rv, rows, cols);
 }

 void main() {
     auto m1 = rndMatrix(10.0, 2, 3);
     auto m2 = rndMatrix(10.0, 2, 3);
     auto m3 = m1 + m2;

     writeln(m1);
     writeln(m2);
     writeln(m3);
 }

The same as numpy for large matrixes because the cost is memory 
access. Mir+LDC will be faster for small matrixes because it will 
flatten the inner loop and use SIMD instructions.

Few performances nitpick for your example to be fair with 
benchmarking againt the test:
1. Random (default) is slower then Xorfish.
2. double is twice larger then int and requires twice more 
memory, so it would be twice slower then int for large matrixes.

Check the prev. post, we have posted almost in the same time ;)
https://forum.dlang.org/post/izoflhyerkiladngyrov forum.dlang.org

Feb 27 2020

jmh530 <john.michael.hall gmail.com> writes:

On Thursday, 27 February 2020 at 16:39:15 UTC, 9il wrote:
 [snip]
 Few performances nitpick for your example to be fair with 
 benchmarking againt the test:
 1. Random (default) is slower then Xorfish.
 2. double is twice larger then int and requires twice more 
 memory, so it would be twice slower then int for large matrixes.

 Check the prev. post, we have posted almost in the same time ;)
 https://forum.dlang.org/post/izoflhyerkiladngyrov forum.dlang.org

Those differences largely came from a lack of attention to 
detail. I didn't notice the Xorshift until after I posted. I used 
double because it's such a force of habit for me to use 
continuous distributions.

I came across this in the documentation.
UniformVariable!T uniformVariable(T = double)(in T a, in T b)
     if(isIntegral!T)
and did a double-take until I read the note associated with it in 
the source.

Feb 27 2020

bachmeier <no spam.net> writes:

On Thursday, 27 February 2020 at 14:15:26 UTC, p.shkadzko wrote:

[...]

 This works but it does not look very efficient considering we 
 flatten and then calling array twice. It will get even worse 
 with 3D arrays. Is there a better way without relying on 
 mir.ndslice?

Is there a reason you can't create a struct around a double[] 
like this?

struct Matrix {
   double[] data;
}

Then to add Matrix A to Matrix B, you use A.data[] + B.data[]. 
But since I'm not sure what exactly you're doing, maybe that 
won't work.

Feb 27 2020

p.shkadzko <p.shkadzko gmail.com> writes:

On Thursday, 27 February 2020 at 15:48:53 UTC, bachmeier wrote:
 On Thursday, 27 February 2020 at 14:15:26 UTC, p.shkadzko wrote:

 [...]

 This works but it does not look very efficient considering we 
 flatten and then calling array twice. It will get even worse 
 with 3D arrays. Is there a better way without relying on 
 mir.ndslice?

 Is there a reason you can't create a struct around a double[] 
 like this?

 struct Matrix {
   double[] data;
 }

 Then to add Matrix A to Matrix B, you use A.data[] + B.data[]. 
 But since I'm not sure what exactly you're doing, maybe that 
 won't work.

Right! Ok, here is how I do it.

```
struct Matrix(T)
{
     T[] elems;
     int cols;

     T[][] to2D()
     {
         return elems.chunks(cols).array;
     }
}
```

and Matrix summing and random array generator functions

```
auto matrixSum(Matrix!int m1, Matrix!int m2)
{
     Matrix!int m3;
     m3.cols = m1.cols;
     m3.elems.length = m1.elems.length;
     m3.elems[] = m1.elems[] + m2.elems[];
     return m3.to2D;
}

static T[] rndArr(T)(in T max, in int elems)
{
     Xorshift rnd;
     return generate(() => uniform(0, max, rnd)).take(elems).array;
}
```
Then we do the following

```
auto m1 = Matrix!int(rndArr!int(10, 5000 * 6000), 6000);
auto m2 = Matrix!int(rndArr!int(10, 5000 * 6000), 6000);
auto m3 = matrixSum(m1, m2);
```

And it works effortlessly!
Sum of two 5000 x 6000 int arrays is just 0.105 sec! (on a 
Windows machine though but with weaker CPU).

I bet using mir.ndslice instead of D arrays would be even faster.

Feb 27 2020

9il <ilyayaroshenko gmail.com> writes:

On Thursday, 27 February 2020 at 23:15:28 UTC, p.shkadzko wrote:
 And it works effortlessly!
 Sum of two 5000 x 6000 int arrays is just 0.105 sec! (on a 
 Windows machine though but with weaker CPU).

 I bet using mir.ndslice instead of D arrays would be even 
 faster.

Yes, the output for the following benchmark shows that Mir is 43% 
faster.
However, when I have checked the assembler output, both Mir and 
Std (really LDC in both cases) generate almost the same and best 
possible loops with AVX instructions for summation.

In another hand, Mir is faster because it generates random 
matrixes faster and uses uninitialized memory for the summation 
target.

Output:
```
std: 426 ms, 432 μs, and 1 hnsec |10
mir: 297 ms, 694 μs, and 3 hnsecs |10
```

Run command:

`dub --build=release --single --compiler=ldc2 test.d`

Note that -mcpu=native flag is passed to LDC.

Source:
```
/+dub.sdl:
dependency "mir-algorithm" version="~>3.7.17"
dependency "mir-random" version="~>2.2.10"
dflags "-mcpu=native" platform="ldc"
+/

int val;

void testStd()
{
     pragma(inline, false);
     static struct Matrix(T)
     {
         import std.range;
         T[] elems;
         int cols;

         T[][] to2D()
         {
             return elems.chunks(cols).array;
         }
     }

     static auto matrixSum(Matrix!int m1, Matrix!int m2)
     {
         Matrix!int m3;
         m3.cols = m1.cols;
         m3.elems.length = m1.elems.length;
         m3.elems[] = m1.elems[] + m2.elems[];
         return m3.to2D;
     }

     static T[] rndArr(T)(in T max, in int elems)
     {
         import std.random;
         import std.range;
         Xorshift rnd;
         return generate(() => uniform(0, max, 
rnd)).take(elems).array;
     }
     auto m1 = Matrix!int(rndArr!int(10, 5000 * 6000), 6000);
     auto m2 = Matrix!int(rndArr!int(10, 5000 * 6000), 6000);
     auto m3 = matrixSum(m1, m2);
     val = m3[$-1][$-1];
}

void testMir()
{
     pragma(inline, false);
     import mir.ndslice;
     import mir.random: threadLocal;
     import mir.random.variable: uniformVar;
     import mir.random.algorithm: randomSlice;
     import mir.random.engine.xorshift;

     auto m1 = threadLocal!Xorshift.randomSlice(uniformVar!int(0, 
10), [5000, 6000]);
     auto m2 = threadLocal!Xorshift.randomSlice(uniformVar!int(0, 
10), [5000, 6000]);
     auto m3 = slice(m1 + m2);
     val = m3[$-1][$-1];
}

void main()
{
	import std.datetime.stopwatch;
     import std.stdio;
     import core.memory;
     GC.disable;
     StopWatch clock;
     clock.reset;
     clock.start;
     testStd;
     clock.stop;
     writeln("std: ", clock.peek, " |", val);
     clock.reset;
     clock.start;
     testMir;
     clock.stop;
     writeln("mir: ", clock.peek, " |", val);
}
```

Feb 27 2020

9il <ilyayaroshenko gmail.com> writes:

On Thursday, 27 February 2020 at 14:15:26 UTC, p.shkadzko wrote:
 Is there a better way without relying on mir.ndslice?

ndslice Poker Face

/+dub.sdl:
dependency "mir-algorithm" version="~>3.7.17"
dependency "mir-random" version="~>2.2.10"
+/
import mir.ndslice;
import mir.random: threadLocal;
import mir.random.variable: uniformVar;
import mir.random.algorithm: randomSlice;
import mir.random.engine.xorshift;

void main() {
     Slice!(int*, 2) m1 = 
threadLocal!Xorshift.randomSlice(uniformVar!int(0, 10), [2, 3]);
     Slice!(int*, 2) m2 = 
threadLocal!Xorshift.randomSlice(uniformVar!int(0, 10), [2, 3]);
     Slice!(int*, 2) c = slice(m1 + m2);
}

Feb 27 2020

p.shkadzko <p.shkadzko gmail.com> writes:

On Thursday, 27 February 2020 at 16:31:07 UTC, 9il wrote:
 On Thursday, 27 February 2020 at 14:15:26 UTC, p.shkadzko wrote:
 Is there a better way without relying on mir.ndslice?

 ndslice Poker Face

 /+dub.sdl:
 dependency "mir-algorithm" version="~>3.7.17"
 dependency "mir-random" version="~>2.2.10"
 +/
 import mir.ndslice;
 import mir.random: threadLocal;
 import mir.random.variable: uniformVar;
 import mir.random.algorithm: randomSlice;
 import mir.random.engine.xorshift;

 void main() {
     Slice!(int*, 2) m1 = 
 threadLocal!Xorshift.randomSlice(uniformVar!int(0, 10), [2, 3]);
     Slice!(int*, 2) m2 = 
 threadLocal!Xorshift.randomSlice(uniformVar!int(0, 10), [2, 3]);
     Slice!(int*, 2) c = slice(m1 + m2);
 }

Yes, mir.ndslice is a straightforward choice for multidimensional 
arrays. I shall do some benchmarks with it next. But first, I try 
to do it with standard D ops and see what's the rough difference 
against numpy's C.

Feb 27 2020

Andrea Fontana <nospam example.com> writes:

On Thursday, 27 February 2020 at 14:15:26 UTC, p.shkadzko wrote:
 void main() {
     int[][] m1 = rndMatrix(10, 2, 3);
     int[][] m2 = rndMatrix(10, 2, 3);

     auto c = m1[] + m2[];
 }


I think you're trying to do this:

int[][] m1 = rndMatrix(10, 2, 3);
int[][] m2 = rndMatrix(10, 2, 3);
int[][] m3;

m3.length = m1.length;
foreach(i; 0..m1.length)
{
     m3[i].length = m1[i].length;
     m3[i][] = m1[i][] + m2[i][];
}

But of course that's not the best solution :)

Feb 27 2020

AB <nospam nospam.me> writes:

On Thursday, 27 February 2020 at 14:15:26 UTC, p.shkadzko wrote:
 I'd like to sum 2D arrays. Let's create 2 random 2D arrays and 
 sum them.

 ```
 import std.random : Xorshift, unpredictableSeed, uniform;
 import std.range : generate, take, chunks;
 import std.array : array;

 static T[][] rndMatrix(T)(T max, in int rows, in int cols)
 {
     Xorshift rnd;
     rnd.seed(unpredictableSeed);
     const amount = rows * cols;
     return generate(() => uniform(0, max, 
 rnd)).take(amount).array.chunks(cols).array;
 }

 void main() {
     int[][] m1 = rndMatrix(10, 2, 3);
     int[][] m2 = rndMatrix(10, 2, 3);

     auto c = m1[] + m2[];
 }
 ```

Maybe this is already clear, but it is important to highlight 
that 2D arrays and arrays of arrays are two different things.

int[][] is an array of arrays, for each outer index the element 
is an array that has its own allocated memory and length.

2D arrays are not provided by the language, but can be 
implemented by defining a type with the required operator 
overloads.

Using an array of arrays for a 10000x10000 matrix requires 10001 
allocations while a dedicated 2D array implementation needs only 
1;


Example of an array of arrays where the inner arrays have 
different lengths:

------------
module test;
void main()
{
     import std.stdio;

     int[][] a;
     a.length=3;

     a[0]=[1,2,3];
     a[1]=[3,4];
     a[2]=[];
     writeln(a);
}
------------


Your Example with a minimal 2D array.


------------
module test2;

import std.random : Xorshift, unpredictableSeed, uniform;
import std.range : generate, take, chunks;
import std.array : array;
import std.stdio : writeln;

struct Matrix(T)
{
  int rows;
  T[] data;
  alias data this;
  int cols() {return cast(int) data.length/rows;}
  this(int r, int c) { data=new int[r*c]; rows=r;}
  this(int r, int c, T[] d) {assert(r*c==data.length); data=d; 
rows=r; }

  auto opIndex(int r, int c) {return data[rows*c+r];}

}

auto rndMatrix(T)(T max, in int rows, in int cols)
{
     Xorshift rnd;
     rnd.seed(unpredictableSeed);
     const amount = rows * cols;
     return Matrix!T(rows,cols,generate(() => uniform(0, max, 
rnd)).take(amount).array);
}

void main() {
     auto m1 = rndMatrix(10, 2, 3);
     auto m2 = rndMatrix(10, 2, 3);

     auto c = Matrix!int(2,3);
     c[] = m1[] + m2[];
     writeln(m1[1,2]);
     writeln(m2[1,2]);
     writeln(c[1,2]);
}
----------------

See https://dlang.org/spec/operatoroverloading.html#array-ops for 
a better overview of the required operators or mir.ndslice for an 
nD implementation.

Feb 28 2020

bachmeier <no spam.net> writes:

On Friday, 28 February 2020 at 16:51:10 UTC, AB wrote:

 See https://dlang.org/spec/operatoroverloading.html#array-ops 
 for a better overview of the required operators or mir.ndslice 
 for an nD implementation.

Here's an old version of some of the things I've been using:

https://bitbucket.org/bachmeil/dmdgretl/src/67a6c5dbf95f23fa753bfd590bc464147cbdc5cc/inst/gretl/matrix.d#lines-307

It has multidimensional slicing and such.

Feb 28 2020

p.shkadzko <p.shkadzko gmail.com> writes:

On Friday, 28 February 2020 at 16:51:10 UTC, AB wrote:
 On Thursday, 27 February 2020 at 14:15:26 UTC, p.shkadzko wrote:
[...]

 Your Example with a minimal 2D array.

 ------------
 module test2;

 import std.random : Xorshift, unpredictableSeed, uniform;
 import std.range : generate, take, chunks;
 import std.array : array;
 import std.stdio : writeln;

 struct Matrix(T)
 {
  int rows;
  T[] data;
  alias data this;
  int cols() {return cast(int) data.length/rows;}
  this(int r, int c) { data=new int[r*c]; rows=r;}
  this(int r, int c, T[] d) {assert(r*c==data.length); data=d; 
 rows=r; }

  auto opIndex(int r, int c) {return data[rows*c+r];}

 }

Can you please explain what is the purpose of "alias data this" 
in your Matrix struct? As I remember "alias <member> this" is 
used for implicit type conversions but I don't see where "data" 
is converted.

Feb 29 2020

AB <a a.a> writes:

On Saturday, 29 February 2020 at 19:04:12 UTC, p.shkadzko wrote:
 On Friday, 28 February 2020 at 16:51:10 UTC, AB wrote:
 On Thursday, 27 February 2020 at 14:15:26 UTC, p.shkadzko 
 wrote:
[...]

 Your Example with a minimal 2D array.

 ------------
 module test2;

 import std.random : Xorshift, unpredictableSeed, uniform;
 import std.range : generate, take, chunks;
 import std.array : array;
 import std.stdio : writeln;

 struct Matrix(T)
 {
  int rows;
  T[] data;
  alias data this;
  int cols() {return cast(int) data.length/rows;}
  this(int r, int c) { data=new int[r*c]; rows=r;}
  this(int r, int c, T[] d) {assert(r*c==data.length); data=d; 
 rows=r; }

  auto opIndex(int r, int c) {return data[rows*c+r];}

 }

 Can you please explain what is the purpose of "alias data this" 
 in your Matrix struct? As I remember "alias <member> this" is 
 used for implicit type conversions but I don't see where "data" 
 is converted.

Without "alias data this" the call

     c[] = m1[] + m2[];

should be written as

     c.data[] = m1.data[] + m2.data[];

With "alias data this" if some members of Matrix are not defined, 
but they are available for Matrix.data, the members of 
Matrix.data will be used.

It is more about names lookup rules than type conversions.

Mar 01 2020

D Programming

C/C++ Programming

Other

digitalmars.D.learn - How to sum multidimensional arrays?