www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - Red-Black Gauss-seidel with mir

reply Christoph <alt.c4 web.de> writes:
Hi all,

I am trying to implement a sweep method for a 2D Red-black 
Gauss-Seidel Solver with the help of mir and its slices.
The fastest Version I discovered so far looks like this:
```
void sweep(T, size_t Dim : 2, Color color)(in Slice!(T*, 2) F, 
Slice!(T*, 2) U, T h2)
{
     const auto m = F.shape[0];
     const auto n = F.shape[1];
     auto UF = U.field;
     auto FF = F.field;

     foreach (i; 1 .. m - 1)
     {
         for (size_t j = 1 + (i + 1 + color) % 2; j < n - 1; j += 
2)
         {
             auto flatindex = i * m + j;
             UF[flatindex] = (
                     UF[flatindex - m] +
                     UF[flatindex + m] +
                     UF[flatindex - 1] +
                     UF[flatindex + 1] - h2 * FF[flatindex]) / 4.0;
         }
     }
}
```
Accessing the field of the mirslice directly seems to be 
incredibly fast and I assume this might be the fastest version 
for this purpose?

I have already implemented this in Python with numpy and there it 
looks like this:

```
def sweep_2D(color, F, U, h2):

     m, n = F.shape

     if color == 1:

         U[1:m - 1:2, 2:n - 1:2] = (U[0:m - 2:2, 2:n - 1:2] +
                                    U[2::2, 2:n - 1:2] +
                                    U[1:m - 1:2, 1:n - 2:2] +
                                    U[1:m - 1:2, 3::2] -
                                    F[1:m - 1:2, 2:n - 1:2] * h2) 
/ (4.0)
         U[2:m - 1:2, 1:n - 1:2] = (U[1:m - 2:2, 1:n - 1:2] +
                                    U[3::2, 1:n - 1:2] +
                                    U[2:m - 1:2, 0:n - 2:2] +
                                    U[2:m - 1:2, 2::2] -
                                    F[2:m - 1:2, 1:n - 1:2] * h2) 
/ (4.0)
...

```
My Question now is: Is there a possibility too use the mir slices 
in a way that "looks" similar to the numpy version and is similar 
fast?
Below there is fastest version I found up to now, but it is still 
much slower then the version with the two for loops. It seems to 
me, that accessing and modifying only parts of a mirslice with 
this "indexed" syntax in combination with strided is really slow. 
Is this true or am I using the slices wrong?

```
static if (color == Color.red)
{
         U[1 .. m - 1, 1 .. n - 1].strided!(0, 1)(2, 2)[] = U[0 .. 
m - 2, 1 .. n - 1].strided!(0, 1)(2, 2) / 4.0;
         U[1 .. m - 1, 1 .. n - 1].strided!(0, 1)(2, 2)[] += U[2 
.. m, 1 .. n - 1].strided!(0, 1)(2, 2) / 4.0;
         U[1 .. m - 1, 1 .. n - 1].strided!(0, 1)(2, 2)[] += U[1 
.. m - 1, 0 .. n - 2].strided!(0, 1)(2, 2) / 4.0;
         U[1 .. m - 1, 1 .. n - 1].strided!(0, 1)(2, 2)[] += U[1 
.. m - 1, 2 .. n].strided!(0, 1)(2, 2) / 4.0;
         U[1 .. m - 1, 1 .. n - 1].strided!(0, 1)(2, 2)[] -= F[1 
.. m - 1, 1 .. n - 1].strided!(0, 1)(2, 2) * h2 / 4.0;

         U[2 .. m - 1, 2 .. n - 1].strided!(0, 1)(2, 2)[] = U[1 .. 
m - 2, 2 .. n - 1].strided!(0, 1)(2, 2) / 4.0;
         U[2 .. m - 1, 2 .. n - 1].strided!(0, 1)(2, 2)[] += U[3 
.. m, 2 .. n - 1].strided!(0, 1)(2, 2) / 4.0;
         U[2 .. m - 1, 2 .. n - 1].strided!(0, 1)(2, 2)[] += U[2 
.. m - 1, 1 .. n - 2].strided!(0, 1)(2, 2) / 4.0;
         U[2 .. m - 1, 2 .. n - 1].strided!(0, 1)(2, 2)[] += U[2 
.. m - 1, 3 .. n].strided!(0, 1)(2, 2) / 4.0;
         U[2 .. m - 1, 2 .. n - 1].strided!(0, 1)(2, 2)[] -= F[2 
.. m - 1, 2 .. n - 1].strided!(0, 1)(2, 2) * h2 / 4.0;

}
...

```

Thanks for your help in Advance!

Christoph
Sep 13 2020
parent reply 9il <ilyayaroshenko gmail.com> writes:
On Sunday, 13 September 2020 at 14:48:30 UTC, Christoph wrote:
 Hi all,

 I am trying to implement a sweep method for a 2D Red-black 
 Gauss-Seidel Solver with the help of mir and its slices.
 The fastest Version I discovered so far looks like this:
 ```
 void sweep(T, size_t Dim : 2, Color color)(in Slice!(T*, 2) F, 
 Slice!(T*, 2) U, T h2)
 {
     const auto m = F.shape[0];
     const auto n = F.shape[1];
     auto UF = U.field;
     auto FF = F.field;

 [...]
Hi Christoph, More details are required. What compiler and command line has been used? The full source of the benchmark would be helpful. Kind regards, Ilya
Sep 13 2020
parent reply Christoph <alt.c4 web.de> writes:
Hi Ilya,

On Sunday, 13 September 2020 at 19:29:31 UTC, 9il wrote:
 More details are required. What compiler and command line has 
 been used?
I have tested it with dmd and ldc and called them just with $ dub build --compiler=ldc(dmd) with no more configurations in the dub.json file. I have compared them with a 100 x 100 example problem and the version with the for-loops from above takes around 1.8s compiled with ldc and 0.8s compiled with dmd. The slow version from below takes with the same problem around 18s(dmd) and 12s (ldc) on my maschine. The driver function for my Gaussseidel method looks like this: ``` /++ This is a Gauss Seidel Red Black implementation it solves AU = F, with A being a poisson matrix like this 1 1 1 1 .. 1 1 4 1 0 .. 1 1 1 4 1 .. 1 . . . 0..1 4 1 . 1 .. 1 1 1 1 so the borders of U remain unchanged Params: U = slice of dimension Dim F = slice of dimension Dim h = the distance between the grid points Returns: U +/ Slice!(T*, Dim) GS_RB(T, size_t Dim, size_t max_iter = 10_000_000, size_t norm_iter = 1_000, double eps = 1e-8) (in Slice!(T*, Dim) F, Slice!(T*, Dim) U, in T h) if (1 <= Dim && Dim <= 3 && isFloatingPoint!T) { const T h2 = h * h; foreach (it; 1 .. max_iter + 1) { if (it % norm_iter == 0) { const auto norm = compute_residual!(T, Dim)(F, U, h).nrmL2; if (norm <= eps) { break; } } // rote Halbiteration sweep!(T, Dim, Color.red)(F, U, h2); // schwarze Halbiteration sweep!(T, Dim, Color.black)(F, U, h2); } return U; } ``` And the full slow version looks like this: ``` void sweep(T, size_t Dim : 2, Color color) (in Slice!(T*, 2) F, Slice!(T*, 2) U, T h2) { const auto m = F.shape[0]; const auto n = F.shape[1]; static if (color == Color.red) { U[1 .. m - 1, 1 .. n - 1].strided!(0, 1)(2, 2)[] = U[0 .. m - 2, 1 .. n - 1].strided!(0, 1)(2, 2); U[1 .. m - 1, 1 .. n - 1].strided!(0, 1)(2, 2)[] += U[2 .. m, 1 .. n - 1].strided!(0, 1)(2, 2); U[1 .. m - 1, 1 .. n - 1].strided!(0, 1)(2, 2)[] += U[1 .. m - 1, 0 .. n - 2].strided!(0, 1)(2, 2); U[1 .. m - 1, 1 .. n - 1].strided!(0, 1)(2, 2)[] += U[1 .. m - 1, 2 .. n].strided!(0, 1)(2, 2); U[1 .. m - 1, 1 .. n - 1].strided!(0, 1)(2, 2)[] -= F[1 .. m - 1, 1 .. n - 1].strided!(0, 1)(2, 2) * h2; U[1 .. m - 1, 1 .. n - 1].strided!(0, 1)(2, 2)[] /= cast(T) 4.0; U[2 .. m - 1, 2 .. n - 1].strided!(0, 1)(2, 2)[] = U[1 .. m - 2, 2 .. n - 1].strided!(0, 1)(2, 2); U[2 .. m - 1, 2 .. n - 1].strided!(0, 1)(2, 2)[] += U[3 .. m, 2 .. n - 1].strided!(0, 1)(2, 2); U[2 .. m - 1, 2 .. n - 1].strided!(0, 1)(2, 2)[] += U[2 .. m - 1, 1 .. n - 2].strided!(0, 1)(2, 2); U[2 .. m - 1, 2 .. n - 1].strided!(0, 1)(2, 2)[] += U[2 .. m - 1, 3 .. n].strided!(0, 1)(2, 2); U[2 .. m - 1, 2 .. n - 1].strided!(0, 1)(2, 2)[] -= F[2 .. m - 1, 2 .. n - 1].strided!(0, 1)(2, 2) * h2; U[2 .. m - 1, 2 .. n - 1].strided!(0, 1)(2, 2)[] /= cast(T) 4.0; } else static if (color == Color.black) { U[1 .. m - 1, 2 .. n - 1].strided!(0, 1)(2, 2)[] = U[0 .. m - 2, 2 .. n - 1].strided!(0, 1)(2, 2) / 4.0; U[1 .. m - 1, 2 .. n - 1].strided!(0, 1)(2, 2)[] += U[2 .. m, 2 .. n - 1].strided!(0, 1)(2, 2) / 4.0; U[1 .. m - 1, 2 .. n - 1].strided!(0, 1)(2, 2)[] += U[1 .. m - 1, 1 .. n - 2].strided!(0, 1)(2, 2) / 4.0; U[1 .. m - 1, 2 .. n - 1].strided!(0, 1)(2, 2)[] += U[1 .. m - 1, 3 .. n].strided!(0, 1)(2, 2) / 4.0; U[1 .. m - 1, 2 .. n - 1].strided!(0, 1)(2, 2)[] -= F[1 .. m - 1, 2 .. n - 1].strided!(0, 1)(2, 2) * h2 / 4.0; U[2 .. m - 1, 1 .. n - 1].strided!(0, 1)(2, 2)[] = U[1 .. m - 2, 1 .. n - 1].strided!(0, 1)(2, 2) / 4.0; U[2 .. m - 1, 1 .. n - 1].strided!(0, 1)(2, 2)[] += U[3 .. m, 1 .. n - 1].strided!(0, 1)(2, 2) / 4.0; U[2 .. m - 1, 1 .. n - 1].strided!(0, 1)(2, 2)[] += U[2 .. m - 1, 0 .. n - 2].strided!(0, 1)(2, 2) / 4.0; U[2 .. m - 1, 1 .. n - 1].strided!(0, 1)(2, 2)[] += U[2 .. m - 1, 2 .. n].strided!(0, 1)(2, 2) / 4.0; U[2 .. m - 1, 1 .. n - 1].strided!(0, 1)(2, 2)[] -= F[2 .. m - 1, 1 .. n - 1].strided!(0, 1)(2, 2) * h2 / 4.0; } else { static assert(false, color.stringof ~ "invalid color"); } } ``` Thank you very much for your time. Christoph
Sep 14 2020
parent reply 9il <ilyayaroshenko gmail.com> writes:
On Monday, 14 September 2020 at 09:50:16 UTC, Christoph wrote:
 Hi Ilya,

 On Sunday, 13 September 2020 at 19:29:31 UTC, 9il wrote:
 [...]
I have tested it with dmd and ldc and called them just with $ dub build --compiler=ldc(dmd) with no more configurations in the dub.json file. [...]
On Monday, 14 September 2020 at 09:50:16 UTC, Christoph wrote: For a release performance, it should be run in release mode ``` dub build --build=release --compiler=ldc2 ``` I expect it will speed up the slow version a few times. Also, the slow version has a few times more memory access then the fast version and Python. The improvement would look more like the C code and require inner loops. Your fast version looks good too me. If it is correct, it is very good.
Sep 14 2020
parent Christoph <alt.c4 web.de> writes:
On Monday, 14 September 2020 at 14:32:08 UTC, 9il wrote:
 For a release performance, it should be run in release mode
 ```
 dub build --build=release --compiler=ldc2
 ```
 I expect it will speed up the slow version a few times.
Oh yes, that made both versions much faster! Thank you very much! Christoph
Sep 15 2020