digitalmars.D.learn - core.simd and dynamic arrays

John C. (28/28) Jan 26 Hello everyone. I am complete newbie in D and programming at all

ryuukk_ (7/7) Jan 26 LDC 1.36 = 1 years old

Steven Schveighoffer (4/6) Jan 26 This is the learn forum. People are learning here. Try to be
John C. (7/14) Jan 26 Sorry, but I'm not using rolling-release Linux distribution and

John C. (4/6) Jan 26 I have tested this code with LDC on run.dlang.io, segmentation

John C. (4/7) Jan 26 Actually, if I use -mcpu=avx with DMD, no error is generated.

Johan (19/29) Jan 26 The `align(32)` applies to the slice `a`, not the contents of `a`

John C. (112/123) Jan 26 Thank you, seems that it is reason for that errors. I remember

John C. (2/6) Jan 26 Except address with B(1011) at second from the right position?

Guillaume Piolat (4/11) Jan 27 You may use intel-intrinsics who:

John C. (5/8) Jan 27 I have heard about intel-intrinsics and it's really good idea to

Johan (7/63) Jan 27 This is a long-standing druntime bug: alignment of the type is

John C. <example example.com> writes:

Hello everyone. I am complete newbie in D and programming at all 
and I can't understand why dynamic arrays can't be used within 
following D code:
```d
import std.random : uniform01;
import core.simd;

void main() {
     align(32) float[] a = new float[128];
     align(32) float[] b = new float[128];
     align(32) float[] c = new float[128];

     /* filling input arrays with random numbers in [0, 1) range */
     for (size_t i = 0; i < c.length; ++i) {
         a[i], b[i] = uniform01(), uniform01();
     }

     for (size_t i = 0; i < c.length; i += 8) {
         /* seems that segfault reason hides below */
         auto va = *cast(float8 *)(&a[i]);
         auto vb = *cast(float8 *)(&b[i]);
         auto vc = va * vb;
         *cast(float8 *)(&c[i]) = vc;
     }
}
```

I have tested same code (but used instead static arrays of size 
8) and it worked correctly. For bigger static arrays code above 
even outperformed one-by-one element iterative version.
I'm using LDC compiler 1.36.0 on x86_64 Linux system with "-w -O3 
-mattr=+avx" compiler flags.

Jan 26

ryuukk_ <ryuukk.dev gmail.com> writes:

LDC 1.36 = 1 years old

latest version is LDC 1.40

with LDC 1.40, your code works on my computer


now my turn to ask a question:


why were you using a 1 years old compiler version?


common sense would be to make sure you are up to date before 
wondering why it's broken

Jan 26

Steven Schveighoffer <schveiguy gmail.com> writes:

On Sunday, 26 January 2025 at 12:56:55 UTC, ryuukk_ wrote:

 common sense would be to make sure you are up to date before 
 wondering why it's broken

This is the learn forum. People are learning here. Try to be 
nicer, there is no need for this.

-Steve

Jan 26

John C. <example example.com> writes:

On Sunday, 26 January 2025 at 12:56:55 UTC, ryuukk_ wrote:
 LDC 1.36 = 1 years old

 latest version is LDC 1.40

 with LDC 1.40, your code works on my computer


 now my turn to ask a question:


 why were you using a 1 years old compiler version?


 common sense would be to make sure you are up to date before 
 wondering why it's broken

Sorry, but I'm not using rolling-release Linux distribution and 
only version that was available in package repositories by 
default was 1.36. I have tried to switch to other package 
repositories, changing software sources and among all available 
package updates there wasn't any ldc entry. But I will try to 
update compiler to latest available GitHub release.

Jan 26

John C. <example example.com> writes:

On Sunday, 26 January 2025 at 12:45:11 UTC, John C. wrote:
 I'm using LDC compiler 1.36.0 on x86_64 Linux system with "-w 
 -O3 -mattr=+avx" compiler flags.

I have tested this code with LDC on run.dlang.io, segmentation 
fault does occur only if -mattr=+avx is used. Without this flag 
no errors are produced.

Jan 26

John C. <example example.com> writes:

On Sunday, 26 January 2025 at 13:59:09 UTC, John C. wrote:
 I have tested this code with LDC on run.dlang.io, segmentation 
 fault does occur only if -mattr=+avx is used. Without this flag 
 no errors are produced.

Actually, if I use -mcpu=avx with DMD, no error is generated. 
However, if this flag is not specified, "undefined identifier 
`float8`" error occurs.

Jan 26

Johan <j j.nl> writes:

On Sunday, 26 January 2025 at 12:45:11 UTC, John C. wrote:
 Hello everyone. I am complete newbie in D and programming at 
 all and I can't understand why dynamic arrays can't be used 
 within following D code:
 ```d
 import std.random : uniform01;
 import core.simd;

 void main() {
     align(32) float[] a = new float[128];
 ...
 ```

The `align(32)` applies to the slice `a`, not the contents of `a` 
(where `a` points to).

Some things to try:
- What exactly is the error reported? An out-of-bounds read/write 
would not result in a segfault. (but perhaps with optimization 
and UB for unaligned float8 access...)
- Print out the pointer to `a[0]` to verify what the actual 
alignment is.
- Does it work when you create an array of `float8`?  (`float8[] 
a = new float8[128/8];`)

By the way, `a[i], b[i] = uniform01(), uniform01();` does not do 
what you think it does. Rewrite to
```
a[i] = uniform01();
b[i] = uniform01();
```

cheers,
   Johan

Jan 26

John C. <example example.com> writes:

On Sunday, 26 January 2025 at 16:43:19 UTC, Johan wrote:
 The `align(32)` applies to the slice `a`, not the contents of 
 `a` (where `a` points to).

Thank you, seems that it is reason for that errors. I remember 
that dynamic array can be represented as structure with size_t 
len and pointer to memory location, so do we need to align memory 
for this memory location, not dynamic array? Even if we align 
dynamic array structure, we get five zeros at the end of it's 
address, but memory location pointed to is still unaligned, so do 
I have align it manually? I have written this code and it works 
without any error with LDC and DMD on run.dlang.io:
```d
import std.stdio : writeln, writefln;
import std.random : uniform01;
import core.memory : GC;
import core.simd;

T[] initAlignedArr(T)(size_t length) {
     auto arr = GC.malloc(T.sizeof * length + 32);
     return (cast(T*)(cast(size_t)(arr + 32) & ~ 
0x01F))[0..length];
}

void main() {
     float[] a = initAlignedArr!float(1024);
     float[] b = initAlignedArr!float(1024);
     float[] c = initAlignedArr!float(1024);

     writeln(&a, " ", &b, " ", &c);
     writeln(a.ptr, " ", b.ptr, " ", c.ptr);

     writeln("Filling array...");
     for (size_t i = 0; i < c.length; ++i) {
         a[i] = uniform01();
         b[i] = uniform01();
     }

     writeln("Performing arithmetics...");
     for (size_t i = 0; i < c.length; i += 8) {
         auto va = *cast(float8 *)(&a[i]);
         auto vb = *cast(float8 *)(&b[i]);
         auto vc = va * vb;
         *cast(float8 *)(&c[i]) = vc;
     }

     writeln("Checking array...");
     for (size_t i = 0; i < c.length; i += 8) {
         if (c[i] != a[i] * b[i]) {
             writefln("Value in array c is not product (i = %s): 
%s != %s + %s", i, c[i], a[i], b[i]);
             break;
         }
     }
}
```
Output:
```
7FFE53D6FDC0 7FFE53D6FDB0 7FFE53D6FDA0
7F90BAC35020 7F90BAC37020 7F90BAC39020
Filling array...
Performing arithmetics...
Checking array...
```
 What exactly is the error reported? An out-of-bounds read/write 
 would not result in a segfault. (but perhaps with optimization 
 and UB for unaligned float8 access...)

Seems like optimization level does not change error message 
(run.dlang.io LDC, only "-mattr=+avx" flag):
```
Error: /tmp/onlineapp-223f65 failed with status: -2
        message: Segmentation fault (core dumped)
```
Without this LDC flag, no errors.
 Print out the pointer to `a[0]` to verify what the actual 
 alignment is.

If we look to output above, first line addresses are aligned to 
32 bytes, but it does not matter since we have size_t length of 
dynamic array first, then pointer and not array itself if I 
understand correctly? Second line addresses are aligned too, but 
their alignment matters.
 Does it work when you create an array of `float8`?  (`float8[] 
 a = new float8[128/8];`)

No, I have modified original code version and errors are the 
same, except for dmd with "-mcpu=avx" flag set (error changed to 
"program killed by signal 11" on run.dlang.io).
```d
import std.stdio : writeln, writefln;
import std.random : uniform01;
import core.memory : GC;
import core.simd;

void main() {
     float8[] a = new float8[128];
     float8[] b = new float8[128];
     float8[] c = new float8[128];

     writeln(&a, " ", &b, " ", &c);
     writeln(a.ptr, " ", b.ptr, " ", c.ptr);

     writeln("Filling array...");
     for (size_t i = 0; i < c.length; ++i) {
         // If I understand correctly, lines below assign 8 equal 
float values to float8 (does not matter in this test?)
         a[i] = uniform01();
         b[i] = uniform01();
     }

     writeln("Performing arithmetics...");
     for (size_t i = 0; i < c.length; ++i) {
         c[i] = a[i] * b[i];
     }

     writeln("Checking array...");
     for (size_t i = 0; i < c.length; i += 8) {
         if (c[i].array != (a[i] * b[i]).array) {
             writefln("Value in array c is not product (i = %s): 
%s != %s + %s", i, c[i], a[i], b[i]);
             break;
         }
     }
}

```
Output:
```
7FFF602EF5A0 7FFF602EF590 7FFF602EF580
7F15CB784010 7F15CB786010 7F15CB788010
Filling array...
Error: /tmp/onlineapp-835ef2 failed with status: -2
        message: Segmentation fault (core dumped)
Error: program received signal 2 (Interrupt)
```
 By the way, `a[i], b[i] = uniform01(), uniform01();` does not 
 do what you think it does. Rewrite to

Oh, yesterday I became little pythonic :)

Jan 26

John C. <example example.com> writes:

On Monday, 27 January 2025 at 05:53:09 UTC, John C. wrote:
 Print out the pointer to `a[0]` to verify what the actual 
 alignment is.

 If we look to output above, first line addresses are aligned to 
 32 bytes

Except address with B(1011) at second from the right position?

Jan 26

Guillaume Piolat <first.nam_e gmail.com> writes:

On Monday, 27 January 2025 at 05:57:18 UTC, John C. wrote:
 On Monday, 27 January 2025 at 05:53:09 UTC, John C. wrote:
 Print out the pointer to `a[0]` to verify what the actual 
 alignment is.

 If we look to output above, first line addresses are aligned 
 to 32 bytes

 Except address with B(1011) at second from the right position?

You may use intel-intrinsics who:
1. guarantees float8 is there
2. have aligned malloc _mm_malloc

Jan 27

John C. <example example.com> writes:

On Monday, 27 January 2025 at 15:55:17 UTC, Guillaume Piolat 
wrote:
 You may use intel-intrinsics who:
 1. guarantees float8 is there
 2. have aligned malloc _mm_malloc

I have heard about intel-intrinsics and it's really good idea to 
use it in my code, but I wanted to try some SIMD operations with 
core.simd. But I didn't know about aligned malloc, thanks!

Jan 27

Johan <j j.nl> writes:

On Monday, 27 January 2025 at 05:53:09 UTC, John C. wrote:
 On Sunday, 26 January 2025 at 16:43:19 UTC, Johan wrote:
 The `align(32)` applies to the slice `a`, not the contents of 
 `a` (where `a` points to).

 Thank you, seems that it is reason for that errors. I remember 
 that dynamic array can be represented as structure with size_t 
 len and pointer to memory location, so do we need to align 
 memory for this memory location, not dynamic array? Even if we 
 align dynamic array structure, we get five zeros at the end of 
 it's address, but memory location pointed to is still 
 unaligned, so do I have align it manually?

Exactly.


 Does it work when you create an array of `float8`?  (`float8[] 
 a = new float8[128/8];`)

 No, I have modified original code version and errors are the 
 same, except for dmd with "-mcpu=avx" flag set (error changed 
 to "program killed by signal 11" on run.dlang.io).
 ```d
 import std.stdio : writeln, writefln;
 import std.random : uniform01;
 import core.memory : GC;
 import core.simd;

 void main() {
     float8[] a = new float8[128];
     float8[] b = new float8[128];
     float8[] c = new float8[128];

     writeln(&a, " ", &b, " ", &c);
     writeln(a.ptr, " ", b.ptr, " ", c.ptr);

     writeln("Filling array...");
     for (size_t i = 0; i < c.length; ++i) {
         // If I understand correctly, lines below assign 8 
 equal float values to float8 (does not matter in this test?)
         a[i] = uniform01();
         b[i] = uniform01();
     }

     writeln("Performing arithmetics...");
     for (size_t i = 0; i < c.length; ++i) {
         c[i] = a[i] * b[i];
     }

     writeln("Checking array...");
     for (size_t i = 0; i < c.length; i += 8) {
         if (c[i].array != (a[i] * b[i]).array) {
             writefln("Value in array c is not product (i = %s): 
 %s != %s + %s", i, c[i], a[i], b[i]);
             break;
         }
     }
 }

 ```
 Output:
 ```
 7FFF602EF5A0 7FFF602EF590 7FFF602EF580
 7F15CB784010 7F15CB786010 7F15CB788010
 Filling array...
 Error: /tmp/onlineapp-835ef2 failed with status: -2
        message: Segmentation fault (core dumped)
 Error: program received signal 2 (Interrupt)
 ```

This is a long-standing druntime bug: alignment of the type is 
not taken into account for GC allocations... Wow.
https://github.com/dlang/dmd/issues/17259

-Johan


- Johan

Jan 27

D Programming

C/C++ Programming

Other

digitalmars.D.learn - core.simd and dynamic arrays