www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - core.simd and dynamic arrays

reply John C. <example example.com> writes:
Hello everyone. I am complete newbie in D and programming at all 
and I can't understand why dynamic arrays can't be used within 
following D code:
```d
import std.random : uniform01;
import core.simd;

void main() {
     align(32) float[] a = new float[128];
     align(32) float[] b = new float[128];
     align(32) float[] c = new float[128];

     /* filling input arrays with random numbers in [0, 1) range */
     for (size_t i = 0; i < c.length; ++i) {
         a[i], b[i] = uniform01(), uniform01();
     }

     for (size_t i = 0; i < c.length; i += 8) {
         /* seems that segfault reason hides below */
         auto va = *cast(float8 *)(&a[i]);
         auto vb = *cast(float8 *)(&b[i]);
         auto vc = va * vb;
         *cast(float8 *)(&c[i]) = vc;
     }
}
```

I have tested same code (but used instead static arrays of size 
8) and it worked correctly. For bigger static arrays code above 
even outperformed one-by-one element iterative version.
I'm using LDC compiler 1.36.0 on x86_64 Linux system with "-w -O3 
-mattr=+avx" compiler flags.
Jan 26
next sibling parent reply ryuukk_ <ryuukk.dev gmail.com> writes:
LDC 1.36 = 1 years old

latest version is LDC 1.40

with LDC 1.40, your code works on my computer


now my turn to ask a question:


why were you using a 1 years old compiler version?


common sense would be to make sure you are up to date before 
wondering why it's broken
Jan 26
next sibling parent Steven Schveighoffer <schveiguy gmail.com> writes:
On Sunday, 26 January 2025 at 12:56:55 UTC, ryuukk_ wrote:

 common sense would be to make sure you are up to date before 
 wondering why it's broken
This is the learn forum. People are learning here. Try to be nicer, there is no need for this. -Steve
Jan 26
prev sibling parent John C. <example example.com> writes:
On Sunday, 26 January 2025 at 12:56:55 UTC, ryuukk_ wrote:
 LDC 1.36 = 1 years old

 latest version is LDC 1.40

 with LDC 1.40, your code works on my computer


 now my turn to ask a question:


 why were you using a 1 years old compiler version?


 common sense would be to make sure you are up to date before 
 wondering why it's broken
Sorry, but I'm not using rolling-release Linux distribution and only version that was available in package repositories by default was 1.36. I have tried to switch to other package repositories, changing software sources and among all available package updates there wasn't any ldc entry. But I will try to update compiler to latest available GitHub release.
Jan 26
prev sibling next sibling parent reply John C. <example example.com> writes:
On Sunday, 26 January 2025 at 12:45:11 UTC, John C. wrote:
 I'm using LDC compiler 1.36.0 on x86_64 Linux system with "-w 
 -O3 -mattr=+avx" compiler flags.
I have tested this code with LDC on run.dlang.io, segmentation fault does occur only if -mattr=+avx is used. Without this flag no errors are produced.
Jan 26
parent John C. <example example.com> writes:
On Sunday, 26 January 2025 at 13:59:09 UTC, John C. wrote:
 I have tested this code with LDC on run.dlang.io, segmentation 
 fault does occur only if -mattr=+avx is used. Without this flag 
 no errors are produced.
Actually, if I use -mcpu=avx with DMD, no error is generated. However, if this flag is not specified, "undefined identifier `float8`" error occurs.
Jan 26
prev sibling parent reply Johan <j j.nl> writes:
On Sunday, 26 January 2025 at 12:45:11 UTC, John C. wrote:
 Hello everyone. I am complete newbie in D and programming at 
 all and I can't understand why dynamic arrays can't be used 
 within following D code:
 ```d
 import std.random : uniform01;
 import core.simd;

 void main() {
     align(32) float[] a = new float[128];
 ...
 ```
The `align(32)` applies to the slice `a`, not the contents of `a` (where `a` points to). Some things to try: - What exactly is the error reported? An out-of-bounds read/write would not result in a segfault. (but perhaps with optimization and UB for unaligned float8 access...) - Print out the pointer to `a[0]` to verify what the actual alignment is. - Does it work when you create an array of `float8`? (`float8[] a = new float8[128/8];`) By the way, `a[i], b[i] = uniform01(), uniform01();` does not do what you think it does. Rewrite to ``` a[i] = uniform01(); b[i] = uniform01(); ``` cheers, Johan
Jan 26
parent reply John C. <example example.com> writes:
On Sunday, 26 January 2025 at 16:43:19 UTC, Johan wrote:
 The `align(32)` applies to the slice `a`, not the contents of 
 `a` (where `a` points to).
Thank you, seems that it is reason for that errors. I remember that dynamic array can be represented as structure with size_t len and pointer to memory location, so do we need to align memory for this memory location, not dynamic array? Even if we align dynamic array structure, we get five zeros at the end of it's address, but memory location pointed to is still unaligned, so do I have align it manually? I have written this code and it works without any error with LDC and DMD on run.dlang.io: ```d import std.stdio : writeln, writefln; import std.random : uniform01; import core.memory : GC; import core.simd; T[] initAlignedArr(T)(size_t length) { auto arr = GC.malloc(T.sizeof * length + 32); return (cast(T*)(cast(size_t)(arr + 32) & ~ 0x01F))[0..length]; } void main() { float[] a = initAlignedArr!float(1024); float[] b = initAlignedArr!float(1024); float[] c = initAlignedArr!float(1024); writeln(&a, " ", &b, " ", &c); writeln(a.ptr, " ", b.ptr, " ", c.ptr); writeln("Filling array..."); for (size_t i = 0; i < c.length; ++i) { a[i] = uniform01(); b[i] = uniform01(); } writeln("Performing arithmetics..."); for (size_t i = 0; i < c.length; i += 8) { auto va = *cast(float8 *)(&a[i]); auto vb = *cast(float8 *)(&b[i]); auto vc = va * vb; *cast(float8 *)(&c[i]) = vc; } writeln("Checking array..."); for (size_t i = 0; i < c.length; i += 8) { if (c[i] != a[i] * b[i]) { writefln("Value in array c is not product (i = %s): %s != %s + %s", i, c[i], a[i], b[i]); break; } } } ``` Output: ``` 7FFE53D6FDC0 7FFE53D6FDB0 7FFE53D6FDA0 7F90BAC35020 7F90BAC37020 7F90BAC39020 Filling array... Performing arithmetics... Checking array... ```
 What exactly is the error reported? An out-of-bounds read/write 
 would not result in a segfault. (but perhaps with optimization 
 and UB for unaligned float8 access...)
Seems like optimization level does not change error message (run.dlang.io LDC, only "-mattr=+avx" flag): ``` Error: /tmp/onlineapp-223f65 failed with status: -2 message: Segmentation fault (core dumped) ``` Without this LDC flag, no errors.
 Print out the pointer to `a[0]` to verify what the actual 
 alignment is.
If we look to output above, first line addresses are aligned to 32 bytes, but it does not matter since we have size_t length of dynamic array first, then pointer and not array itself if I understand correctly? Second line addresses are aligned too, but their alignment matters.
 Does it work when you create an array of `float8`?  (`float8[] 
 a = new float8[128/8];`)
No, I have modified original code version and errors are the same, except for dmd with "-mcpu=avx" flag set (error changed to "program killed by signal 11" on run.dlang.io). ```d import std.stdio : writeln, writefln; import std.random : uniform01; import core.memory : GC; import core.simd; void main() { float8[] a = new float8[128]; float8[] b = new float8[128]; float8[] c = new float8[128]; writeln(&a, " ", &b, " ", &c); writeln(a.ptr, " ", b.ptr, " ", c.ptr); writeln("Filling array..."); for (size_t i = 0; i < c.length; ++i) { // If I understand correctly, lines below assign 8 equal float values to float8 (does not matter in this test?) a[i] = uniform01(); b[i] = uniform01(); } writeln("Performing arithmetics..."); for (size_t i = 0; i < c.length; ++i) { c[i] = a[i] * b[i]; } writeln("Checking array..."); for (size_t i = 0; i < c.length; i += 8) { if (c[i].array != (a[i] * b[i]).array) { writefln("Value in array c is not product (i = %s): %s != %s + %s", i, c[i], a[i], b[i]); break; } } } ``` Output: ``` 7FFF602EF5A0 7FFF602EF590 7FFF602EF580 7F15CB784010 7F15CB786010 7F15CB788010 Filling array... Error: /tmp/onlineapp-835ef2 failed with status: -2 message: Segmentation fault (core dumped) Error: program received signal 2 (Interrupt) ```
 By the way, `a[i], b[i] = uniform01(), uniform01();` does not 
 do what you think it does. Rewrite to
Oh, yesterday I became little pythonic :)
Jan 26
next sibling parent reply John C. <example example.com> writes:
On Monday, 27 January 2025 at 05:53:09 UTC, John C. wrote:
 Print out the pointer to `a[0]` to verify what the actual 
 alignment is.
If we look to output above, first line addresses are aligned to 32 bytes
Except address with B(1011) at second from the right position?
Jan 26
parent reply Guillaume Piolat <first.nam_e gmail.com> writes:
On Monday, 27 January 2025 at 05:57:18 UTC, John C. wrote:
 On Monday, 27 January 2025 at 05:53:09 UTC, John C. wrote:
 Print out the pointer to `a[0]` to verify what the actual 
 alignment is.
If we look to output above, first line addresses are aligned to 32 bytes
Except address with B(1011) at second from the right position?
You may use intel-intrinsics who: 1. guarantees float8 is there 2. have aligned malloc _mm_malloc
Jan 27
parent John C. <example example.com> writes:
On Monday, 27 January 2025 at 15:55:17 UTC, Guillaume Piolat 
wrote:
 You may use intel-intrinsics who:
 1. guarantees float8 is there
 2. have aligned malloc _mm_malloc
I have heard about intel-intrinsics and it's really good idea to use it in my code, but I wanted to try some SIMD operations with core.simd. But I didn't know about aligned malloc, thanks!
Jan 27
prev sibling parent Johan <j j.nl> writes:
On Monday, 27 January 2025 at 05:53:09 UTC, John C. wrote:
 On Sunday, 26 January 2025 at 16:43:19 UTC, Johan wrote:
 The `align(32)` applies to the slice `a`, not the contents of 
 `a` (where `a` points to).
Thank you, seems that it is reason for that errors. I remember that dynamic array can be represented as structure with size_t len and pointer to memory location, so do we need to align memory for this memory location, not dynamic array? Even if we align dynamic array structure, we get five zeros at the end of it's address, but memory location pointed to is still unaligned, so do I have align it manually?
Exactly.
 Does it work when you create an array of `float8`?  (`float8[] 
 a = new float8[128/8];`)
No, I have modified original code version and errors are the same, except for dmd with "-mcpu=avx" flag set (error changed to "program killed by signal 11" on run.dlang.io). ```d import std.stdio : writeln, writefln; import std.random : uniform01; import core.memory : GC; import core.simd; void main() { float8[] a = new float8[128]; float8[] b = new float8[128]; float8[] c = new float8[128]; writeln(&a, " ", &b, " ", &c); writeln(a.ptr, " ", b.ptr, " ", c.ptr); writeln("Filling array..."); for (size_t i = 0; i < c.length; ++i) { // If I understand correctly, lines below assign 8 equal float values to float8 (does not matter in this test?) a[i] = uniform01(); b[i] = uniform01(); } writeln("Performing arithmetics..."); for (size_t i = 0; i < c.length; ++i) { c[i] = a[i] * b[i]; } writeln("Checking array..."); for (size_t i = 0; i < c.length; i += 8) { if (c[i].array != (a[i] * b[i]).array) { writefln("Value in array c is not product (i = %s): %s != %s + %s", i, c[i], a[i], b[i]); break; } } } ``` Output: ``` 7FFF602EF5A0 7FFF602EF590 7FFF602EF580 7F15CB784010 7F15CB786010 7F15CB788010 Filling array... Error: /tmp/onlineapp-835ef2 failed with status: -2 message: Segmentation fault (core dumped) Error: program received signal 2 (Interrupt) ```
This is a long-standing druntime bug: alignment of the type is not taken into account for GC allocations... Wow. https://github.com/dlang/dmd/issues/17259 -Johan - Johan
Jan 27