digitalmars.D.learn - core.simd and dynamic arrays
- John C. (28/28) Jan 26 Hello everyone. I am complete newbie in D and programming at all
- ryuukk_ (7/7) Jan 26 LDC 1.36 = 1 years old
- Steven Schveighoffer (4/6) Jan 26 This is the learn forum. People are learning here. Try to be
- John C. (7/14) Jan 26 Sorry, but I'm not using rolling-release Linux distribution and
- John C. (4/6) Jan 26 I have tested this code with LDC on run.dlang.io, segmentation
- John C. (4/7) Jan 26 Actually, if I use -mcpu=avx with DMD, no error is generated.
- Johan (19/29) Jan 26 The `align(32)` applies to the slice `a`, not the contents of `a`
- John C. (112/123) Jan 26 Thank you, seems that it is reason for that errors. I remember
- John C. (2/6) Jan 26 Except address with B(1011) at second from the right position?
- Guillaume Piolat (4/11) Jan 27 You may use intel-intrinsics who:
- John C. (5/8) Jan 27 I have heard about intel-intrinsics and it's really good idea to
- Johan (7/63) Jan 27 This is a long-standing druntime bug: alignment of the type is
Hello everyone. I am complete newbie in D and programming at all and I can't understand why dynamic arrays can't be used within following D code: ```d import std.random : uniform01; import core.simd; void main() { align(32) float[] a = new float[128]; align(32) float[] b = new float[128]; align(32) float[] c = new float[128]; /* filling input arrays with random numbers in [0, 1) range */ for (size_t i = 0; i < c.length; ++i) { a[i], b[i] = uniform01(), uniform01(); } for (size_t i = 0; i < c.length; i += 8) { /* seems that segfault reason hides below */ auto va = *cast(float8 *)(&a[i]); auto vb = *cast(float8 *)(&b[i]); auto vc = va * vb; *cast(float8 *)(&c[i]) = vc; } } ``` I have tested same code (but used instead static arrays of size 8) and it worked correctly. For bigger static arrays code above even outperformed one-by-one element iterative version. I'm using LDC compiler 1.36.0 on x86_64 Linux system with "-w -O3 -mattr=+avx" compiler flags.
Jan 26
LDC 1.36 = 1 years old latest version is LDC 1.40 with LDC 1.40, your code works on my computer now my turn to ask a question: why were you using a 1 years old compiler version? common sense would be to make sure you are up to date before wondering why it's broken
Jan 26
On Sunday, 26 January 2025 at 12:56:55 UTC, ryuukk_ wrote:common sense would be to make sure you are up to date before wondering why it's brokenThis is the learn forum. People are learning here. Try to be nicer, there is no need for this. -Steve
Jan 26
On Sunday, 26 January 2025 at 12:56:55 UTC, ryuukk_ wrote:LDC 1.36 = 1 years old latest version is LDC 1.40 with LDC 1.40, your code works on my computer now my turn to ask a question: why were you using a 1 years old compiler version? common sense would be to make sure you are up to date before wondering why it's brokenSorry, but I'm not using rolling-release Linux distribution and only version that was available in package repositories by default was 1.36. I have tried to switch to other package repositories, changing software sources and among all available package updates there wasn't any ldc entry. But I will try to update compiler to latest available GitHub release.
Jan 26
On Sunday, 26 January 2025 at 12:45:11 UTC, John C. wrote:I'm using LDC compiler 1.36.0 on x86_64 Linux system with "-w -O3 -mattr=+avx" compiler flags.I have tested this code with LDC on run.dlang.io, segmentation fault does occur only if -mattr=+avx is used. Without this flag no errors are produced.
Jan 26
On Sunday, 26 January 2025 at 13:59:09 UTC, John C. wrote:I have tested this code with LDC on run.dlang.io, segmentation fault does occur only if -mattr=+avx is used. Without this flag no errors are produced.Actually, if I use -mcpu=avx with DMD, no error is generated. However, if this flag is not specified, "undefined identifier `float8`" error occurs.
Jan 26
On Sunday, 26 January 2025 at 12:45:11 UTC, John C. wrote:Hello everyone. I am complete newbie in D and programming at all and I can't understand why dynamic arrays can't be used within following D code: ```d import std.random : uniform01; import core.simd; void main() { align(32) float[] a = new float[128]; ... ```The `align(32)` applies to the slice `a`, not the contents of `a` (where `a` points to). Some things to try: - What exactly is the error reported? An out-of-bounds read/write would not result in a segfault. (but perhaps with optimization and UB for unaligned float8 access...) - Print out the pointer to `a[0]` to verify what the actual alignment is. - Does it work when you create an array of `float8`? (`float8[] a = new float8[128/8];`) By the way, `a[i], b[i] = uniform01(), uniform01();` does not do what you think it does. Rewrite to ``` a[i] = uniform01(); b[i] = uniform01(); ``` cheers, Johan
Jan 26
On Sunday, 26 January 2025 at 16:43:19 UTC, Johan wrote:The `align(32)` applies to the slice `a`, not the contents of `a` (where `a` points to).Thank you, seems that it is reason for that errors. I remember that dynamic array can be represented as structure with size_t len and pointer to memory location, so do we need to align memory for this memory location, not dynamic array? Even if we align dynamic array structure, we get five zeros at the end of it's address, but memory location pointed to is still unaligned, so do I have align it manually? I have written this code and it works without any error with LDC and DMD on run.dlang.io: ```d import std.stdio : writeln, writefln; import std.random : uniform01; import core.memory : GC; import core.simd; T[] initAlignedArr(T)(size_t length) { auto arr = GC.malloc(T.sizeof * length + 32); return (cast(T*)(cast(size_t)(arr + 32) & ~ 0x01F))[0..length]; } void main() { float[] a = initAlignedArr!float(1024); float[] b = initAlignedArr!float(1024); float[] c = initAlignedArr!float(1024); writeln(&a, " ", &b, " ", &c); writeln(a.ptr, " ", b.ptr, " ", c.ptr); writeln("Filling array..."); for (size_t i = 0; i < c.length; ++i) { a[i] = uniform01(); b[i] = uniform01(); } writeln("Performing arithmetics..."); for (size_t i = 0; i < c.length; i += 8) { auto va = *cast(float8 *)(&a[i]); auto vb = *cast(float8 *)(&b[i]); auto vc = va * vb; *cast(float8 *)(&c[i]) = vc; } writeln("Checking array..."); for (size_t i = 0; i < c.length; i += 8) { if (c[i] != a[i] * b[i]) { writefln("Value in array c is not product (i = %s): %s != %s + %s", i, c[i], a[i], b[i]); break; } } } ``` Output: ``` 7FFE53D6FDC0 7FFE53D6FDB0 7FFE53D6FDA0 7F90BAC35020 7F90BAC37020 7F90BAC39020 Filling array... Performing arithmetics... Checking array... ```What exactly is the error reported? An out-of-bounds read/write would not result in a segfault. (but perhaps with optimization and UB for unaligned float8 access...)Seems like optimization level does not change error message (run.dlang.io LDC, only "-mattr=+avx" flag): ``` Error: /tmp/onlineapp-223f65 failed with status: -2 message: Segmentation fault (core dumped) ``` Without this LDC flag, no errors.Print out the pointer to `a[0]` to verify what the actual alignment is.If we look to output above, first line addresses are aligned to 32 bytes, but it does not matter since we have size_t length of dynamic array first, then pointer and not array itself if I understand correctly? Second line addresses are aligned too, but their alignment matters.Does it work when you create an array of `float8`? (`float8[] a = new float8[128/8];`)No, I have modified original code version and errors are the same, except for dmd with "-mcpu=avx" flag set (error changed to "program killed by signal 11" on run.dlang.io). ```d import std.stdio : writeln, writefln; import std.random : uniform01; import core.memory : GC; import core.simd; void main() { float8[] a = new float8[128]; float8[] b = new float8[128]; float8[] c = new float8[128]; writeln(&a, " ", &b, " ", &c); writeln(a.ptr, " ", b.ptr, " ", c.ptr); writeln("Filling array..."); for (size_t i = 0; i < c.length; ++i) { // If I understand correctly, lines below assign 8 equal float values to float8 (does not matter in this test?) a[i] = uniform01(); b[i] = uniform01(); } writeln("Performing arithmetics..."); for (size_t i = 0; i < c.length; ++i) { c[i] = a[i] * b[i]; } writeln("Checking array..."); for (size_t i = 0; i < c.length; i += 8) { if (c[i].array != (a[i] * b[i]).array) { writefln("Value in array c is not product (i = %s): %s != %s + %s", i, c[i], a[i], b[i]); break; } } } ``` Output: ``` 7FFF602EF5A0 7FFF602EF590 7FFF602EF580 7F15CB784010 7F15CB786010 7F15CB788010 Filling array... Error: /tmp/onlineapp-835ef2 failed with status: -2 message: Segmentation fault (core dumped) Error: program received signal 2 (Interrupt) ```By the way, `a[i], b[i] = uniform01(), uniform01();` does not do what you think it does. Rewrite toOh, yesterday I became little pythonic :)
Jan 26
On Monday, 27 January 2025 at 05:53:09 UTC, John C. wrote:Except address with B(1011) at second from the right position?Print out the pointer to `a[0]` to verify what the actual alignment is.If we look to output above, first line addresses are aligned to 32 bytes
Jan 26
On Monday, 27 January 2025 at 05:57:18 UTC, John C. wrote:On Monday, 27 January 2025 at 05:53:09 UTC, John C. wrote:You may use intel-intrinsics who: 1. guarantees float8 is there 2. have aligned malloc _mm_mallocExcept address with B(1011) at second from the right position?Print out the pointer to `a[0]` to verify what the actual alignment is.If we look to output above, first line addresses are aligned to 32 bytes
Jan 27
On Monday, 27 January 2025 at 15:55:17 UTC, Guillaume Piolat wrote:You may use intel-intrinsics who: 1. guarantees float8 is there 2. have aligned malloc _mm_mallocI have heard about intel-intrinsics and it's really good idea to use it in my code, but I wanted to try some SIMD operations with core.simd. But I didn't know about aligned malloc, thanks!
Jan 27
On Monday, 27 January 2025 at 05:53:09 UTC, John C. wrote:On Sunday, 26 January 2025 at 16:43:19 UTC, Johan wrote:Exactly.The `align(32)` applies to the slice `a`, not the contents of `a` (where `a` points to).Thank you, seems that it is reason for that errors. I remember that dynamic array can be represented as structure with size_t len and pointer to memory location, so do we need to align memory for this memory location, not dynamic array? Even if we align dynamic array structure, we get five zeros at the end of it's address, but memory location pointed to is still unaligned, so do I have align it manually?This is a long-standing druntime bug: alignment of the type is not taken into account for GC allocations... Wow. https://github.com/dlang/dmd/issues/17259 -Johan - JohanDoes it work when you create an array of `float8`? (`float8[] a = new float8[128/8];`)No, I have modified original code version and errors are the same, except for dmd with "-mcpu=avx" flag set (error changed to "program killed by signal 11" on run.dlang.io). ```d import std.stdio : writeln, writefln; import std.random : uniform01; import core.memory : GC; import core.simd; void main() { float8[] a = new float8[128]; float8[] b = new float8[128]; float8[] c = new float8[128]; writeln(&a, " ", &b, " ", &c); writeln(a.ptr, " ", b.ptr, " ", c.ptr); writeln("Filling array..."); for (size_t i = 0; i < c.length; ++i) { // If I understand correctly, lines below assign 8 equal float values to float8 (does not matter in this test?) a[i] = uniform01(); b[i] = uniform01(); } writeln("Performing arithmetics..."); for (size_t i = 0; i < c.length; ++i) { c[i] = a[i] * b[i]; } writeln("Checking array..."); for (size_t i = 0; i < c.length; i += 8) { if (c[i].array != (a[i] * b[i]).array) { writefln("Value in array c is not product (i = %s): %s != %s + %s", i, c[i], a[i], b[i]); break; } } } ``` Output: ``` 7FFF602EF5A0 7FFF602EF590 7FFF602EF580 7F15CB784010 7F15CB786010 7F15CB788010 Filling array... Error: /tmp/onlineapp-835ef2 failed with status: -2 message: Segmentation fault (core dumped) Error: program received signal 2 (Interrupt) ```
Jan 27