digitalmars.D.learn - SIMD under LDC
- Igor (11/11) Sep 04 2017 I found that I can't use __simd function from core.simd under LDC
- Nicholas Wilson (18/31) Sep 04 2017 You have several options:
- 12345swordy (3/36) Sep 04 2017 I seen cases where the compiler fail to optimized for smid.
- Igor (5/9) Sep 05 2017 I tried it and LDC optimized build did generate SIMD instructions
- Johan Engelen (9/19) Sep 05 2017 You can use the module ldc.gccbuiltins_x86.di,
- Igor (5/26) Sep 06 2017 I'll try that this evening. Thanks! I'll also open an issue but
- Igor (18/49) Sep 06 2017 I opened a feature request on github. I also tried using the
- Johan Engelen (10/14) Sep 07 2017 That's because SSSE3 instructions are not enabled by default, so
I found that I can't use __simd function from core.simd under LDC and that it has ldc.simd but I couldn't find how to implement equivalent to this with it: ubyte16* masks = ...; foreach (ref c; pixels) { c = __simd(XMM.PSHUFB, c, *masks); } I see it has shufflevector function but it only accepts constant masks and I am using a variable one. Is this possible under LDC? BTW. Shuffling channels within pixels using DMD simd is about 5 times faster than with normal code on my machine :)
Sep 04 2017
On Monday, 4 September 2017 at 20:39:11 UTC, Igor wrote:I found that I can't use __simd function from core.simd under LDCCorrect LDC does not support the core.simd interface.and that it has ldc.simd but I couldn't find how to implement equivalent to this with it: ubyte16* masks = ...; foreach (ref c; pixels) { c = __simd(XMM.PSHUFB, c, *masks); } I see it has shufflevector function but it only accepts constant masks and I am using a variable one. Is this possible under LDC?You have several options: * write a regular for loop and let LDC's optimiser take care of the rest. alias mask_t = ReturnType!(equalMask!ubyte16); pragma(LDC_intrinsic, "llvm.masked.load.v16i8.p0v16i8") ubyte16 llvm_masked_load(ubyte16* val,int align, mask_t mask, ubyte16 fallthru); ubyte16* masks = ...; foreach (ref c; pixels) { auto mask = equalMask!ubyte16(*masks, [-1,-1,-1, ...]); c = llvm_masked_load(&c,16,mask, [0,0,0,0 ... ]); } The second one might not work, because of type differences in llvm, but should serve as a guide to hacking the `cmpMask` IR code in ldc.simd to do what you want it to.BTW. Shuffling channels within pixels using DMD simd is about 5 times faster than with normal code on my machine :)Don't underestimate ldc's optimiser ;)
Sep 04 2017
On Monday, 4 September 2017 at 23:06:27 UTC, Nicholas Wilson wrote:On Monday, 4 September 2017 at 20:39:11 UTC, Igor wrote:I seen cases where the compiler fail to optimized for smid.I found that I can't use __simd function from core.simd under LDCCorrect LDC does not support the core.simd interface.and that it has ldc.simd but I couldn't find how to implement equivalent to this with it: ubyte16* masks = ...; foreach (ref c; pixels) { c = __simd(XMM.PSHUFB, c, *masks); } I see it has shufflevector function but it only accepts constant masks and I am using a variable one. Is this possible under LDC?You have several options: * write a regular for loop and let LDC's optimiser take care of the rest. alias mask_t = ReturnType!(equalMask!ubyte16); pragma(LDC_intrinsic, "llvm.masked.load.v16i8.p0v16i8") ubyte16 llvm_masked_load(ubyte16* val,int align, mask_t mask, ubyte16 fallthru); ubyte16* masks = ...; foreach (ref c; pixels) { auto mask = equalMask!ubyte16(*masks, [-1,-1,-1, ...]); c = llvm_masked_load(&c,16,mask, [0,0,0,0 ... ]); } The second one might not work, because of type differences in llvm, but should serve as a guide to hacking the `cmpMask` IR code in ldc.simd to do what you want it to.BTW. Shuffling channels within pixels using DMD simd is about 5 times faster than with normal code on my machine :)Don't underestimate ldc's optimiser ;)
Sep 04 2017
On Tuesday, 5 September 2017 at 01:11:29 UTC, 12345swordy wrote:On Monday, 4 September 2017 at 23:06:27 UTC, Nicholas Wilson wrote:I tried it and LDC optimized build did generate SIMD instructions from regular code but it used multiple ones to do job so it is about 1.4 times slower than manual SIMD version with DMD. That is probably good enough for me.Don't underestimate ldc's optimiser ;)I seen cases where the compiler fail to optimized for smid.
Sep 05 2017
On Monday, 4 September 2017 at 20:39:11 UTC, Igor wrote:I found that I can't use __simd function from core.simd under LDC and that it has ldc.simd but I couldn't find how to implement equivalent to this with it: ubyte16* masks = ...; foreach (ref c; pixels) { c = __simd(XMM.PSHUFB, c, *masks); } I see it has shufflevector function but it only accepts constant masks and I am using a variable one. Is this possible under LDC?You can use the module ldc.gccbuiltins_x86.di, __builtin_ia32_pshufb128 and __builtin_ia32_pshufb256. (also see https://gcc.gnu.org/onlinedocs/gcc-4.4.5/gcc/X86-Built_002din-Functions.html) Please file a feature request about shufflevector with variable mask in our (LDC) issue tracker on Github; with some code that you'd expect to work. Thanks. - Johan
Sep 05 2017
On Tuesday, 5 September 2017 at 18:50:34 UTC, Johan Engelen wrote:On Monday, 4 September 2017 at 20:39:11 UTC, Igor wrote:I'll try that this evening. Thanks! I'll also open an issue but are you sure such feature request is valid since LLVM shufflevector instruction, as far as I see, only supports constant masks as well.I found that I can't use __simd function from core.simd under LDC and that it has ldc.simd but I couldn't find how to implement equivalent to this with it: ubyte16* masks = ...; foreach (ref c; pixels) { c = __simd(XMM.PSHUFB, c, *masks); } I see it has shufflevector function but it only accepts constant masks and I am using a variable one. Is this possible under LDC?You can use the module ldc.gccbuiltins_x86.di, __builtin_ia32_pshufb128 and __builtin_ia32_pshufb256. (also see https://gcc.gnu.org/onlinedocs/gcc-4.4.5/gcc/X86-Built_002din-Functions.html) Please file a feature request about shufflevector with variable mask in our (LDC) issue tracker on Github; with some code that you'd expect to work. Thanks. - Johan
Sep 06 2017
On Wednesday, 6 September 2017 at 09:01:18 UTC, Igor wrote:On Tuesday, 5 September 2017 at 18:50:34 UTC, Johan Engelen wrote:I opened a feature request on github. I also tried using the gccbuiltins but I got this error: LLVM ERROR: Cannot select: 0x2199c96fd70: v16i8 = X86ISD::PSHUFB 0x2199c74e9a8, 0x2199c74d6c0 0x2199c74e9a8: v16i8,ch = CopyFromReg 0x21994bcfd90, Register:v16i8 %vreg384 0x2199c96fb00: v16i8 = Register %vreg384 0x2199c74d6c0: v16i8,ch = CopyFromReg 0x21994bcfd90, Register:v16i8 %vreg385 0x2199c74ed50: v16i8 = Register %vreg385 In function: _D7assetdb12loadBmpImageFAxaZf Building x64\LDCDebug\DNgin.exe failed! You can see the code I used here: https://github.com/igor84/dngin/blob/3c171330843af71170a6ee4ae164a76bf58c35f6/source/assetdb.d#L123 Note that if you want to try it you will need a test.bmp in specific format where header.compression == 3, like this one: https://drive.google.com/file/d/0B9l8IgnRaPwCU0hIWEtHUElhTTg/view?usp=sharingOn Monday, 4 September 2017 at 20:39:11 UTC, Igor wrote:I'll try that this evening. Thanks! I'll also open an issue but are you sure such feature request is valid since LLVM shufflevector instruction, as far as I see, only supports constant masks as well.I found that I can't use __simd function from core.simd under LDC and that it has ldc.simd but I couldn't find how to implement equivalent to this with it: ubyte16* masks = ...; foreach (ref c; pixels) { c = __simd(XMM.PSHUFB, c, *masks); } I see it has shufflevector function but it only accepts constant masks and I am using a variable one. Is this possible under LDC?You can use the module ldc.gccbuiltins_x86.di, __builtin_ia32_pshufb128 and __builtin_ia32_pshufb256. (also see https://gcc.gnu.org/onlinedocs/gcc-4.4.5/gcc/X86-Built_002din-Functions.html) Please file a feature request about shufflevector with variable mask in our (LDC) issue tracker on Github; with some code that you'd expect to work. Thanks. - Johan
Sep 06 2017
On Wednesday, 6 September 2017 at 20:43:01 UTC, Igor wrote:I opened a feature request on github. I also tried using the gccbuiltins but I got this error: LLVM ERROR: Cannot select: 0x2199c96fd70: v16i8 = X86ISD::PSHUFB 0x2199c74e9a8, 0x2199c74d6c0That's because SSSE3 instructions are not enabled by default, so the compiler isn't allowed to generate the PSHUFB instruction. Some options you have: 1. Set a cpu that has ssse3, e.g. compile with `-mcpu=native` 2. Enable SSSE3: compile with `-mattr=+ssse3` 3. Perhaps best for your case, enable SSSE3 for that function, importing the ldc.attributes module and using the target("ssse3") UDA on that function. -Johan
Sep 07 2017
On Thursday, 7 September 2017 at 15:24:13 UTC, Johan Engelen wrote:On Wednesday, 6 September 2017 at 20:43:01 UTC, Igor wrote:Thanks Johan. I tried this and now it does compile but it crashes with Access Violation in debug build. In optimized build it seems to be working though.I opened a feature request on github. I also tried using the gccbuiltins but I got this error: LLVM ERROR: Cannot select: 0x2199c96fd70: v16i8 = X86ISD::PSHUFB 0x2199c74e9a8, 0x2199c74d6c0That's because SSSE3 instructions are not enabled by default, so the compiler isn't allowed to generate the PSHUFB instruction. Some options you have: 1. Set a cpu that has ssse3, e.g. compile with `-mcpu=native` 2. Enable SSSE3: compile with `-mattr=+ssse3` 3. Perhaps best for your case, enable SSSE3 for that function, importing the ldc.attributes module and using the target("ssse3") UDA on that function. -Johan
Sep 07 2017
On Thursday, 7 September 2017 at 16:45:40 UTC, Igor wrote:On Thursday, 7 September 2017 at 15:24:13 UTC, Johan Engelen wrote:I will try to reproduce this in minimal project and open LDC bug if successful. In the meantime can anyone tell me how to add an attribute to a function only if something is defined, since this doesn't work: version(USE_SIMD_WITH_LDC) { import ldc.attributes; target("ssse3") } void funcThatUsesSIMD() { ... version(LDC) { import ldc.gccbuiltins_x86; c = __builtin_ia32_pshufb128(c, *simdMasks); } else { c = __simd(XMM.PSHUFB, c, *simdMasks); } ... }On Wednesday, 6 September 2017 at 20:43:01 UTC, Igor wrote:Thanks Johan. I tried this and now it does compile but it crashes with Access Violation in debug build. In optimized build it seems to be working though.I opened a feature request on github. I also tried using the gccbuiltins but I got this error: LLVM ERROR: Cannot select: 0x2199c96fd70: v16i8 = X86ISD::PSHUFB 0x2199c74e9a8, 0x2199c74d6c0That's because SSSE3 instructions are not enabled by default, so the compiler isn't allowed to generate the PSHUFB instruction. Some options you have: 1. Set a cpu that has ssse3, e.g. compile with `-mcpu=native` 2. Enable SSSE3: compile with `-mattr=+ssse3` 3. Perhaps best for your case, enable SSSE3 for that function, importing the ldc.attributes module and using the target("ssse3") UDA on that function. -Johan
Sep 11 2017
On Monday, 11 September 2017 at 11:55:45 UTC, Igor wrote:In the meantime can anyone tell me how to add an attribute to a function only if something is defined, since this doesn't work: version(USE_SIMD_WITH_LDC) { import ldc.attributes; target("ssse3") } void funcThatUsesSIMD() { ... version(LDC) { import ldc.gccbuiltins_x86; c = __builtin_ia32_pshufb128(c, *simdMasks); } else { c = __simd(XMM.PSHUFB, c, *simdMasks); } ... }Regarding the crash in debug mode the problem was that my masks variable wasn't properly aligned and I guess the best I can do with the attribute is this: version(LDC) import ldc.attributes; else private struct target { string specifier; } target("ssse3") void funcThatUsesSIMD() {...}
Sep 11 2017