digitalmars.D - Adapting Tree Structures for Processing with SIMD,Instructions
- Andrei Alexandrescu (2/2) Sep 22 2015 A paper I found interesting:
- Iakh (5/7) Sep 22 2015 __mm_movemask_epi a cornerstone of the topic currently not
- David Nadlinger (6/9) Sep 22 2015 From ldc.gccbuiltins_x86:
- Iakh (11/20) Sep 22 2015 Your solution is platform dependent, isn't it?
- David Nadlinger (5/8) Sep 22 2015 Platform-dependent in what way? Yes, the intrinsic for PMOVMSKB
- Iakh (6/16) Sep 23 2015 Yes, I meant compiler dependent.
- Marco Leise (28/36) Sep 23 2015 Yes, it cannot be expressed in dmd's SIMD intrinsic "template"
A paper I found interesting: http://openproceedings.org/EDBT/2014/paper_107.pdf -- Andrei
Sep 22 2015
On Tuesday, 22 September 2015 at 13:06:39 UTC, Andrei Alexandrescu wrote:A paper I found interesting: http://openproceedings.org/EDBT/2014/paper_107.pdf -- Andrei__mm_movemask_epi a cornerstone of the topic currently not implemented/not supported in D :( AFAIK it has irregular result format
Sep 22 2015
On Tuesday, 22 September 2015 at 16:36:42 UTC, Iakh wrote:__mm_movemask_epi a cornerstone of the topic currently not implemented/not supported in D :( AFAIK it has irregular result formatFrom ldc.gccbuiltins_x86: int __builtin_ia32_pmovmskb128(byte16); int __builtin_ia32_pmovmskb256(byte32); What am I missing? — David
Sep 22 2015
On Tuesday, 22 September 2015 at 17:46:32 UTC, David Nadlinger wrote:On Tuesday, 22 September 2015 at 16:36:42 UTC, Iakh wrote:Your solution is platform dependent, isn't it? core.simd XMM enum has commented this opcode //PMOVMSKB = 0x660FD7 https://github.com/D-Programming-Language/druntime/blob/master/src/core/simd.d line 241 PMOVMSKB is opcode of the instruction. And there is no instruction generator for this opcode like this: pure nothrow nogc safe void16 __simd(XMM opcode, void16 op1, void16 op2);__mm_movemask_epi a cornerstone of the topic currently not implemented/not supported in D :( AFAIK it has irregular result formatFrom ldc.gccbuiltins_x86: int __builtin_ia32_pmovmskb128(byte16); int __builtin_ia32_pmovmskb256(byte32); What am I missing? — David
Sep 22 2015
On Tuesday, 22 September 2015 at 19:45:33 UTC, Iakh wrote:Your solution is platform dependent, isn't it?Platform-dependent in what way? Yes, the intrinsic for PMOVMSKB is obviously x86-only.core.simd XMM enum has commented this opcode //PMOVMSKB = 0x660FD7__simd is similarly DMD-only. – David
Sep 22 2015
On Tuesday, 22 September 2015 at 20:10:36 UTC, David Nadlinger wrote:On Tuesday, 22 September 2015 at 19:45:33 UTC, Iakh wrote:Yes, I meant compiler dependent.Your solution is platform dependent, isn't it?Platform-dependent in what way? Yes, the intrinsic for PMOVMSKB is obviously x86-only.core.simd XMM enum has commented this opcode //PMOVMSKB = 0x660FD7__simd is similarly DMD-only. – David__simd is similarly DMD-only. – DavidSad, didn't know it :( Thought core.simd is builtin-language feature as it described at http://dlang.org/simd.html - Iakh
Sep 23 2015
Am Tue, 22 Sep 2015 16:36:40 +0000 schrieb Iakh <iaktakh gmail.com>:On Tuesday, 22 September 2015 at 13:06:39 UTC, Andrei Alexandrescu wrote:Yes, it cannot be expressed in dmd's SIMD intrinsic "template" so it is unsupported there. Before Walter went into challenge-accepted mode towards LLVM and GDC, it was also not really important to speed up algorithms with SIMD on it. You would just be told to use one of the other compilers. Manu's std.simd also doesn't attempt to support it, because he was interested in unifying SIMD instructions available to all architectures and movemask is somewhat x86 specific. (I asked him about that instruction specifically a few years ago.) That said, in my code for string operations I use the intrinsic for GCC and LDC2 and fall back to emulated SIMD using uint or ulong on DMD. Where movemask returns packed bits that you might scan with 'bsf', in a ulong you usually end up with one high bit per byte. If you call bsf() on it and divide the result by 8 you have the byte index in the same way as with bsf(movemask(...)). On a related note, LLVM and GCC also offer extended inline assemblers that are transparent to the optimizer. You just ask for registers and/or stack memory to use and tell the compiler what registers or memory locations will be overwritten. The compiler can then hand you a few spare registers and knows what registers it needs to save before the asm block. As a result there are absolutely no seems where you placed your asm, unlike earlier generations of inline asm. -- MarcoA paper I found interesting: http://openproceedings.org/EDBT/2014/paper_107.pdf -- Andrei__mm_movemask_epi a cornerstone of the topic currently not implemented/not supported in D :( AFAIK it has irregular result format
Sep 23 2015
On Wednesday, 23 September 2015 at 09:58:39 UTC, Marco Leise wrote:Am Tue, 22 Sep 2015 16:36:40 +0000 schrieb Iakh <iaktakh gmail.com>: [...]thanks for the workaround(s)
Sep 23 2015
On Wednesday, 23 September 2015 at 09:58:39 UTC, Marco Leise wrote:Am Tue, 22 Sep 2015 16:36:40 +0000 schrieb Iakh <iaktakh gmail.com>: [...]Implementatation of SIMD find algorithm: http://forum.dlang.org/post/hwjbyqnovwbyibjusqem forum.dlang.org
Oct 25 2015