digitalmars.D - Adapting Tree Structures for Processing with SIMD,Instructions

Andrei Alexandrescu (2/2) Sep 22 2015 A paper I found interesting:

Iakh (5/7) Sep 22 2015 __mm_movemask_epi a cornerstone of the topic currently not

David Nadlinger (6/9) Sep 22 2015 From ldc.gccbuiltins_x86:

Iakh (11/20) Sep 22 2015 Your solution is platform dependent, isn't it?

David Nadlinger (5/8) Sep 22 2015 Platform-dependent in what way? Yes, the intrinsic for PMOVMSKB

Iakh (6/16) Sep 23 2015 Yes, I meant compiler dependent.

Marco Leise (28/36) Sep 23 2015 Yes, it cannot be expressed in dmd's SIMD intrinsic "template"

Iakh (3/6) Sep 23 2015 thanks for the workaround(s)
Iakh (4/7) Oct 25 2015 Implementatation of SIMD find algorithm:

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

A paper I found interesting: 
http://openproceedings.org/EDBT/2014/paper_107.pdf -- Andrei

Sep 22 2015

Iakh <iaktakh gmail.com> writes:

On Tuesday, 22 September 2015 at 13:06:39 UTC, Andrei 
Alexandrescu wrote:
 A paper I found interesting: 
 http://openproceedings.org/EDBT/2014/paper_107.pdf -- Andrei

__mm_movemask_epi a cornerstone of the topic currently not 
implemented/not supported in D :(
AFAIK it has irregular result format

Sep 22 2015

David Nadlinger <code klickverbot.at> writes:

On Tuesday, 22 September 2015 at 16:36:42 UTC, Iakh wrote:
 __mm_movemask_epi a cornerstone of the topic currently not 
 implemented/not supported in D :(
 AFAIK it has irregular result format

 From ldc.gccbuiltins_x86:

int __builtin_ia32_pmovmskb128(byte16);
int __builtin_ia32_pmovmskb256(byte32);

What am I missing?

  — David

Sep 22 2015

Iakh <iaktakh gmail.com> writes:

On Tuesday, 22 September 2015 at 17:46:32 UTC, David Nadlinger 
wrote:
 On Tuesday, 22 September 2015 at 16:36:42 UTC, Iakh wrote:
 __mm_movemask_epi a cornerstone of the topic currently not 
 implemented/not supported in D :(
 AFAIK it has irregular result format

 From ldc.gccbuiltins_x86:

 int __builtin_ia32_pmovmskb128(byte16);
 int __builtin_ia32_pmovmskb256(byte32);

 What am I missing?

  — David

Your solution is platform dependent, isn't it?

core.simd XMM enum has commented this opcode
//PMOVMSKB = 0x660FD7

https://github.com/D-Programming-Language/druntime/blob/master/src/core/simd.d
line 241

PMOVMSKB  is opcode of the instruction. And there is no 
instruction generator for this opcode like this:
pure nothrow  nogc  safe void16 __simd(XMM opcode, void16 op1, 
void16 op2);

Sep 22 2015

David Nadlinger <code klickverbot.at> writes:

On Tuesday, 22 September 2015 at 19:45:33 UTC, Iakh wrote:
 Your solution is platform dependent, isn't it?

Platform-dependent in what way? Yes, the intrinsic for PMOVMSKB 
is obviously x86-only.

 core.simd XMM enum has commented this opcode
 //PMOVMSKB = 0x660FD7

__simd is similarly DMD-only.

  – David

Sep 22 2015

Iakh <iaktakh gmail.com> writes:

On Tuesday, 22 September 2015 at 20:10:36 UTC, David Nadlinger 
wrote:
 On Tuesday, 22 September 2015 at 19:45:33 UTC, Iakh wrote:
 Your solution is platform dependent, isn't it?

 Platform-dependent in what way? Yes, the intrinsic for PMOVMSKB 
 is obviously x86-only.

 core.simd XMM enum has commented this opcode
 //PMOVMSKB = 0x660FD7

 __simd is similarly DMD-only.

  – David

Yes, I meant compiler dependent.

 __simd is similarly DMD-only.

  – David

Sad, didn't know it :( Thought core.simd is builtin-language 
feature as it described at http://dlang.org/simd.html

  - Iakh

Sep 23 2015

Marco Leise <Marco.Leise gmx.de> writes:

Am Tue, 22 Sep 2015 16:36:40 +0000
schrieb Iakh <iaktakh gmail.com>:

 On Tuesday, 22 September 2015 at 13:06:39 UTC, Andrei 
 Alexandrescu wrote:
 A paper I found interesting: 
 http://openproceedings.org/EDBT/2014/paper_107.pdf -- Andrei

 
 __mm_movemask_epi a cornerstone of the topic currently not 
 implemented/not supported in D :(
 AFAIK it has irregular result format

Yes, it cannot be expressed in dmd's SIMD intrinsic "template"
so it is unsupported there. Before Walter went into
challenge-accepted mode towards LLVM and GDC, it was also not
really important to speed up algorithms with SIMD on it. You
would just be told to use one of the other compilers.

Manu's std.simd also doesn't attempt to support it, because he
was interested in unifying SIMD instructions available to all
architectures and movemask is somewhat x86 specific. (I asked
him about that instruction specifically a few years ago.)

That said, in my code for string operations I use the intrinsic
for GCC and LDC2 and fall back to emulated SIMD using uint or
ulong on DMD. Where movemask returns packed bits that you
might scan with 'bsf', in a ulong you usually end up with one
high bit per byte. If you call bsf() on it and divide the
result by 8 you have the byte index in the same way as with
bsf(movemask(...)).

On a related note, LLVM and GCC also offer extended inline
assemblers that are transparent to the optimizer. You just ask
for registers and/or stack memory to use and tell the compiler
what registers or memory locations will be overwritten. The
compiler can then hand you a few spare registers and knows
what registers it needs to save before the asm block. As a
result there are absolutely no seems where you placed your
asm, unlike earlier generations of inline asm.

-- 
Marco

Sep 23 2015

Iakh <iaktakh gmail.com> writes:

On Wednesday, 23 September 2015 at 09:58:39 UTC, Marco Leise 
wrote:
 Am Tue, 22 Sep 2015 16:36:40 +0000
 schrieb Iakh <iaktakh gmail.com>:

 [...]

thanks for the workaround(s)

Sep 23 2015

Iakh <Iakh github.com> writes:

On Wednesday, 23 September 2015 at 09:58:39 UTC, Marco Leise 
wrote:
 Am Tue, 22 Sep 2015 16:36:40 +0000
 schrieb Iakh <iaktakh gmail.com>:

 [...]

Implementatation of SIMD find algorithm:
http://forum.dlang.org/post/hwjbyqnovwbyibjusqem forum.dlang.org

Oct 25 2015

D Programming

C/C++ Programming

Other

digitalmars.D - Adapting Tree Structures for Processing with SIMD,Instructions