digitalmars.D - How do I do a 64 bits mulhi in D?
- deadalnix (13/13) Nov 26 2022 Mulhi is an instruction that is effectively available on any 64
- Vladimir Panteleev (9/11) Nov 26 2022 For x86 I made this:
- deadalnix (4/15) Nov 26 2022 That's a clever trick, but that only work on x86, but also will
- kinke (15/17) Nov 26 2022 A solution with LDC, using inline-(LLVM-)IR:
- deadalnix (4/21) Nov 27 2022 I wasn't aware of that feature. Not very portable, but way more
- deadalnix (4/7) Nov 27 2022 There is still a problem, but this is getting better (assuming
- Iain Buclaw (22/26) Nov 27 2022 For completeness sake, the X86 backend has built-in functions
- Vladimir Panteleev (4/6) Nov 27 2022 This is great to have (and I wish we had more standardized
- Vladimir Panteleev (4/6) Nov 27 2022 For DMD, yes. For GDC and LDC, this uses the syntax which
- Walter Bright (3/9) Nov 27 2022 Should work that into:
Mulhi is an instruction that is effectively available on any 64 bits CPU that multiply that effectively does the following: ```d ulong mulhi(ulong a, ulong b) { return (ucent(a) * ucent(b)) >> 64; } ``` And this is how you'd implement it in say, C++, using `__int128`. This operation is absolutely capital to do hashing fast, as it allows to do a ton of shit and adds in one go. You'd ask, isn't that what the good old regular multiplication does too? and no, it isn't, because it never shift right, so high bits can never influence lowers ones.
Nov 26 2022
On Sunday, 27 November 2022 at 01:26:36 UTC, deadalnix wrote:Mulhi is an instruction that is effectively available on any 64 bits CPU that multiply that effectively does the following:For x86 I made this: https://github.com/cybershadow/ae/blob/master/utils/math/longmul.d There's also a BSL-licensed copy in DustMite, used for polynomial hashing. BTW, not that any of us are students new to D, but I think such questions would be better suitable for the D.learn group. E.g., Ali often asks questions about how to achieve non-trivial goals in D there.
Nov 26 2022
On Sunday, 27 November 2022 at 01:53:11 UTC, Vladimir Panteleev wrote:On Sunday, 27 November 2022 at 01:26:36 UTC, deadalnix wrote:That's a clever trick, but that only work on x86, but also will blind the optimizer as to what's going on.Mulhi is an instruction that is effectively available on any 64 bits CPU that multiply that effectively does the following:For x86 I made this: https://github.com/cybershadow/ae/blob/master/utils/math/longmul.d There's also a BSL-licensed copy in DustMite, used for polynomial hashing. BTW, not that any of us are students new to D, but I think such questions would be better suitable for the D.learn group. E.g., Ali often asks questions about how to achieve non-trivial goals in D there.
Nov 26 2022
On Sunday, 27 November 2022 at 01:55:45 UTC, deadalnix wrote:That's a clever trick, but that only work on x86, but also will blind the optimizer as to what's going on.A solution with LDC, using inline-(LLVM-)IR: ``` ulong mulhi(ulong a, ulong b) { import ldc.llvmasm; return __ir!(` %a = zext i64 %0 to i128 %b = zext i64 %1 to i128 %r = mul i128 %a, %b %r2 = lshr i128 %r, 64 %r3 = trunc i128 %r2 to i64 ret i64 %r3`, ulong)(a, b); } ```
Nov 26 2022
On Sunday, 27 November 2022 at 03:26:30 UTC, kinke wrote:On Sunday, 27 November 2022 at 01:55:45 UTC, deadalnix wrote:I wasn't aware of that feature. Not very portable, but way more portable than the previous solution. Still compiler specific, but at least not plateform specific anymore.That's a clever trick, but that only work on x86, but also will blind the optimizer as to what's going on.A solution with LDC, using inline-(LLVM-)IR: ``` ulong mulhi(ulong a, ulong b) { import ldc.llvmasm; return __ir!(` %a = zext i64 %0 to i128 %b = zext i64 %1 to i128 %r = mul i128 %a, %b %r2 = lshr i128 %r, 64 %r3 = trunc i128 %r2 to i64 ret i64 %r3`, ulong)(a, b); } ```
Nov 27 2022
On Sunday, 27 November 2022 at 14:22:28 UTC, deadalnix wrote:I wasn't aware of that feature. Not very portable, but way more portable than the previous solution. Still compiler specific, but at least not plateform specific anymore.There is still a problem, but this is getting better (assuming one uses LDC): https://github.com/llvm/llvm-project/issues/59217
Nov 27 2022
On Sunday, 27 November 2022 at 03:26:30 UTC, kinke wrote:On Sunday, 27 November 2022 at 01:55:45 UTC, deadalnix wrote:For completeness sake, the X86 backend has built-in functions with GDC (LDC might have the same too). ``` import gcc.builtins; ulong mulhi(ulong a, ulong b) { alias __m64 = __vector(short[4]); __m64 res = __builtin_ia32_pmulhw(*cast(__m64*)&a, *cast(__m64*)&b); return *cast(ulong*)&res; } __EOF__ _D3vec5mulhiFmmZm: .cfi_startproc movq %rdi, %xmm0 movq %rsi, %xmm1 pmulhw %xmm1, %xmm0 movq %xmm0, %rax ret .cfi_endproc ```That's a clever trick, but that only work on x86, but also will blind the optimizer as to what's going on.A solution with LDC, using inline-(LLVM-)IR:
Nov 27 2022
On Sunday, 27 November 2022 at 19:20:16 UTC, Iain Buclaw wrote:For completeness sake, the X86 backend has built-in functions with GDC (LDC might have the same too).This is great to have (and I wish we had more standardized intrinsics across compilers) but I think this is doable without MMX.
Nov 27 2022
On Sunday, 27 November 2022 at 01:55:45 UTC, deadalnix wrote:That's a clever trick, but that only work on x86, but also will blind the optimizer as to what's going on.For DMD, yes. For GDC and LDC, this uses the syntax which declares input and output registers, so it's inlinable and should compile to one (inlined) instruction.
Nov 27 2022
On 11/26/2022 5:53 PM, Vladimir Panteleev wrote:On Sunday, 27 November 2022 at 01:26:36 UTC, deadalnix wrote:Should work that into: https://github.com/dlang/dmd/blob/master/druntime/src/core/int128.d#L407Mulhi is an instruction that is effectively available on any 64 bits CPU that multiply that effectively does the following:For x86 I made this: https://github.com/cybershadow/ae/blob/master/utils/math/longmul.d
Nov 27 2022