www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - How do I do a 64 bits mulhi in D?

reply deadalnix <deadalnix gmail.com> writes:
Mulhi is an instruction that is effectively available on any 64 
bits CPU that multiply that effectively does the following:

```d
ulong mulhi(ulong a, ulong b) {
     return (ucent(a) * ucent(b)) >> 64;
}
```

And this is how you'd implement it in say, C++, using `__int128`.

This operation is absolutely capital to do hashing fast, as it 
allows to do a ton of shit and adds in one go. You'd ask, isn't 
that what the good old regular multiplication does too? and no, 
it isn't, because it never shift right, so high bits can never 
influence lowers ones.
Nov 26 2022
parent reply Vladimir Panteleev <thecybershadow.lists gmail.com> writes:
On Sunday, 27 November 2022 at 01:26:36 UTC, deadalnix wrote:
 Mulhi is an instruction that is effectively available on any 64 
 bits CPU that multiply that effectively does the following:
For x86 I made this: https://github.com/cybershadow/ae/blob/master/utils/math/longmul.d There's also a BSL-licensed copy in DustMite, used for polynomial hashing. BTW, not that any of us are students new to D, but I think such questions would be better suitable for the D.learn group. E.g., Ali often asks questions about how to achieve non-trivial goals in D there.
Nov 26 2022
next sibling parent reply deadalnix <deadalnix gmail.com> writes:
On Sunday, 27 November 2022 at 01:53:11 UTC, Vladimir Panteleev 
wrote:
 On Sunday, 27 November 2022 at 01:26:36 UTC, deadalnix wrote:
 Mulhi is an instruction that is effectively available on any 
 64 bits CPU that multiply that effectively does the following:
For x86 I made this: https://github.com/cybershadow/ae/blob/master/utils/math/longmul.d There's also a BSL-licensed copy in DustMite, used for polynomial hashing. BTW, not that any of us are students new to D, but I think such questions would be better suitable for the D.learn group. E.g., Ali often asks questions about how to achieve non-trivial goals in D there.
That's a clever trick, but that only work on x86, but also will blind the optimizer as to what's going on.
Nov 26 2022
next sibling parent reply kinke <noone nowhere.com> writes:
On Sunday, 27 November 2022 at 01:55:45 UTC, deadalnix wrote:
 That's a clever trick, but that only work on x86, but also will 
 blind the optimizer as to what's going on.
A solution with LDC, using inline-(LLVM-)IR: ``` ulong mulhi(ulong a, ulong b) { import ldc.llvmasm; return __ir!(` %a = zext i64 %0 to i128 %b = zext i64 %1 to i128 %r = mul i128 %a, %b %r2 = lshr i128 %r, 64 %r3 = trunc i128 %r2 to i64 ret i64 %r3`, ulong)(a, b); } ```
Nov 26 2022
next sibling parent reply deadalnix <deadalnix gmail.com> writes:
On Sunday, 27 November 2022 at 03:26:30 UTC, kinke wrote:
 On Sunday, 27 November 2022 at 01:55:45 UTC, deadalnix wrote:
 That's a clever trick, but that only work on x86, but also 
 will blind the optimizer as to what's going on.
A solution with LDC, using inline-(LLVM-)IR: ``` ulong mulhi(ulong a, ulong b) { import ldc.llvmasm; return __ir!(` %a = zext i64 %0 to i128 %b = zext i64 %1 to i128 %r = mul i128 %a, %b %r2 = lshr i128 %r, 64 %r3 = trunc i128 %r2 to i64 ret i64 %r3`, ulong)(a, b); } ```
I wasn't aware of that feature. Not very portable, but way more portable than the previous solution. Still compiler specific, but at least not plateform specific anymore.
Nov 27 2022
parent deadalnix <deadalnix gmail.com> writes:
On Sunday, 27 November 2022 at 14:22:28 UTC, deadalnix wrote:
 I wasn't aware of that feature. Not very portable, but way more 
 portable than the previous solution. Still compiler specific, 
 but at least not plateform specific anymore.
There is still a problem, but this is getting better (assuming one uses LDC): https://github.com/llvm/llvm-project/issues/59217
Nov 27 2022
prev sibling parent reply Iain Buclaw <ibuclaw gdcproject.org> writes:
On Sunday, 27 November 2022 at 03:26:30 UTC, kinke wrote:
 On Sunday, 27 November 2022 at 01:55:45 UTC, deadalnix wrote:
 That's a clever trick, but that only work on x86, but also 
 will blind the optimizer as to what's going on.
A solution with LDC, using inline-(LLVM-)IR:
For completeness sake, the X86 backend has built-in functions with GDC (LDC might have the same too). ``` import gcc.builtins; ulong mulhi(ulong a, ulong b) { alias __m64 = __vector(short[4]); __m64 res = __builtin_ia32_pmulhw(*cast(__m64*)&a, *cast(__m64*)&b); return *cast(ulong*)&res; } __EOF__ _D3vec5mulhiFmmZm: .cfi_startproc movq %rdi, %xmm0 movq %rsi, %xmm1 pmulhw %xmm1, %xmm0 movq %xmm0, %rax ret .cfi_endproc ```
Nov 27 2022
parent Vladimir Panteleev <thecybershadow.lists gmail.com> writes:
On Sunday, 27 November 2022 at 19:20:16 UTC, Iain Buclaw wrote:
 For completeness sake, the X86 backend has built-in functions 
 with GDC (LDC might have the same too).
This is great to have (and I wish we had more standardized intrinsics across compilers) but I think this is doable without MMX.
Nov 27 2022
prev sibling parent Vladimir Panteleev <thecybershadow.lists gmail.com> writes:
On Sunday, 27 November 2022 at 01:55:45 UTC, deadalnix wrote:
 That's a clever trick, but that only work on x86, but also will 
 blind the optimizer as to what's going on.
For DMD, yes. For GDC and LDC, this uses the syntax which declares input and output registers, so it's inlinable and should compile to one (inlined) instruction.
Nov 27 2022
prev sibling parent Walter Bright <newshound2 digitalmars.com> writes:
On 11/26/2022 5:53 PM, Vladimir Panteleev wrote:
 On Sunday, 27 November 2022 at 01:26:36 UTC, deadalnix wrote:
 Mulhi is an instruction that is effectively available on any 64 bits CPU that 
 multiply that effectively does the following:
For x86 I made this: https://github.com/cybershadow/ae/blob/master/utils/math/longmul.d
Should work that into: https://github.com/dlang/dmd/blob/master/druntime/src/core/int128.d#L407
Nov 27 2022