digitalmars.D - How do I do a 64 bits mulhi in D?

deadalnix (13/13) Nov 26 2022 Mulhi is an instruction that is effectively available on any 64

Vladimir Panteleev (9/11) Nov 26 2022 For x86 I made this:

deadalnix (4/15) Nov 26 2022 That's a clever trick, but that only work on x86, but also will

kinke (15/17) Nov 26 2022 A solution with LDC, using inline-(LLVM-)IR:

deadalnix (4/21) Nov 27 2022 I wasn't aware of that feature. Not very portable, but way more

deadalnix (4/7) Nov 27 2022 There is still a problem, but this is getting better (assuming

Iain Buclaw (22/26) Nov 27 2022 For completeness sake, the X86 backend has built-in functions

Vladimir Panteleev (4/6) Nov 27 2022 This is great to have (and I wish we had more standardized

Vladimir Panteleev (4/6) Nov 27 2022 For DMD, yes. For GDC and LDC, this uses the syntax which

Walter Bright (3/9) Nov 27 2022 Should work that into:

deadalnix <deadalnix gmail.com> writes:

Mulhi is an instruction that is effectively available on any 64 
bits CPU that multiply that effectively does the following:

```d
ulong mulhi(ulong a, ulong b) {
     return (ucent(a) * ucent(b)) >> 64;
}
```

And this is how you'd implement it in say, C++, using `__int128`.

This operation is absolutely capital to do hashing fast, as it 
allows to do a ton of shit and adds in one go. You'd ask, isn't 
that what the good old regular multiplication does too? and no, 
it isn't, because it never shift right, so high bits can never 
influence lowers ones.

Nov 26 2022

Vladimir Panteleev <thecybershadow.lists gmail.com> writes:

On Sunday, 27 November 2022 at 01:26:36 UTC, deadalnix wrote:
 Mulhi is an instruction that is effectively available on any 64 
 bits CPU that multiply that effectively does the following:

For x86 I made this: 
https://github.com/cybershadow/ae/blob/master/utils/math/longmul.d

There's also a BSL-licensed copy in DustMite, used for polynomial 
hashing.

BTW, not that any of us are students new to D, but I think such 
questions would be better suitable for the D.learn group. E.g., 
Ali often asks questions about how to achieve non-trivial goals 
in D there.

Nov 26 2022

deadalnix <deadalnix gmail.com> writes:

On Sunday, 27 November 2022 at 01:53:11 UTC, Vladimir Panteleev 
wrote:
 On Sunday, 27 November 2022 at 01:26:36 UTC, deadalnix wrote:
 Mulhi is an instruction that is effectively available on any 
 64 bits CPU that multiply that effectively does the following:

 For x86 I made this: 
 https://github.com/cybershadow/ae/blob/master/utils/math/longmul.d

 There's also a BSL-licensed copy in DustMite, used for 
 polynomial hashing.

 BTW, not that any of us are students new to D, but I think such 
 questions would be better suitable for the D.learn group. E.g., 
 Ali often asks questions about how to achieve non-trivial goals 
 in D there.

That's a clever trick, but that only work on x86, but also will 
blind the optimizer as to what's going on.

Nov 26 2022

kinke <noone nowhere.com> writes:

On Sunday, 27 November 2022 at 01:55:45 UTC, deadalnix wrote:
 That's a clever trick, but that only work on x86, but also will 
 blind the optimizer as to what's going on.

A solution with LDC, using inline-(LLVM-)IR:

```
ulong mulhi(ulong a, ulong b)
{
     import ldc.llvmasm;
     return __ir!(`
         %a = zext i64 %0 to i128
         %b = zext i64 %1 to i128
         %r = mul i128 %a, %b
         %r2 = lshr i128 %r, 64
         %r3 = trunc i128 %r2 to i64
         ret i64 %r3`, ulong)(a, b);
}
```

Nov 26 2022

deadalnix <deadalnix gmail.com> writes:

On Sunday, 27 November 2022 at 03:26:30 UTC, kinke wrote:
 On Sunday, 27 November 2022 at 01:55:45 UTC, deadalnix wrote:
 That's a clever trick, but that only work on x86, but also 
 will blind the optimizer as to what's going on.

 A solution with LDC, using inline-(LLVM-)IR:

 ```
 ulong mulhi(ulong a, ulong b)
 {
     import ldc.llvmasm;
     return __ir!(`
         %a = zext i64 %0 to i128
         %b = zext i64 %1 to i128
         %r = mul i128 %a, %b
         %r2 = lshr i128 %r, 64
         %r3 = trunc i128 %r2 to i64
         ret i64 %r3`, ulong)(a, b);
 }
 ```

I wasn't aware of that feature. Not very portable, but way more 
portable than the previous solution. Still compiler specific, but 
at least not plateform specific anymore.

Nov 27 2022

deadalnix <deadalnix gmail.com> writes:

On Sunday, 27 November 2022 at 14:22:28 UTC, deadalnix wrote:
 I wasn't aware of that feature. Not very portable, but way more 
 portable than the previous solution. Still compiler specific, 
 but at least not plateform specific anymore.

There is still a problem, but this is getting better (assuming 
one uses LDC):

https://github.com/llvm/llvm-project/issues/59217

Nov 27 2022

Iain Buclaw <ibuclaw gdcproject.org> writes:

On Sunday, 27 November 2022 at 03:26:30 UTC, kinke wrote:
 On Sunday, 27 November 2022 at 01:55:45 UTC, deadalnix wrote:
 That's a clever trick, but that only work on x86, but also 
 will blind the optimizer as to what's going on.

 A solution with LDC, using inline-(LLVM-)IR:

For completeness sake, the X86 backend has built-in functions 
with GDC (LDC might have the same too).
```
import gcc.builtins;

ulong mulhi(ulong a, ulong b)
{
     alias __m64 = __vector(short[4]);
     __m64 res = __builtin_ia32_pmulhw(*cast(__m64*)&a, 
*cast(__m64*)&b);
     return *cast(ulong*)&res;
}
__EOF__
_D3vec5mulhiFmmZm:
	.cfi_startproc
	movq	%rdi, %xmm0
	movq	%rsi, %xmm1
	pmulhw	%xmm1, %xmm0
	movq	%xmm0, %rax
	ret
	.cfi_endproc
```

Nov 27 2022

Vladimir Panteleev <thecybershadow.lists gmail.com> writes:

On Sunday, 27 November 2022 at 19:20:16 UTC, Iain Buclaw wrote:
 For completeness sake, the X86 backend has built-in functions 
 with GDC (LDC might have the same too).

This is great to have (and I wish we had more standardized 
intrinsics across compilers) but I think this is doable without 
MMX.

Nov 27 2022

Vladimir Panteleev <thecybershadow.lists gmail.com> writes:

On Sunday, 27 November 2022 at 01:55:45 UTC, deadalnix wrote:
 That's a clever trick, but that only work on x86, but also will 
 blind the optimizer as to what's going on.

For DMD, yes. For GDC and LDC, this uses the syntax which 
declares input and output registers, so it's inlinable and should 
compile to one (inlined) instruction.

Nov 27 2022

Walter Bright <newshound2 digitalmars.com> writes:

On 11/26/2022 5:53 PM, Vladimir Panteleev wrote:
 On Sunday, 27 November 2022 at 01:26:36 UTC, deadalnix wrote:
 Mulhi is an instruction that is effectively available on any 64 bits CPU that 
 multiply that effectively does the following:

 
 For x86 I made this: 
 https://github.com/cybershadow/ae/blob/master/utils/math/longmul.d


Should work that into:

https://github.com/dlang/dmd/blob/master/druntime/src/core/int128.d#L407

Nov 27 2022

D Programming

C/C++ Programming

Other

digitalmars.D - How do I do a 64 bits mulhi in D?