digitalmars.D.ldc - Emulate 64-bit mulh instruction
- Kagamin (26/26) Mar 13 2019 Apparently this has no intrinsic, so wrote this code for x86 to
- Johan Engelen (4/10) Mar 13 2019 I think the compiler can't recognize it, judging from other posts
- lithium iodate (19/21) Mar 13 2019 I cannot help you with your code directly, but I can propose an
Apparently this has no intrinsic, so wrote this code for x86 to compute 128 bit product: ulong[2] mul(ulong a, ulong b) { import ldc.intrinsics; ulong a1=cast(uint)a, a2=a>>32; ulong b1=cast(uint)b, b2=b>>32; ulong c1=a1*b1; //0+64 ulong c2=a1*b2; //32+64 ulong c3=a2*b1; //32+64 ulong c4=a2*b2; //64+64 auto d1o=llvm_uadd_with_overflow(c1,c2<<32); ulong d1=d1o.result; c4+=d1o.overflow; auto d2o=llvm_uadd_with_overflow(d1,c3<<32); ulong d2=d2o.result; c4+=d2o.overflow; //ulong d1=c1+(c2<<32); //ulong d2=d1+(c3<<32); ulong d3=c4+(c2>>32); ulong d4=d3+(c3>>32); return [d4,d2]; } but the compiler doesn't recognize it as multiplication and doesn't generate single imul instruction. Is the code wrong or the compiler can't recognize it?
Mar 13 2019
On Wednesday, 13 March 2019 at 16:06:34 UTC, Kagamin wrote:Apparently this has no intrinsic, so wrote this code for x86 to compute 128 bit product: ... but the compiler doesn't recognize it as multiplication and doesn't generate single imul instruction. Is the code wrong or the compiler can't recognize it?I think the compiler can't recognize it, judging from other posts online. -Johan
Mar 13 2019
On Wednesday, 13 March 2019 at 16:06:34 UTC, Kagamin wrote:Apparently this has no intrinsic, so wrote this code for x86 to compute 128 bit product:I cannot help you with your code directly, but I can propose an alternative: import ldc.intrinsics; pragma(LDC_inline_ir) R inlineIR(string s, R, P...)(P); ulong[2] mul(ulong a, ulong b) { ulong[2] result; inlineIR!(` %a = zext i64 %0 to i128 %b = zext i64 %1 to i128 %c = mul i128 %a, %b %d = bitcast [2 x i64]* %2 to i128* store i128 %c, i128* %d ret void`, void)(a, b, &result); return result; } This is optimized down to mul.
Mar 13 2019