www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.ldc - Emulate 64-bit mulh instruction

reply Kagamin <spam here.lot> writes:
Apparently this has no intrinsic, so wrote this code for x86 to 
compute 128 bit product:

ulong[2] mul(ulong a, ulong b)
{
     import ldc.intrinsics;
     ulong a1=cast(uint)a, a2=a>>32;
     ulong b1=cast(uint)b, b2=b>>32;
     ulong c1=a1*b1; //0+64
     ulong c2=a1*b2; //32+64
     ulong c3=a2*b1; //32+64
     ulong c4=a2*b2; //64+64
     auto d1o=llvm_uadd_with_overflow(c1,c2<<32);
     ulong d1=d1o.result;
     c4+=d1o.overflow;
     auto d2o=llvm_uadd_with_overflow(d1,c3<<32);
     ulong d2=d2o.result;
     c4+=d2o.overflow;
     //ulong d1=c1+(c2<<32);
     //ulong d2=d1+(c3<<32);
     ulong d3=c4+(c2>>32);
     ulong d4=d3+(c3>>32);
     return [d4,d2];
}

but the compiler doesn't recognize it as multiplication and 
doesn't generate single imul instruction. Is the code wrong or 
the compiler can't recognize it?
Mar 13
next sibling parent Johan Engelen <j j.nl> writes:
On Wednesday, 13 March 2019 at 16:06:34 UTC, Kagamin wrote:
 Apparently this has no intrinsic, so wrote this code for x86 to 
 compute 128 bit product:
 ...
 but the compiler doesn't recognize it as multiplication and 
 doesn't generate single imul instruction. Is the code wrong or 
 the compiler can't recognize it?
I think the compiler can't recognize it, judging from other posts online. -Johan
Mar 13
prev sibling parent lithium iodate <whatdoiknow doesntexist.net> writes:
On Wednesday, 13 March 2019 at 16:06:34 UTC, Kagamin wrote:
 Apparently this has no intrinsic, so wrote this code for x86 to 
 compute 128 bit product:
I cannot help you with your code directly, but I can propose an alternative: import ldc.intrinsics; pragma(LDC_inline_ir) R inlineIR(string s, R, P...)(P); ulong[2] mul(ulong a, ulong b) { ulong[2] result; inlineIR!(` %a = zext i64 %0 to i128 %b = zext i64 %1 to i128 %c = mul i128 %a, %b %d = bitcast [2 x i64]* %2 to i128* store i128 %c, i128* %d ret void`, void)(a, b, &result); return result; } This is optimized down to mul.
Mar 13