www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Inline Assembler rox

reply "Marco Leise" <Marco.Leise gmx.de> writes:
Porting some code from C to D I found the inline assembler very  
convenient. This is the C code (using an external NASM file):

	// dot_product returns dot product t*w of n elements.  n is rounded
	// up to a multiple of 8.  Result is scaled down by 8 bits.
	#ifdef NOASM  // no assembly language
	int dot_product(short *t, short *w, int n) {
	  int sum=0;
	  n=(n+7)&-8;
	  for (int i=0; i<n; i+=2) {
	if (lol >= 21567) printf("dp %d %d %d %d %d %d\n", n, i, t[i], w[i],  
t[i+1], w[i+1]);
	    sum+=(t[i]*w[i]+t[i+1]*w[i+1]) >> 8;
	  }
	  return sum;
	}
	#else  // The NASM version uses MMX and is about 8 times faster.
	extern "C" int dot_product(short *t, short *w, int n);  // in NASM
	#endif

In D, I can move the ASM inside the function, so there is no need for two  
declarations:

	extern (C) int dot_product(short *t, short *w, const int n) {
	    version (D_InlineAsm_X86_64) asm {
	        naked;
	        mov RCX, RDX;            // n
	        mov RAX, RDI;            // a
	        mov RDX, RSI;            // b
	        cmp RCX, 0;
	        jz done;
	        sub RAX, 16;
	        sub RDX, 16;
	        pxor XMM0, XMM0;         // sum = 0
	    loop:                        // each loop sums 4 products
	        movdqa XMM1, [RAX+RCX*2];// put parital sums of vector product in  
xmm1
	        pmaddwd XMM1, [RDX+RCX*2];
	        psrad XMM1, 8;
	        paddd XMM0, XMM1;
	        sub RCX, 8;
	        ja loop;
	        movdqa XMM1, XMM0;       // add 4 parts of xmm0 and return in eax
	        psrldq XMM1, 8;
	        paddd XMM0, XMM1;
	        movdqa XMM1, XMM0;
	        psrldq XMM1, 4;
	        paddd XMM0, XMM1;
	        movq RAX, XMM0;
	    done:
	        ret;
	    } else {
	        int sum = 0;
	        for (int i = 0; i < n; i += 4) {
	            sum += (t[i  ]*w[i  ] + t[i+1]*w[i+1]) >> 8;
	            sum += (t[i+2]*w[i+2] + t[i+3]*w[i+3]) >> 8;
	        }
	        return sum;
	    }
	}

This example also shows, how 'naked' should probably not be applied to the  
function declaration, because it contains non-asm code as well. (It could  
be "naked asm" though.) For compatibility with GDC (and in fact the  
original NASM code), I used extern(C) here as the parameter passing  
strategy.
This may also serve as a practical use case for vector operations.
Jan 22 2012
parent Walter Bright <newshound2 digitalmars.com> writes:
On 1/22/2012 2:38 AM, Marco Leise wrote:
 Porting some code from C to D I found the inline assembler very convenient.
This
 is the C code (using an external NASM file):
The original impetus for doing an inline assembler is all the nightmares we had trying to support the endless variety of assemblers that users had, all of which had different bugs and different syntax.
Jan 22 2012