digitalmars.D.learn - ASM access to array
- Heinz (23/23) Feb 01 2011 Hi there,
- bearophile (34/34) Feb 01 2011 Heinz:
- Heinz (42/42) Feb 01 2011 Wow, thanks for the reply. Changing the 'enum dsize' to 'uint dsize = 4;...
- bearophile (7/11) Feb 01 2011 Someday D will run on other CPUs, like 64 bit ones, so 32 bit X86 will n...
- Matthias Pleh (28/28) Feb 02 2011 As bearophile notet, unittest are always good!!
- Joel Christensen (34/34) Feb 02 2011 What about my edited version:
- Heinz (33/33) Feb 03 2011 Thanks 4 the code (this goes to Matthias Pleh too):
- bearophile (6/26) Feb 04 2011 I see you have removed the asm guard I have shown you.
- Heinz (24/24) Feb 04 2011 bearophile,
- bearophile (8/12) Feb 04 2011 That may be slow, so you need to benchmark.
Hi there, Although i've been coding in D for the last 5 years, i've never got my hands into ASM until now. I'm trying to use the inline assembler now and i'm trying to apply opcodes to a class member, an uint array but don't know how. This is what i've been trying to accomplish: class MyClass { private uint[] array; private void MyFunc() { asm { rol array[1], 8; rol array[2], 16 } } } The above code complains about type/size. However, it seems to work with single uint variables local to the function. I've tried moving the values to EBX and ECX then applying rol to these registers. The code compile and run but does nothing at all. Any ideas or guide? A good doc about asm with D? BIG THXS!!!
Feb 01 2011
Heinz: Not tested much: import std.stdio: writeln; class MyClass { uint[] array; this() { array = new typeof(array)(4); array = [10, 20, 30, 40]; } void myFunc() { version (D_InlineAsm_X86) { auto aptr = array.ptr; enum dsize = typeof(array[0]).sizeof; asm { mov EAX, aptr; mov ECX, [EAX + dsize * 1]; rol ECX, 8; mov [EAX + dsize * 1], ECX; mov ECX, [EAX + dsize * 2]; rol ECX, 16; mov [EAX+ dsize * 2], ECX; } } else assert(0); } } void main() { auto c = new MyClass(); writeln(c.array); c.myFunc(); writeln(c.array); } Bye, bearophile
Feb 01 2011
Wow, thanks for the reply. Changing the 'enum dsize' to 'uint dsize = 4;' seems to produce some results...i guess it's working now (i have no way to verify it is working but looks like bits are being rotated by rol). One thing, if i replace dsize with 4 in all lines, the code compiles but again it does nothing, weird uh? Is D_InlineAsm_X86 really needed? I saw it is listed in the predefined versions table. Is this version for custom use or is it used internally by the compiler to specify that inline ASM is available or not? Inline ASM is always available right? Anyway, thanks for the code. I was wondering: Is there any other way to directly use operands with class members, array items in this case? I mean, for variables local to functions i use the variables as if they were in a D code statement, but in your code we have to create local variables and move them between registers, the values are moved too. // LOCAL VARIABLE EXAMPLE void MyFunct() { uint temp; asm { rol temp, 8; } } If i'm going to create local pointers and constants and move them across registers then, to accomplish the same result, would it be better (or the same) to create 2 local ints, rol them and then move them to their respective place (this way the compiler might generate better code): // EXAMPLE class MyClass { private uint[] array; private void MyFunc() { uint a = array[0], b = array[1]; asm { rol a, 8; rol b, 16 } array[0] = a; array[1] = b; } } Thanks 4 everything!!!!!!!!!!!
Feb 01 2011
Heinz:(i have no way to verify it is working but looks like bits are being rotated by rol).<Create an unittest and test if the output-input pairs are correct.Is D_InlineAsm_X86 really needed?<Someday D will run on other CPUs, like 64 bit ones, so 32 bit X86 will not work. To avoid problems it's better to get used to protect asm blocks with that. So it's not necessary, but it's a good habit and it doesn't hurt.Inline ASM is always available right?<I think the (partial) D implementation for Dotnet didn't support asm. But even if all implementations support it, there are various kinds of CPUs, and each one needs a different asm.If i'm going to create local pointers and constants and move them across registers then, to accomplish the same result, would it be better (or the same) to create 2 local ints, rol them and then move them to their respective place (this way the compiler might generate better code):<When the D compiler sees asm code, it locally switches off optimizations. I suggest you to compile your two versions, and take a look at the asm the D compiler produces (with obj2asm or with a free disassembler on Linux). Bye, bearophile
Feb 01 2011
As bearophile notet, unittest are always good!! But for the start, I always liked to make some pretty-printing functions ... ... so this is my version import std.stdio; uint rotl_d(uint value,ubyte rotation){ return (value<<rotation) | (value>>(value.sizeof*8 - rotation)); } uint rotl_asm(uint value,ubyte rotation){ asm{ mov EAX, value; // get first argument mov CL , rotation; // how many bits to move rol EAX, CL; }// return with result in EAX } void bin_writeln(string info,uint value, bool nl){ writefln("%1s: %02$32b%3$s",info,value,nl?"\n":""); } int main(string[] argv){ uint a=0xc0def00d; bin_writeln("value a",a ,false); bin_writeln("value b",rotl_d(a,1),true); // bin_writeln("value a",a ,false); bin_writeln("value b",rotl_asm(a,1),true); return 0; } greets Matthias
Feb 02 2011
What about my edited version: import std.stdio; uint rotl_d(uint value,ubyte rotation){ return (value<<rotation) | (value>>(value.sizeof*8 - rotation)); } uint rotl_asm(uint value,ubyte rotation){ asm{ mov EAX, value; // get first argument mov CL , rotation; // how many bits to move rol EAX, CL; }// return with result in EAX } void bin_writeln(string info,uint value, bool nl){ writefln("%1s: %02$32b%3$s",info,value,nl?"\n":""); } int main(string[] argv){ uint a=0xc0def00d; bin_writeln("value a",a ,false); bin_writeln("value b",rotl_d(a,1),true); // bin_writeln("value a",a ,false); bin_writeln("value b",rotl_asm(a,1),true); uint b; ubyte c = 0; while ( 1 == 1 ) { // Press Ctrl + C to quit b = rotl_asm(0xc0def00d, c); foreach (rst; 0 .. 5_000 ) writef("%032b %2d\r",b, c ); c = cast(ubyte)( c + 1 == 32 ? 0 : c + 1 ); } return 0; }
Feb 02 2011
Thanks 4 the code (this goes to Matthias Pleh too): Both codes work amazingly fine after modifying them a bit to work with DMD1.030. This helped me a lot! Still don't know how the asm version implicitly returns a value (no return keyword needed). It seems that the returned value is EAX, not the variable "value". To return "value", first the content of EAX should be moved to "value", right? (mov value, EAX;) I found this site: http://www.swansontec.com/sregisters.html It helped me to resolve optimal register usage. It also details wich registers can use offsets. I finally ended with this ASM code: // I have to rotate every int of an uint[3]. For some reason i can't directly reference the array pointer so i create a local variable to the function. void myFunct() { uint* p = myarray.ptr; asm { mov EBX, p; mov EAX, [EBX + 4]; rol EAX, 8; mov [EBX + 4], EAX; mov EAX, [EBX + 8]; rol EAX, 16; mov [EBX + 8], EAX; mov EAX, [EBX + 12]; rol EAX, 24; mov [EBX + 12], EAX; } } Hope this helps someone else. Cheers. Heinz
Feb 03 2011
Heinz:void myFunct() { uint* p = myarray.ptr; asm { mov EBX, p; mov EAX, [EBX + 4]; rol EAX, 8; mov [EBX + 4], EAX; mov EAX, [EBX + 8]; rol EAX, 16; mov [EBX + 8], EAX; mov EAX, [EBX + 12]; rol EAX, 24; mov [EBX + 12], EAX; } }I see you have removed the asm guard I have shown you. I suggest you to benchmark it against another normal D function. Keep in mind that asm blocks kill inlining. Also try to perform a load-load-load processing-processing-processing store-store-store instead a load-processing-store load-processing-store load-processing-store, because this often helps the pipelining of the processor (expecially when you use SSE/AVX registers). Bye, bearophile
Feb 04 2011
bearophile, Thank you so much for all your help. It seems you're very into ASM. I kept the D_InlineAsm_X86 in my code as you suggested. The code i gave here was just an example. But my code's version implementation looks like this: version(D_InlineAsm_X86) { // ASM Code. } else { // D code. } This results in a much robust code. You were right about it. You are right too about the "load-load-load processing-processing-processing store-store-store instead a load-processing-store load-processing-store load-processing-store" thing. I'll modify my code to this model, though it will require to move some elements to the stack but no big deal, i think this won't hurt performance as it is designed to work this way. -Does ASM kill inlining for the function where the asm block is present or for the whole compilation? -In your opinion, How badly can be if function inlining is not present? some docs from the net: http://www.parashift.com/c++-faq-lite/inline-functions.html Cheers, Heinz
Feb 04 2011
Heinz:This results in a much robust code.That's the right way to do it, with a D fallback.You are right too about the "load-load-load processing-processing-processing store-store-store instead a load-processing-store load-processing-store load-processing-store" thing. I'll modify my code to this model, though it will require to move some elements to the stack but no big deal, i think this won't hurt performance as it is designed to work this way.<That may be slow, so you need to benchmark.-Does ASM kill inlining for the function where the asm block is present or for the whole compilation?<It prevents just the function that contains assembly to be inlined.-In your opinion, How badly can be if function inlining is not present?<Inlining is an important optimization if your function does very little, otherwise it's not important or it makes the code slower. In your function there are only few asm instructions, so inlining becomes important. This is why I suggest you to benchmark your asm code against equivalent D code compiled with -O -release -inline. The D code+inlining may turn out to be faster. Here this is the most probable outcome, in my opinion. If you use the LDC compiler there are two different ways (pragma(allow_inline) and inline asm expressions) to have inlining even when you use asm code, so the situation is better. Bye, bearophile
Feb 04 2011