digitalmars.D.learn - Help needed on inline assembly
- Hendrik Renken (40/40) Jan 30 2008 Hi,
- downs (5/23) Jan 30 2008 Nope :)
- Hendrik Renken (13/36) Jan 30 2008 ok. yeah. didnt posted the right example (i have a bunch of
- downs (5/25) Jan 30 2008 The problem is that parameters to movaps need to be aligned on a 16-byte...
- Jarrett Billingsley (25/35) Jan 30 2008 If you're using the newest DMD, this should work, it does for me. If yo...
- Jarrett Billingsley (22/40) Jan 30 2008 A third way is to wrap the second way in a function, allowing you to
- Hendrik Renken (6/54) Jan 30 2008 Now i've updated to 1.026. And the above example doesnt work (still not
- Jarrett Billingsley (6/8) Jan 30 2008 Ah, it's probably because of Linux. That code works on Windows. I forg...
- Hendrik Renken (10/21) Jan 31 2008 i did some more testing, it seems that dynamically allocated data is
- Don Clugston (11/32) Jan 31 2008 That's what I found on Windows, and persuaded Walter to fix it. I didn't...
Hi, i'd like to work with the SSE-commands in assembly. I wrote some testroutines (with my limited knowledge). Some of them work, others dont. I'd like to know, why's that so. Can someone of you guys help me out? First thing: #void main() doesnt work. uncomment the line // float t; and it works. Why? Does the assembler code need to be aligned to something? When yes, how can i do this without the need of allocating another float on the stack? Second thing: I dont know how to address a public const variable, with my limited knowledge i would do something like this: #float[4] array = [ 1f, 2f, 3f, 4f ]; #void main() But i get a secfault. Why? I would interpret the code like this: a holds the address to the first arrayelement. moc EAX, [a]; copies the address to the first arrayelement to EAX. Which is then used to access the array. Im using DMD 1.024 (since later versions broke derelict on my platform)
Jan 30 2008
#float[4] array = [ 1f, 2f, 3f, 4f ]; #void main() But i get a secfault. Why? I would interpret the code like this: a holds the address to the first arrayelement. moc EAX, [a]; copies the address to the first arrayelement to EAX.Nope :) Remember, [] dereferences. mov EAX, [a] copies the value, i.e. "a dereferenced", to EAX. So EAX now contains the first value in array, 1f. Trying to dereference that floating point number leads understandably to a segfault. --downs
Jan 30 2008
downs schrieb:ok. yeah. didnt posted the right example (i have a bunch of asm-test-files). But this doesnt work either: float[4] array = [ 1f, 2f, 3f, 4f ]; void main() { float* a = &array[0]; asm { mov EAX, a; movaps XMM1, [EAX]; } }#float[4] array = [ 1f, 2f, 3f, 4f ]; #void main() But i get a secfault. Why? I would interpret the code like this: a holds the address to the first arrayelement. moc EAX, [a]; copies the address to the first arrayelement to EAX.Nope :) Remember, [] dereferences. mov EAX, [a] copies the value, i.e. "a dereferenced", to EAX. So EAX now contains the first value in array, 1f. Trying to dereference that floating point number leads understandably to a segfault.
Jan 30 2008
The problem is that parameters to movaps need to be aligned on a 16-byte boundary. That's what the 'a' in movaps means, "aligned". So you can either use movups (unaligned), which is slower, or explicitly allocate your memory to lie on a 16-byte boundary. Example:Hope it helps. --downsimport std.gc, std.stdio; void* malloc_align16(size_t count) { void* res = malloc(count+15).ptr; return cast(void*) ((cast(size_t)(res + 15))&(0xFFFFFFFF - 15)); } float[4] array = [ 1f, 2f, 3f, 4f ]; void main() { auto _array = (cast(float*)malloc_align16(4*float.sizeof))[0 .. 4]; _array[] = array; auto a = &_array[0]; asm { mov EAX, a; movaps XMM1, [EAX]; } writefln("Done"); }
Jan 30 2008
"Hendrik Renken" <funsheep -[no-spam]-gmx.net> wrote in message news:fnprum$6oh$1 digitalmars.com...float[4] array = [ 1f, 2f, 3f, 4f ]; void main() { float* a = &array[0]; asm { mov EAX, a; movaps XMM1, [EAX]; } }If you're using the newest DMD, this should work, it does for me. If you're using anything older than 1.023 (like, hm, 1.015? GRRGH), this will probably fail. 1.023 made anything in the static data segment >= 16 bytes paragraph aligned, so that data is already aligned properly. I don't know what GDC does in this case. Another way to get an aligned allocation is to use a struct with the float[4] in it. struct vec { float[4] array; } void main() { vec* v = new vec; // ptr will get you the pointer to the 0th element too float* a = v.array.ptr; asm { mov EAX, a; movaps XMM1, [EAX]; } } This also doesn't rely on any standard library stuff.
Jan 30 2008
"Jarrett Billingsley" <kb3ctd2 yahoo.com> wrote in message news:fnq21n$mvk$1 digitalmars.com...Another way to get an aligned allocation is to use a struct with the float[4] in it. struct vec { float[4] array; } void main() { vec* v = new vec; // ptr will get you the pointer to the 0th element too float* a = v.array.ptr; asm { mov EAX, a; movaps XMM1, [EAX]; } } This also doesn't rely on any standard library stuff.A third way is to wrap the second way in a function, allowing you to allocate statically-sized arrays directly: T* alloc(T)() { struct S { T t; } return &(new S).t; } void main() { float[4]* array = alloc!(float[4]); float* a = array.ptr; asm { mov EAX, a; movaps XMM1, [EAX]; } }
Jan 30 2008
Jarrett Billingsley schrieb:"Hendrik Renken" <funsheep -[no-spam]-gmx.net> wrote in message news:fnprum$6oh$1 digitalmars.com...Now i've updated to 1.026. And the above example doesnt work (still not aligned) movups works.float[4] array = [ 1f, 2f, 3f, 4f ]; void main() { float* a = &array[0]; asm { mov EAX, a; movaps XMM1, [EAX]; } }If you're using the newest DMD, this should work, it does for me.If you're using anything older than 1.023 (like, hm, 1.015? GRRGH), this will probably fail.Yeah. I used 1.015 before...1.023 made anything in the static data segment >= 16 bytes paragraph aligned, so that data is already aligned properly.Doesnt seem to work for me (using DMD 1.026 on linux). Or the aligment is in 1.026 broken again...I don't know what GDC does in this case. Another way to get an aligned allocation is to use a struct with the float[4] in it. struct vec { float[4] array; } void main() { vec* v = new vec; // ptr will get you the pointer to the 0th element too float* a = v.array.ptr; asm { mov EAX, a; movaps XMM1, [EAX]; } } This also doesn't rely on any standard library stuff.
Jan 30 2008
"Hendrik Renken" <funsheep -[no-spam]-gmx.net> wrote in message news:fnqet1$1on0$1 digitalmars.com...Doesnt seem to work for me (using DMD 1.026 on linux). Or the aligment is in 1.026 broken again...Ah, it's probably because of Linux. That code works on Windows. I forgot that DMD uses ELF on Linux like GDC. DMD maybe can't control the alignment of the data there. Either that, or it's a genuine bug. :\
Jan 30 2008
Jarrett Billingsley wrote:"Hendrik Renken" <funsheep -[no-spam]-gmx.net> wrote in message news:fnqet1$1on0$1 digitalmars.com...i did some more testing, it seems that dynamically allocated data is aligned. for that i can use movaps. however statically allocated data is not aligned. but we can allocate 1 to 3 ints/floats/etc before the data, until it is aligned ;) thanks for the help, got it working - and a speedup factor of 240! yeah. from 480 millisec down to 2 millisec with sse instructions. that rocks! regards HendrikDoesnt seem to work for me (using DMD 1.026 on linux). Or the aligment is in 1.026 broken again...Ah, it's probably because of Linux. That code works on Windows. I forgot that DMD uses ELF on Linux like GDC. DMD maybe can't control the alignment of the data there. Either that, or it's a genuine bug. :\
Jan 31 2008
Hendrik Renken wrote:Jarrett Billingsley wrote:That's what I found on Windows, and persuaded Walter to fix it. I didn't realise it wasn't working on Linux yet. I hope that eventually we'll get stack data properly aligned; if it gets into the D ABI, then we only have to worry about callbacks from C -- ie, only extern() functions would need to align the stack."Hendrik Renken" <funsheep -[no-spam]-gmx.net> wrote in message news:fnqet1$1on0$1 digitalmars.com...i did some more testing, it seems that dynamically allocated data is aligned. for that i can use movaps. however statically allocated data is not aligned. but we can allocate 1 to 3 ints/floats/etc before the data, until it is aligned ;)Doesnt seem to work for me (using DMD 1.026 on linux). Or the aligment is in 1.026 broken again...Ah, it's probably because of Linux. That code works on Windows. I forgot that DMD uses ELF on Linux like GDC. DMD maybe can't control the alignment of the data there. Either that, or it's a genuine bug. :\thanks for the help, got it working - and a speedup factor of 240! yeah. from 480 millisec down to 2 millisec with sse instructions. that rocks!Oh yeah! DMD's floating point code generation is very basic; it does almost no optimisation; and it's excellent support for inline asm makes asm particularly attractive. But a factor of 240 is pretty extreme.
Jan 31 2008