digitalmars.D.learn - Mixin in Inline Assembly
- Chris M. (20/20) Jan 08 2017 Right now I'm working on a project where I'm implementing a VM in
- Stefan Koch (2/23) Jan 08 2017 Yes make the whole inline asm a mixin.
- Chris M. (2/5) Jan 08 2017 Awesome, got it working. Thanks to both replies.
- ketmar (5/7) Jan 08 2017 yep. iasm is completely independent from other fronted, it has
- Adam D. Ruppe (11/23) Jan 08 2017 '
- Basile B. (4/21) Jan 10 2017 don't forget to flag
- Guillaume Piolat (2/5) Jan 10 2017 Why?
- Basile B. (8/16) Jan 10 2017 It's an empirical observation. In september I tried to get why an
- Guillaume Piolat (2/19) Jan 10 2017 Interesting, thanks.
- Chris M (17/34) Jan 10 2017 Huh, that's really interesting, thanks for posting. I guess my
- Basile B. (2/26) Jan 10 2017 The game changer for the performances is just "nothrow".
- Era Scarecrow (4/7) Jan 10 2017 Suddenly reminds me some of the speedup assembly I was writing
- Guillaume Piolat (4/7) Jan 11 2017 I'm a taker if you have some algorithm to reuse 32-bit divide in
- Era Scarecrow (9/17) Jan 11 2017 I remember the divide was giving me some trouble. The idea was
- Era Scarecrow (62/64) May 22 2017 Decided I'd give my hand at writing a 'ScaledInt' which is
- Era Scarecrow (27/31) Jun 01 2017 More experiments and i think it comes down to static arrays.
- Era Scarecrow (11/14) Jun 02 2017 Well as a side note a simple yet not happy workaround is making
Right now I'm working on a project where I'm implementing a VM in D. I'm on the rotate instructions, and realized I could *almost* abstract the ror and rol instructions with the following function private void rot(string ins)(int *op1, int op2) { int tmp = *op1; asm { mov EAX, tmp; // I'd also like to know if I could just load *op1 directly into EAX mov ECX, op2[EBP]; mixin(ins ~ " EAX, CL;"); // Issue here mov tmp, EAX; } *op1 = tmp; } However, the inline assembler doesn't like me trying to do a mixin. Is there a way around this? (There is a reason op1 is a pointer instead of a ref int, please don't ask about it)
Jan 08 2017
On Monday, 9 January 2017 at 02:31:42 UTC, Chris M. wrote:Right now I'm working on a project where I'm implementing a VM in D. I'm on the rotate instructions, and realized I could *almost* abstract the ror and rol instructions with the following function private void rot(string ins)(int *op1, int op2) { int tmp = *op1; asm { mov EAX, tmp; // I'd also like to know if I could just load *op1 directly into EAX mov ECX, op2[EBP]; mixin(ins ~ " EAX, CL;"); // Issue here mov tmp, EAX; } *op1 = tmp; } However, the inline assembler doesn't like me trying to do a mixin. Is there a way around this? (There is a reason op1 is a pointer instead of a ref int, please don't ask about it)Yes make the whole inline asm a mixin.
Jan 08 2017
On Monday, 9 January 2017 at 02:38:01 UTC, Stefan Koch wrote:On Monday, 9 January 2017 at 02:31:42 UTC, Chris M. wrote:Awesome, got it working. Thanks to both replies.[...]Yes make the whole inline asm a mixin.
Jan 08 2017
On Monday, 9 January 2017 at 02:31:42 UTC, Chris M. wrote:However, the inline assembler doesn't like me trying to do a mixin.yep. iasm is completely independent from other fronted, it has it's own lexer, parser and so on. don't expect those things to work. the only way is to mixin the whole iasm block, including `asm{}`.
Jan 08 2017
On Monday, 9 January 2017 at 02:31:42 UTC, Chris M. wrote:asm { mov EAX, tmp; // I'd also like to know if I could just load *op1 directly into EAX mov ECX, op2[EBP]; mixin(ins ~ " EAX, CL;"); // Issue here mov tmp, EAX; } *op1 = tmp; } However, the inline assembler doesn't like me trying to do a mixin. Is there a way around this?' You should be able to break it up too asm { mov EAX, tmp; } mixin("asm { "~ ins ~ "EAX, CL;" }"); asm { move tmp, EAX; } you get the idea. It should compile to the same thing.
Jan 08 2017
On Monday, 9 January 2017 at 02:31:42 UTC, Chris M. wrote:Right now I'm working on a project where I'm implementing a VM in D. I'm on the rotate instructions, and realized I could *almost* abstract the ror and rol instructions with the following function private void rot(string ins)(int *op1, int op2) { int tmp = *op1; asm { mov EAX, tmp; // I'd also like to know if I could just load *op1 directly into EAX mov ECX, op2[EBP]; mixin(ins ~ " EAX, CL;"); // Issue here mov tmp, EAX; } *op1 = tmp; }don't forget to flag asm pure nothrow {} otherwise it's slow.
Jan 10 2017
On Tuesday, 10 January 2017 at 10:41:54 UTC, Basile B. wrote:don't forget to flag asm pure nothrow {} otherwise it's slow.Why?
Jan 10 2017
On Tuesday, 10 January 2017 at 11:38:43 UTC, Guillaume Piolat wrote:On Tuesday, 10 January 2017 at 10:41:54 UTC, Basile B. wrote:It's an empirical observation. In september I tried to get why an inline asm function was slow. What happened was that I didn't mark the asm block as nothrow https://forum.dlang.org/post/xznocpxtalpayvkrwxey forum.dlang.org I opened an issue asking the specifications to explain that clearly.don't forget to flag asm pure nothrow {} otherwise it's slow.Why?
Jan 10 2017
On Tuesday, 10 January 2017 at 13:13:17 UTC, Basile B. wrote:On Tuesday, 10 January 2017 at 11:38:43 UTC, Guillaume Piolat wrote:Interesting, thanks.On Tuesday, 10 January 2017 at 10:41:54 UTC, Basile B. wrote:It's an empirical observation. In september I tried to get why an inline asm function was slow. What happened was that I didn't mark the asm block as nothrow https://forum.dlang.org/post/xznocpxtalpayvkrwxey forum.dlang.org I opened an issue asking the specifications to explain that clearly.don't forget to flag asm pure nothrow {} otherwise it's slow.Why?
Jan 10 2017
On Tuesday, 10 January 2017 at 13:13:17 UTC, Basile B. wrote:On Tuesday, 10 January 2017 at 11:38:43 UTC, Guillaume Piolat wrote:Huh, that's really interesting, thanks for posting. I guess my other question would be how do I determine if a block of assembly is pure? I also figured out moving *op1 directly into RAX, guess it makes sense that a 64-bit value would need a 64-bit register :) private void rot(string ins)(int *op1, int op2) { mixin(" asm { mov RAX, op1; mov ECX, op2[EBP];" ~ ins ~ " [RAX], CL; } "); }On Tuesday, 10 January 2017 at 10:41:54 UTC, Basile B. wrote:It's an empirical observation. In september I tried to get why an inline asm function was slow. What happened was that I didn't mark the asm block as nothrow https://forum.dlang.org/post/xznocpxtalpayvkrwxey forum.dlang.org I opened an issue asking the specifications to explain that clearly.don't forget to flag asm pure nothrow {} otherwise it's slow.Why?
Jan 10 2017
On Wednesday, 11 January 2017 at 00:11:50 UTC, Chris M wrote:On Tuesday, 10 January 2017 at 13:13:17 UTC, Basile B. wrote:The game changer for the performances is just "nothrow".On Tuesday, 10 January 2017 at 11:38:43 UTC, Guillaume Piolat wrote:Huh, that's really interesting, thanks for posting. I guess my other question would be how do I determine if a block of assembly is pure?On Tuesday, 10 January 2017 at 10:41:54 UTC, Basile B. wrote:It's an empirical observation. In september I tried to get why an inline asm function was slow. What happened was that I didn't mark the asm block as nothrow https://forum.dlang.org/post/xznocpxtalpayvkrwxey forum.dlang.org I opened an issue asking the specifications to explain that clearly.don't forget to flag asm pure nothrow {} otherwise it's slow.Why?
Jan 10 2017
On Tuesday, 10 January 2017 at 10:41:54 UTC, Basile B. wrote:don't forget to flag asm pure nothrow {} otherwise it's slow.Suddenly reminds me some of the speedup assembly I was writing for wideint, but seems I lost my code. too bad, the 128bit multiply had sped up and the division needed some work.
Jan 10 2017
On Wednesday, 11 January 2017 at 06:14:35 UTC, Era Scarecrow wrote:Suddenly reminds me some of the speedup assembly I was writing for wideint, but seems I lost my code. too bad, the 128bit multiply had sped up and the division needed some work.I'm a taker if you have some algorithm to reuse 32-bit divide in wideint division instead of scanning bits :)
Jan 11 2017
On Wednesday, 11 January 2017 at 15:39:49 UTC, Guillaume Piolat wrote:On Wednesday, 11 January 2017 at 06:14:35 UTC, Era Scarecrow wrote:I remember the divide was giving me some trouble. The idea was to try and use the built in registers and limits of the assembly to take advantage of full 128bit division, unfortunately if the result is too large to fit in a 64bit result it breaks, rather than giving me half the result and letting me work with it. Still I think I'll impliment my own version and then if it's faster I'll submit it.Suddenly reminds me some of the speedup assembly I was writing for wideint, but seems I lost my code. too bad, the 128bit multiply had sped up and the division needed some work.I'm a taker if you have some algorithm to reuse 32-bit divide in wideint division instead of scanning bits :)
Jan 11 2017
On Wednesday, 11 January 2017 at 17:32:35 UTC, Era Scarecrow wrote:Still I think I'll impliment my own version and then if it's faster I'll submit it.Decided I'd give my hand at writing a 'ScaledInt' which is intended to basically allow any larger unsigned type. Coming across some assembly confusion. Using mixin with assembly here's the 'result' of the mixin (as a final result) alias UCent = ScaledInt!(uint, 4); struct ScaledInt(I, int Size) if (isUnsigned!(I) && Size > 1) { I[Size] val; ScaledInt opBinary(string op)(const ScaledInt rhs) const if (op == "+") { ScaledInt t; asm pure nothrow { //mixin generated from another function, for simplicity mov EBX, this; clc; mov EAX, rhs[EBP+0]; adc EAX, val[EBX+0]; mov t[EBP+0], EAX; mov EAX, rhs[EBP+4]; adc EAX, val[EBX+4]; mov t[EBP+4], EAX; mov EAX, rhs[EBP+8]; adc EAX, val[EBX+8]; mov t[EBP+8], EAX; mov EAX, rhs[EBP+12]; adc EAX, val[EBX+12]; mov t[EBP+12], EAX; } return t; } } Raw disassembly for my asm code shows this: mov EBX,-4[EBP] clc mov EAX,0Ch[EBP] adc EAX,[EBX] mov -014h[EBP],EAX mov EAX,010h[EBP] adc EAX,4[EBX] mov -010h[EBP],EAX mov EAX,014h[EBP] adc EAX,8[EBX] mov -0Ch[EBP],EAX mov EAX,018h[EBP] adc EAX,0Ch[EBX] mov -8[EBP],EAX From what I'm seeing, it should be 8, 0ch, 10h, then 14h, all positive. I'm really scratching my head why I'm having this issue... Doing an add of t[0] = val[0] + rhs[0]; i get this disassembly: mov EDX,-4[EBP] //mov EDX, this; mov EBX,[EDX] //val[0] add EBX,0Ch[EBP]//+ rhs.val[0] mov ECX,8[EBP] //mov ECX, ???[???] mov [ECX],EBX //t.val[0] = If i do "mov ECX,t[EBP]", i get "mov ECX,-014h[EBP]". If i try to reference the exact variable val within t, it complains it doesn't know it at compiler-time (although it's a fixed location). What am i missing here?
May 22 2017
On Tuesday, 23 May 2017 at 03:33:38 UTC, Era Scarecrow wrote:From what I'm seeing, it should be 8, 0ch, 10h, then 14h, all positive. I'm really scratching my head why I'm having this issue... What am i missing here?More experiments and i think it comes down to static arrays. The following function code int[4] fun2() { int[4] x = void; asm { mov dword ptr x, 100; } x[0] = 200; //get example of real offset return x; } Produces the following (from obj2asm) int[4] x.fun2() comdat assume CS:int[4] x.fun2() enter 014h,0 mov -4[EBP],EAX mov dword ptr -014h[EBP],064h mov EAX,-4[EBP] mov dword ptr [EAX],0C8h // x[0]=200, offset +0 mov EAX,-4[EBP] leave ret int[4] x.fun2() ends So why is the offset off by 14h (20 bytes)? It's not like we need a to set a ptr first. Go figure i probably found a bug...
Jun 01 2017
On Thursday, 1 June 2017 at 12:00:45 UTC, Era Scarecrow wrote:So why is the offset off by 14h (20 bytes)? It's not like we need a to set a ptr first. Go figure i probably found a bug...Well as a side note a simple yet not happy workaround is making a new array slice of the memory and then using that pointer directly. Looking at the intel opcode and memory call conventions, I could have used a very compact intel set and scaling. Instead I'm forced to ignore scaling, and I'm also forced to push/pop the flags to save the carry when advancing the two pointers in parallel. Plus there's 3 instructions that don't need to be there. Yeah this is probably nitpicking... I can't help wanting to be as optimized and small as possible.
Jun 02 2017