digitalmars.D.learn - asm woes...

Era Scarecrow (87/87) May 27 2016 Well decided I should dig my hand in assembly just to see if it

Era Scarecrow (9/11) May 27 2016 Hmmm it just occurs to me I made a big assumption. I assumed

Guillaume Piolat (28/34) May 27 2016 You have to write your code three times, one for

Era Scarecrow (15/34) May 27 2016 If longs are emulated, then only X86_64 and without assembly

Era Scarecrow (2/5) May 27 2016 Nope, still hangs...

Guillaume Piolat (3/8) May 27 2016 We can't know why your code hangs if you don't post any code.

Era Scarecrow (4/7) May 27 2016 Considering I'd have to include the whole of wideint.d, that is

Era Scarecrow (28/32) May 28 2016 Rather than make a new thread I wonder if struct inheritance

ZombineDev (37/71) May 28 2016 The great thing about D's UFCS is that it allows exactly that:

Era Scarecrow (19/26) May 28 2016 Hmmm if it wasn't wideint being template I'd agree with you.

rikki cattermole (18/101) May 27 2016 Me and p0nce solved this on IRC.

Era Scarecrow (4/21) May 27 2016 This is good progress. Using the assembler doesn't have many

Guillaume Piolat (4/8) May 27 2016 Referencing EBP or ESP yourself is indeed dangerous. Not sure why

Marco Leise (9/12) May 31 2016 DMD makes sure that the EBP relative access of parameters and

Era Scarecrow (16/31) May 27 2016 Hmmm actually this is incorrect...

Era Scarecrow (11/12) May 27 2016 So just tested it, and it didn't hang, meaning all unittests

Marco Leise (38/54) May 31 2016 The 'this' pointer is usually in some register already. On

Era Scarecrow (11/19) May 31 2016 The AX register seems like a bad choice, since you require the

Era Scarecrow <rtcvb32 yahoo.com> writes:

  Well decided I should dig my hand in assembly just to see if it 
would work. Using wideint.d as a starting point I thought I would 
do the simplest operation I could do, an increment.

   
https://github.com/d-gamedev-team/gfm/blob/master/integers/gfm/integers/wideint.d
   https://dlang.org/spec/iasm.html

  Most of my code was failing outright until I looked at the 
integrated assembler page, which TDPL doesn't go into at all. To 
access variables for example I have to do var[ESP] or var[RSP] to 
access it from the stack frame. Unintuitive, but sure I can work 
with it.

  So the code for incrementing is pretty simple...

    nogc void increment() pure nothrow
     ++lo;
     if (lo == 0) ++hi;
   }

  That's pretty simple to work with. I know the assembly 
instructions can be done 1 of 2 ways.

    add lo, 1
    adc hi, 0

  OR

    inc lo
    jnc L1 //jump if not carry
    inc hi


  So I've tried. Considering the wideint basically is self calling 
if you want to make a larger type than 128bit, then that means I 
need to leave the original code alone if it's a type that's too 
large, but only inject assembly if it's the right time and size. 
Thankfully bits is there to tell us.

So, add version
    nogc void increment() pure nothrow
   {
     static if (bits > 128) {
       ++lo;
       if (lo == 0) ++hi;
     } else {
       version(X86) {
         asm pure  nogc nothrow {
           add lo[ESP], 1;
           adc hi[ESP], 0;
         }
       } else {
         ++lo;
         if (lo == 0) ++hi;
       }
     }
   }

  I compile and get: Error: asm statements cannot be interpreted 
at compile time

  The whole thing now fails, rather than compiling to do the 
unittests... Doing the inc version gives the same error..

         asm pure  nogc nothrow {
           inc lo[ESP];
           jnc L1;
           inc hi[ESP];
           L1:;
         }

  Naturally it wasn't very specific about if I should rely on RSP 
or ESP or what, but since it's X86 rather than X86_64 I guess 
that answers it... would be easy to write the x64 version, if it 
would let me.

  So i figure i put a check for __ctfe and that will avoid the 
assembly calls if that's the case. So...

     version(X86) {
        nogc void increment() pure nothrow
       {
         if (!__ctfe && bits == 128) {
           asm pure  nogc nothrow {
             add lo[ESP], 1;
             adc hi[ESP], 0;
           }
         } else {
           ++lo;
           if (lo == 0) ++hi;
         }
       }
     } else {
       //original declaration
     }

  Now it compiles, however it hangs the program when doing the 
unittest. Why does it hang the program? I have no clue. Tried 
changing the ESP to EBP just in case that was actually what it 
wanted, but doesn't seem to be the case. I can tell how I will be 
refactoring the code, assuming i can figure out what's wrong in 
the first place...

  Anyone with inline assembly experience who can help me out a 
little? 2 add instructions shouldn't cause it to hang...

May 27 2016

Era Scarecrow <rtcvb32 yahoo.com> writes:

On Friday, 27 May 2016 at 08:20:02 UTC, Era Scarecrow wrote:
  Anyone with inline assembly experience who can help me out a 
 little? 2 add instructions shouldn't cause it to hang...

  Hmmm it just occurs to me I made a big assumption. I assumed 
that if the CPU supports 64bit operations, that it would be 
compiled to use 64bit registers when possible. I'm assuming this 
is not the case. As such the tests I was doing will probably be 
of little help _unless_ it was X86_64 code, or a check that 
verifies it's 64bit hardware?

  Does this mean the 64bit types are emulated rather than using 
hardware?

May 27 2016

Guillaume Piolat <first.last gmail.com> writes:

On Friday, 27 May 2016 at 09:11:01 UTC, Era Scarecrow wrote:
  Hmmm it just occurs to me I made a big assumption. I assumed 
 that if the CPU supports 64bit operations, that it would be 
 compiled to use 64bit registers when possible. I'm assuming 
 this is not the case. As such the tests I was doing will 
 probably be of little help _unless_ it was X86_64 code, or a 
 check that verifies it's 64bit hardware?

You have to write your code three times, one for

version(D_InlineAsm_X86)
version (D_InlineAsm_X86_64)
and a version without assembly.

In rare cases you can merge D_InlineAsm_X86 and 
D_InlineAsm_X86_64 versions. D provides unfortunately less 
support to write code that is valid in both compared to C++! This 
causes lots of duplication </rant>


TBH I don't know how to access members in assembly, I think you 
shouldn't ever do that. It will depend heavily on the particular 
calling convention called.
Just put these fields in local variables.

void increment()
{
     auto lo_local = lo;
     auto hi_local = hi;
     asm
     {
         add dword ptr lo_local, 1;
         adc dword ptr hi_local, 0;
     }
     lo = lo_local;
     hi = hi_local;
}

The compiler will replace with the right register-indexed stuff.
But honestly I doubt it will be any faster because on the other 
hand you mess with the optimizer.

May 27 2016

Era Scarecrow <rtcvb32 yahoo.com> writes:

On Friday, 27 May 2016 at 09:22:49 UTC, Guillaume Piolat wrote:
 On Friday, 27 May 2016 at 09:11:01 UTC, Era Scarecrow wrote:
  Hmmm it just occurs to me I made a big assumption. I assumed 
 that if the CPU supports 64bit operations, that it would be 
 compiled to use 64bit registers when possible. I'm assuming 
 this is not the case. As such the tests I was doing will  
 probably be of little help _unless_ it was X86_64 code, or a 
 check that verifies it's 64bit hardware?

 You have to write your code three times, one for

 version(D_InlineAsm_X86)
 version (D_InlineAsm_X86_64)
 and a version without assembly.

  If longs are emulated, then only X86_64 and without assembly 
would be considered, as there would be no benefit to doing the 
X86 version. If i can do it, the two will be identical, except 
for which stack register is used. (A lot of wasted space for so 
little to add).

 TBH I don't know how to access members in assembly, I think you 
 shouldn't ever do that. It will depend heavily on the 
 particular calling convention called.
 Just put these fields in local variables.

 <snip>

 The compiler will replace with the right register-indexed stuff.
 But honestly I doubt it will be any faster because on the other 
 hand you mess with the optimizer.

  Hmmm tried it as you have it listed. Still hangs. Tried it 
directly with qword with and without [ESP], still hangs.

  The listed inline assembler here on Dlang says to use 
'variableName[ESP]', which then becomes obvious it's a variable 
and even probably inserts type-size information as appropriate. 
Although I did it manually as you had listed but it still hangs. 
I suppose there's the requirement to have a register pointing to 
this, which then would be mov EAX, this, and then add lo[EAX], 
1...

May 27 2016

Era Scarecrow <rtcvb32 yahoo.com> writes:

On Friday, 27 May 2016 at 09:39:36 UTC, Era Scarecrow wrote:
 I suppose there's the requirement to have a register pointing 
 to this, which then would be mov EAX, this, and then add 
 lo[EAX], 1...

  Nope, still hangs...

May 27 2016

Guillaume Piolat <first.last gmail.com> writes:

On Friday, 27 May 2016 at 09:44:47 UTC, Era Scarecrow wrote:
 On Friday, 27 May 2016 at 09:39:36 UTC, Era Scarecrow wrote:
 I suppose there's the requirement to have a register pointing 
 to this, which then would be mov EAX, this, and then add 
 lo[EAX], 1...

  Nope, still hangs...

We can't know why your code hangs if you don't post any code.

https://dpaste.dzfl.pl/4026d9e6d3c0

May 27 2016

Era Scarecrow <rtcvb32 yahoo.com> writes:

On Friday, 27 May 2016 at 09:51:36 UTC, Guillaume Piolat wrote:
 On Friday, 27 May 2016 at 09:44:47 UTC, Era Scarecrow wrote:
  Nope, still hangs...

 We can't know why your code hangs if you don't post any code.

  Considering I'd have to include the whole of wideint.d, that is 
highly implausible to do. But already got a possible answer which 
I'm about to test.

May 27 2016

Era Scarecrow <rtcvb32 yahoo.com> writes:

On Friday, 27 May 2016 at 09:22:49 UTC, Guillaume Piolat wrote:
 You have to write your code three times, one for

 version(D_InlineAsm_X86)
 version (D_InlineAsm_X86_64)
 and a version without assembly.

  Rather than make a new thread I wonder if struct inheritance 
wouldn't solve this, as trying to manage specific versions, lack 
of versions, checks for CTFE all became a headache. and bloated a 
4 line function (2 of which were the opening/declaration) to 
something like 20 lines and looks like a huge mess.

  So...

  Let's assume structs as they are don't (otherwise) change.
  Let's assume structs can be inherited.
  Let's assume inherited structs change _behavior_ only 
(overridden functions as final), but don't add/expand any new 
data (non-polymorphic, no vtables).

  Then I could do something like this!

   //contains plain portable version
   struct base {}

   version(X86) {
     struct inherited : base {
       //only adds or replaces functions, no data changes
       //all asm injection is known to be 32bit x86
     }
   }
   version(X86_64) {
     ...
   }

  Truthfully going with my example, only a couple functions would 
be considered, namely multiply and divide as they would be the 
slowest ones, while everything else has very little to improve 
on, at least based on how wideint.d was implemented.

May 28 2016

ZombineDev <petar.p.kirov gmail.com> writes:

On Saturday, 28 May 2016 at 08:10:50 UTC, Era Scarecrow wrote:
 On Friday, 27 May 2016 at 09:22:49 UTC, Guillaume Piolat wrote:
 You have to write your code three times, one for

 version(D_InlineAsm_X86)
 version (D_InlineAsm_X86_64)
 and a version without assembly.

  Rather than make a new thread I wonder if struct inheritance 
 wouldn't solve this, as trying to manage specific versions, 
 lack of versions, checks for CTFE all became a headache. and 
 bloated a 4 line function (2 of which were the 
 opening/declaration) to something like 20 lines and looks like 
 a huge mess.

  So...

  Let's assume structs as they are don't (otherwise) change.
  Let's assume structs can be inherited.
  Let's assume inherited structs change _behavior_ only 
 (overridden functions as final), but don't add/expand any new 
 data (non-polymorphic, no vtables).

  Then I could do something like this!

   //contains plain portable version
   struct base {}

   version(X86) {
     struct inherited : base {
       //only adds or replaces functions, no data changes
       //all asm injection is known to be 32bit x86
     }
   }
   version(X86_64) {
     ...
   }

  Truthfully going with my example, only a couple functions 
 would be considered, namely multiply and divide as they would 
 be the slowest ones, while everything else has very little to 
 improve on, at least based on how wideint.d was implemented.

The great thing about D's UFCS is that it allows exactly that:

void main()
{
     WideInt myInt;
     myInt.inc(); // looks like a member function
     myInt++; // can be hidden behind operator overloading
}

struct WideInt
{
     ulong[2] data;

     int opUnary(string s)()
     {
         static if (s == "++")
             this.inc();
     }
}

version(D_InlineAsm_X86_64)
{
     void inc(ref WideInt w) { /* 32-bit increment implementation 
*/ }
}
else version(D_InlineAsm_X86)
{
     void inc(ref WideInt w) { /* 64-bit increment implementation 
*/ }
}
else
{
     void inc(ref WideInt w) { /* generic increment implementation 
*/ }
}

Also, you can implement inc() in terms of ulong[2] - void inc(ref 
ulong[2] w), which makes it applicable for other types, with the 
same memory representation.
E.g. cent - (cast(ulong[2]*)&cent).inc(), arrays - ulong[] arr; 
arr[0..2].inc(), and so on.

May 28 2016

Era Scarecrow <rtcvb32 yahoo.com> writes:

On Saturday, 28 May 2016 at 10:10:19 UTC, ZombineDev wrote:
 The great thing about D's UFCS is that it allows exactly that:

 <snip>

 Also, you can implement inc() in terms of ulong[2] - void 
 inc(ref ulong[2] w), which makes it applicable for other types, 
 with the same memory representation. E.g. cent - 
 (cast(ulong[2]*)&cent).inc(), arrays - ulong[] arr; 
 arr[0..2].inc(), and so on.

  Hmmm if it wasn't wideint being template I'd agree with you. 
Then again the way you have it listed the increment would 
probably call the version that's generated and doesn't require 
specific template instantiation to work.

  I don't know, personally to me it makes more sense to replace 
functions rather than export them and add an unknown generated 
type. If inherited structs worked (as i have them listed) then 
you could export all the CPU specific code to another file and 
never have to even know it exists. And if my impressions of code 
management and portability are accurate, then having 
OS/Architecture specific details should be separate from what is 
openly shared.

  Besides I'd like to leave the original source completely 
untouched if i can while applying updates/changes that don't add 
any confusion to the existing source code; Plus it's more an 
opt-in option at that point where you can hopefully have both 
active at the same time to unittest one against the other. (A is 
known to be correct, so B's output is tested against A).

May 28 2016

rikki cattermole <rikki cattermole.co.nz> writes:

On 27/05/2016 8:20 PM, Era Scarecrow wrote:
  Well decided I should dig my hand in assembly just to see if it would
 work. Using wideint.d as a starting point I thought I would do the
 simplest operation I could do, an increment.


 https://github.com/d-gamedev-team/gfm/blob/master/integers/gfm/integers/wideint.d

   https://dlang.org/spec/iasm.html

  Most of my code was failing outright until I looked at the integrated
 assembler page, which TDPL doesn't go into at all. To access variables
 for example I have to do var[ESP] or var[RSP] to access it from the
 stack frame. Unintuitive, but sure I can work with it.

  So the code for incrementing is pretty simple...

    nogc void increment() pure nothrow
     ++lo;
     if (lo == 0) ++hi;
   }

  That's pretty simple to work with. I know the assembly instructions can
 be done 1 of 2 ways.

    add lo, 1
    adc hi, 0

  OR

    inc lo
    jnc L1 //jump if not carry
    inc hi


  So I've tried. Considering the wideint basically is self calling if you
 want to make a larger type than 128bit, then that means I need to leave
 the original code alone if it's a type that's too large, but only inject
 assembly if it's the right time and size. Thankfully bits is there to
 tell us.

 So, add version
    nogc void increment() pure nothrow
   {
     static if (bits > 128) {
       ++lo;
       if (lo == 0) ++hi;
     } else {
       version(X86) {
         asm pure  nogc nothrow {
           add lo[ESP], 1;
           adc hi[ESP], 0;
         }
       } else {
         ++lo;
         if (lo == 0) ++hi;
       }
     }
   }

  I compile and get: Error: asm statements cannot be interpreted at
 compile time

  The whole thing now fails, rather than compiling to do the unittests...
 Doing the inc version gives the same error..

         asm pure  nogc nothrow {
           inc lo[ESP];
           jnc L1;
           inc hi[ESP];
           L1:;
         }

  Naturally it wasn't very specific about if I should rely on RSP or ESP
 or what, but since it's X86 rather than X86_64 I guess that answers
 it... would be easy to write the x64 version, if it would let me.

  So i figure i put a check for __ctfe and that will avoid the assembly
 calls if that's the case. So...

     version(X86) {
        nogc void increment() pure nothrow
       {
         if (!__ctfe && bits == 128) {
           asm pure  nogc nothrow {
             add lo[ESP], 1;
             adc hi[ESP], 0;
           }
         } else {
           ++lo;
           if (lo == 0) ++hi;
         }
       }
     } else {
       //original declaration
     }

  Now it compiles, however it hangs the program when doing the unittest.
 Why does it hang the program? I have no clue. Tried changing the ESP to
 EBP just in case that was actually what it wanted, but doesn't seem to
 be the case. I can tell how I will be refactoring the code, assuming i
 can figure out what's wrong in the first place...

  Anyone with inline assembly experience who can help me out a little? 2
 add instructions shouldn't cause it to hang...

Me and p0nce solved this on IRC.

struct Foo {
         int x;

         void foobar() {
                 asm {
                         mov EAX, this;
                         inc [EAX+Foo.x.offsetof];
                 }
         }
}

void main() {
         import std.stdio;

         Foo foo = Foo(8);
         foo.foobar;

         writeln(foo.x);
}

You have to reference the field via a register.

May 27 2016

Era Scarecrow <rtcvb32 yahoo.com> writes:

On Friday, 27 May 2016 at 09:51:56 UTC, rikki cattermole wrote:
 Me and p0nce solved this on IRC.

 struct Foo {
   int x;

   void foobar() {
     asm {
       mov EAX, this;
       inc [EAX+Foo.x.offsetof];
     }
   }
 }

 void main() {
   import std.stdio;

   Foo foo = Foo(8);
   foo.foobar;

   writeln(foo.x);
 }

 You have to reference the field via a register.

  This is good progress. Using the assembler doesn't have many 
documentation examples of how to do things, guess the x[ESP] 
example was totally useless on the iasm page.

May 27 2016

Guillaume Piolat <first.last gmail.com> writes:

On Friday, 27 May 2016 at 10:00:40 UTC, Era Scarecrow wrote:
 On Friday, 27 May 2016 at 09:51:56 UTC, rikki cattermole wrote:

  This is good progress. Using the assembler doesn't have many 
 documentation examples of how to do things, guess the x[ESP] 
 example was totally useless on the iasm page.

Referencing EBP or ESP yourself is indeed dangerous. Not sure why 
the documentation would advise that. Using "this", names of 
parameters/locals/field offset is much safer.

May 27 2016

Marco Leise <Marco.Leise gmx.de> writes:

Am Fri, 27 May 2016 10:06:28 +0000
schrieb Guillaume Piolat <first.last gmail.com>:

 Referencing EBP or ESP yourself is indeed dangerous. Not sure why 
 the documentation would advise that. Using "this", names of 
 parameters/locals/field offset is much safer.

DMD makes sure that the EBP relative access of parameters and
stack variables works by copying everything to the stack
that's in registers when you have an asm block in the
function. Using var[EBP] or just plain var will then
dereference that memory location.

-- 
Marco

May 31 2016

Era Scarecrow <rtcvb32 yahoo.com> writes:

On Friday, 27 May 2016 at 10:00:40 UTC, Era Scarecrow wrote:
 On Friday, 27 May 2016 at 09:51:56 UTC, rikki cattermole wrote:
 struct Foo {
   int x;

   void foobar() {
     asm {
       mov EAX, this;
       inc [EAX+Foo.x.offsetof];
     }
   }
 }

 You have to reference the field via a register.

  This is good progress. Using the assembler doesn't have many 
 documentation examples of how to do things

  Hmmm actually this is incorrect...

void main() {
   import std.stdio;

   Foo foo = Foo(-1);
   writeln(foo.x);
   foo.foobar;
   writeln(foo.x);
}

-1
-256

  It's assuming a byte obviously for the size. So this is the 
correct instruction:

   inc dword ptr [EAX+Foo.x.offsetof];

  However trying it with a long and a qword shows it reverts to a 
byte again, meaning 64 bit instructions are inaccessible.

May 27 2016

Era Scarecrow <rtcvb32 yahoo.com> writes:

On Friday, 27 May 2016 at 10:14:31 UTC, Era Scarecrow wrote:
   inc dword ptr [EAX+Foo.x.offsetof];


  So just tested it, and it didn't hang, meaning all unittests 
also passed.

  Final solution is:

   asm pure  nogc nothrow {
     mov EAX, this;
     add dword ptr [EAX+wideIntImpl.lo.offsetof], 1;
     adc dword ptr [EAX+wideIntImpl.lo.offsetof+4], 0;
     adc dword ptr [EAX+wideIntImpl.hi.offsetof], 0;
     adc dword ptr [EAX+wideIntImpl.hi.offsetof+4], 0;
   }

May 27 2016

Marco Leise <Marco.Leise gmx.de> writes:

Am Fri, 27 May 2016 10:16:48 +0000
schrieb Era Scarecrow <rtcvb32 yahoo.com>:

 On Friday, 27 May 2016 at 10:14:31 UTC, Era Scarecrow wrote:
   inc dword ptr [EAX+Foo.x.offsetof];  

 
 
   So just tested it, and it didn't hang, meaning all unittests 
 also passed.
 
   Final solution is:
 
    asm pure  nogc nothrow {
      mov EAX, this;
      add dword ptr [EAX+wideIntImpl.lo.offsetof], 1;
      adc dword ptr [EAX+wideIntImpl.lo.offsetof+4], 0;
      adc dword ptr [EAX+wideIntImpl.hi.offsetof], 0;
      adc dword ptr [EAX+wideIntImpl.hi.offsetof+4], 0;
    }

The 'this' pointer is usually in some register already. On
Linux 32-bit for example it is in EAX, on Linux 64-bit is in
RDI. What DMD does when it encounters an asm block is, it
stores every parameter (including the implicit this) on the
stack and when you do "mov EAX, this;" it loads it back from
there using EBP as the base pointer to the stack variables. The
boilerplate will look like this on 32-bit Linux:

   push   EBP                     // Save what's currently in EBP
   mov    EBP,ESP                 // Remember current stack pointer as base for
variables
   push   EAX                     // Save implicit 'this' parameter on the stack
   mov    EAX,DWORD PTR [EBP-0x4] // Load 'this' into EAX as you requested
   <add and adc code here>
   mov    ESP,EBP     // Restore stack to what it was before saving parameters
and variables
   pop    EBP         // Restore EBP register
   ret                // Return from function

Remember that this works only for x86 32-bit in DMD and LDC.
GDC passes inline asm right through to an arbitrary external
assembler after doing some template replacements. It will not
understand any of the asm you feed it, but forward the
external assemblers error messages.

On the other hand GDC's and LDC's extended assemblers free you
from manually loading stuff into registers. You just use a
placeholder and tell the compiler to put 'this' into some
register. The compiler will realize it is already in EAX or
RDI and do nothing but use that register instead of EAX in
your code above. Sometimes that has the additional benefit that
the same asm code works on both 32-bit and 64-bit.
Also, extended asm is transparent to the optimizer. The code
can be inlined and already loaded variables reused.

By the way, you are right that 32-bit does not have access to
64-bit machine words (actually kind of obvious), but your idea
wasn't far fetched, since there is the X32 architecture at
least for Linux. It uses 64-bit machine words, but 32-bit
pointers and allows for compact and fast programs.

-- 
Marco

May 31 2016

Era Scarecrow <rtcvb32 yahoo.com> writes:

On Tuesday, 31 May 2016 at 18:52:16 UTC, Marco Leise wrote:
 The 'this' pointer is usually in some register already. On 
 Linux 32-bit for example it is in EAX, on Linux 64-bit is in 
 RDI.

  The AX register seems like a bad choice, since you require the 
AX/DX registers when you do multiplication and division (although 
all other registers are general purpose some instructions are 
still tied to specific registers). SI/DI are a much better choice.

 By the way, you are right that 32-bit does not have access to 
 64-bit machine words (actually kind of obvious), but your idea 
 wasn't far fetched, since there is the X32 architecture at 
 least for Linux. It uses 64-bit machine words, but 32-bit 
 pointers and allows for compact and fast programs.

  As i recall the switch to use the larger registers is a simple 
switch per instruction, something like either 60h, 66h or 67h. I 
forget which one exactly, as i recall writing assembly programs 
using 16bit DOS but using 32bit registers using that trick (built 
into the assembler). Although to use the lower registers by 
themselves required the same switch, so...

May 31 2016

D Programming

C/C++ Programming

Other

digitalmars.D.learn - asm woes...