digitalmars.D.learn - VLA in Assembler

Foo (12/12) Dec 17 2014 Hi,

bearophile (13/25) Dec 17 2014 Doing it with alloca is simpler:

Foo (3/29) Dec 17 2014 Yes I know, but I really want it in inline assembly. It's for

uri (2/40) Dec 17 2014 You could look at the disassembly.

Foo (2/42) Dec 17 2014 And how? I'm on Windows.

Adam D. Ruppe (7/8) Dec 17 2014 Digital Mars sells an obj2asm function that will disassemble dmd

btdc (28/40) Dec 17 2014 It's probably something like that:

btdc (2/43) Dec 17 2014 fuck...the comments are once again cut...

Foo (24/24) Dec 17 2014 And it is using malloc... ;)

btdc (2/26) Dec 17 2014 You cant always get what you want. try more, speak less.

Namespaces (2/29) Dec 17 2014 Very helpful. And soo friendly! ;)

Adam D. Ruppe (219/226) Dec 17 2014 That wouldn't work even with malloc.... remember, an integer more

Foo (4/236) Dec 17 2014 That is an awesome explanation! :)

"Foo" <Foo test.de> writes:

Hi,
Could someone explain me, if and how it is possible to allocate a 
variable length array with inline assembly?
Somewhat like
----
int[] arr;
int n = 42;
asm {
     // allocate n stack space for arr
}
----
I know it is dangerous and all that, but I just want it know. ;)

Dec 17 2014

"bearophile" <bearophileHUGS lycos.com> writes:

Foo:

 Hi,
 Could someone explain me, if and how it is possible to allocate 
 a variable length array with inline assembly?
 Somewhat like
 ----
 int[] arr;
 int n = 42;
 asm {
     // allocate n stack space for arr
 }
 ----
 I know it is dangerous and all that, but I just want it know. ;)

Doing it with alloca is simpler:


void main()  nogc {
     import core.stdc.stdlib: alloca, exit;

     alias T = int;
     enum n = 42;

     auto ptr = cast(T*)alloca(T.sizeof * n);
     if (ptr == null)
         exit(1); // Or throw a memory error.
     auto arr = ptr[0 .. n];
}


Bye,
bearophile

Dec 17 2014

"Foo" <Foo test.de> writes:

On Wednesday, 17 December 2014 at 10:59:09 UTC, bearophile wrote:
 Foo:

 Hi,
 Could someone explain me, if and how it is possible to 
 allocate a variable length array with inline assembly?
 Somewhat like
 ----
 int[] arr;
 int n = 42;
 asm {
    // allocate n stack space for arr
 }
 ----
 I know it is dangerous and all that, but I just want it know. 
 ;)

 Doing it with alloca is simpler:


 void main()  nogc {
     import core.stdc.stdlib: alloca, exit;

     alias T = int;
     enum n = 42;

     auto ptr = cast(T*)alloca(T.sizeof * n);
     if (ptr == null)
         exit(1); // Or throw a memory error.
     auto arr = ptr[0 .. n];
 }


 Bye,
 bearophile

Yes I know, but I really want it in inline assembly. It's for 
learning purpose. :)

Dec 17 2014

"uri" <uri.grill gmail.com> writes:

On Wednesday, 17 December 2014 at 11:39:43 UTC, Foo wrote:
 On Wednesday, 17 December 2014 at 10:59:09 UTC, bearophile 
 wrote:
 Foo:

 Hi,
 Could someone explain me, if and how it is possible to 
 allocate a variable length array with inline assembly?
 Somewhat like
 ----
 int[] arr;
 int n = 42;
 asm {
   // allocate n stack space for arr
 }
 ----
 I know it is dangerous and all that, but I just want it know. 
 ;)

 Doing it with alloca is simpler:


 void main()  nogc {
    import core.stdc.stdlib: alloca, exit;

    alias T = int;
    enum n = 42;

    auto ptr = cast(T*)alloca(T.sizeof * n);
    if (ptr == null)
        exit(1); // Or throw a memory error.
    auto arr = ptr[0 .. n];
 }


 Bye,
 bearophile

 Yes I know, but I really want it in inline assembly. It's for 
 learning purpose. :)

You could look at the disassembly.

Dec 17 2014

"Foo" <Foo test.de> writes:

On Wednesday, 17 December 2014 at 12:15:23 UTC, uri wrote:
 On Wednesday, 17 December 2014 at 11:39:43 UTC, Foo wrote:
 On Wednesday, 17 December 2014 at 10:59:09 UTC, bearophile 
 wrote:
 Foo:

 Hi,
 Could someone explain me, if and how it is possible to 
 allocate a variable length array with inline assembly?
 Somewhat like
 ----
 int[] arr;
 int n = 42;
 asm {
  // allocate n stack space for arr
 }
 ----
 I know it is dangerous and all that, but I just want it 
 know. ;)

 Doing it with alloca is simpler:


 void main()  nogc {
   import core.stdc.stdlib: alloca, exit;

   alias T = int;
   enum n = 42;

   auto ptr = cast(T*)alloca(T.sizeof * n);
   if (ptr == null)
       exit(1); // Or throw a memory error.
   auto arr = ptr[0 .. n];
 }


 Bye,
 bearophile

 Yes I know, but I really want it in inline assembly. It's for 
 learning purpose. :)

 You could look at the disassembly.

And how? I'm on Windows.

Dec 17 2014

"Adam D. Ruppe" <destructionator gmail.com> writes:

On Wednesday, 17 December 2014 at 12:29:53 UTC, Foo wrote:
 And how? I'm on Windows.

Digital Mars sells an obj2asm function that will disassemble dmd 
generated code. I think it is in the $15 basic utility package.

But VLA/alloca is more complex than a regular function - the 
compiler needs to know about it to adjust for the changed stack. 
It'll take more length to write this up, I'll do it in a separate 
post.

Dec 17 2014

"btdc" <btdc nowhere.fr> writes:

On Wednesday, 17 December 2014 at 10:35:39 UTC, Foo wrote:
 Hi,
 Could someone explain me, if and how it is possible to allocate 
 a variable length array with inline assembly?
 Somewhat like
 ----
 int[] arr;
 int n = 42;
 asm {
     // allocate n stack space for arr
 }
 ----
 I know it is dangerous and all that, but I just want it know. ;)

It's probably something like that:

module runnable;

import std.stdio;
import std.c.stdlib;

ubyte[] newArr(size_t aLength)
{
     asm
     {
         naked;

         mov ECX, EAX;       // saves aLength in ECX

         push ECX;
         call malloc;        // .ptr =  malloc(aLength);
         mov ECX,[EAX];      // saved the .ptr of our array

         mov EAX, 8;         // an array is a struct with length 
and ptr
                             // so 8 bytes in 32 bit
         call malloc;        // EAX points to the first byte of 
the struct

         mov [EAX + 4], ECX; // .ptr
         pop ECX;
         mov [EAX], ECX;     // .length
         mov EAX, [EAX];     // curretnly EAX is a ref, so need to 
dig...

         ret;
     }
}

try and see ;) Actually it may be wrong

Dec 17 2014

"btdc" <btdc nowhere.fr> writes:

On Wednesday, 17 December 2014 at 12:54:44 UTC, btdc wrote:
 On Wednesday, 17 December 2014 at 10:35:39 UTC, Foo wrote:
 Hi,
 Could someone explain me, if and how it is possible to 
 allocate a variable length array with inline assembly?
 Somewhat like
 ----
 int[] arr;
 int n = 42;
 asm {
    // allocate n stack space for arr
 }
 ----
 I know it is dangerous and all that, but I just want it know. 
 ;)

 It's probably something like that:

 module runnable;

 import std.stdio;
 import std.c.stdlib;

 ubyte[] newArr(size_t aLength)
 {
     asm
     {
         naked;

         mov ECX, EAX;       // saves aLength in ECX

         push ECX;
         call malloc;        // .ptr =  malloc(aLength);
         mov ECX,[EAX];      // saved the .ptr of our array

         mov EAX, 8;         // an array is a struct with length 
 and ptr
                             // so 8 bytes in 32 bit
         call malloc;        // EAX points to the first byte of 
 the struct

         mov [EAX + 4], ECX; // .ptr
         pop ECX;
         mov [EAX], ECX;     // .length
         mov EAX, [EAX];     // curretnly EAX is a ref, so need 
 to dig...

         ret;
     }
 }

 try and see ;) Actually it may be wrong

fuck...the comments are once again cut...

Dec 17 2014

"Foo" <foo test.de> writes:

And it is using malloc... ;)
I wanted something that increases the stack pointer ESP.

e.g.
----
void main()
{
	int[] arr;
	int n = 42;
	
	writeln(arr.length);
	writeln(arr.ptr);
	
	asm {
		mov EAX, n;
		mov [arr + 8], ESP;
		sub [ESP], EAX;
		mov [arr + 0], EAX;
	}
	
	writeln(arr.length);
	//writeln(arr[0]);
}
----
but that does not work...

Dec 17 2014

"btdc" <btdc nowhere.fr> writes:

On Wednesday, 17 December 2014 at 14:11:32 UTC, Foo wrote:
 And it is using malloc... ;)
 I wanted something that increases the stack pointer ESP.

 e.g.
 ----
 void main()
 {
 	int[] arr;
 	int n = 42;
 	
 	writeln(arr.length);
 	writeln(arr.ptr);
 	
 	asm {
 		mov EAX, n;
 		mov [arr + 8], ESP;
 		sub [ESP], EAX;
 		mov [arr + 0], EAX;
 	}
 	
 	writeln(arr.length);
 	//writeln(arr[0]);
 }
 ----
 but that does not work...

You cant always get what you want. try more, speak less.

Dec 17 2014

"Namespaces" <rswhite4 gmail.com> writes:

On Wednesday, 17 December 2014 at 15:20:28 UTC, btdc wrote:
 On Wednesday, 17 December 2014 at 14:11:32 UTC, Foo wrote:
 And it is using malloc... ;)
 I wanted something that increases the stack pointer ESP.

 e.g.
 ----
 void main()
 {
 	int[] arr;
 	int n = 42;
 	
 	writeln(arr.length);
 	writeln(arr.ptr);
 	
 	asm {
 		mov EAX, n;
 		mov [arr + 8], ESP;
 		sub [ESP], EAX;
 		mov [arr + 0], EAX;
 	}
 	
 	writeln(arr.length);
 	//writeln(arr[0]);
 }
 ----
 but that does not work...

 You cant always get what you want. try more, speak less.

Very helpful. And soo friendly! ;)

Dec 17 2014

"Adam D. Ruppe" <destructionator gmail.com> writes:

On Wednesday, 17 December 2014 at 14:11:32 UTC, Foo wrote:
 	asm {
 		mov EAX, n;
 		mov [arr + 8], ESP;
 		sub [ESP], EAX;
 		mov [arr + 0], EAX;
 	}
 but that does not work...

That wouldn't work even with malloc.... remember, an integer more 
than one byte long, so your subtract is 1/4 the size it needs to 
be! Also, since the stack grows downward, you're storing the 
pointer to the end of the array instead of the beginning of it.


NOTE: I've never actually done this before, so I'm figuring it 
out as I go too. This might be buggy or otherwise mistaken at 
points. (Personally, I prefer to use a static array sized to the 
max thing I'll probably need that I slice  instead of alloca...)


Here's some code that runs successfully (in 32 bit!):

void vla(int n) {
	int[] arr;

	asm {
		mov EAX, [n];
                 // the first word in an array is the length, 
store that
		mov [arr], EAX;
		shl EAX, 2; // number of bytes == n * int.sizeof
		sub ESP, EAX; // allocate the bytes
		mov [arr + size_t.sizeof], ESP; // store the beginning of it in 
the arr.ptr
	}

	import std.stdio;
	writeln(arr.length);
	writeln(arr.ptr);

         // initialize the data...
	foreach(i, ref a; arr)
		a = i;

	writeln(arr); // and print it back out
}

void main() {
	vla(8);
}


This looks right.... but isn't, we changed the stack and didn't 
put it back. That's usually a no-no. If we disassemble the 
function, we can take a look at the end and see something scary:

  8084ec6:       e8 9d 6a 00 00          call   808b968 
<_D3std5stdio15__T7writelnTAiZ7writelnFAiZv>  // our final 
writeln call
  8084ecb:       5e                      pop    esi  // uh oh
  8084ecc:       5b                      pop    ebx
  8084ecd:       c9                      leave
  8084ece:       c3                      ret



Before the call to leave, which puts the stack back how it was at 
the beginning of the function - which saves us from a random EIP 
being restored upon the ret instruction - the compiler put in a 
few pop instructions.

main() will have different values in esi and ebx than it expects! 
Running it in the debugger shows these values changed too:

before

(gdb) info registers
[...]
ebx            0xffffd4f4       -11020
[...]
esi            0x80916e8        134813416


after

ebx            0x1      1
esi            0x0      0


It popped the values of our array. According to the ABI: "EBX, 
ESI, EDI, EBP must be preserved across function calls." 
http://dlang.org/abi.html

They are pushed for a reason - the compiler assumes they remain 
the same.


In this little test program, nothing went wrong because no more 
code was run after vla returned. But, if we were using, say a 
struct, it'd probably fault when it tried to access `this`. It'd 
probably mess up other local variables too. No good!


So, we'll need to store and restore the stack pointer... can we 
use the stack's push and pop instructions? Nope, we're changing 
the stack! Our own pop would grab the wrong data too.

We could save it in a local variable. How do we restore it 
though? scope(exit) won't work, it won't happen at the right time 
and will corrupt the stack even worse.

Gotta do it ourselves - which means we can't do the alloca even 
as a single mixin, since it needs code added before any return 
point too!

(There might be other, better ways to do this... and indeed, 
there is, as we'll see later on. I peeked at the druntime source 
code and it does it differently. Continue reading...)




Here's code that we can verify in the debugger leaves everything 
how it should be and doesn't crash:

void vla(int n) {
	int[] arr;
	void* saved_esp;

	asm {
		mov EAX, [n];
		mov [arr], EAX;
		shl EAX, 2; // number of bytes == n * int.sizeof

                 // NEW LINE
		mov [saved_esp], ESP; // save it for later

		sub ESP, EAX;
		mov [arr + size_t.sizeof], ESP;
	}

	import std.stdio;
	writeln(arr.length);
	writeln(arr.ptr);

	foreach(i, ref a; arr)
		a = i;

	writeln(arr);

         // NEW LINE
	asm { mov ESP, [saved_esp]; } // restore it before we return
}




Note that this still isn't quite right - the allocated size 
should be aligned too. It works for the simple case of 8 ints 
since that's coincidentally aligned, but if we were doing like 3 
bytes, it would mess things up. Gotta be rounded up to a multiple 
of 4 or 16 on some systems.

hmm, I'm looking at the alloca source and there's a touch of a 
guard page on Windows too. Check out the file: 
dmd2/src/druntime/src/rt/alloca.d, it is written in mostly inline 
asm.

Note the comment though:

  * This is a 'magic' function that needs help from the compiler to
  * work right, do not change its name, do not call it from other 
compilers.




So, how does this compare with alloca? Let's make a really simple 
example to compare and contrast with malloc to make the asm more 
readable:

import core.stdc.stdlib;

void vla(int n) {
         int[] arr;
         arr = (cast(int*)alloca(n * int.sizeof))[0 .. n];
}


Program runs, let's see the code.

0805f3f0 <_D3vla3vlaFiZv>:
  805f3f0:       55                      push   ebp
  805f3f1:       8b ec                   mov    ebp,esp
  805f3f3:       83 ec 10                sub    esp,0x10
  805f3f6:       c7 45 f0 10 00 00 00    mov    DWORD PTR 
[ebp-0x10],0x10
  805f3fd:       89 45 fc                mov    DWORD PTR 
[ebp-0x4],eax
  805f400:       c7 45 f4 00 00 00 00    mov    DWORD PTR 
[ebp-0xc],0x0
  805f407:       c7 45 f8 00 00 00 00    mov    DWORD PTR 
[ebp-0x8],0x0
  805f40e:       8b 45 fc                mov    eax,DWORD PTR 
[ebp-0x4]
  805f411:       50                      push   eax
  805f412:       c1 e0 02                shl    eax,0x2
  805f415:       50                      push   eax
  805f416:       8d 4d f0                lea    ecx,[ebp-0x10]
  805f419:       e8 e2 01 00 00          call   805f600 <__alloca>
  805f41e:       89 c1                   mov    ecx,eax
  805f420:       83 c4 04                add    esp,0x4
  805f423:       58                      pop    eax
  805f424:       89 45 f4                mov    DWORD PTR 
[ebp-0xc],eax
  805f427:       89 4d f8                mov    DWORD PTR 
[ebp-0x8],ecx
  805f42a:       c9                      leave
  805f42b:       c3                      ret


Change alloca to malloc:

0805f3f0 <_D3vla3vlaFiZv>:
  805f3f0:       55                      push   ebp
  805f3f1:       8b ec                   mov    ebp,esp
  805f3f3:       83 ec 0c                sub    esp,0xc
  805f3f6:       89 45 fc                mov    DWORD PTR 
[ebp-0x4],eax
  805f3f9:       c7 45 f4 00 00 00 00    mov    DWORD PTR 
[ebp-0xc],0x0
  805f400:       c7 45 f8 00 00 00 00    mov    DWORD PTR 
[ebp-0x8],0x0
  805f407:       8b 45 fc                mov    eax,DWORD PTR 
[ebp-0x4]
  805f40a:       50                      push   eax
  805f40b:       c1 e0 02                shl    eax,0x2
  805f40e:       50                      push   eax
  805f40f:       e8 0c fc ff ff          call   805f020 
<malloc plt>
  805f414:       89 c1                   mov    ecx,eax
  805f416:       83 c4 04                add    esp,0x4
  805f419:       58                      pop    eax
  805f41a:       89 45 f4                mov    DWORD PTR 
[ebp-0xc],eax
  805f41d:       89 4d f8                mov    DWORD PTR 
[ebp-0x8],ecx
  805f420:       c9                      leave
  805f421:       c3                      ret


Differences?


We can see on line 3 that there's an extra word allocated for a 
local variable with alloca. It is loaded with the size of the 
local variables - 0x10. A pointer to that is passed to alloca.

If we go back to the druntime source code:


  *              This is adjusted upon return to reflect the 
additional
  *              size of the stack frame.


It is used in that function:
         // Copy down to [ESP] the temps on the stack.
         // The number of temps is (EBP - ESP - locals).
  // snip

sub     ECX,[EDX]       ; // ECX = number of temps (bytes) to 
move.
         add     [EDX],ESI       ; // adjust locals by nbytes for 
next call to alloca()
  // snip
         rep                     ;
         movsd                   ;




So, instead of restoring the stack pointer upon function return 
like I did, this copies the relevant data that was pushed onto 
the stack to the new location, so a subsequent pop will find what 
it expects, then it adjusts the hidden local size variable so 
next time, it can repeat the process. Cool - that's something my 
solution wouldn't have done super easily (it totally could, just 
don't overwrite that variable once it is initialized).


I guess there is a better way than I had figured above :)





We can use that same trick the compiler did by declaring a local 
variable and moving the magic __LOCAL_SIZE (see: 
http://dlang.org/iasm.html ) value into it up front, then calling 
alloca exactly as the C does. The implementation can be the same 
as from druntime too.


That's why it is a magic function: it needs to put the stack how 
it expects, somehow. My way was to add a store. The way actually 
used in druntime is to store the size of the locals in a hidden 
variable. Either way, if you do an iasm alloca yourself, you'll 
have to account for it as well.


Otherwise, remember to store the right pointer and allocate the 
right number of bytes and you've got it.

Dec 17 2014

"Foo" <Foo test.de> writes:

On Wednesday, 17 December 2014 at 16:10:40 UTC, Adam D. Ruppe 
wrote:
 On Wednesday, 17 December 2014 at 14:11:32 UTC, Foo wrote:
 	asm {
 		mov EAX, n;
 		mov [arr + 8], ESP;
 		sub [ESP], EAX;
 		mov [arr + 0], EAX;
 	}
 but that does not work...

 That wouldn't work even with malloc.... remember, an integer 
 more than one byte long, so your subtract is 1/4 the size it 
 needs to be! Also, since the stack grows downward, you're 
 storing the pointer to the end of the array instead of the 
 beginning of it.


 NOTE: I've never actually done this before, so I'm figuring it 
 out as I go too. This might be buggy or otherwise mistaken at 
 points. (Personally, I prefer to use a static array sized to 
 the max thing I'll probably need that I slice  instead of 
 alloca...)


 Here's some code that runs successfully (in 32 bit!):

 void vla(int n) {
 	int[] arr;

 	asm {
 		mov EAX, [n];
                 // the first word in an array is the length, 
 store that
 		mov [arr], EAX;
 		shl EAX, 2; // number of bytes == n * int.sizeof
 		sub ESP, EAX; // allocate the bytes
 		mov [arr + size_t.sizeof], ESP; // store the beginning of it 
 in the arr.ptr
 	}

 	import std.stdio;
 	writeln(arr.length);
 	writeln(arr.ptr);

         // initialize the data...
 	foreach(i, ref a; arr)
 		a = i;

 	writeln(arr); // and print it back out
 }

 void main() {
 	vla(8);
 }


 This looks right.... but isn't, we changed the stack and didn't 
 put it back. That's usually a no-no. If we disassemble the 
 function, we can take a look at the end and see something scary:

  8084ec6:       e8 9d 6a 00 00          call   808b968 
 <_D3std5stdio15__T7writelnTAiZ7writelnFAiZv>  // our final 
 writeln call
  8084ecb:       5e                      pop    esi  // uh oh
  8084ecc:       5b                      pop    ebx
  8084ecd:       c9                      leave
  8084ece:       c3                      ret



 Before the call to leave, which puts the stack back how it was 
 at the beginning of the function - which saves us from a random 
 EIP being restored upon the ret instruction - the compiler put 
 in a few pop instructions.

 main() will have different values in esi and ebx than it 
 expects! Running it in the debugger shows these values changed 
 too:

 before

 (gdb) info registers
 [...]
 ebx            0xffffd4f4       -11020
 [...]
 esi            0x80916e8        134813416


 after

 ebx            0x1      1
 esi            0x0      0


 It popped the values of our array. According to the ABI: "EBX, 
 ESI, EDI, EBP must be preserved across function calls." 
 http://dlang.org/abi.html

 They are pushed for a reason - the compiler assumes they remain 
 the same.


 In this little test program, nothing went wrong because no more 
 code was run after vla returned. But, if we were using, say a 
 struct, it'd probably fault when it tried to access `this`. 
 It'd probably mess up other local variables too. No good!


 So, we'll need to store and restore the stack pointer... can we 
 use the stack's push and pop instructions? Nope, we're changing 
 the stack! Our own pop would grab the wrong data too.

 We could save it in a local variable. How do we restore it 
 though? scope(exit) won't work, it won't happen at the right 
 time and will corrupt the stack even worse.

 Gotta do it ourselves - which means we can't do the alloca even 
 as a single mixin, since it needs code added before any return 
 point too!

 (There might be other, better ways to do this... and indeed, 
 there is, as we'll see later on. I peeked at the druntime 
 source code and it does it differently. Continue reading...)




 Here's code that we can verify in the debugger leaves 
 everything how it should be and doesn't crash:

 void vla(int n) {
 	int[] arr;
 	void* saved_esp;

 	asm {
 		mov EAX, [n];
 		mov [arr], EAX;
 		shl EAX, 2; // number of bytes == n * int.sizeof

                 // NEW LINE
 		mov [saved_esp], ESP; // save it for later

 		sub ESP, EAX;
 		mov [arr + size_t.sizeof], ESP;
 	}

 	import std.stdio;
 	writeln(arr.length);
 	writeln(arr.ptr);

 	foreach(i, ref a; arr)
 		a = i;

 	writeln(arr);

         // NEW LINE
 	asm { mov ESP, [saved_esp]; } // restore it before we return
 }




 Note that this still isn't quite right - the allocated size 
 should be aligned too. It works for the simple case of 8 ints 
 since that's coincidentally aligned, but if we were doing like 
 3 bytes, it would mess things up. Gotta be rounded up to a 
 multiple of 4 or 16 on some systems.

 hmm, I'm looking at the alloca source and there's a touch of a 
 guard page on Windows too. Check out the file: 
 dmd2/src/druntime/src/rt/alloca.d, it is written in mostly 
 inline asm.

 Note the comment though:

  * This is a 'magic' function that needs help from the compiler 
 to
  * work right, do not change its name, do not call it from 
 other compilers.




 So, how does this compare with alloca? Let's make a really 
 simple example to compare and contrast with malloc to make the 
 asm more readable:

 import core.stdc.stdlib;

 void vla(int n) {
         int[] arr;
         arr = (cast(int*)alloca(n * int.sizeof))[0 .. n];
 }


 Program runs, let's see the code.

 0805f3f0 <_D3vla3vlaFiZv>:
  805f3f0:       55                      push   ebp
  805f3f1:       8b ec                   mov    ebp,esp
  805f3f3:       83 ec 10                sub    esp,0x10
  805f3f6:       c7 45 f0 10 00 00 00    mov    DWORD PTR 
 [ebp-0x10],0x10
  805f3fd:       89 45 fc                mov    DWORD PTR 
 [ebp-0x4],eax
  805f400:       c7 45 f4 00 00 00 00    mov    DWORD PTR 
 [ebp-0xc],0x0
  805f407:       c7 45 f8 00 00 00 00    mov    DWORD PTR 
 [ebp-0x8],0x0
  805f40e:       8b 45 fc                mov    eax,DWORD PTR 
 [ebp-0x4]
  805f411:       50                      push   eax
  805f412:       c1 e0 02                shl    eax,0x2
  805f415:       50                      push   eax
  805f416:       8d 4d f0                lea    ecx,[ebp-0x10]
  805f419:       e8 e2 01 00 00          call   805f600 
 <__alloca>
  805f41e:       89 c1                   mov    ecx,eax
  805f420:       83 c4 04                add    esp,0x4
  805f423:       58                      pop    eax
  805f424:       89 45 f4                mov    DWORD PTR 
 [ebp-0xc],eax
  805f427:       89 4d f8                mov    DWORD PTR 
 [ebp-0x8],ecx
  805f42a:       c9                      leave
  805f42b:       c3                      ret


 Change alloca to malloc:

 0805f3f0 <_D3vla3vlaFiZv>:
  805f3f0:       55                      push   ebp
  805f3f1:       8b ec                   mov    ebp,esp
  805f3f3:       83 ec 0c                sub    esp,0xc
  805f3f6:       89 45 fc                mov    DWORD PTR 
 [ebp-0x4],eax
  805f3f9:       c7 45 f4 00 00 00 00    mov    DWORD PTR 
 [ebp-0xc],0x0
  805f400:       c7 45 f8 00 00 00 00    mov    DWORD PTR 
 [ebp-0x8],0x0
  805f407:       8b 45 fc                mov    eax,DWORD PTR 
 [ebp-0x4]
  805f40a:       50                      push   eax
  805f40b:       c1 e0 02                shl    eax,0x2
  805f40e:       50                      push   eax
  805f40f:       e8 0c fc ff ff          call   805f020 
 <malloc plt>
  805f414:       89 c1                   mov    ecx,eax
  805f416:       83 c4 04                add    esp,0x4
  805f419:       58                      pop    eax
  805f41a:       89 45 f4                mov    DWORD PTR 
 [ebp-0xc],eax
  805f41d:       89 4d f8                mov    DWORD PTR 
 [ebp-0x8],ecx
  805f420:       c9                      leave
  805f421:       c3                      ret


 Differences?


 We can see on line 3 that there's an extra word allocated for a 
 local variable with alloca. It is loaded with the size of the 
 local variables - 0x10. A pointer to that is passed to alloca.

 If we go back to the druntime source code:


  *              This is adjusted upon return to reflect the 
 additional
  *              size of the stack frame.


 It is used in that function:
         // Copy down to [ESP] the temps on the stack.
         // The number of temps is (EBP - ESP - locals).
  // snip

 sub     ECX,[EDX]       ; // ECX = number of temps (bytes) to 
 move.
         add     [EDX],ESI       ; // adjust locals by nbytes 
 for next call to alloca()
  // snip
         rep                     ;
         movsd                   ;




 So, instead of restoring the stack pointer upon function return 
 like I did, this copies the relevant data that was pushed onto 
 the stack to the new location, so a subsequent pop will find 
 what it expects, then it adjusts the hidden local size variable 
 so next time, it can repeat the process. Cool - that's 
 something my solution wouldn't have done super easily (it 
 totally could, just don't overwrite that variable once it is 
 initialized).


 I guess there is a better way than I had figured above :)





 We can use that same trick the compiler did by declaring a 
 local variable and moving the magic __LOCAL_SIZE (see: 
 http://dlang.org/iasm.html ) value into it up front, then 
 calling alloca exactly as the C does. The implementation can be 
 the same as from druntime too.


 That's why it is a magic function: it needs to put the stack 
 how it expects, somehow. My way was to add a store. The way 
 actually used in druntime is to store the size of the locals in 
 a hidden variable. Either way, if you do an iasm alloca 
 yourself, you'll have to account for it as well.


 Otherwise, remember to store the right pointer and allocate the 
 right number of bytes and you've got it.

That is an awesome explanation! :)
Thank you for your time, I will experiment with your code.

Dec 17 2014

D Programming

C/C++ Programming

Other

digitalmars.D.learn - VLA in Assembler