www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - VLA in Assembler

reply "Foo" <Foo test.de> writes:
Hi,
Could someone explain me, if and how it is possible to allocate a 
variable length array with inline assembly?
Somewhat like
----
int[] arr;
int n = 42;
asm {
     // allocate n stack space for arr
}
----
I know it is dangerous and all that, but I just want it know. ;)
Dec 17 2014
next sibling parent reply "bearophile" <bearophileHUGS lycos.com> writes:
Foo:

 Hi,
 Could someone explain me, if and how it is possible to allocate 
 a variable length array with inline assembly?
 Somewhat like
 ----
 int[] arr;
 int n = 42;
 asm {
     // allocate n stack space for arr
 }
 ----
 I know it is dangerous and all that, but I just want it know. ;)
Doing it with alloca is simpler: void main() nogc { import core.stdc.stdlib: alloca, exit; alias T = int; enum n = 42; auto ptr = cast(T*)alloca(T.sizeof * n); if (ptr == null) exit(1); // Or throw a memory error. auto arr = ptr[0 .. n]; } Bye, bearophile
Dec 17 2014
parent reply "Foo" <Foo test.de> writes:
On Wednesday, 17 December 2014 at 10:59:09 UTC, bearophile wrote:
 Foo:

 Hi,
 Could someone explain me, if and how it is possible to 
 allocate a variable length array with inline assembly?
 Somewhat like
 ----
 int[] arr;
 int n = 42;
 asm {
    // allocate n stack space for arr
 }
 ----
 I know it is dangerous and all that, but I just want it know. 
 ;)
Doing it with alloca is simpler: void main() nogc { import core.stdc.stdlib: alloca, exit; alias T = int; enum n = 42; auto ptr = cast(T*)alloca(T.sizeof * n); if (ptr == null) exit(1); // Or throw a memory error. auto arr = ptr[0 .. n]; } Bye, bearophile
Yes I know, but I really want it in inline assembly. It's for learning purpose. :)
Dec 17 2014
parent reply "uri" <uri.grill gmail.com> writes:
On Wednesday, 17 December 2014 at 11:39:43 UTC, Foo wrote:
 On Wednesday, 17 December 2014 at 10:59:09 UTC, bearophile 
 wrote:
 Foo:

 Hi,
 Could someone explain me, if and how it is possible to 
 allocate a variable length array with inline assembly?
 Somewhat like
 ----
 int[] arr;
 int n = 42;
 asm {
   // allocate n stack space for arr
 }
 ----
 I know it is dangerous and all that, but I just want it know. 
 ;)
Doing it with alloca is simpler: void main() nogc { import core.stdc.stdlib: alloca, exit; alias T = int; enum n = 42; auto ptr = cast(T*)alloca(T.sizeof * n); if (ptr == null) exit(1); // Or throw a memory error. auto arr = ptr[0 .. n]; } Bye, bearophile
Yes I know, but I really want it in inline assembly. It's for learning purpose. :)
You could look at the disassembly.
Dec 17 2014
parent reply "Foo" <Foo test.de> writes:
On Wednesday, 17 December 2014 at 12:15:23 UTC, uri wrote:
 On Wednesday, 17 December 2014 at 11:39:43 UTC, Foo wrote:
 On Wednesday, 17 December 2014 at 10:59:09 UTC, bearophile 
 wrote:
 Foo:

 Hi,
 Could someone explain me, if and how it is possible to 
 allocate a variable length array with inline assembly?
 Somewhat like
 ----
 int[] arr;
 int n = 42;
 asm {
  // allocate n stack space for arr
 }
 ----
 I know it is dangerous and all that, but I just want it 
 know. ;)
Doing it with alloca is simpler: void main() nogc { import core.stdc.stdlib: alloca, exit; alias T = int; enum n = 42; auto ptr = cast(T*)alloca(T.sizeof * n); if (ptr == null) exit(1); // Or throw a memory error. auto arr = ptr[0 .. n]; } Bye, bearophile
Yes I know, but I really want it in inline assembly. It's for learning purpose. :)
You could look at the disassembly.
And how? I'm on Windows.
Dec 17 2014
parent "Adam D. Ruppe" <destructionator gmail.com> writes:
On Wednesday, 17 December 2014 at 12:29:53 UTC, Foo wrote:
 And how? I'm on Windows.
Digital Mars sells an obj2asm function that will disassemble dmd generated code. I think it is in the $15 basic utility package. But VLA/alloca is more complex than a regular function - the compiler needs to know about it to adjust for the changed stack. It'll take more length to write this up, I'll do it in a separate post.
Dec 17 2014
prev sibling parent reply "btdc" <btdc nowhere.fr> writes:
On Wednesday, 17 December 2014 at 10:35:39 UTC, Foo wrote:
 Hi,
 Could someone explain me, if and how it is possible to allocate 
 a variable length array with inline assembly?
 Somewhat like
 ----
 int[] arr;
 int n = 42;
 asm {
     // allocate n stack space for arr
 }
 ----
 I know it is dangerous and all that, but I just want it know. ;)
It's probably something like that: module runnable; import std.stdio; import std.c.stdlib; ubyte[] newArr(size_t aLength) { asm { naked; mov ECX, EAX; // saves aLength in ECX push ECX; call malloc; // .ptr = malloc(aLength); mov ECX,[EAX]; // saved the .ptr of our array mov EAX, 8; // an array is a struct with length and ptr // so 8 bytes in 32 bit call malloc; // EAX points to the first byte of the struct mov [EAX + 4], ECX; // .ptr pop ECX; mov [EAX], ECX; // .length mov EAX, [EAX]; // curretnly EAX is a ref, so need to dig... ret; } } try and see ;) Actually it may be wrong
Dec 17 2014
parent reply "btdc" <btdc nowhere.fr> writes:
On Wednesday, 17 December 2014 at 12:54:44 UTC, btdc wrote:
 On Wednesday, 17 December 2014 at 10:35:39 UTC, Foo wrote:
 Hi,
 Could someone explain me, if and how it is possible to 
 allocate a variable length array with inline assembly?
 Somewhat like
 ----
 int[] arr;
 int n = 42;
 asm {
    // allocate n stack space for arr
 }
 ----
 I know it is dangerous and all that, but I just want it know. 
 ;)
It's probably something like that: module runnable; import std.stdio; import std.c.stdlib; ubyte[] newArr(size_t aLength) { asm { naked; mov ECX, EAX; // saves aLength in ECX push ECX; call malloc; // .ptr = malloc(aLength); mov ECX,[EAX]; // saved the .ptr of our array mov EAX, 8; // an array is a struct with length and ptr // so 8 bytes in 32 bit call malloc; // EAX points to the first byte of the struct mov [EAX + 4], ECX; // .ptr pop ECX; mov [EAX], ECX; // .length mov EAX, [EAX]; // curretnly EAX is a ref, so need to dig... ret; } } try and see ;) Actually it may be wrong
fuck...the comments are once again cut...
Dec 17 2014
parent reply "Foo" <foo test.de> writes:
And it is using malloc... ;)
I wanted something that increases the stack pointer ESP.

e.g.
----
void main()
{
	int[] arr;
	int n = 42;
	
	writeln(arr.length);
	writeln(arr.ptr);
	
	asm {
		mov EAX, n;
		mov [arr + 8], ESP;
		sub [ESP], EAX;
		mov [arr + 0], EAX;
	}
	
	writeln(arr.length);
	//writeln(arr[0]);
}
----
but that does not work...
Dec 17 2014
next sibling parent reply "btdc" <btdc nowhere.fr> writes:
On Wednesday, 17 December 2014 at 14:11:32 UTC, Foo wrote:
 And it is using malloc... ;)
 I wanted something that increases the stack pointer ESP.

 e.g.
 ----
 void main()
 {
 	int[] arr;
 	int n = 42;
 	
 	writeln(arr.length);
 	writeln(arr.ptr);
 	
 	asm {
 		mov EAX, n;
 		mov [arr + 8], ESP;
 		sub [ESP], EAX;
 		mov [arr + 0], EAX;
 	}
 	
 	writeln(arr.length);
 	//writeln(arr[0]);
 }
 ----
 but that does not work...
You cant always get what you want. try more, speak less.
Dec 17 2014
parent "Namespaces" <rswhite4 gmail.com> writes:
On Wednesday, 17 December 2014 at 15:20:28 UTC, btdc wrote:
 On Wednesday, 17 December 2014 at 14:11:32 UTC, Foo wrote:
 And it is using malloc... ;)
 I wanted something that increases the stack pointer ESP.

 e.g.
 ----
 void main()
 {
 	int[] arr;
 	int n = 42;
 	
 	writeln(arr.length);
 	writeln(arr.ptr);
 	
 	asm {
 		mov EAX, n;
 		mov [arr + 8], ESP;
 		sub [ESP], EAX;
 		mov [arr + 0], EAX;
 	}
 	
 	writeln(arr.length);
 	//writeln(arr[0]);
 }
 ----
 but that does not work...
You cant always get what you want. try more, speak less.
Very helpful. And soo friendly! ;)
Dec 17 2014
prev sibling parent reply "Adam D. Ruppe" <destructionator gmail.com> writes:
On Wednesday, 17 December 2014 at 14:11:32 UTC, Foo wrote:
 	asm {
 		mov EAX, n;
 		mov [arr + 8], ESP;
 		sub [ESP], EAX;
 		mov [arr + 0], EAX;
 	}
 but that does not work...
That wouldn't work even with malloc.... remember, an integer more than one byte long, so your subtract is 1/4 the size it needs to be! Also, since the stack grows downward, you're storing the pointer to the end of the array instead of the beginning of it. NOTE: I've never actually done this before, so I'm figuring it out as I go too. This might be buggy or otherwise mistaken at points. (Personally, I prefer to use a static array sized to the max thing I'll probably need that I slice instead of alloca...) Here's some code that runs successfully (in 32 bit!): void vla(int n) { int[] arr; asm { mov EAX, [n]; // the first word in an array is the length, store that mov [arr], EAX; shl EAX, 2; // number of bytes == n * int.sizeof sub ESP, EAX; // allocate the bytes mov [arr + size_t.sizeof], ESP; // store the beginning of it in the arr.ptr } import std.stdio; writeln(arr.length); writeln(arr.ptr); // initialize the data... foreach(i, ref a; arr) a = i; writeln(arr); // and print it back out } void main() { vla(8); } This looks right.... but isn't, we changed the stack and didn't put it back. That's usually a no-no. If we disassemble the function, we can take a look at the end and see something scary: 8084ec6: e8 9d 6a 00 00 call 808b968 <_D3std5stdio15__T7writelnTAiZ7writelnFAiZv> // our final writeln call 8084ecb: 5e pop esi // uh oh 8084ecc: 5b pop ebx 8084ecd: c9 leave 8084ece: c3 ret Before the call to leave, which puts the stack back how it was at the beginning of the function - which saves us from a random EIP being restored upon the ret instruction - the compiler put in a few pop instructions. main() will have different values in esi and ebx than it expects! Running it in the debugger shows these values changed too: before (gdb) info registers [...] ebx 0xffffd4f4 -11020 [...] esi 0x80916e8 134813416 after ebx 0x1 1 esi 0x0 0 It popped the values of our array. According to the ABI: "EBX, ESI, EDI, EBP must be preserved across function calls." http://dlang.org/abi.html They are pushed for a reason - the compiler assumes they remain the same. In this little test program, nothing went wrong because no more code was run after vla returned. But, if we were using, say a struct, it'd probably fault when it tried to access `this`. It'd probably mess up other local variables too. No good! So, we'll need to store and restore the stack pointer... can we use the stack's push and pop instructions? Nope, we're changing the stack! Our own pop would grab the wrong data too. We could save it in a local variable. How do we restore it though? scope(exit) won't work, it won't happen at the right time and will corrupt the stack even worse. Gotta do it ourselves - which means we can't do the alloca even as a single mixin, since it needs code added before any return point too! (There might be other, better ways to do this... and indeed, there is, as we'll see later on. I peeked at the druntime source code and it does it differently. Continue reading...) Here's code that we can verify in the debugger leaves everything how it should be and doesn't crash: void vla(int n) { int[] arr; void* saved_esp; asm { mov EAX, [n]; mov [arr], EAX; shl EAX, 2; // number of bytes == n * int.sizeof // NEW LINE mov [saved_esp], ESP; // save it for later sub ESP, EAX; mov [arr + size_t.sizeof], ESP; } import std.stdio; writeln(arr.length); writeln(arr.ptr); foreach(i, ref a; arr) a = i; writeln(arr); // NEW LINE asm { mov ESP, [saved_esp]; } // restore it before we return } Note that this still isn't quite right - the allocated size should be aligned too. It works for the simple case of 8 ints since that's coincidentally aligned, but if we were doing like 3 bytes, it would mess things up. Gotta be rounded up to a multiple of 4 or 16 on some systems. hmm, I'm looking at the alloca source and there's a touch of a guard page on Windows too. Check out the file: dmd2/src/druntime/src/rt/alloca.d, it is written in mostly inline asm. Note the comment though: * This is a 'magic' function that needs help from the compiler to * work right, do not change its name, do not call it from other compilers. So, how does this compare with alloca? Let's make a really simple example to compare and contrast with malloc to make the asm more readable: import core.stdc.stdlib; void vla(int n) { int[] arr; arr = (cast(int*)alloca(n * int.sizeof))[0 .. n]; } Program runs, let's see the code. 0805f3f0 <_D3vla3vlaFiZv>: 805f3f0: 55 push ebp 805f3f1: 8b ec mov ebp,esp 805f3f3: 83 ec 10 sub esp,0x10 805f3f6: c7 45 f0 10 00 00 00 mov DWORD PTR [ebp-0x10],0x10 805f3fd: 89 45 fc mov DWORD PTR [ebp-0x4],eax 805f400: c7 45 f4 00 00 00 00 mov DWORD PTR [ebp-0xc],0x0 805f407: c7 45 f8 00 00 00 00 mov DWORD PTR [ebp-0x8],0x0 805f40e: 8b 45 fc mov eax,DWORD PTR [ebp-0x4] 805f411: 50 push eax 805f412: c1 e0 02 shl eax,0x2 805f415: 50 push eax 805f416: 8d 4d f0 lea ecx,[ebp-0x10] 805f419: e8 e2 01 00 00 call 805f600 <__alloca> 805f41e: 89 c1 mov ecx,eax 805f420: 83 c4 04 add esp,0x4 805f423: 58 pop eax 805f424: 89 45 f4 mov DWORD PTR [ebp-0xc],eax 805f427: 89 4d f8 mov DWORD PTR [ebp-0x8],ecx 805f42a: c9 leave 805f42b: c3 ret Change alloca to malloc: 0805f3f0 <_D3vla3vlaFiZv>: 805f3f0: 55 push ebp 805f3f1: 8b ec mov ebp,esp 805f3f3: 83 ec 0c sub esp,0xc 805f3f6: 89 45 fc mov DWORD PTR [ebp-0x4],eax 805f3f9: c7 45 f4 00 00 00 00 mov DWORD PTR [ebp-0xc],0x0 805f400: c7 45 f8 00 00 00 00 mov DWORD PTR [ebp-0x8],0x0 805f407: 8b 45 fc mov eax,DWORD PTR [ebp-0x4] 805f40a: 50 push eax 805f40b: c1 e0 02 shl eax,0x2 805f40e: 50 push eax 805f40f: e8 0c fc ff ff call 805f020 <malloc plt> 805f414: 89 c1 mov ecx,eax 805f416: 83 c4 04 add esp,0x4 805f419: 58 pop eax 805f41a: 89 45 f4 mov DWORD PTR [ebp-0xc],eax 805f41d: 89 4d f8 mov DWORD PTR [ebp-0x8],ecx 805f420: c9 leave 805f421: c3 ret Differences? We can see on line 3 that there's an extra word allocated for a local variable with alloca. It is loaded with the size of the local variables - 0x10. A pointer to that is passed to alloca. If we go back to the druntime source code: * This is adjusted upon return to reflect the additional * size of the stack frame. It is used in that function: // Copy down to [ESP] the temps on the stack. // The number of temps is (EBP - ESP - locals). // snip sub ECX,[EDX] ; // ECX = number of temps (bytes) to move. add [EDX],ESI ; // adjust locals by nbytes for next call to alloca() // snip rep ; movsd ; So, instead of restoring the stack pointer upon function return like I did, this copies the relevant data that was pushed onto the stack to the new location, so a subsequent pop will find what it expects, then it adjusts the hidden local size variable so next time, it can repeat the process. Cool - that's something my solution wouldn't have done super easily (it totally could, just don't overwrite that variable once it is initialized). I guess there is a better way than I had figured above :) We can use that same trick the compiler did by declaring a local variable and moving the magic __LOCAL_SIZE (see: http://dlang.org/iasm.html ) value into it up front, then calling alloca exactly as the C does. The implementation can be the same as from druntime too. That's why it is a magic function: it needs to put the stack how it expects, somehow. My way was to add a store. The way actually used in druntime is to store the size of the locals in a hidden variable. Either way, if you do an iasm alloca yourself, you'll have to account for it as well. Otherwise, remember to store the right pointer and allocate the right number of bytes and you've got it.
Dec 17 2014
parent "Foo" <Foo test.de> writes:
On Wednesday, 17 December 2014 at 16:10:40 UTC, Adam D. Ruppe 
wrote:
 On Wednesday, 17 December 2014 at 14:11:32 UTC, Foo wrote:
 	asm {
 		mov EAX, n;
 		mov [arr + 8], ESP;
 		sub [ESP], EAX;
 		mov [arr + 0], EAX;
 	}
 but that does not work...
That wouldn't work even with malloc.... remember, an integer more than one byte long, so your subtract is 1/4 the size it needs to be! Also, since the stack grows downward, you're storing the pointer to the end of the array instead of the beginning of it. NOTE: I've never actually done this before, so I'm figuring it out as I go too. This might be buggy or otherwise mistaken at points. (Personally, I prefer to use a static array sized to the max thing I'll probably need that I slice instead of alloca...) Here's some code that runs successfully (in 32 bit!): void vla(int n) { int[] arr; asm { mov EAX, [n]; // the first word in an array is the length, store that mov [arr], EAX; shl EAX, 2; // number of bytes == n * int.sizeof sub ESP, EAX; // allocate the bytes mov [arr + size_t.sizeof], ESP; // store the beginning of it in the arr.ptr } import std.stdio; writeln(arr.length); writeln(arr.ptr); // initialize the data... foreach(i, ref a; arr) a = i; writeln(arr); // and print it back out } void main() { vla(8); } This looks right.... but isn't, we changed the stack and didn't put it back. That's usually a no-no. If we disassemble the function, we can take a look at the end and see something scary: 8084ec6: e8 9d 6a 00 00 call 808b968 <_D3std5stdio15__T7writelnTAiZ7writelnFAiZv> // our final writeln call 8084ecb: 5e pop esi // uh oh 8084ecc: 5b pop ebx 8084ecd: c9 leave 8084ece: c3 ret Before the call to leave, which puts the stack back how it was at the beginning of the function - which saves us from a random EIP being restored upon the ret instruction - the compiler put in a few pop instructions. main() will have different values in esi and ebx than it expects! Running it in the debugger shows these values changed too: before (gdb) info registers [...] ebx 0xffffd4f4 -11020 [...] esi 0x80916e8 134813416 after ebx 0x1 1 esi 0x0 0 It popped the values of our array. According to the ABI: "EBX, ESI, EDI, EBP must be preserved across function calls." http://dlang.org/abi.html They are pushed for a reason - the compiler assumes they remain the same. In this little test program, nothing went wrong because no more code was run after vla returned. But, if we were using, say a struct, it'd probably fault when it tried to access `this`. It'd probably mess up other local variables too. No good! So, we'll need to store and restore the stack pointer... can we use the stack's push and pop instructions? Nope, we're changing the stack! Our own pop would grab the wrong data too. We could save it in a local variable. How do we restore it though? scope(exit) won't work, it won't happen at the right time and will corrupt the stack even worse. Gotta do it ourselves - which means we can't do the alloca even as a single mixin, since it needs code added before any return point too! (There might be other, better ways to do this... and indeed, there is, as we'll see later on. I peeked at the druntime source code and it does it differently. Continue reading...) Here's code that we can verify in the debugger leaves everything how it should be and doesn't crash: void vla(int n) { int[] arr; void* saved_esp; asm { mov EAX, [n]; mov [arr], EAX; shl EAX, 2; // number of bytes == n * int.sizeof // NEW LINE mov [saved_esp], ESP; // save it for later sub ESP, EAX; mov [arr + size_t.sizeof], ESP; } import std.stdio; writeln(arr.length); writeln(arr.ptr); foreach(i, ref a; arr) a = i; writeln(arr); // NEW LINE asm { mov ESP, [saved_esp]; } // restore it before we return } Note that this still isn't quite right - the allocated size should be aligned too. It works for the simple case of 8 ints since that's coincidentally aligned, but if we were doing like 3 bytes, it would mess things up. Gotta be rounded up to a multiple of 4 or 16 on some systems. hmm, I'm looking at the alloca source and there's a touch of a guard page on Windows too. Check out the file: dmd2/src/druntime/src/rt/alloca.d, it is written in mostly inline asm. Note the comment though: * This is a 'magic' function that needs help from the compiler to * work right, do not change its name, do not call it from other compilers. So, how does this compare with alloca? Let's make a really simple example to compare and contrast with malloc to make the asm more readable: import core.stdc.stdlib; void vla(int n) { int[] arr; arr = (cast(int*)alloca(n * int.sizeof))[0 .. n]; } Program runs, let's see the code. 0805f3f0 <_D3vla3vlaFiZv>: 805f3f0: 55 push ebp 805f3f1: 8b ec mov ebp,esp 805f3f3: 83 ec 10 sub esp,0x10 805f3f6: c7 45 f0 10 00 00 00 mov DWORD PTR [ebp-0x10],0x10 805f3fd: 89 45 fc mov DWORD PTR [ebp-0x4],eax 805f400: c7 45 f4 00 00 00 00 mov DWORD PTR [ebp-0xc],0x0 805f407: c7 45 f8 00 00 00 00 mov DWORD PTR [ebp-0x8],0x0 805f40e: 8b 45 fc mov eax,DWORD PTR [ebp-0x4] 805f411: 50 push eax 805f412: c1 e0 02 shl eax,0x2 805f415: 50 push eax 805f416: 8d 4d f0 lea ecx,[ebp-0x10] 805f419: e8 e2 01 00 00 call 805f600 <__alloca> 805f41e: 89 c1 mov ecx,eax 805f420: 83 c4 04 add esp,0x4 805f423: 58 pop eax 805f424: 89 45 f4 mov DWORD PTR [ebp-0xc],eax 805f427: 89 4d f8 mov DWORD PTR [ebp-0x8],ecx 805f42a: c9 leave 805f42b: c3 ret Change alloca to malloc: 0805f3f0 <_D3vla3vlaFiZv>: 805f3f0: 55 push ebp 805f3f1: 8b ec mov ebp,esp 805f3f3: 83 ec 0c sub esp,0xc 805f3f6: 89 45 fc mov DWORD PTR [ebp-0x4],eax 805f3f9: c7 45 f4 00 00 00 00 mov DWORD PTR [ebp-0xc],0x0 805f400: c7 45 f8 00 00 00 00 mov DWORD PTR [ebp-0x8],0x0 805f407: 8b 45 fc mov eax,DWORD PTR [ebp-0x4] 805f40a: 50 push eax 805f40b: c1 e0 02 shl eax,0x2 805f40e: 50 push eax 805f40f: e8 0c fc ff ff call 805f020 <malloc plt> 805f414: 89 c1 mov ecx,eax 805f416: 83 c4 04 add esp,0x4 805f419: 58 pop eax 805f41a: 89 45 f4 mov DWORD PTR [ebp-0xc],eax 805f41d: 89 4d f8 mov DWORD PTR [ebp-0x8],ecx 805f420: c9 leave 805f421: c3 ret Differences? We can see on line 3 that there's an extra word allocated for a local variable with alloca. It is loaded with the size of the local variables - 0x10. A pointer to that is passed to alloca. If we go back to the druntime source code: * This is adjusted upon return to reflect the additional * size of the stack frame. It is used in that function: // Copy down to [ESP] the temps on the stack. // The number of temps is (EBP - ESP - locals). // snip sub ECX,[EDX] ; // ECX = number of temps (bytes) to move. add [EDX],ESI ; // adjust locals by nbytes for next call to alloca() // snip rep ; movsd ; So, instead of restoring the stack pointer upon function return like I did, this copies the relevant data that was pushed onto the stack to the new location, so a subsequent pop will find what it expects, then it adjusts the hidden local size variable so next time, it can repeat the process. Cool - that's something my solution wouldn't have done super easily (it totally could, just don't overwrite that variable once it is initialized). I guess there is a better way than I had figured above :) We can use that same trick the compiler did by declaring a local variable and moving the magic __LOCAL_SIZE (see: http://dlang.org/iasm.html ) value into it up front, then calling alloca exactly as the C does. The implementation can be the same as from druntime too. That's why it is a magic function: it needs to put the stack how it expects, somehow. My way was to add a store. The way actually used in druntime is to store the size of the locals in a hidden variable. Either way, if you do an iasm alloca yourself, you'll have to account for it as well. Otherwise, remember to store the right pointer and allocate the right number of bytes and you've got it.
That is an awesome explanation! :) Thank you for your time, I will experiment with your code.
Dec 17 2014