digitalmars.D.learn - D on lm32-CPU: string argument on stack instead of register

Michael Reese (68/68) Jul 31 2020 Hi all,

Chad Joan (227/296) Jul 31 2020 Hi Michael!

Michael Reese (65/84) Aug 01 2020 Thanks a lot for the suggestions and explanations. I tried them

Chad Joan (38/76) Aug 01 2020 Nice find!

Michael Reese (22/45) Aug 04 2020 Right, I think at some point one should fix the backend. C

FeepingCreature (4/13) Aug 05 2020 Try -ffunction-sections -Wl,--gc-sections. That should remove all

Johan (13/17) Jul 31 2020 A D string is a "slice", which is a struct (pointer + length).

Michael Reese <michaelate gmail.com> writes:

Hi all,

at work we put embedded lm32 soft-core CPUs in FPGAs and write 
the firmware in C.
At home I enjoy writing small projects in D from time to time, 
but I don't consider myself a D expert.

Now, I'm trying to run some toy examples in D on the lm32 cpu. 
I'm using a recent gcc-elf-lm32. I succeeded in compiling and 
running some code and it works fine.

But I noticed, when calling a function with a string argument, 
the string is not stored in registers, but on the stack.
Consider a simple function (below) that writes bytes to a 
peripheral (that forwards the data to the host computer via USB). 
I've two versions, an ideomatic D one, and another version where 
pointer and length are two distinct function parameters.
I also show the generated assembly code. The string version is 4 
instructions longer, just because of the stack manipulation. In 
addition, it is also slower because it need to access the ram, 
and it needs more stack space.

My question: Is there a way I can tell the D compiler to use 
registers instead of stack for string arguments, or any other 
trick to reduce code size while maintaining an ideomatic D 
codestyle?

Best regards
Michael


// ideomatic D version
void write_to_host(in string msg) {
	// a fixed address to get bytes to the host via usb
	char *usb_slave = cast(char*)BaseAdr.ft232_slave;
	foreach(ch; msg) {
		*usb_slave = ch;
	}
}
// resulting assembly code (compiled with -Os) 12 instructions
_D10firmware_d13write_to_hostFxAyaZv:
	addi     sp, sp, -8
	addi     r3, r0, 4096
	sw       (sp+4), r1
	sw       (sp+8), r2
	add      r1, r2, r1
.L3:
	be     r2,r1,.L1
	lbu      r4, (r2+0)
	addi     r2, r2, 1
	sb       (r3+0), r4
	bi       .L3
.L1:
	addi     sp, sp, 8
	b        ra

// C-like version
void write_to_hostC(const char *msg, int len) {
	char *ptr = cast(char*)msg;
	char *usb_slave = cast(char*)BaseAdr.ft232_slave;
	while (len--) {
		*usb_slave = *ptr++;
	}
}
// resulting assembly code (compiled with -Os) 8 instructions
_D10firmware_d14write_to_hostCFxPaiZv:
	add      r2, r1, r2
	addi     r3, r0, 4096
.L7:
	be     r1,r2,.L5
	lbu      r4, (r1+0)
	addi     r1, r1, 1
	sb       (r3+0), r4
	bi       .L7
.L5:
	b        ra

Jul 31 2020

Chad Joan <chadjoan gmail.com> writes:

On Friday, 31 July 2020 at 10:22:20 UTC, Michael Reese wrote:
 Hi all,

 at work we put embedded lm32 soft-core CPUs in FPGAs and write 
 the firmware in C.
 At home I enjoy writing small projects in D from time to time, 
 but I don't consider myself a D expert.

 Now, I'm trying to run some toy examples in D on the lm32 cpu. 
 I'm using a recent gcc-elf-lm32. I succeeded in compiling and 
 running some code and it works fine.

 But I noticed, when calling a function with a string argument, 
 the string is not stored in registers, but on the stack.
 Consider a simple function (below) that writes bytes to a 
 peripheral (that forwards the data to the host computer via 
 USB). I've two versions, an ideomatic D one, and another 
 version where pointer and length are two distinct function 
 parameters.
 I also show the generated assembly code. The string version is 
 4 instructions longer, just because of the stack manipulation. 
 In addition, it is also slower because it need to access the 
 ram, and it needs more stack space.

 My question: Is there a way I can tell the D compiler to use 
 registers instead of stack for string arguments, or any other 
 trick to reduce code size while maintaining an ideomatic D 
 codestyle?

 Best regards
 Michael


 // ideomatic D version
 void write_to_host(in string msg) {
 	// a fixed address to get bytes to the host via usb
 	char *usb_slave = cast(char*)BaseAdr.ft232_slave;
 	foreach(ch; msg) {
 		*usb_slave = ch;
 	}
 }
 // resulting assembly code (compiled with -Os) 12 instructions
 _D10firmware_d13write_to_hostFxAyaZv:
 	addi     sp, sp, -8
 	addi     r3, r0, 4096
 	sw       (sp+4), r1
 	sw       (sp+8), r2
 	add      r1, r2, r1
 .L3:
 	be     r2,r1,.L1
 	lbu      r4, (r2+0)
 	addi     r2, r2, 1
 	sb       (r3+0), r4
 	bi       .L3
 .L1:
 	addi     sp, sp, 8
 	b        ra

 // C-like version
 void write_to_hostC(const char *msg, int len) {
 	char *ptr = cast(char*)msg;
 	char *usb_slave = cast(char*)BaseAdr.ft232_slave;
 	while (len--) {
 		*usb_slave = *ptr++;
 	}
 }
 // resulting assembly code (compiled with -Os) 8 instructions
 _D10firmware_d14write_to_hostCFxPaiZv:
 	add      r2, r1, r2
 	addi     r3, r0, 4096
 .L7:
 	be     r1,r2,.L5
 	lbu      r4, (r1+0)
 	addi     r1, r1, 1
 	sb       (r3+0), r4
 	bi       .L7
 .L5:
 	b        ra

Hi Michael!

Last time I checked, D doesn't have any specific type attributes 
or special ways to force variables to enregister. But I could be 
poorly informed. Maybe there are GDC-specific hints or something. 
I hope that if anyone else knows better, they will toss in an 
answer.

THAT SAID, I think there are things to try and I hope we can get 
you what you want.

If you're willing to entertain more experimentation, here are my 
thoughts:

---------------------------------------
(1) Try writing "in string" as "in const(char)[]" instead:

// ideomatic D version
void write_to_host(in const(char)[] msg) {
	// a fixed address to get bytes to the host via usb
	char *usb_slave = cast(char*)BaseAdr.ft232_slave;
	foreach(ch; msg) {
		*usb_slave = ch;
	}
}

Explanation:

The "string" type is an alias for "immutable(char)[]".

In D, "immutable" is a stronger guarantee than "const". The 
"const" modifier, like in C, tells the compiler that this 
function shall not modify the data referenced by this 
pointer/array/whatever. The "immutable" modifier is a bit 
different, as it says that NO ONE will modify the data referenced 
by this pointer/array/whatever, including other functions that 
may or may not be concurrently executing alongside the one you're 
in. So "const" constraints the callee, while "immutable" 
constrains both the callee AND the caller. This makes it more 
useful for some multithreaded code, because if you can accept the 
potential inefficiency of needing to do more copying of data (if 
you can't modify, usually you must copy instead), then you can 
have more deterministic behavior and sometimes even much better 
total efficiency by way of parallelization. This might not be a 
guarantee you care about though, at which point you can just toss 
it out completely and see if the compiler generates better code 
now that it sees the same type qualifier as in the other example.

I'd actually be surprised if using "immutable" causes /less/ 
efficient code in this case, because it should be even /safer/ to 
use the argument as-is. But it IS a difference between the two 
examples, and one that might not be benefiting your cause (though 
that's totally up to you).

---------------------------------------
(2) Try keeping the string argument, but make the function more 
closely identical in semantics:

// ideomatic D version
void write_to_host(string msg) {
	// a fixed address to get bytes to the host via usb
	char *usb_slave = cast(char*)BaseAdr.ft232_slave;
	while(msg.length > 0) {
		*usb_slave = msg[0];
		msg = msg[1 .. $];
	}
}

Explanation:

First of all, I wouldn't expect you to keep this, especially if 
you need utf-8 autodecoding behavior (more on that later). But it 
might be revealing if this leads to different assembly output.

The idea behind this one is to see if the regression is actually 
caused by the foreach construct, rather than the parameter type. 
I did have to change the parameter slightly by removing the "in" 
qualifier. It shouldn't make much difference though, because the 
'string' type's pointer and length are copied from the caller, so 
any modifications to "msg" (that don't affect "msg"'s array 
elements) will be contained within the function and will not be 
observable anywhere else. In other words, the "in" qualifier is 
largely redundant with "string"'s immutability guarantees plus 
function argument copying semantics.

---------------------------------------
(3) Try a different type of while-loop in the D-style version:

// ideomatic D version
void write_to_host(in string msg) {
	// a fixed address to get bytes to the host via usb
	char *usb_slave = cast(char*)BaseAdr.ft232_slave;
	size_t i = 0;
	while(i < msg.length) {
		*usb_slave = msg[i++];
	}
}

Explanation:


variable, so I don't have high hopes. But the compiler might 
optimize that out and make it look like the C-style version. 
Again, I don't expect you to use this version if it discards one 
of D's features that you hope to use, but it might at least help 
you identify where your expenses are coming from.

---------------------------------------
(4) Try having these examples use "const ubyte* msg" and 
"immutable(ubyte)[] msg" instead of "const char* msg" and "string 
msg".

// ideomatic D version
void write_to_host(in immutable(ubyte)[] msg) {
	// a fixed address to get bytes to the host via usb
	ubyte *usb_slave = cast(ubyte*)BaseAdr.ft232_slave;
	foreach(ch; msg) {
		*usb_slave = ch;
	}
}

// C-like version
void write_to_hostC(const ubyte *msg, int len) {
	ubyte *ptr = cast(ubyte*)msg;
	ubyte *usb_slave = cast(ubyte*)BaseAdr.ft232_slave;
	while (len--) {
		*usb_slave = *ptr++;
	}
}

Explanation:

The "string" type is an alias for "immutable(char)[]", which 
seems like it would be very similar to "immutable(ubyte)[]", but 
the 'char' element type communicates a requirement that the 
'ubyte' element type does not: utf-8 awareness. And that can have 
a cost.

In D, char[] arrays are defined as containing utf-8 text. This is 
rather different from C, where the 'char' type is more like D's 
'byte' or 'ubyte' types and just happens to also be used to store 
text data in any encoding the author feels like. When I see 
"foreach(ch; msg)" and msg's element type is "char", then I 
expect "ch" to be of type 'dchar' (instead of 'char') and I 
expect the foreach loop to auto-decode the utf-8 text in the 
string (or immutable(char)[]) type into whole unicode codepoints 
that are then placed into the 'dchar'. If you are only dealing 
with ASCII text (or any 8-bit-or-less encoding that isn't utf-8), 
then you may just want to use the 'byte' or 'ubyte' types 
instead. In everyday D, this changes the semantics of the foreach 
loop, because no autodecoding is done on types like byte[] or 
ubyte[], and it may "behave" (from an implementor perspective) 
more like the while-loop in your second example.

You probably won't see a lot of text-processing through byte[] or 
ubyte[] in normal D code, but that's because most programmers 
will want their programs to be able to process utf-8 text, while 
in the embedded programming space you might not have to worry 
about utf-8 at all.

Now, I actually didn't see any autodecoding of utf-8 in the 
assembly you posted. Maybe I could be wrong though; I am not 
experienced in lm32 assembly. Nonetheless, I'd expect to seem 
some sort of conditional call or, at the very least, some kind of 
masking of the highest bit of every char (to detect utf-8 
sequences). Maybe it's a bug in your (cross?) compiler, or even 
just an intentional configuration choice that I didn't expect. At 
any rate, I don't think your code is larger or less efficient due 
to utf-8 decoding, because I don't see the utf-8 decoding.

Still, I'm curious to see if changing up the types causes the 
compiler to choose different codepaths for its codegen, even for 
inane reasons. Maybe the autodecoding is turned off, but it still 
thinks it needs to allocate extra space for the autodecoder's 
"dchar" or something, and then that exceeds some threshold for 
passing enregistered arguments. Maybe for similar reasons it 
thinks it needs to keep a copy of that string around. Compilers 
are mysterious beasts sometimes. *shrug*

---------------------------------------
(5) And for maximum curiousity, what happens if you write the 
C-like version this way instead?

// C-like version
// msg parameter change: "const char *msg" -> "const(char)* msg"
void write_to_hostC(const(char)* msg, int len) {
	// cast() statement removed.
	char *usb_slave = cast(char*)BaseAdr.ft232_slave;
	while (len--) {
		*usb_slave = *msg++;
	}
}

Explanation:

I realize the difference is subtle, but "const char *msg" says 
that both the pointed-to chars can't be modified and also that 
the /pointer itself/ cannot be modified. In the other case, with 
"const(char)* msg", the constraint is looser but still very 
useful: the pointed-to chars can't be modified, but the pointer 
can be modified. Because the pointer (but not the referred data) 
is a copy of the caller's pointer, any modifications to the 
pointer (increments and such) are only visible within the scope 
of this function.

The C-like version is already the more optimal one, but if making 
this change causes it to regress to generating assembly similar 
to the D-like version, then it might suggest that the additional 
assignment statement is actually helpful somehow. It'd be 
unintuitive, but you never know.

---------------------------------------

extended the D-idiomatic-version's immutability guarantee to the 
whole array value and not just the array elements?

// ideomatic D version
void write_to_host(immutable(char[]) msg) {
	// a fixed address to get bytes to the host via usb
	char *usb_slave = cast(char*)BaseAdr.ft232_slave;
	foreach(ch; msg) {
		*usb_slave = ch;
	}
}

And to make it even more like the C-style version without being 
C-style, it might also be worth stacking it with the 
immutable->const change:

// ideomatic D version
void write_to_host(const(char[]) msg) {
	// a fixed address to get bytes to the host via usb
	char *usb_slave = cast(char*)BaseAdr.ft232_slave;
	foreach(ch; msg) {
		*usb_slave = ch;
	}
}

After all, if the original C-style version isn't allowed to 
change its argument's pointer, then we could try making the 
D-idiomatic version behave that way too, and see if this minor 
alteration makes the difference.

---------------------------------------

Just to be safe, I also want to point out the difference between 
"char *ptr" and "char* ptr": in a single-variable declaration, 
there is none, but if there is more than one, the pointer binds 
more strongly to the type than to the variable in D.

Consider a declaration like:

char* str0, str1;

In C, this would make str0 a pointer, and str1 a char.
In D, this means that both str0 and str1 are pointers.

Thus, in D, it is more conventional to write the * character next 
to the type than it is to write it next to the 
variable/identifier. This reinforces the notion that pointer-ness 
is (syntactically) part of the type, rather than part of the 
variables.

There's a similar example in this article:
https://dlang.org/blog/2018/10/17/interfacing-d-with-c-arrays-part-1/

If you already knew that, don't mind me. I realize that a lot of 
C code gets copied into D without changing this thing, and unless 
there are multiple variables in the same declaration, it really 
doesn't matter.



Good luck with your lm32/FPGA coding. That sounds like cool stuff!

Jul 31 2020

Michael Reese <michaelate gmail.com> writes:

On Friday, 31 July 2020 at 15:13:29 UTC, Chad Joan wrote:
 THAT SAID, I think there are things to try and I hope we can 
 get you what you want.

 If you're willing to entertain more experimentation, here are 
 my thoughts:

Thanks a lot for the suggestions and explanations. I tried them 
all but the only time I got different assembly code was your 
suggestion (3) which produced the following.

 (3) Try a different type of while-loop in the D-style version:

 // ideomatic D version
 void write_to_host(in string msg) {
 	// a fixed address to get bytes to the host via usb
 	char *usb_slave = cast(char*)BaseAdr.ft232_slave;
 	size_t i = 0;
 	while(i < msg.length) {
 		*usb_slave = msg[i++];
 	}
 }

_D10firmware_d13write_to_hostFAyaZv:
	addi     sp, sp, -8
	addi     r4, r0, 4096
	sw       (sp+8), r2
	sw       (sp+4), r1
	addi     r2, r0, 0
.L3:
	or       r5, r2, r0
	be     r2,r1,.L1
	lw       r3, (sp+8)
	addi     r2, r2, 1
	add      r3, r3, r5
	lbu      r3, (r3+0)
	sb       (r4+0), r3
	bi       .L3
.L1:
	addi     sp, sp, 8
	b        ra

 At any rate, I don't think your code is larger or less 
 efficient due to utf-8 decoding, because I don't see the utf-8 
 decoding.

Agreed, I think there's no code for autodecoding being generated.

I did some more experiments:
Trying to put pointer and length into a struct and pass that to 
the function. Same result, the argument ended up on the stack.

Then, I wrote the function in C and compiled it with the 
C-compiler of lm32-elf-gcc. It also puts the 64-bit POD structure 
on the stack. It seems the only arguments passed in registers are 
primitive data types. However, if I pass a uint64_t argument, it 
is registered using registers r1 and r2. So the compiler knows 
how to use r1 and r2 for arguments. I checked again in the lm32 
manual 
(https://www.latticesemi.com/view_document?document_id=52077), 
and it says:

"As illustrated in Table 3 on page 8, the first eight function 
arguments are
passed in registers. Any remaining arguments are passed on the 
stack, as
illustrated in Figure 12."

So strings and structs should be passed on the stack and this 
seems to be more an issue of the gcc lm32 backend than a D issue.

But I just found a workaround using a wrapper function.

void write_to_host(in string msg) {
         write_to_hostC(msg.ptr, msg.length);
}

I checked the assembly code on the caller side, and the call 
write_to host("Hello, World!\n") is inlined. There is only one 
call to write_to_hostC. This is still not nice, but I think I can 
live with that for now. Now I have to figure out how make the 
cast back from from pointer-length pair into a string. I'm sure I 
read that somewhere before, but forgot it and was unable to find 
it now on a quick google search...
And since this is D: is there maybe some CTFE magic that allows 
to create these wrappers automatically? Somthing like

fix_stack!write_to_host("Hello, World!\n");


 Good luck with your lm32/FPGA coding. That sounds like cool 
 stuff!

I'm doing this mainly to improve my understanding of how embedded 
processors work, and how to write linker scripts for a given 
environment. Although I do have actual hardware where I can see 
if everything runs in the real world, I mainly use simulations. 
The coolest thing in my opinion is, nowadays it can be done using 
only open source tools (mainly ghdl and verilator, the lm32 
source code is open source, too). The complete system is 
simulated and you can look at every logic signal in the cpu or in 
the ram while the program executes.

Aug 01 2020

Chad Joan <chadjoan gmail.com> writes:

On Saturday, 1 August 2020 at 08:58:03 UTC, Michael Reese wrote:
 [...]
 So the compiler knows how to use r1 and r2 for arguments. I 
 checked again in the lm32 manual 
 (https://www.latticesemi.com/view_document?document_id=52077), 
 and it says:

 "As illustrated in Table 3 on page 8, the first eight function 
 arguments are
 passed in registers. Any remaining arguments are passed on the 
 stack, as
 illustrated in Figure 12."

 So strings and structs should be passed on the stack and this 
 seems to be more an issue of the gcc lm32 backend than a D 
 issue.

Nice find!

Though if the compiler is allowed to split a single uint64_t into 
two registers, I would expect it to split struct/string into two 
registers as well. At least, the manual doesn't seem to 
explicitly mention higher-level constructs like structs. It does 
suggest a one-to-one relationship between arguments and registers 
(up to a point), but GCC seems to have decided otherwise for 
certain uint64_t's. (Looking at Table 3...) It even gives you two 
registers for a return value: enough for a string or an array. 
And if the backend/ABI weren't up for it, it would be 
theoretically possible to have the frontend to lower strings 
(dynamic arrays) and small structs into their components before 
function calls and then also insert code on the other side to 
cast them back into their original form. I'm not sure if anyone 
would want to write it, though. o.O

 But I just found a workaround using a wrapper function.

 void write_to_host(in string msg) {
         write_to_hostC(msg.ptr, msg.length);
 }

 I checked the assembly code on the caller side, and the call 
 write_to host("Hello, World!\n") is inlined. There is only one 
 call to write_to_hostC. This is still not nice, but I think I 
 can live with that for now. Now I have to figure out how make 
 the cast back from from pointer-length pair into a string. I'm 
 sure I read that somewhere before, but forgot it and was unable 
 to find it now on a quick google search...

That's pretty clever. I like it.

Getting from pointer-length to string might be pretty easy:
string foo = ptr[0 .. len];

D allows pointers to be indexed, like in C. But unlike C, D has 
slices, and pointers can be "sliced". The result of a slice 
operation, at least for primitive arrays+pointers, is always an 
array (the "dynamic" ptr+length kind). Hope that helps.


 And since this is D: is there maybe some CTFE magic that allows 
 to create these wrappers automatically? Somthing like

 fix_stack!write_to_host("Hello, World!\n");

It ended up being a little more complicated than I thought it 
would be. Hope I didn't ruin the fun. ;)

https://pastebin.com/y6e9mxre


Also, that part where you mentioned a 64-bit integer being passed 
as a pair of registers made me start to wonder if unions could be 
(ab)used to juke the ABI:

https://pastebin.com/eGfZN0SL


 Good luck with your lm32/FPGA coding. That sounds like cool 
 stuff!

 I'm doing this mainly to improve my understanding of how 
 embedded processors work, and how to write linker scripts for a 
 given environment. Although I do have actual hardware where I 
 can see if everything runs in the real world, I mainly use 
 simulations. The coolest thing in my opinion is, nowadays it 
 can be done using only open source tools (mainly ghdl and 
 verilator, the lm32 source code is open source, too). The 
 complete system is simulated and you can look at every logic 
 signal in the cpu or in the ram while the program executes.

Thanks for the insights; I've done just a little hobby electrical 
stuff here-and-there, and having some frame of reference for tool 
and component choice makes me feel good, even if I don't plan on 
buying any lm32s or FPGAs anytime soon :)  Maybe I can Google 
some of that later and geek out at images of other people's 
debugging sessions or something. I'm curious how they manage the 
complexity that happens when circuits and massive swarms of logic 
gates do their, uh, complexity thing. o.O

Aug 01 2020

Michael Reese <michaelate gmail.com> writes:

On Saturday, 1 August 2020 at 23:08:38 UTC, Chad Joan wrote:
 Though if the compiler is allowed to split a single uint64_t 
 into two registers, I would expect it to split struct/string 
 into two registers as well. At least, the manual doesn't seem 
 to explicitly mention higher-level constructs like structs. It 
 does suggest a one-to-one relationship between arguments and 
 registers (up to a point), but GCC seems to have decided 
 otherwise for certain uint64_t's. (Looking at Table 3...) It 
 even gives you two registers for a return value: enough for a 
 string or an array. And if the backend/ABI weren't up for it, 
 it would be theoretically possible to have the frontend to 
 lower strings (dynamic arrays) and small structs into their 
 components before function calls and then also insert code on 
 the other side to cast them back into their original form. I'm 
 not sure if anyone would want to write it, though. o.O

Right, I think at some point one should fix the backend. C 
programs would also benefit from it when passing structs as 
arguments. However in C it is more common to just pass pointers 
and they go into registers. I guess this is why I never noticed 
before that struct passing is needlessly expensive.

 Getting from pointer-length to string might be pretty easy:
 string foo = ptr[0 .. len];

Ah cool! I did know about array slicing, but wasn't aware that it 
works on pointers, too.

 It ended up being a little more complicated than I thought it 
 would be. Hope I didn't ruin the fun. ;)

 https://pastebin.com/y6e9mxre

Thanks :) I'll have to look into that more closely. But this is 
the kind of stuff that I hope to make use of in the future on the 
embedded CPU. But for now I cannot use it yet because I don't 
have phobos and druntime in my toolchain right now... just naked 
D.

 Also, that part where you mentioned a 64-bit integer being 
 passed as a pair of registers made me start to wonder if unions 
 could be (ab)used to juke the ABI:

 https://pastebin.com/eGfZN0SL

Thanks for suggesting! I tried, and the union works as well, i.e. 
the function args are registered. But I noticed another thing 
about all workarounds so far:
Even if calls are inlined and arguments end up on the stack, the 
linker puts code of the wrapper function in my final binary event 
if it is never explicitly called. So until I find a way to strip 
of uncalled functions from the binary (not sure the linker can do 
it), the workarounds don't solve the size problem. But they still 
make the code run faster.

Aug 04 2020

FeepingCreature <feepingcreature gmail.com> writes:

On Tuesday, 4 August 2020 at 17:36:53 UTC, Michael Reese wrote:
 Thanks for suggesting! I tried, and the union works as well, 
 i.e. the function args are registered. But I noticed another 
 thing about all workarounds so far:
 Even if calls are inlined and arguments end up on the stack, 
 the linker puts code of the wrapper function in my final binary 
 event if it is never explicitly called. So until I find a way 
 to strip of uncalled functions from the binary (not sure the 
 linker can do it), the workarounds don't solve the size 
 problem. But they still make the code run faster.

Try -ffunction-sections -Wl,--gc-sections. That should remove all 
unreferenced functions. It removes all unreferenced sections, and 
writes every function into a separate section.

Aug 05 2020

Johan <j j.nl> writes:

On Friday, 31 July 2020 at 10:22:20 UTC, Michael Reese wrote:
 My question: Is there a way I can tell the D compiler to use 
 registers instead of stack for string arguments, or any other 
 trick to reduce code size while maintaining an ideomatic D 
 codestyle?

A D string is a "slice", which is a struct (pointer + length). 
Depending on the function call ABI, structs are passed in 
registers or on the stack.
On x86, the D calling convention is to put small POD structs in 
registers, similar to the C++ calling convention.
I don't know whether GDC has attributes to change the calling 
convention of functions (besides extern(C/C++/Windows/etc.)), but 
that's where you'd need to look.
Otherwise, file a bug with GDC. Slice arguments are common enough 
to require enregistering them in the D calling convention on your 
lm32 platform.

-Johan

Jul 31 2020

D Programming

C/C++ Programming

Other

digitalmars.D.learn - D on lm32-CPU: string argument on stack instead of register