www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - Is there a way to get a list of functions that get inlined by dmd?

reply Trass3r <un known.com> writes:
Would be interesting.
Feb 08 2010
parent reply Scorn <scorn trash-mail.com> writes:
Trass3r schrieb:
 Would be interesting.
Yes, this would be very interesting indeed. A list of the rules which dmd uses internally for inlining functions and methods similiar to this (which is for .NET) http://blogs.msdn.com/ericgu/archive/2004/01/29/64717.aspx would be really nice. Sure you can figure this out on your own by studying the compiler sources but a simple list of rules (does not have to be exhaustive) on http://www.digitalmars.com/d/1.0/lex.html or on a seperate page regarding optimizations for d would be very appreciated. The only things i figured out so far is that functions across modules do not seem to get inlined (i don't know if this is the case in general) which would be really bad. Another thing which i believe is that functions / methods which contain ref parameters are never inlined at all (which is again really annoying since it's not a clever way passing huge structs by value).
Feb 08 2010
next sibling parent reply bearophile <bearophileHUGS lycos.com> writes:
Scorn:
 The only things i figured out so far is that functions across modules do
 not seem to get inlined (i don't know if this is the case in general)
 which would be really bad. Another thing which i believe is that
 functions / methods which contain ref parameters are never inlined at
 all (which is again really annoying since it's not a clever way passing
 huge structs by value).
Use LDC (D1), you will note a significant improvement over DMD. Bye, bearophile
Feb 08 2010
next sibling parent Trass3r <un known.com> writes:
 Use LDC (D1), you will note a significant improvement over DMD.
I believe that, but... D1. Also LDC doesn't seem to get much attention (development-wise) recently.
Feb 08 2010
prev sibling next sibling parent reply Scorn <scorn trash-mail.com> writes:
bearophile schrieb:
 Scorn:
 The only things i figured out so far is that functions across modules do
 not seem to get inlined (i don't know if this is the case in general)
 which would be really bad. Another thing which i believe is that
 functions / methods which contain ref parameters are never inlined at
 all (which is again really annoying since it's not a clever way passing
 huge structs by value).
Use LDC (D1), you will note a significant improvement over DMD. Bye, bearophile
Hi bearophile. Thanks for your advice (i might try out ldc in the future but now i need phobos and not tango as standard library). At the moment i am using gdc where i nearly always see a significant improvement regarding the speed of the produced code when comparing to dmd. But since all three dmd, ldc and gdc use the same frontend, the question from Trass3r still remains: Under which conditions are functions/methods inlined ? Do you know if ldc inlines functions/methods across modules ? (dmd doesn't seem to do it and neither does gdc). And that is bad since for an actual project i have made a separate module with a lot of small utility math functions which should be inlined but don't because of this. When i inline them using mixins or manually i get an overall speed up of about 20%.
Feb 08 2010
parent reply bearophile <bearophileHUGS lycos.com> writes:
Scorn:

 Hi bearophile. Thanks for your advice (i might try out ldc in the future
 but now i need phobos and not tango as standard library).
I like DMD for some of the D2 features, for its speed, for allowing exceptions to be used on Windows too, for its built-in profiler and code coverage analyser that are not present in LDC, and for other small things, but I like how LDC feels more like a real-world compiler, it has smaller features like force_inline that look like coming out of a more practical compiler. When I optimize code on LDC, I see predictable improvements of the performance, while with DMD it's like a shoot in the dark, and I usually have to avoid several tweaks of the code, otherwise I get a negative improvement.
 But since all three dmd, ldc and gdc use the same frontend, the question
 from Trass3r still remains:
 Under which conditions are functions/methods inlined ?
I think LDC doesn't use the inliner of the front-end and just uses the much better inliner of the back-end. So the inling rules are probably all different (but in theory the front-end knows more about the D semantics, so LDC has to work even more to regain the lost semantics). programmers don't look obsessed with performance, this is often positive), but it's not easy to find them on most C/C++ compilers I know of. Have you ever seen the exact inlining rules of C code compiled with GCC 4.4.3?
 Do you know if ldc inlines functions/methods across modules ? (dmd
 doesn't seem to do it and neither does gdc).
I have just done a test, normally LDC is not able to inline across modules. This is a shitty situation. But with LDC you can perform Link-Time Optimization too (but you have to ask for it!), that in my test I've just seen is able to inline across modules.
 And that is bad since for an actual project i have made a separate
 module with a lot of small utility math functions which should be
 inlined but don't because of this. When i inline them using mixins or
 manually i get an overall speed up of about 20%.
If you explain this problem to Walter he will surely tell you that such inlining can't be done because mumble mumble separate compilation mumble mumble was done fifteen years ago mumble mumble mumble (even if LDC is currently doing it) :-) Good luck, bearophile
Feb 09 2010
parent reply Scorn <scorn trash-mail.com> writes:
 Scorn:
 
 Hi bearophile. Thanks for your advice (i might try out ldc in the future
 but now i need phobos and not tango as standard library).
I like DMD for some of the D2 features, for its speed, for allowing exceptions to be used on Windows too, for its built-in profiler and code coverage analyser that are not present in LDC, and for other small things, but I like how LDC feels more like a real-world compiler, it has smaller features like force_inline that look like coming out of a more practical compiler. When I optimize code on LDC, I see predictable improvements of the performance, while with DMD it's like a shoot in the dark, and I usually have to avoid several tweaks of the code, otherwise I get a negative improvement.
I absolutely understand this :-) That is why i am using gdc at the moment and not dmd (with all of the very nice gcc optimizations flags for the gcc backend which improve speed substantially when comparing to dmd)
 But since all three dmd, ldc and gdc use the same frontend, the question
 from Trass3r still remains:
 Under which conditions are functions/methods inlined ?
I think LDC doesn't use the inliner of the front-end and just uses the much better inliner of the back-end. So the inling rules are probably all different (but in theory the front-end knows more about the D semantics, so LDC has to work even more to regain the lost semantics).
That would be nice. You nearly convinced me porting my project to tango and ldc.


programmers don't look obsessed with performance, this is often positive), but
it's not easy to find them on most C/C++ compilers I know of.
application-development language and it's very good for these kind of things. It's not meant to be used for high-performance numerical computing but astonishingly it is sometimes not so bad when you use it for this purpose. But D as a system-programming language with the ambition to be an alternative to C / C++ has, at least in the long term, to compete with C / C++ regarding these things. So when i consider using D for a project i do it because i am looking makes me always very sad when i figure out that i can't use D successfully because of little flaws like this. And that means another chance of using D for a project is lost. That's why i like your posts about benchmarking so much. They might not be liked by parts of the community because they put the hook where it hurts most and might be annoying sometimes but please carry on :-) (you have at least one fan here :-) ). And please make a collection of all the issues you have found so far and but them on a web page somewhere.
 Have you ever seen the exact inlining rules of C code compiled with GCC 4.4.3?
 
No i have not seen yet the exact inlining rules of C code compiled with GCC 4.4.3. But i don't need them and i don't care. Why ? Because gcc does its job well without me knowing anything about the heuristics it uses for inlining. For D it's different. I have found so many times that i can increase the speed of the generate code in D tremendously by just inlining small pieces of code manually (or using mixins). Something i have never found using any C++ compiler (they do it so much better than i ever could). So i think that's why Trass3r and i are so interested in the rules D uses for inlining.
 
 Do you know if ldc inlines functions/methods across modules ? (dmd
 doesn't seem to do it and neither does gdc).
I have just done a test, normally LDC is not able to inline across modules. This is a shitty situation.
You are absolutely right here. That is a shitty situation for all D compilers. Because it basically means that your are not able to do decent software engineering practices (a clean separation of functionality in different modules) without sacrifying performance. That is a situation you have in no other language i know of and this is a big design flaw of the module system making it nearly senseless.
 But with LDC you can perform Link-Time Optimization too (but you have to ask
for it!), that in my test I've just seen is able to inline across modules.
 
Which compiler switches do you need for this ?
 
 And that is bad since for an actual project i have made a separate
 module with a lot of small utility math functions which should be
 inlined but don't because of this. When i inline them using mixins or
 manually i get an overall speed up of about 20%.
If you explain this problem to Walter he will surely tell you that such inlining can't be done because mumble mumble separate compilation mumble mumble was done fifteen years ago mumble mumble mumble (even if LDC is currently doing it) :-)
:-) (Big smile.) Yes. That's what i would expect. But the whole thing is easy solvable by not doing just a separate compilation of the modules and then linking them together but by including the source code content of a module file in another module file which does import it (like a C or C++ compiler includes files). Then every function / method needed is visible while compiling the module and the compiler can decide by using its heuristics if a function / method is worth inlining or not. The compile times might be a bit longer but since the grammar of D is much simpler than C++ it would not hurt compile time so much.
 
 Good luck,
 bearophile
Thanks. Good luck for you too. But for this project i think i will abandon D and stick to C++ again (even if i hate to do so) because the only options would be using mixins all over the place or put everything in one big file (and both solutions are ugly). Thanks for your help.
Feb 09 2010
parent reply bearophile <bearophileHUGS lycos.com> writes:
Scorn:

But D as a system-programming language with the ambition to be an alternative
to C / C++ has, at least in the long term, to compete with C / C++ regarding
these things.<
I understand. The problem is that D is a new language with new compilers, so it can't optimize as well as GCC that is compiling C code for so many years. LDC compiles C-like D1 code well enough, about as well as GCC or better.
And please make a collection of all the issues you have found so far and but
them on a web page somewhere.<
http://www.fantascienza.net/leonardo/js/slow_d.zip
Which compiler switches do you need for this ?<
Found after few hours of tests of mine plus a suggestion from the LLVM lead developer :-) For example you have a "temp.d" main module and a "mo.d" imported module: ldc -O5 -release -inline -output-bc temp.d ldc -O5 -release -inline -output-bc mo.d opt -std-compile-opts temp.bc > tempo.bc opt -std-compile-opts mo.bc > moo.bc llvm-ld -L/usr/lib/d -native -ltango-ldc -ldl -lm -lpthread -internalize-public-api-list=_Dmain -o=tempo tempo.bc moo.bc
But the whole thing is easy solvable by not doing just a separate compilation
of the modules and then linking them together<
If you take a look at my precedent post ( http://www.digitalmars.com/webnews/newsgroups.php?art_group=digitalmars.D.le rn&article_id=18822 ) you can see I have found DMD does inline functions from other modules (while LDC doesn't do it, so I have to report this to the ldc devs and not to Walter). Bye, bearophile
Feb 09 2010
next sibling parent reply bearophile <bearophileHUGS lycos.com> writes:
(while LDC doesn't do it, so I have to report this to the ldc devs and not to
Walter).<
larsivi and Deew have told me that compiling with -inline -singleobj solves the problem with ldc, and my test shows it's true. Bye, bearophile
Feb 09 2010
parent Scorn <scorn trash-mail.com> writes:
bearophile schrieb:
 (while LDC doesn't do it, so I have to report this to the ldc devs and not to
Walter).<
larsivi and Deew have told me that compiling with -inline -singleobj solves the problem with ldc, and my test shows it's true. Bye, bearophile
Should that not be a default then for the ldc compiler when compiling in release mode ?
Feb 09 2010
prev sibling parent Scorn <scorn trash-mail.com> writes:
bearophile schrieb:
 Scorn:
 
 But D as a system-programming language with the ambition to be an alternative
to C / C++ has, at least in the long term, to compete with C / C++ regarding
these things.<
I understand. The problem is that D is a new language with new compilers, so it can't optimize as well as GCC that is compiling C code for so many years. LDC compiles C-like D1 code well enough, about as well as GCC or better.
Sure. And that is something i do not expect yet. Since Walther is doing nearly all the compiler work alone, i would never ever blame him for anything. And it's not that i am not willing to sacrifice speed for the productivity gains which programming in D gets me. But it's always annoying when simple things like this do not seem to work. Things which work in virtually every other compiled language out there. This sometimes costs a lot of confidence in the compiler.
 And please make a collection of all the issues you have found so far and but
them on a web page somewhere.<
http://www.fantascienza.net/leonardo/js/slow_d.zip
This test code is very nice. But some kind of web page where you a collection of all the performance problems you found so far in the D compiler would be great. So nothing of your research would get lost.
 
 Which compiler switches do you need for this ?<
Found after few hours of tests of mine plus a suggestion from the LLVM lead developer :-) For example you have a "temp.d" main module and a "mo.d" imported module: ldc -O5 -release -inline -output-bc temp.d ldc -O5 -release -inline -output-bc mo.d opt -std-compile-opts temp.bc > tempo.bc opt -std-compile-opts mo.bc > moo.bc llvm-ld -L/usr/lib/d -native -ltango-ldc -ldl -lm -lpthread -internalize-public-api-list=_Dmain -o=tempo tempo.bc moo.bc
 But the whole thing is easy solvable by not doing just a separate compilation
of the modules and then linking them together<
If you take a look at my precedent post ( http://www.digitalmars.com/webnews/newsgroups.php?art_group=digitalmars.D.le rn&article_id=18822 ) you can see I have found DMD does inline functions from other modules (while LDC doesn't do it, so I have to report this to the ldc devs and not to Walter).
These are good news. But hopefully the compiler does it not only for your test case but tries always to inline functions across modules when it is worth it (which was the purpose of the original question). Is it constrained by the size of the function or the parameters. Trass3r did some nice research regarding this.
 Bye,
 bearophile
Thanks for your hints, Bye.
Feb 09 2010
prev sibling parent reply "Nick Sabalausky" <a a.a> writes:
"bearophile" <bearophileHUGS lycos.com> wrote in message 
news:hkpiai$2kt6$1 digitalmars.com...
 Scorn:
 The only things i figured out so far is that functions across modules do
 not seem to get inlined (i don't know if this is the case in general)
 which would be really bad. Another thing which i believe is that
 functions / methods which contain ref parameters are never inlined at
 all (which is again really annoying since it's not a clever way passing
 huge structs by value).
Use LDC (D1), you will note a significant improvement over DMD.
Unless you're on windows.
Feb 08 2010
parent reply Trass3r <un known.com> writes:
 Use LDC (D1), you will note a significant improvement over DMD.
Unless you're on windows.
Yep, second big problem.
Feb 08 2010
parent Scorn <scorn trash-mail.com> writes:
 Use LDC (D1), you will note a significant improvement over DMD.
Unless you're on windows.
Yep, second big problem.
And unless you're using CodeBlocks (like me at the moment) for development ...
Feb 08 2010
prev sibling parent reply Trass3r <un known.com> writes:
Am 08.02.2010, 16:33 Uhr, schrieb Scorn <scorn trash-mail.com>:

 Trass3r schrieb:
 Would be interesting.
Yes, this would be very interesting indeed. A list of the rules which dmd uses internally for inlining functions and methods would be really nice.
Well if I read the code correctly the following is not supported: - nested inline? - variadic functions (T t, ...) - synchronized - imported functions - functions with closure vars - virtual functions that aren't final - functions with out, ref or static array parameters - functions with more than 250 elementary expressions Created my own little inline dumping patch: Index: inline.c =================================================================== --- inline.c (revision 363) +++ inline.c (working copy) -1126,6 +1126,7 if (fd && fd != iss->fd && fd->canInline(0)) { e = fd->doInline(iss, NULL, arguments); + printf("Inlined function %s.\n", fd->toPrettyChars()); } } else if (e1->op == TOKdotvar) -1145,7 +1146,10 ; } else - e = fd->doInline(iss, dve->e1, arguments); + { + e = fd->doInline(iss, dve->e1, arguments); + printf("Inlined method %s.\n", fd->toPrettyChars()); + } } }
Feb 09 2010
parent reply Scorn <scorn trash-mail.com> writes:
Trass3r schrieb:
 Am 08.02.2010, 16:33 Uhr, schrieb Scorn <scorn trash-mail.com>:
 
 Trass3r schrieb:
 Would be interesting.
Yes, this would be very interesting indeed. A list of the rules which dmd uses internally for inlining functions and methods would be really nice.
Well if I read the code correctly the following is not supported: - nested inline? - variadic functions (T t, ...) - synchronized - imported functions - functions with closure vars - virtual functions that aren't final - functions with out, ref or static array parameters - functions with more than 250 elementary expressions Created my own little inline dumping patch: Index: inline.c =================================================================== --- inline.c (revision 363) +++ inline.c (working copy) -1126,6 +1126,7 if (fd && fd != iss->fd && fd->canInline(0)) { e = fd->doInline(iss, NULL, arguments); + printf("Inlined function %s.\n", fd->toPrettyChars()); } } else if (e1->op == TOKdotvar) -1145,7 +1146,10 ; } else - e = fd->doInline(iss, dve->e1, arguments); + { + e = fd->doInline(iss, dve->e1, arguments); + printf("Inlined method %s.\n", fd->toPrettyChars()); + } } }
Thanks for your work. This is very interesting. It would be good to have this on a separate page regarding optimizations in D. But a big problem is that inlining seems to be done only inside one module and not across modules which makes modules with something like small helper or math functions which are used across different modules senseless.
Feb 09 2010
parent reply Trass3r <un known.com> writes:
 But a big problem is that inlining seems to be done only inside one
 module and not across modules which makes modules with something like
 small helper or math functions which are used across different modules
 senseless.
Yeah, I think so as well as each module is compiled separately. Nevertheless my patch lists functions that aren't used in the same module. Maybe I've missed something.
Feb 09 2010
parent reply Don <nospam nospam.com> writes:
Trass3r wrote:
 But a big problem is that inlining seems to be done only inside one
 module and not across modules which makes modules with something like
 small helper or math functions which are used across different modules
 senseless.
Yeah, I think so as well as each module is compiled separately. Nevertheless my patch lists functions that aren't used in the same module. Maybe I've missed something.
I don't think so. It's hard to believe that the inliner would be so limited. It'd be great to assemble some important test cases that currently fail. Probably 'ref' arguments are the main culprit.
Feb 09 2010
parent reply Scorn <scorn trash-mail.com> writes:
Don schrieb:
 Trass3r wrote:
 But a big problem is that inlining seems to be done only inside one
 module and not across modules which makes modules with something like
 small helper or math functions which are used across different modules
 senseless.
Yeah, I think so as well as each module is compiled separately. Nevertheless my patch lists functions that aren't used in the same module. Maybe I've missed something.
I don't think so. It's hard to believe that the inliner would be so limited. It'd be great to assemble some important test cases that currently fail. Probably 'ref' arguments are the main culprit.
Thank you very much for your answer Don. I think it might be interesting for you too, since from what i know you are also using D for numerical stuff. I still hope that the inliner is not so limited. But i am sure i once read a post from Walther where he did say that optimizations are done only per module. I don't hope it applies for inlining too. Because otherwise the whole module system would be, in my humble opinion, totally unusable. Just think about things like setter / getter methods in classes, little math-functions which you would put in a separate math module, operator overloading in a complex number or vector / matrix class module. When things like these are used in a program which does a lot of numerical computations inside in a big loop this would be a really really bad. But from my experience (which i have to admit is not that big) small utility functions like the following are not inlined when they are in a separate module but are inlined when they are in the same module in which they are called: double min(double a, double b, double c) { return a < b && a < c ? a : b < c ? b : c; } At the moment i help myself with mixins because they will just get copied and pasted inside the code: template Min(char[] a, char[] b, char[] c) { const char[] Min = a~"<"~b~"&&"~a~"<"~c~"?"~a~":"~b~"<"~c~"?"~b~":" ~ c; } min.x = mixin(Min!("vertex0.x", "vertex1.x", "vertex2.x")); min.y = mixin(Min!("vertex0.y", "vertex1.y", "vertex2.y")); min.z = mixin(Min!("vertex0.z", "vertex1.z", "vertex2.z")); This is (and a little bit more) is running in a tight loop which runs about 10000000 times. With these "optimizations" i get a speed increase about 20% percent. And i get the same increase in speed when i just copy the same function in the module in which the function is called. It's not the only this function where i notice this behaviour but in all of my tiny helper functions in the separate math-module. So for the moment it seems for me i have the alternatives in copying every math-helper function in every module which needs it (so it gets inlined) which is a software-engineering nightmare because i spread the same functionality over and over in my code or use mixins to death. So this is why i was so interested in the question under which conditions are functions inlined (it sometimes is very strange). I still hope it's not true that inlinig in D is so limited but from my experience it seems to be (at last in many cases).
Feb 09 2010
parent reply bearophile <bearophileHUGS lycos.com> writes:
Scorn:

 double min(double a, double b, double c)
 {
     return a < b && a < c ? a : b < c ? b : c;
 }
Don't write code like that, add some parenthesys like this: return (a < b && a < c) ? a : (b < c ? b : c); because the compiler is able to sort out those operator precedences, but the programmer that comes after you and reads that code will have problems. A compiler compiles that code with 3 FP tests, while I think two suffice, so there are better ways to write that.
 This is (and a little bit more) is running in a tight loop which runs
 about 10000000 times.
 With these "optimizations" i get a speed increase about 20% percent. 
--------------------- I have created a module named "mo" and a main module named "temp": module mo; int foo(int x) { return x * x; } double min3(double a, double b, double c) { return (a <= b) ? (a <= c ? a : c) : (b <= c ? b : c); } --------------------- module temp; // main module version (Tango) { import tango.stdc.stdio: printf; import tango.stdc.stdlib: atoi, atof; } else { import std.c.stdio: printf; import std.c.stdlib: atoi, atof; } import mo: foo, min3; void main() { int x = atoi("12"); printf("%d\n", foo(x)); double x1 = atof("10"); double x2 = atof("20"); double x3 = atof("30"); printf("%f\n", min3(x1, x2, x3)); } --------------------- From my tests it seems LDC isn't able to inline those functions, while DMD is able to inline them :-) ldc -O5 -release -output-s -inline temp.d mo.d 08049600 <_Dmain>: 8049600: 83 ec 34 sub $0x34,%esp 8049603: c7 04 24 e8 8c 05 08 movl $0x8058ce8,(%esp) 804960a: e8 99 fd ff ff call 80493a8 <atoi plt> 804960f: e8 9c 00 00 00 call 80496b0 <_D2mo3fooFiZi> 8049614: 89 44 24 04 mov %eax,0x4(%esp) 8049618: c7 04 24 eb 8c 05 08 movl $0x8058ceb,(%esp) 804961f: e8 64 fd ff ff call 8049388 <printf plt> 8049624: c7 04 24 ef 8c 05 08 movl $0x8058cef,(%esp) 804962b: e8 98 fd ff ff call 80493c8 <atof plt> 8049630: db 7c 24 28 fstpt 0x28(%esp) 8049634: c7 04 24 f2 8c 05 08 movl $0x8058cf2,(%esp) 804963b: e8 88 fd ff ff call 80493c8 <atof plt> 8049640: db 7c 24 1c fstpt 0x1c(%esp) 8049644: c7 04 24 f5 8c 05 08 movl $0x8058cf5,(%esp) 804964b: e8 78 fd ff ff call 80493c8 <atof plt> 8049650: db 6c 24 28 fldt 0x28(%esp) 8049654: dd 5c 24 10 fstpl 0x10(%esp) 8049658: db 6c 24 1c fldt 0x1c(%esp) 804965c: dd 5c 24 08 fstpl 0x8(%esp) 8049660: dd 1c 24 fstpl (%esp) 8049663: e8 58 00 00 00 call 80496c0 <_D2mo4min3FdddZd> 8049668: 83 ec 18 sub $0x18,%esp 804966b: dd 5c 24 04 fstpl 0x4(%esp) 804966f: c7 04 24 f8 8c 05 08 movl $0x8058cf8,(%esp) 8049676: e8 0d fd ff ff call 8049388 <printf plt> 804967b: 31 c0 xor %eax,%eax 804967d: 83 c4 34 add $0x34,%esp 8049680: c2 08 00 ret $0x8 8049683: 8d b6 00 00 00 00 lea 0x0(%esi),%esi 8049689: 8d bc 27 00 00 00 00 lea 0x0(%edi,%eiz,1),%edi ----------------- dmd -O -release -inline temp.d mo.d __Dmain comdat L0: sub ESP,038h mov EAX,offset FLAT:_DATA push EBX push ESI push EDI push EAX call near ptr _atoi add ESP,4 mov EBX,EAX mov ECX,EAX imul ECX,ECX mov EDX,offset FLAT:_DATA[4] push ECX push EDX call near ptr _printf mov ESI,offset FLAT:_DATA[8] push ESI call near ptr _atof mov EDI,offset FLAT:_DATA[0Ch] fstp qword ptr 018h[ESP] push EDI call near ptr _atof mov EAX,offset FLAT:_DATA[010h] fstp qword ptr 024h[ESP] push EAX call near ptr _atof add ESP,4 fld qword ptr 01Ch[ESP] fxch ST1 fstp qword ptr 02Ch[ESP] fcomp qword ptr 024h[ESP] fstsw AX sahf ja L83 jp L83 fld qword ptr 01Ch[ESP] fcomp qword ptr 02Ch[ESP] fstsw AX sahf ja L7D jp L7D fld qword ptr 01Ch[ESP] jmp short L9C L7D: fld qword ptr 02Ch[ESP] jmp short L9C L83: fld qword ptr 024h[ESP] fcomp qword ptr 02Ch[ESP] fstsw AX sahf ja L98 jp L98 fld qword ptr 024h[ESP] jmp short L9C L98: fld qword ptr 02Ch[ESP] L9C: sub ESP,8 mov ECX,offset FLAT:_DATA[014h] fstp qword ptr [ESP] push ECX call near ptr _printf add ESP,01Ch xor EAX,EAX pop EDI pop ESI pop EBX add ESP,038h ret ----------------- Using Link-Time optimization LDC is able to inline those functions. So here it seems LDC is worse :-( Bye, bearophile
Feb 09 2010
parent Scorn <scorn trash-mail.com> writes:
 Scorn:
 
 double min(double a, double b, double c)
 {
     return a < b && a < c ? a : b < c ? b : c;
 }
Don't write code like that, add some parenthesys like this: return (a < b && a < c) ? a : (b < c ? b : c); because the compiler is able to sort out those operator precedences, but the programmer that comes after you and reads that code will have problems.
Ok. The next time i post an example i will take care that it is more readable :-)
 A compiler compiles that code with 3 FP tests, while I think two suffice, so
there are better ways to write that.
:-) Sure. Yes you are right. Since i do not want to sort the values a, b and c (have a total order of things) i could, of course, write something longer and a bit more efficient code like this: double max(double a, double b, double c) { if (a >= b) { if (a >= c) return a; else return c; } else { if (b >= c) return b; else return c; } } which just uses two comparisons instead of three. But trust me. That bad code from above is not the explanation for the lack of speed in my program and would be a bit longer to write as a mixin. ;-) But here comes the interesting part:
 
 This is (and a little bit more) is running in a tight loop which runs
 about 10000000 times.
 With these "optimizations" i get a speed increase about 20% percent. 
--------------------- I have created a module named "mo" and a main module named "temp": module mo; int foo(int x) { return x * x; } double min3(double a, double b, double c) { return (a <= b) ? (a <= c ? a : c) : (b <= c ? b : c); } --------------------- module temp; // main module version (Tango) { import tango.stdc.stdio: printf; import tango.stdc.stdlib: atoi, atof; } else { import std.c.stdio: printf; import std.c.stdlib: atoi, atof; } import mo: foo, min3; void main() { int x = atoi("12"); printf("%d\n", foo(x)); double x1 = atof("10"); double x2 = atof("20"); double x3 = atof("30"); printf("%f\n", min3(x1, x2, x3)); } --------------------- From my tests it seems LDC isn't able to inline those functions, while DMD is able to inline them :-)
And gdc does not seem to inline those functions neither :-(
 
 ldc -O5 -release -output-s -inline temp.d mo.d
 
 08049600 <_Dmain>:
  8049600:	83 ec 34             	sub    $0x34,%esp
  8049603:	c7 04 24 e8 8c 05 08 	movl   $0x8058ce8,(%esp)
  804960a:	e8 99 fd ff ff       	call   80493a8 <atoi plt>
  804960f:	e8 9c 00 00 00       	call   80496b0 <_D2mo3fooFiZi>
  8049614:	89 44 24 04          	mov    %eax,0x4(%esp)
  8049618:	c7 04 24 eb 8c 05 08 	movl   $0x8058ceb,(%esp)
  804961f:	e8 64 fd ff ff       	call   8049388 <printf plt>
  8049624:	c7 04 24 ef 8c 05 08 	movl   $0x8058cef,(%esp)
  804962b:	e8 98 fd ff ff       	call   80493c8 <atof plt>
  8049630:	db 7c 24 28          	fstpt  0x28(%esp)
  8049634:	c7 04 24 f2 8c 05 08 	movl   $0x8058cf2,(%esp)
  804963b:	e8 88 fd ff ff       	call   80493c8 <atof plt>
  8049640:	db 7c 24 1c          	fstpt  0x1c(%esp)
  8049644:	c7 04 24 f5 8c 05 08 	movl   $0x8058cf5,(%esp)
  804964b:	e8 78 fd ff ff       	call   80493c8 <atof plt>
  8049650:	db 6c 24 28          	fldt   0x28(%esp)
  8049654:	dd 5c 24 10          	fstpl  0x10(%esp)
  8049658:	db 6c 24 1c          	fldt   0x1c(%esp)
  804965c:	dd 5c 24 08          	fstpl  0x8(%esp)
  8049660:	dd 1c 24             	fstpl  (%esp)
  8049663:	e8 58 00 00 00       	call   80496c0 <_D2mo4min3FdddZd>
  8049668:	83 ec 18             	sub    $0x18,%esp
  804966b:	dd 5c 24 04          	fstpl  0x4(%esp)
  804966f:	c7 04 24 f8 8c 05 08 	movl   $0x8058cf8,(%esp)
  8049676:	e8 0d fd ff ff       	call   8049388 <printf plt>
  804967b:	31 c0                	xor    %eax,%eax
  804967d:	83 c4 34             	add    $0x34,%esp
  8049680:	c2 08 00             	ret    $0x8
  8049683:	8d b6 00 00 00 00    	lea    0x0(%esi),%esi
  8049689:	8d bc 27 00 00 00 00 	lea    0x0(%edi,%eiz,1),%edi
 
 -----------------
 
 dmd -O -release -inline temp.d mo.d
 
 __Dmain comdat
 L0:     sub ESP,038h
         mov EAX,offset FLAT:_DATA
         push    EBX
         push    ESI
         push    EDI
         push    EAX
         call    near ptr _atoi
         add ESP,4
         mov EBX,EAX
         mov ECX,EAX
         imul    ECX,ECX
         mov EDX,offset FLAT:_DATA[4]
         push    ECX
         push    EDX
         call    near ptr _printf
         mov ESI,offset FLAT:_DATA[8]
         push    ESI
         call    near ptr _atof
         mov EDI,offset FLAT:_DATA[0Ch]
         fstp    qword ptr 018h[ESP]
         push    EDI
         call    near ptr _atof
         mov EAX,offset FLAT:_DATA[010h]
         fstp    qword ptr 024h[ESP]
         push    EAX
         call    near ptr _atof
         add ESP,4
         fld qword ptr 01Ch[ESP]
         fxch    ST1
         fstp    qword ptr 02Ch[ESP]
         fcomp   qword ptr 024h[ESP]
         fstsw   AX
         sahf
         ja  L83
         jp  L83
         fld qword ptr 01Ch[ESP]
         fcomp   qword ptr 02Ch[ESP]
         fstsw   AX
         sahf
         ja  L7D
         jp  L7D
         fld qword ptr 01Ch[ESP]
         jmp short   L9C
 L7D:        fld qword ptr 02Ch[ESP]
         jmp short   L9C
 L83:        fld qword ptr 024h[ESP]
         fcomp   qword ptr 02Ch[ESP]
         fstsw   AX
         sahf
         ja  L98
         jp  L98
         fld qword ptr 024h[ESP]
         jmp short   L9C
 L98:        fld qword ptr 02Ch[ESP]
 L9C:        sub ESP,8
         mov ECX,offset FLAT:_DATA[014h]
         fstp    qword ptr [ESP]
         push    ECX
         call    near ptr _printf
         add ESP,01Ch
         xor EAX,EAX
         pop EDI
         pop ESI
         pop EBX
         add ESP,038h
         ret
 
 -----------------
 
 Using Link-Time optimization LDC is able to inline those functions.
 So here it seems LDC is worse :-(
I have to try it with gdc too.
 
 Bye,
 bearophile
Thank you very much for your research bearophile. It's very appreciated. But now the interesting question is why the different compilers inline functions so differently (other versions of the frontend ? has Walther changed something) or is because they use different backends (which should not matter so much since inlining normally is best done in the frontend). And of course Trass3rs original question under which conditions are functions inlined still remains. Are setters/getters inlined ? Overloaded operators ? Short helper functions ? Functions with ref or out parameters ? In which cases does it simply not work when it should ?
Feb 09 2010