digitalmars.D - Problems with dmd inlining
- Craig Black (11/11) Dec 11 2010 I did some benchmarking with a simple quick sort algorithm and was very
- Brad Roberts (10/22) Dec 11 2010 There's a number of things that currently stop dmd from inlining. Sever...
- Andrei Alexandrescu (7/29) Dec 11 2010 Seconded. I think it's great to address whatever keeps bona fide
- Craig Black (3/6) Dec 11 2010 20% slower would be acceptable if I didn't have to do my own inlining.
- Jason House (2/11) Dec 11 2010
- Jason House (2/14) Dec 11 2010 I should add that I strongly suspect failure to inline as a cause. The C...
- Walter Bright (3/9) Dec 11 2010 50 times slower is not likely to be a problem with inlining, it's likely...
- Jason House (2/12) Dec 12 2010 Normally, yes, I'd agree. But in this case, it's merely a port of the C+...
- so (5/11) Dec 12 2010 If you already haven't, i suggest you to profile it and share related
- Michel Fortin (7/18) Dec 12 2010 Another interesting metric would be to compile the C++ code with
- Walter Bright (2/20) Dec 12 2010 There's something funky going on. Inlining can't explain anywhere near a...
- Simen kjaeraas (4/20) Dec 12 2010 If no-one else has stepped up, I'm willing to have a look.
- Andrei Alexandrescu (5/25) Dec 12 2010 That would be a great help to the community. I did look at that code and...
- Jason House (2/11) Dec 12 2010 To be fair, back when I e-mailed it to you, I was banging my head trying...
- Jason House (9/18) Dec 12 2010 Thanks Simen. I sent a reply to the e-mail you gave in this newsgroup w...
- Walter Bright (6/8) Dec 13 2010 I see the problem. You need to compile with the
- Craig Black (4/13) Dec 13 2010 I don't need a -winbenchmark switch since I already have an easy button.
- so (6/17) Dec 11 2010 As you know inlining is very important for numeric coding, D doesn't hav...
- Andrej Mitrovic (3/33) Dec 11 2010 Show us the code and how you invoked DMD. I'm sure there are experts
I did some benchmarking with a simple quick sort algorithm and was very disappointed that dmd was over twice as slow as Visual C++. Investigation revealed most of the slowness was due to the fact that dmd was not inlining a simple function that returned a reference. After hand-inlining some code, I got within 20% of the performance of Visual C++. I don't see this as acceptable. The main reason that I want to use D is so that my code will be cleaner. If I have to inline my own functions then this will not result in clean code. Anyway, has anyone else had problems with dmd's inliner? Should I post a bug report or has someone else already complained about this? -Craig
Dec 11 2010
On 12/11/2010 8:22 PM, Craig Black wrote:I did some benchmarking with a simple quick sort algorithm and was very disappointed that dmd was over twice as slow as Visual C++. Investigation revealed most of the slowness was due to the fact that dmd was not inlining a simple function that returned a reference. After hand-inlining some code, I got within 20% of the performance of Visual C++. I don't see this as acceptable. The main reason that I want to use D is so that my code will be cleaner. If I have to inline my own functions then this will not result in clean code. Anyway, has anyone else had problems with dmd's inliner? Should I post a bug report or has someone else already complained about this? -CraigThere's a number of things that currently stop dmd from inlining. Several exist as bug reports. I don't recall if there's one about ref return results or not. These limitations are certainly worth working to lift, but they're lower priority than a lot of other bugs. That said, they're the sort of thing I enjoy trying to fix, so go ahead and file a nice tiny test case. As always, if there's issues you care a lot about, the source code for the compiler is there for anyone to work with. Later, Brad
Dec 11 2010
On 12/11/10 10:36 PM, Brad Roberts wrote:On 12/11/2010 8:22 PM, Craig Black wrote:Seconded. I think it's great to address whatever keeps bona fide potential users from using D over competitor languages. One more thing - to clarify, Craig, are you implying that it's acceptable for performance to be within 20%? If not, there are tweaks on the algorithmic side we can do to improve sorting. AndreiI did some benchmarking with a simple quick sort algorithm and was very disappointed that dmd was over twice as slow as Visual C++. Investigation revealed most of the slowness was due to the fact that dmd was not inlining a simple function that returned a reference. After hand-inlining some code, I got within 20% of the performance of Visual C++. I don't see this as acceptable. The main reason that I want to use D is so that my code will be cleaner. If I have to inline my own functions then this will not result in clean code. Anyway, has anyone else had problems with dmd's inliner? Should I post a bug report or has someone else already complained about this? -CraigThere's a number of things that currently stop dmd from inlining. Several exist as bug reports. I don't recall if there's one about ref return results or not. These limitations are certainly worth working to lift, but they're lower priority than a lot of other bugs. That said, they're the sort of thing I enjoy trying to fix, so go ahead and file a nice tiny test case. As always, if there's issues you care a lot about, the source code for the compiler is there for anyone to work with. Later, Brad
Dec 11 2010
One more thing - to clarify, Craig, are you implying that it's acceptable for performance to be within 20%? If not, there are tweaks on the algorithmic side we can do to improve sorting.20% slower would be acceptable if I didn't have to do my own inlining. Closing the gap even more would be nice. Twice as slow is not acceptable. -Craig
Dec 11 2010
Craig Black Wrote:I wish I had your problems. I ported a sizable set of C++ code to D2 and discovered D2 with dmd was 50x slower than C++ with gcc! I've been to busy/disappointed to track down the bug(s) causing such a slowdown. If anyone is sufficiently inspired to find the bugs, I can make the GPL source code available.One more thing - to clarify, Craig, are you implying that it's acceptable for performance to be within 20%? If not, there are tweaks on the algorithmic side we can do to improve sorting.20% slower would be acceptable if I didn't have to do my own inlining. Closing the gap even more would be nice. Twice as slow is not acceptable. -Craig
Dec 11 2010
Jason House Wrote:Craig Black Wrote:I should add that I strongly suspect failure to inline as a cause. The C++ code has lots of mini functions returning compile-time constants. I know that the C++ code started out as low level code aimed at maximum performance and then gradually got cleaned up. Any cleanup that confused gcc's optimizer was rejected/reworked. The code may be closely tied to gcc's optimization/inlining, but dmd should come close. 20% slower would be acceptable.I wish I had your problems. I ported a sizable set of C++ code to D2 and discovered D2 with dmd was 50x slower than C++ with gcc! I've been to busy/disappointed to track down the bug(s) causing such a slowdown. If anyone is sufficiently inspired to find the bugs, I can make the GPL source code available.One more thing - to clarify, Craig, are you implying that it's acceptable for performance to be within 20%? If not, there are tweaks on the algorithmic side we can do to improve sorting.20% slower would be acceptable if I didn't have to do my own inlining. Closing the gap even more would be nice. Twice as slow is not acceptable. -Craig
Dec 11 2010
Jason House wrote:I wish I had your problems. I ported a sizable set of C++ code to D2 and discovered D2 with dmd was 50x slower than C++ with gcc! I've been to busy/disappointed to track down the bug(s) causing such a slowdown. If anyone is sufficiently inspired to find the bugs, I can make the GPL source code available.50 times slower is not likely to be a problem with inlining, it's likely to be an algorithmic one.
Dec 11 2010
Walter Bright Wrote:Jason House wrote:Normally, yes, I'd agree. But in this case, it's merely a port of the C++ source code, so all algorithms are identical. The only change I did initially was to use ranges, but even after replacing those with mixins, the performance was equally as bad. There's also no memory allocations, so the GC isn't an issue either. There are also benchmarks on behavior that make me fairly confident the behavior is comparable.I wish I had your problems. I ported a sizable set of C++ code to D2 and discovered D2 with dmd was 50x slower than C++ with gcc! I've been to busy/disappointed to track down the bug(s) causing such a slowdown. If anyone is sufficiently inspired to find the bugs, I can make the GPL source code available.50 times slower is not likely to be a problem with inlining, it's likely to be an algorithmic one.
Dec 12 2010
Normally, yes, I'd agree. But in this case, it's merely a port of the C++ source code, so all algorithms are identical. The only change I did initially was to use ranges, but even after replacing those with mixins, the performance was equally as bad. There's also no memory allocations, so the GC isn't an issue either. There are also benchmarks on behavior that make me fairly confident the behavior is comparable.If you already haven't, i suggest you to profile it and share related parts with us. That is unacceptable. -- Using Opera's revolutionary email client: http://www.opera.com/mail/
Dec 12 2010
On 2010-12-12 11:09:24 -0500, so <so so.do> said:Another interesting metric would be to compile the C++ code with inlining disabled and compare with the D code with inlining disabled. -- Michel Fortin michel.fortin michelf.com http://michelf.com/Normally, yes, I'd agree. But in this case, it's merely a port of the C++ source code, so all algorithms are identical. The only change I did initially was to use ranges, but even after replacing those with mixins, the performance was equally as bad. There's also no memory allocations, so the GC isn't an issue either. There are also benchmarks on behavior that make me fairly confident the behavior is comparable.If you already haven't, i suggest you to profile it and share related parts with us. That is unacceptable.
Dec 12 2010
Jason House wrote:Walter Bright Wrote:There's something funky going on. Inlining can't explain anywhere near a 50x change.Jason House wrote:Normally, yes, I'd agree. But in this case, it's merely a port of the C++ source code, so all algorithms are identical. The only change I did initially was to use ranges, but even after replacing those with mixins, the performance was equally as bad. There's also no memory allocations, so the GC isn't an issue either. There are also benchmarks on behavior that make me fairly confident the behavior is comparable.I wish I had your problems. I ported a sizable set of C++ code to D2 and discovered D2 with dmd was 50x slower than C++ with gcc! I've been to busy/disappointed to track down the bug(s) causing such a slowdown. If anyone is sufficiently inspired to find the bugs, I can make the GPL source code available.50 times slower is not likely to be a problem with inlining, it's likely to be an algorithmic one.
Dec 12 2010
Jason House <jason.james.house gmail.com> wrote:Craig Black Wrote:If no-one else has stepped up, I'm willing to have a look. -- SimenI wish I had your problems. I ported a sizable set of C++ code to D2 and discovered D2 with dmd was 50x slower than C++ with gcc! I've been to busy/disappointed to track down the bug(s) causing such a slowdown. If anyone is sufficiently inspired to find the bugs, I can make the GPL source code available.One more thing - to clarify, Craig, are you implying that it'sacceptablefor performance to be within 20%? If not, there are tweaks on the algorithmic side we can do to improve sorting.20% slower would be acceptable if I didn't have to do my own inlining. Closing the gap even more would be nice. Twice as slow is not acceptable. -Craig
Dec 12 2010
On 12/12/10 5:06 PM, Simen kjaeraas wrote:Jason House <jason.james.house gmail.com> wrote:That would be a great help to the community. I did look at that code and nothing jumped at me. But then I didn't have enough time to profile it properly. AndreiCraig Black Wrote:If no-one else has stepped up, I'm willing to have a look.I wish I had your problems. I ported a sizable set of C++ code to D2 and discovered D2 with dmd was 50x slower than C++ with gcc! I've been to busy/disappointed to track down the bug(s) causing such a slowdown. If anyone is sufficiently inspired to find the bugs, I can make the GPL source code available.One more thing - to clarify, Craig, are you implying that it'sacceptablefor performance to be within 20%? If not, there are tweaks on the algorithmic side we can do to improve sorting.20% slower would be acceptable if I didn't have to do my own inlining. Closing the gap even more would be nice. Twice as slow is not acceptable. -Craig
Dec 12 2010
Andrei Alexandrescu Wrote:On 12/12/10 5:06 PM, Simen kjaeraas wrote:To be fair, back when I e-mailed it to you, I was banging my head trying to find a bug in my string mixin version. I had isolated it down to some ~50 lines, but my "proof" was pretty light and it wasn't obvious how that small bit fit into the larger whole. I forget the finer details now, but as it turns out, I was doing something like mixing in the right half of an assignment so the code was simply not doing anything. The version I sent Simen gives correct output with both versions of the program... Just really really slowly.If no-one else has stepped up, I'm willing to have a look.That would be a great help to the community. I did look at that code and nothing jumped at me. But then I didn't have enough time to profile it properly. Andrei
Dec 12 2010
Simen kjaeraas Wrote:Jason House <jason.james.house gmail.com> wrote:Thanks Simen. I sent a reply to the e-mail you gave in this newsgroup with the following things: .tar.gz with c++ source (2178 lines) .tar.gz with D2 source + ranges (1690 lines) .tar.gz with D2 source + string mixins (1696 lines) dmd's -profile output for the range-based version Very basic description of where the source came from and what it's doing An svg showing c++ dependency tree built from the #includes Benchmarking again, it appears I exaggerated. The D2 code compiled with -gc -release -inline -noboundscheck -O is only 33x slower (not 50x). My test this evening was with dmd 2.047 and g++ 4.4.5.I wish I had your problems. I ported a sizable set of C++ code to D2 and discovered D2 with dmd was 50x slower than C++ with gcc! I've been to busy/disappointed to track down the bug(s) causing such a slowdown. If anyone is sufficiently inspired to find the bugs, I can make the GPL source code available.If no-one else has stepped up, I'm willing to have a look.
Dec 12 2010
Jason House wrote:The D2 code compiled with -gc -release -inline -noboundscheck -O is only 33x slower (not 50x). My test this evening was with dmd 2.047 and g++ 4.4.5.I see the problem. You need to compile with the -winbenchmark switch. This switch enables sophisticated optimizer technology, capable of recognizing benchmark code and replacing it with: printf("1899 primes\n");
Dec 13 2010
"Walter Bright" <newshound2 digitalmars.com> wrote in message news:ie4mit$m2r$1 digitalmars.com...Jason House wrote:I don't need a -winbenchmark switch since I already have an easy button. -CraigThe D2 code compiled with -gc -release -inline -noboundscheck -O is only 33x slower (not 50x). My test this evening was with dmd 2.047 and g++ 4.4.5.I see the problem. You need to compile with the -winbenchmark switch. This switch enables sophisticated optimizer technology, capable of recognizing benchmark code and replacing it with: printf("1899 primes\n");
Dec 13 2010
There's a number of things that currently stop dmd from inlining. Several exist as bug reports. I don't recall if there's one about ref return results or not. These limitations are certainly worth working to lift, but they're lower priority than a lot of other bugs. That said, they're the sort of thing I enjoy trying to fix, so go ahead and file a nice tiny test case. As always, if there's issues you care a lot about, the source code for the compiler is there for anyone to work with.As you know inlining is very important for numeric coding, D doesn't have hints(inline) or constaints, (non standard forceinline) which is just saying "compiler knows the best" is this always the case? Thank you! -- Using Opera's revolutionary email client: http://www.opera.com/mail/
Dec 11 2010
Show us the code and how you invoked DMD. I'm sure there are experts lurking around here ready to investigate. ;) On 12/12/10, Brad Roberts <braddr puremagic.com> wrote:On 12/11/2010 8:22 PM, Craig Black wrote:I did some benchmarking with a simple quick sort algorithm and was very disappointed that dmd was over twice as slow as Visual C++. Investigation revealed most of the slowness was due to the fact that dmd was not inlining a simple function that returned a reference. After hand-inlining some code, I got within 20% of the performance of Visual C++. I don't see this as acceptable. The main reason that I want to use D is so that my code will be cleaner. If I have to inline my own functions then this will not result in clean code. Anyway, has anyone else had problems with dmd's inliner? Should I post a bug report or has someone else already complained about this? -CraigThere's a number of things that currently stop dmd from inlining. Several exist as bug reports. I don't recall if there's one about ref return results or not. These limitations are certainly worth working to lift, but they're lower priority than a lot of other bugs. That said, they're the sort of thing I enjoy trying to fix, so go ahead and file a nice tiny test case. As always, if there's issues you care a lot about, the source code for the compiler is there for anyone to work with. Later, Brad
Dec 11 2010