www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - DMD is slow for matrix maths?

reply Etienne Cimon <etcimon gmail.com> writes:
I've been playing around with perf and my web server and found 
that the bottleneck is by far the math module of Botan: 
https://github.com/etcimon/botan/blob/master/source/botan/math/mp/mp_core.d

I'm probably a bit naive but I was wishing for some inlining to 
happen. I see LOTS of CPU time spent on "pop" instructions to 
return from a simple multiply function, and the pragma(inline, 
true) was refused on all of these. So, should I wait for an 
inline? Should I import another library? Should I rewrite all the 
maths in assembly manually for each processor? Should I write 
another library that must be compiled with LDC/release for maths?

I think the best option would be for an inline feature in DMD 
that works, but I'm wondering what the stance is right now about 
the subject?
Oct 25 2015
next sibling parent reply "H. S. Teoh via Digitalmars-d" <digitalmars-d puremagic.com> writes:
On Mon, Oct 26, 2015 at 02:37:16AM +0000, Etienne Cimon via Digitalmars-d wrote:
 I've been playing around with perf and my web server and found that
 the bottleneck is by far the math module of Botan:
 https://github.com/etcimon/botan/blob/master/source/botan/math/mp/mp_core.d
 
 I'm probably a bit naive but I was wishing for some inlining to
 happen. I see LOTS of CPU time spent on "pop" instructions to return
 from a simple multiply function, and the pragma(inline, true) was
 refused on all of these.  So, should I wait for an inline? Should I
 import another library? Should I rewrite all the maths in assembly
 manually for each processor? Should I write another library that must
 be compiled with LDC/release for maths?
 
 I think the best option would be for an inline feature in DMD that
 works, but I'm wondering what the stance is right now about the
 subject?
For an immediate solution to performance-related issues, I recommend using GDC or LDC (with maximum optimization options) instead of DMD. If you must use DMD, I recommend filing an enhancement request and bothering Walter about it. T -- MS Windows: 64-bit rehash of 32-bit extensions and a graphical shell for a 16-bit patch to an 8-bit operating system originally coded for a 4-bit microprocessor, written by a 2-bit company that can't stand 1-bit of competition.
Oct 25 2015
parent reply Etienne Cimon <etcimon gmail.com> writes:
On Monday, 26 October 2015 at 04:48:09 UTC, H. S. Teoh wrote:
 On Mon, Oct 26, 2015 at 02:37:16AM +0000, Etienne Cimon via 
 Digitalmars-d wrote:

 If you must use DMD, I recommend filing an enhancement request 
 and bothering Walter about it.


 T
I'd really like the performance benefits to be available to DMD users as well. I think I'll have to write it all with inline assembler just to be sure...
Oct 26 2015
next sibling parent reply "H. S. Teoh via Digitalmars-d" <digitalmars-d puremagic.com> writes:
On Mon, Oct 26, 2015 at 11:37:16AM +0000, Etienne Cimon via Digitalmars-d wrote:
 On Monday, 26 October 2015 at 04:48:09 UTC, H. S. Teoh wrote:
On Mon, Oct 26, 2015 at 02:37:16AM +0000, Etienne Cimon via Digitalmars-d
wrote:

If you must use DMD, I recommend filing an enhancement request and
bothering Walter about it.


T
I'd really like the performance benefits to be available to DMD users as well. I think I'll have to write it all with inline assembler just to be sure...
Sure, but filing a bug with DMD and prodding Walter to do something about it will only benefit everybody in the end, and not just for your current project. DMD could use some major improvements in its optimizer. T -- Answer: Because it breaks the logical sequence of discussion. / Question: Why is top posting bad?
Oct 26 2015
parent Dmitry Olshansky <dmitry.olsh gmail.com> writes:
On 26-Oct-2015 16:44, H. S. Teoh via Digitalmars-d wrote:
 On Mon, Oct 26, 2015 at 11:37:16AM +0000, Etienne Cimon via Digitalmars-d
wrote:
 On Monday, 26 October 2015 at 04:48:09 UTC, H. S. Teoh wrote:
 On Mon, Oct 26, 2015 at 02:37:16AM +0000, Etienne Cimon via Digitalmars-d
 wrote:

 If you must use DMD, I recommend filing an enhancement request and
 bothering Walter about it.


 T
I'd really like the performance benefits to be available to DMD users as well. I think I'll have to write it all with inline assembler just to be sure...
Sure, but filing a bug with DMD and prodding Walter to do something about it will only benefit everybody in the end, and not just for your current project. DMD could use some major improvements in its optimizer.
The last few tweaks to the backend seem to noticeably improve performance. And there is plenty of stuff to enable in inliner, with few more specific cases added recently (I think 2.069 will have them). -- Dmitry Olshansky
Oct 26 2015
prev sibling next sibling parent reply rsw0x <anonymous anonymous.com> writes:
On Monday, 26 October 2015 at 11:37:17 UTC, Etienne Cimon wrote:
 On Monday, 26 October 2015 at 04:48:09 UTC, H. S. Teoh wrote:
 On Mon, Oct 26, 2015 at 02:37:16AM +0000, Etienne Cimon via 
 Digitalmars-d wrote:

 If you must use DMD, I recommend filing an enhancement request 
 and bothering Walter about it.


 T
I'd really like the performance benefits to be available to DMD users as well. I think I'll have to write it all with inline assembler just to be sure...
dmd will never reach gdc/ldc performance, gcc and LLVM have entire teams of people that actively contribute to their compilers.
Oct 26 2015
next sibling parent reply default0 <Kevin.Labschek gmx.de> writes:
On Monday, 26 October 2015 at 20:30:51 UTC, rsw0x wrote:
 dmd will never reach gdc/ldc performance, gcc and LLVM have 
 entire teams of people that actively contribute to their 
 compilers.
We have a Walter though :-)
Oct 26 2015
parent reply Jack Stouffer <jack jackstouffer.com> writes:
On Monday, 26 October 2015 at 21:29:47 UTC, default0 wrote:
 On Monday, 26 October 2015 at 20:30:51 UTC, rsw0x wrote:
 dmd will never reach gdc/ldc performance, gcc and LLVM have 
 entire teams of people that actively contribute to their 
 compilers.
We have a Walter though :-)
Walter has a job as well as trying to manage the programming language as a whole. He also makes contributions to Phobos and druntime which eat up time that he would otherwise spend on dmd. Also, Walter seems to be the only one who understands the backend code of dmd, so even if we left aside the licensing issues (which are huge), that really limits the quality of the backend as generally only one set of eyes are on the code. Compare that to the team of LLVM, which has people working full time on it as well as major contributions from some of the largest tech companies.
Oct 26 2015
parent reply Laeeth Isharc <Laeeth.nospam nospam-laeeth.com> writes:
On Monday, 26 October 2015 at 23:13:22 UTC, Jack Stouffer wrote:
 On Monday, 26 October 2015 at 21:29:47 UTC, default0 wrote:
 On Monday, 26 October 2015 at 20:30:51 UTC, rsw0x wrote:
 dmd will never reach gdc/ldc performance, gcc and LLVM have 
 entire teams of people that actively contribute to their 
 compilers.
We have a Walter though :-)
Walter has a job as well as trying to manage the programming language as a whole. He also makes contributions to Phobos and druntime which eat up time that he would otherwise spend on dmd. Also, Walter seems to be the only one who understands the backend code of dmd, so even if we left aside the licensing issues (which are huge), that really limits the quality of the backend as generally only one set of eyes are on the code. Compare that to the team of LLVM, which has people working full time on it as well as major contributions from some of the largest tech companies.
Someone who says never is certainly making a bold claim in an uncertain world at a time when the basic factors governing the fate of D are visibly shifting in a positive direction. In addition, it's mildly demoralising to others to say such things, no matter how good ones intent might be. I don't know much about compiler backends, but It doesn't sound like in the medium term significantly improving the backend code is something utterly fantastic, like sending a man to the moon umm I mean private sector space travel, hmmm well better pick a different example. You maybe need some increase in contributions, and maybe also some companies that have an interest in using dmd and that can pay for some resources to help. Was Weka not using DMD, for example ? (Maybe I misremember). These things will change over time - look at the spurt of hiring lately, and I am doing my small part too (beginnings are often small). So I would suggest it isn't productive to think about whether Dmd will catch up. Maybe, maybe not. Doesn't matter. Making it faster will help many - this defeatist attitude of 'why bother - just use GDC or LDC' may not be the best long-term strategy (remember that these too are volunteer projects, and they don't have an army, especially GDC, and you shouldn't assume that magically they'll always manage just because they have so far), whether or not it catches up. There's always an opportunity cost, but so what. One should treat dmd performance as an interesting challenge, not something that's set in stone, and can never much change.
Oct 26 2015
next sibling parent reply "H. S. Teoh via Digitalmars-d" <digitalmars-d puremagic.com> writes:
On Tue, Oct 27, 2015 at 02:21:38AM +0000, Laeeth Isharc via Digitalmars-d wrote:
[...]
 So I would suggest it isn't productive to think about whether Dmd will
 catch up.  Maybe, maybe not.  Doesn't matter.  Making it faster will
 help many - this defeatist attitude of 'why bother - just use GDC or
 LDC' may not be the best long-term strategy (remember that these too
 are volunteer projects, and they don't have an army, especially GDC,
 and you shouldn't assume that magically they'll always manage just
 because they have so far), whether or not it catches up.  There's
 always an opportunity cost, but so what.
 
 One should treat dmd performance as an interesting challenge, not
 something that's set in stone, and can never much change.
While I agree that improving dmd performance will at the end of the day reap benefits for all parties involved, I'm also not holding my breath for that to happen. I haven't measured the performance of the latest DMD git HEAD against GDC, but the last time I checked (which is sometime earlier this year, not that long ago), GDC-generated code still outperforms DMD-generated code by 20%-30%. Judging from the assembly code, this difference mainly arises from the DMD inliner being overly conservative, and the lack of advanced loop optimization techniques such as aggressive code hoisting, strength reduction, unrolling, etc.. Fixing the inliner will probably be relatively easy; loop optimizations will be more challenging, especially if Walter continues to insist on compilation speed. Some of the problems that have to be solved in advanced loop optimization require more expensive algorithms, which may incur a penalty on compilation time. T -- Creativity is not an excuse for sloppiness.
Oct 26 2015
parent Laeeth Isharc <Laeeth.nospam nospam-laeeth.com> writes:
On Tuesday, 27 October 2015 at 03:10:43 UTC, H. S. Teoh wrote:
 On Tue, Oct 27, 2015 at 02:21:38AM +0000, Laeeth Isharc via 
 Digitalmars-d wrote: [...]
 So I would suggest it isn't productive to think about whether 
 Dmd will catch up.  Maybe, maybe not.  Doesn't matter.  Making 
 it faster will help many - this defeatist attitude of 'why 
 bother - just use GDC or LDC' may not be the best long-term 
 strategy (remember that these too are volunteer projects, and 
 they don't have an army, especially GDC, and you shouldn't 
 assume that magically they'll always manage just because they 
 have so far), whether or not it catches up.  There's always an 
 opportunity cost, but so what.
 
 One should treat dmd performance as an interesting challenge, 
 not something that's set in stone, and can never much change.
While I agree that improving dmd performance will at the end of the day reap benefits for all parties involved, I'm also not holding my breath for that to happen. I haven't measured the performance of the latest DMD git HEAD against GDC, but the last time I checked (which is sometime earlier this year, not that long ago), GDC-generated code still outperforms DMD-generated code by 20%-30%. Judging from the assembly code, this difference mainly arises from the DMD inliner being overly conservative, and the lack of advanced loop optimization techniques such as aggressive code hoisting, strength reduction, unrolling, etc.. Fixing the inliner will probably be relatively easy; loop optimizations will be more challenging, especially if Walter continues to insist on compilation speed. Some of the problems that have to be solved in advanced loop optimization require more expensive algorithms, which may incur a penalty on compilation time. T
Thanks for the insight. If the optimisations are configurable by the user, you pick your poison, and live with that tradeoff, no?
Oct 26 2015
prev sibling next sibling parent reply rsw0x <anonymous anonymous.com> writes:
On Tuesday, 27 October 2015 at 02:21:40 UTC, Laeeth Isharc wrote:
 On Monday, 26 October 2015 at 23:13:22 UTC, Jack Stouffer wrote:
 [...]
Someone who says never is certainly making a bold claim in an uncertain world at a time when the basic factors governing the fate of D are visibly shifting in a positive direction. In addition, it's mildly demoralising to others to say such things, no matter how good ones intent might be. [...]
Nobody besides Walter will work on the backend because 1) It's under a proprietary license 2) Nobody besides Walter understands it 3) Two faster backends already exist and get maintenance for free. D already has a strained, split developer community.
Oct 26 2015
parent reply Laeeth Isharc <Laeeth.nospam nospam-laeeth.com> writes:
On Tuesday, 27 October 2015 at 03:14:55 UTC, rsw0x wrote:
 On Tuesday, 27 October 2015 at 02:21:40 UTC, Laeeth Isharc 
 wrote:
 On Monday, 26 October 2015 at 23:13:22 UTC, Jack Stouffer 
 wrote:
 [...]
Someone who says never is certainly making a bold claim in an uncertain world at a time when the basic factors governing the fate of D are visibly shifting in a positive direction. In addition, it's mildly demoralising to others to say such things, no matter how good ones intent might be. [...]
Nobody besides Walter will work on the backend because 1) It's under a proprietary license 2) Nobody besides Walter understands it 3) Two faster backends already exist and get maintenance for free. D already has a strained, split developer community.
1) and so? There is proprietary and proprietary, and maybe I simply don't understand the practical reasons why someone who understands the pragmatic constraints and realities of what the license actually means would be put off, but it would be interesting to know why it is such a deal killer in your eyes. 2) is it beyond the reach of anyone else to learn ? 3) what exactly is the opportunity cost of Iain working on gdc ? It's hardly a fixed size cake. And I don't know, but the relationships between different back end people doesn't from public postings strike me as strained.
Oct 26 2015
parent Iain Buclaw via Digitalmars-d <digitalmars-d puremagic.com> writes:
On 27 Oct 2015 4:25 am, "Laeeth Isharc via Digitalmars-d" <
digitalmars-d puremagic.com> wrote:
 3) what exactly is the opportunity cost of Iain working on gdc ?  It's
hardly a fixed size cake. And I don't know, but the relationships between different back end people doesn't from public postings strike me as strained.

Cost?  https://www.openhub.net/p/gdc/estimated_cost. :-)

We get somewhere in the vicinity of there eventually on matters of
implementation detail (to a lesser extent we agree on design when it comes
to platform dependant details).

There's never anything short of healthy debate between us.

Iain.
Oct 27 2015
prev sibling parent reply Jack Stouffer <jack jackstouffer.com> writes:
On Tuesday, 27 October 2015 at 02:21:40 UTC, Laeeth Isharc wrote:
 In addition, it's mildly demoralising to others to say such 
 things, no matter how good ones intent might be.
My intentions are to call things as they are. If people are demoralized after learning that one person working in his spare time can't match the productivity of several people working full time, then they need a reality check.
 You maybe need some increase in contributions
People are perfectly willing to contribute to the front-end of dmd as it's boost licensed and not esoteric. The back-end, as I mentioned before, is really only understood by two or three people. Also, the dmd back-end struggles to be classified as open source and is certainly in no way free software. Those two things don't really inspire many people to sink their teeth into dmd's backend, especially sense all of your work will become copyrighted by a private company after you submit it.
 Was Weka not using DMD, for example ?  (Maybe I misremember).  
 These things will change over time - look at the spurt of 
 hiring lately, and I am doing my small part too (beginnings are 
 often small).
Companies using dmd are unlikely to influence dmd's codegen in any way. The biggest thing that companies will complain about are compiler and Phobos bugs, as it actively stops them from doing any work.
 So I would suggest it isn't productive to think about whether 
 Dmd will catch up.  Maybe, maybe not.  Doesn't matter.  Making 
 it faster will help many - this defeatist attitude of 'why 
 bother - just use GDC or LDC' may not be the best long-term 
 strategy (remember that these too are volunteer projects, and 
 they don't have an army, especially GDC, and you shouldn't 
 assume that magically they'll always manage just because they 
 have so far), whether or not it catches up.
No one is saying use gdc or ldc, people are saying use gcc or llvm.
 There's always an opportunity cost, but so what.
Either you don't understand the concept of opportunity cost or your falling into the sunken costs fallacy.
 One should treat dmd performance as an interesting challenge, 
 not something that's set in stone, and can never much change.
goto opportunity_cost;
Oct 26 2015
parent reply burjui <bytefu gmail.com> writes:
On Tuesday, 27 October 2015 at 05:27:22 UTC, Jack Stouffer wrote:
 My intentions are to call things as they are. If people are 
 demoralized after learning that one person working in his spare 
 time can't match the productivity of several people working 
 full time, then they need a reality check.
Can't agree more. It's unrealistic to expect Walter work on the backend full-time just to catch up with GCC and LLVM teams, let alone support architectures other than x86 as well.
Oct 27 2015
parent ponce <contact gam3sfrommars.fr> writes:
On Tuesday, 27 October 2015 at 11:23:37 UTC, burjui wrote:
 On Tuesday, 27 October 2015 at 05:27:22 UTC, Jack Stouffer 
 wrote:
 My intentions are to call things as they are. If people are 
 demoralized after learning that one person working in his 
 spare time can't match the productivity of several people 
 working full time, then they need a reality check.
Can't agree more. It's unrealistic to expect Walter work on the backend full-time just to catch up with GCC and LLVM teams, let alone support architectures other than x86 as well.
Moreover, given the frequent backend regressions it might be better not to touch it too much. As of now relying on DMD for optimized builds gives constant work.
Oct 27 2015
prev sibling parent reply Etienne Cimon <etcimon gmail.com> writes:
On Monday, 26 October 2015 at 20:30:51 UTC, rsw0x wrote:
 On Monday, 26 October 2015 at 11:37:17 UTC, Etienne Cimon wrote:
 On Monday, 26 October 2015 at 04:48:09 UTC, H. S. Teoh wrote:
 On Mon, Oct 26, 2015 at 02:37:16AM +0000, Etienne Cimon via 
 Digitalmars-d wrote:

 If you must use DMD, I recommend filing an enhancement 
 request and bothering Walter about it.


 T
I'd really like the performance benefits to be available to DMD users as well. I think I'll have to write it all with inline assembler just to be sure...
dmd will never reach gdc/ldc performance, gcc and LLVM have entire teams of people that actively contribute to their compilers.
LDC couldn't inline it either. My only options at this point is to write the assembly or link to a C library.
Oct 27 2015
parent reply Etienne Cimon <etcimon gmail.com> writes:
On Tuesday, 27 October 2015 at 18:18:36 UTC, Etienne Cimon wrote:
 On Monday, 26 October 2015 at 20:30:51 UTC, rsw0x wrote:
 On Monday, 26 October 2015 at 11:37:17 UTC, Etienne Cimon 
 wrote:
 On Monday, 26 October 2015 at 04:48:09 UTC, H. S. Teoh wrote:
 [...]
I'd really like the performance benefits to be available to DMD users as well. I think I'll have to write it all with inline assembler just to be sure...
dmd will never reach gdc/ldc performance, gcc and LLVM have entire teams of people that actively contribute to their compilers.
LDC couldn't inline it either. My only options at this point is to write the assembly or link to a C library.
Btw, DMD and LDC had similar performance.
Oct 27 2015
parent reply David Nadlinger <code klickverbot.at> writes:
On Tuesday, 27 October 2015 at 18:19:38 UTC, Etienne Cimon wrote:
 On Tuesday, 27 October 2015 at 18:18:36 UTC, Etienne Cimon 
 wrote:
 LDC couldn't inline it either. My only options at this point 
 is to write the assembly or link to a C library.
Btw, DMD and LDC had similar performance.
This would be very strange in numerical code. Are you building all the relevant modules at once, and with `ldc2 -singleobj` or `ldmd` (which enables -singleobj by default)? LDC will not do cross-module inlining otherwise. — David
Oct 27 2015
parent reply Iain Buclaw via Digitalmars-d <digitalmars-d puremagic.com> writes:
On 27 Oct 2015 9:15 pm, "David Nadlinger via Digitalmars-d" <
digitalmars-d puremagic.com> wrote:
 On Tuesday, 27 October 2015 at 18:19:38 UTC, Etienne Cimon wrote:
 On Tuesday, 27 October 2015 at 18:18:36 UTC, Etienne Cimon wrote:
 LDC couldn't inline it either. My only options at this point is to
write the assembly or link to a C library.
 Btw, DMD and LDC had similar performance.
This would be very strange in numerical code. Are you building all the
relevant modules at once, and with `ldc2 -singleobj` or `ldmd` (which enables -singleobj by default)? LDC will not do cross-module inlining otherwise.
  =E2=80=94 David
There was a recent talk on optimisations using a sumMatrix function and about 9 variants of it. Unfortunately the graphs of LDC didn't fit the screen, but the speaker assured there was nothing interesting to see (for the most part, the half graphs of LDC only started to look more performant than DMD when vectors started being used). Ignoring the vector implementations, it was found that DMD did best when the loop was hand unrolled, and GDC outperformed the vectorized versions (and all other compilers) by using an obscure one-liner combination of std.algorithm functions with unsafe math optimisations turned on. std.algorithm.sum was the slowest of the bunch. I'll have to see if anything was published. Iain.
Oct 29 2015
parent reply jmh530 <john.michael.hall gmail.com> writes:
On Thursday, 29 October 2015 at 08:24:19 UTC, Iain Buclaw wrote:
 std.algorithm.sum was the slowest of the bunch.
I would be a little careful making comparisons with std.algorithm.sum because it uses a variety of different algorithms.
Oct 29 2015
parent Iain Buclaw via Digitalmars-d <digitalmars-d puremagic.com> writes:
On 29 October 2015 at 14:41, jmh530 via Digitalmars-d <
digitalmars-d puremagic.com> wrote:

 On Thursday, 29 October 2015 at 08:24:19 UTC, Iain Buclaw wrote:

 std.algorithm.sum was the slowest of the bunch.
I would be a little careful making comparisons with std.algorithm.sum because it uses a variety of different algorithms.
Well, the point of the talk was about simplicity vs. performance. As the implementation was being writing interactively, std.algorithm.sum was tested first because it's the most obvious way to achieve the end goal.
Oct 29 2015
prev sibling parent Marco Leise <Marco.Leise gmx.de> writes:
Am Mon, 26 Oct 2015 11:37:16 +0000
schrieb Etienne Cimon <etcimon gmail.com>:

 On Monday, 26 October 2015 at 04:48:09 UTC, H. S. Teoh wrote:
 On Mon, Oct 26, 2015 at 02:37:16AM +0000, Etienne Cimon via 
 Digitalmars-d wrote:

 If you must use DMD, I recommend filing an enhancement request 
 and bothering Walter about it.


 T
I'd really like the performance benefits to be available to DMD users as well. I think I'll have to write it all with inline assembler just to be sure...
Remember though, that classic inline asm is not transparent to the compiler and doesn't inline. -- Marco
Oct 28 2015
prev sibling next sibling parent =?UTF-8?B?TcOhcmNpbw==?= Martins <marcioapm gmail.com> writes:
On Monday, 26 October 2015 at 02:37:18 UTC, Etienne Cimon wrote:
 I've been playing around with perf and my web server and found 
 that the bottleneck is by far the math module of Botan: 
 https://github.com/etcimon/botan/blob/master/source/botan/math/mp/mp_core.d

 [...]
http://dlang.org/pragma.html#inline not working as expected? Writing inline assembly should really be the last resort...
Oct 26 2015
prev sibling parent Walter Bright <newshound2 digitalmars.com> writes:
On 10/25/2015 7:37 PM, Etienne Cimon wrote:
 I think the best option would be for an inline feature in DMD that works, but
 I'm wondering what the stance is right now about the subject?
There have been some recent changes to improve inlining in dmd: https://github.com/D-Programming-Language/dmd/pull/5153 https://github.com/D-Programming-Language/dmd/pull/5150 https://github.com/D-Programming-Language/dmd/pull/5136 More work needs to be done, but it is progress.
Oct 27 2015