digitalmars.D - DMD is slow for matrix maths?
- Etienne Cimon (13/13) Oct 25 2015 I've been playing around with perf and my web server and found
- H. S. Teoh via Digitalmars-d (8/23) Oct 25 2015 For an immediate solution to performance-related issues, I recommend
- Etienne Cimon (4/9) Oct 26 2015 I'd really like the performance benefits to be available to DMD
- H. S. Teoh via Digitalmars-d (7/20) Oct 26 2015 Sure, but filing a bug with DMD and prodding Walter to do something
- Dmitry Olshansky (6/23) Oct 26 2015 The last few tweaks to the backend seem to noticeably improve
- rsw0x (4/16) Oct 26 2015 dmd will never reach gdc/ldc performance, gcc and LLVM have
- default0 (2/5) Oct 26 2015 We have a Walter though :-)
- Jack Stouffer (11/17) Oct 26 2015 Walter has a job as well as trying to manage the programming
- Laeeth Isharc (28/46) Oct 26 2015 Someone who says never is certainly making a bold claim in an
- H. S. Teoh via Digitalmars-d (19/30) Oct 26 2015 While I agree that improving dmd performance will at the end of the day
- Laeeth Isharc (4/35) Oct 26 2015 Thanks for the insight.
- rsw0x (6/14) Oct 26 2015 Nobody besides Walter will work on the backend because
- Laeeth Isharc (11/29) Oct 26 2015 1) and so? There is proprietary and proprietary, and maybe I
- Iain Buclaw via Digitalmars-d (11/12) Oct 27 2015 hardly a fixed size cake. And I don't know, but the relationships betwe...
- Jack Stouffer (22/40) Oct 26 2015 My intentions are to call things as they are. If people are
- Etienne Cimon (3/20) Oct 27 2015 LDC couldn't inline it either. My only options at this point is
- Etienne Cimon (2/17) Oct 27 2015 Btw, DMD and LDC had similar performance.
- David Nadlinger (6/11) Oct 27 2015 This would be very strange in numerical code. Are you building
- Iain Buclaw via Digitalmars-d (18/28) Oct 29 2015 write the assembly or link to a C library.
- jmh530 (4/5) Oct 29 2015 I would be a little careful making comparisons with
- Iain Buclaw via Digitalmars-d (5/12) Oct 29 2015 Well, the point of the talk was about simplicity vs. performance. As the
- Marco Leise (6/19) Oct 28 2015 Remember though, that classic inline asm is not transparent to
- =?UTF-8?B?TcOhcmNpbw==?= Martins (3/7) Oct 26 2015 http://dlang.org/pragma.html#inline not working as expected?
- Walter Bright (6/8) Oct 27 2015 There have been some recent changes to improve inlining in dmd:
I've been playing around with perf and my web server and found that the bottleneck is by far the math module of Botan: https://github.com/etcimon/botan/blob/master/source/botan/math/mp/mp_core.d I'm probably a bit naive but I was wishing for some inlining to happen. I see LOTS of CPU time spent on "pop" instructions to return from a simple multiply function, and the pragma(inline, true) was refused on all of these. So, should I wait for an inline? Should I import another library? Should I rewrite all the maths in assembly manually for each processor? Should I write another library that must be compiled with LDC/release for maths? I think the best option would be for an inline feature in DMD that works, but I'm wondering what the stance is right now about the subject?
Oct 25 2015
On Mon, Oct 26, 2015 at 02:37:16AM +0000, Etienne Cimon via Digitalmars-d wrote:I've been playing around with perf and my web server and found that the bottleneck is by far the math module of Botan: https://github.com/etcimon/botan/blob/master/source/botan/math/mp/mp_core.d I'm probably a bit naive but I was wishing for some inlining to happen. I see LOTS of CPU time spent on "pop" instructions to return from a simple multiply function, and the pragma(inline, true) was refused on all of these. So, should I wait for an inline? Should I import another library? Should I rewrite all the maths in assembly manually for each processor? Should I write another library that must be compiled with LDC/release for maths? I think the best option would be for an inline feature in DMD that works, but I'm wondering what the stance is right now about the subject?For an immediate solution to performance-related issues, I recommend using GDC or LDC (with maximum optimization options) instead of DMD. If you must use DMD, I recommend filing an enhancement request and bothering Walter about it. T -- MS Windows: 64-bit rehash of 32-bit extensions and a graphical shell for a 16-bit patch to an 8-bit operating system originally coded for a 4-bit microprocessor, written by a 2-bit company that can't stand 1-bit of competition.
Oct 25 2015
On Monday, 26 October 2015 at 04:48:09 UTC, H. S. Teoh wrote:On Mon, Oct 26, 2015 at 02:37:16AM +0000, Etienne Cimon via Digitalmars-d wrote: If you must use DMD, I recommend filing an enhancement request and bothering Walter about it. TI'd really like the performance benefits to be available to DMD users as well. I think I'll have to write it all with inline assembler just to be sure...
Oct 26 2015
On Mon, Oct 26, 2015 at 11:37:16AM +0000, Etienne Cimon via Digitalmars-d wrote:On Monday, 26 October 2015 at 04:48:09 UTC, H. S. Teoh wrote:Sure, but filing a bug with DMD and prodding Walter to do something about it will only benefit everybody in the end, and not just for your current project. DMD could use some major improvements in its optimizer. T -- Answer: Because it breaks the logical sequence of discussion. / Question: Why is top posting bad?On Mon, Oct 26, 2015 at 02:37:16AM +0000, Etienne Cimon via Digitalmars-d wrote: If you must use DMD, I recommend filing an enhancement request and bothering Walter about it. TI'd really like the performance benefits to be available to DMD users as well. I think I'll have to write it all with inline assembler just to be sure...
Oct 26 2015
On 26-Oct-2015 16:44, H. S. Teoh via Digitalmars-d wrote:On Mon, Oct 26, 2015 at 11:37:16AM +0000, Etienne Cimon via Digitalmars-d wrote:The last few tweaks to the backend seem to noticeably improve performance. And there is plenty of stuff to enable in inliner, with few more specific cases added recently (I think 2.069 will have them). -- Dmitry OlshanskyOn Monday, 26 October 2015 at 04:48:09 UTC, H. S. Teoh wrote:Sure, but filing a bug with DMD and prodding Walter to do something about it will only benefit everybody in the end, and not just for your current project. DMD could use some major improvements in its optimizer.On Mon, Oct 26, 2015 at 02:37:16AM +0000, Etienne Cimon via Digitalmars-d wrote: If you must use DMD, I recommend filing an enhancement request and bothering Walter about it. TI'd really like the performance benefits to be available to DMD users as well. I think I'll have to write it all with inline assembler just to be sure...
Oct 26 2015
On Monday, 26 October 2015 at 11:37:17 UTC, Etienne Cimon wrote:On Monday, 26 October 2015 at 04:48:09 UTC, H. S. Teoh wrote:dmd will never reach gdc/ldc performance, gcc and LLVM have entire teams of people that actively contribute to their compilers.On Mon, Oct 26, 2015 at 02:37:16AM +0000, Etienne Cimon via Digitalmars-d wrote: If you must use DMD, I recommend filing an enhancement request and bothering Walter about it. TI'd really like the performance benefits to be available to DMD users as well. I think I'll have to write it all with inline assembler just to be sure...
Oct 26 2015
On Monday, 26 October 2015 at 20:30:51 UTC, rsw0x wrote:dmd will never reach gdc/ldc performance, gcc and LLVM have entire teams of people that actively contribute to their compilers.We have a Walter though :-)
Oct 26 2015
On Monday, 26 October 2015 at 21:29:47 UTC, default0 wrote:On Monday, 26 October 2015 at 20:30:51 UTC, rsw0x wrote:Walter has a job as well as trying to manage the programming language as a whole. He also makes contributions to Phobos and druntime which eat up time that he would otherwise spend on dmd. Also, Walter seems to be the only one who understands the backend code of dmd, so even if we left aside the licensing issues (which are huge), that really limits the quality of the backend as generally only one set of eyes are on the code. Compare that to the team of LLVM, which has people working full time on it as well as major contributions from some of the largest tech companies.dmd will never reach gdc/ldc performance, gcc and LLVM have entire teams of people that actively contribute to their compilers.We have a Walter though :-)
Oct 26 2015
On Monday, 26 October 2015 at 23:13:22 UTC, Jack Stouffer wrote:On Monday, 26 October 2015 at 21:29:47 UTC, default0 wrote:Someone who says never is certainly making a bold claim in an uncertain world at a time when the basic factors governing the fate of D are visibly shifting in a positive direction. In addition, it's mildly demoralising to others to say such things, no matter how good ones intent might be. I don't know much about compiler backends, but It doesn't sound like in the medium term significantly improving the backend code is something utterly fantastic, like sending a man to the moon umm I mean private sector space travel, hmmm well better pick a different example. You maybe need some increase in contributions, and maybe also some companies that have an interest in using dmd and that can pay for some resources to help. Was Weka not using DMD, for example ? (Maybe I misremember). These things will change over time - look at the spurt of hiring lately, and I am doing my small part too (beginnings are often small). So I would suggest it isn't productive to think about whether Dmd will catch up. Maybe, maybe not. Doesn't matter. Making it faster will help many - this defeatist attitude of 'why bother - just use GDC or LDC' may not be the best long-term strategy (remember that these too are volunteer projects, and they don't have an army, especially GDC, and you shouldn't assume that magically they'll always manage just because they have so far), whether or not it catches up. There's always an opportunity cost, but so what. One should treat dmd performance as an interesting challenge, not something that's set in stone, and can never much change.On Monday, 26 October 2015 at 20:30:51 UTC, rsw0x wrote:Walter has a job as well as trying to manage the programming language as a whole. He also makes contributions to Phobos and druntime which eat up time that he would otherwise spend on dmd. Also, Walter seems to be the only one who understands the backend code of dmd, so even if we left aside the licensing issues (which are huge), that really limits the quality of the backend as generally only one set of eyes are on the code. Compare that to the team of LLVM, which has people working full time on it as well as major contributions from some of the largest tech companies.dmd will never reach gdc/ldc performance, gcc and LLVM have entire teams of people that actively contribute to their compilers.We have a Walter though :-)
Oct 26 2015
On Tue, Oct 27, 2015 at 02:21:38AM +0000, Laeeth Isharc via Digitalmars-d wrote: [...]So I would suggest it isn't productive to think about whether Dmd will catch up. Maybe, maybe not. Doesn't matter. Making it faster will help many - this defeatist attitude of 'why bother - just use GDC or LDC' may not be the best long-term strategy (remember that these too are volunteer projects, and they don't have an army, especially GDC, and you shouldn't assume that magically they'll always manage just because they have so far), whether or not it catches up. There's always an opportunity cost, but so what. One should treat dmd performance as an interesting challenge, not something that's set in stone, and can never much change.While I agree that improving dmd performance will at the end of the day reap benefits for all parties involved, I'm also not holding my breath for that to happen. I haven't measured the performance of the latest DMD git HEAD against GDC, but the last time I checked (which is sometime earlier this year, not that long ago), GDC-generated code still outperforms DMD-generated code by 20%-30%. Judging from the assembly code, this difference mainly arises from the DMD inliner being overly conservative, and the lack of advanced loop optimization techniques such as aggressive code hoisting, strength reduction, unrolling, etc.. Fixing the inliner will probably be relatively easy; loop optimizations will be more challenging, especially if Walter continues to insist on compilation speed. Some of the problems that have to be solved in advanced loop optimization require more expensive algorithms, which may incur a penalty on compilation time. T -- Creativity is not an excuse for sloppiness.
Oct 26 2015
On Tuesday, 27 October 2015 at 03:10:43 UTC, H. S. Teoh wrote:On Tue, Oct 27, 2015 at 02:21:38AM +0000, Laeeth Isharc via Digitalmars-d wrote: [...]Thanks for the insight. If the optimisations are configurable by the user, you pick your poison, and live with that tradeoff, no?So I would suggest it isn't productive to think about whether Dmd will catch up. Maybe, maybe not. Doesn't matter. Making it faster will help many - this defeatist attitude of 'why bother - just use GDC or LDC' may not be the best long-term strategy (remember that these too are volunteer projects, and they don't have an army, especially GDC, and you shouldn't assume that magically they'll always manage just because they have so far), whether or not it catches up. There's always an opportunity cost, but so what. One should treat dmd performance as an interesting challenge, not something that's set in stone, and can never much change.While I agree that improving dmd performance will at the end of the day reap benefits for all parties involved, I'm also not holding my breath for that to happen. I haven't measured the performance of the latest DMD git HEAD against GDC, but the last time I checked (which is sometime earlier this year, not that long ago), GDC-generated code still outperforms DMD-generated code by 20%-30%. Judging from the assembly code, this difference mainly arises from the DMD inliner being overly conservative, and the lack of advanced loop optimization techniques such as aggressive code hoisting, strength reduction, unrolling, etc.. Fixing the inliner will probably be relatively easy; loop optimizations will be more challenging, especially if Walter continues to insist on compilation speed. Some of the problems that have to be solved in advanced loop optimization require more expensive algorithms, which may incur a penalty on compilation time. T
Oct 26 2015
On Tuesday, 27 October 2015 at 02:21:40 UTC, Laeeth Isharc wrote:On Monday, 26 October 2015 at 23:13:22 UTC, Jack Stouffer wrote:Nobody besides Walter will work on the backend because 1) It's under a proprietary license 2) Nobody besides Walter understands it 3) Two faster backends already exist and get maintenance for free. D already has a strained, split developer community.[...]Someone who says never is certainly making a bold claim in an uncertain world at a time when the basic factors governing the fate of D are visibly shifting in a positive direction. In addition, it's mildly demoralising to others to say such things, no matter how good ones intent might be. [...]
Oct 26 2015
On Tuesday, 27 October 2015 at 03:14:55 UTC, rsw0x wrote:On Tuesday, 27 October 2015 at 02:21:40 UTC, Laeeth Isharc wrote:1) and so? There is proprietary and proprietary, and maybe I simply don't understand the practical reasons why someone who understands the pragmatic constraints and realities of what the license actually means would be put off, but it would be interesting to know why it is such a deal killer in your eyes. 2) is it beyond the reach of anyone else to learn ? 3) what exactly is the opportunity cost of Iain working on gdc ? It's hardly a fixed size cake. And I don't know, but the relationships between different back end people doesn't from public postings strike me as strained.On Monday, 26 October 2015 at 23:13:22 UTC, Jack Stouffer wrote:Nobody besides Walter will work on the backend because 1) It's under a proprietary license 2) Nobody besides Walter understands it 3) Two faster backends already exist and get maintenance for free. D already has a strained, split developer community.[...]Someone who says never is certainly making a bold claim in an uncertain world at a time when the basic factors governing the fate of D are visibly shifting in a positive direction. In addition, it's mildly demoralising to others to say such things, no matter how good ones intent might be. [...]
Oct 26 2015
On 27 Oct 2015 4:25 am, "Laeeth Isharc via Digitalmars-d" < digitalmars-d puremagic.com> wrote:3) what exactly is the opportunity cost of Iain working on gdc ? It'shardly a fixed size cake. And I don't know, but the relationships between different back end people doesn't from public postings strike me as strained.Cost? https://www.openhub.net/p/gdc/estimated_cost. :-) We get somewhere in the vicinity of there eventually on matters of implementation detail (to a lesser extent we agree on design when it comes to platform dependant details). There's never anything short of healthy debate between us. Iain.
Oct 27 2015
On Tuesday, 27 October 2015 at 02:21:40 UTC, Laeeth Isharc wrote:In addition, it's mildly demoralising to others to say such things, no matter how good ones intent might be.My intentions are to call things as they are. If people are demoralized after learning that one person working in his spare time can't match the productivity of several people working full time, then they need a reality check.You maybe need some increase in contributionsPeople are perfectly willing to contribute to the front-end of dmd as it's boost licensed and not esoteric. The back-end, as I mentioned before, is really only understood by two or three people. Also, the dmd back-end struggles to be classified as open source and is certainly in no way free software. Those two things don't really inspire many people to sink their teeth into dmd's backend, especially sense all of your work will become copyrighted by a private company after you submit it.Was Weka not using DMD, for example ? (Maybe I misremember). These things will change over time - look at the spurt of hiring lately, and I am doing my small part too (beginnings are often small).Companies using dmd are unlikely to influence dmd's codegen in any way. The biggest thing that companies will complain about are compiler and Phobos bugs, as it actively stops them from doing any work.So I would suggest it isn't productive to think about whether Dmd will catch up. Maybe, maybe not. Doesn't matter. Making it faster will help many - this defeatist attitude of 'why bother - just use GDC or LDC' may not be the best long-term strategy (remember that these too are volunteer projects, and they don't have an army, especially GDC, and you shouldn't assume that magically they'll always manage just because they have so far), whether or not it catches up.No one is saying use gdc or ldc, people are saying use gcc or llvm.There's always an opportunity cost, but so what.Either you don't understand the concept of opportunity cost or your falling into the sunken costs fallacy.One should treat dmd performance as an interesting challenge, not something that's set in stone, and can never much change.goto opportunity_cost;
Oct 26 2015
On Tuesday, 27 October 2015 at 05:27:22 UTC, Jack Stouffer wrote:My intentions are to call things as they are. If people are demoralized after learning that one person working in his spare time can't match the productivity of several people working full time, then they need a reality check.Can't agree more. It's unrealistic to expect Walter work on the backend full-time just to catch up with GCC and LLVM teams, let alone support architectures other than x86 as well.
Oct 27 2015
On Tuesday, 27 October 2015 at 11:23:37 UTC, burjui wrote:On Tuesday, 27 October 2015 at 05:27:22 UTC, Jack Stouffer wrote:Moreover, given the frequent backend regressions it might be better not to touch it too much. As of now relying on DMD for optimized builds gives constant work.My intentions are to call things as they are. If people are demoralized after learning that one person working in his spare time can't match the productivity of several people working full time, then they need a reality check.Can't agree more. It's unrealistic to expect Walter work on the backend full-time just to catch up with GCC and LLVM teams, let alone support architectures other than x86 as well.
Oct 27 2015
On Monday, 26 October 2015 at 20:30:51 UTC, rsw0x wrote:On Monday, 26 October 2015 at 11:37:17 UTC, Etienne Cimon wrote:LDC couldn't inline it either. My only options at this point is to write the assembly or link to a C library.On Monday, 26 October 2015 at 04:48:09 UTC, H. S. Teoh wrote:dmd will never reach gdc/ldc performance, gcc and LLVM have entire teams of people that actively contribute to their compilers.On Mon, Oct 26, 2015 at 02:37:16AM +0000, Etienne Cimon via Digitalmars-d wrote: If you must use DMD, I recommend filing an enhancement request and bothering Walter about it. TI'd really like the performance benefits to be available to DMD users as well. I think I'll have to write it all with inline assembler just to be sure...
Oct 27 2015
On Tuesday, 27 October 2015 at 18:18:36 UTC, Etienne Cimon wrote:On Monday, 26 October 2015 at 20:30:51 UTC, rsw0x wrote:Btw, DMD and LDC had similar performance.On Monday, 26 October 2015 at 11:37:17 UTC, Etienne Cimon wrote:LDC couldn't inline it either. My only options at this point is to write the assembly or link to a C library.On Monday, 26 October 2015 at 04:48:09 UTC, H. S. Teoh wrote:dmd will never reach gdc/ldc performance, gcc and LLVM have entire teams of people that actively contribute to their compilers.[...]I'd really like the performance benefits to be available to DMD users as well. I think I'll have to write it all with inline assembler just to be sure...
Oct 27 2015
On Tuesday, 27 October 2015 at 18:19:38 UTC, Etienne Cimon wrote:On Tuesday, 27 October 2015 at 18:18:36 UTC, Etienne Cimon wrote:This would be very strange in numerical code. Are you building all the relevant modules at once, and with `ldc2 -singleobj` or `ldmd` (which enables -singleobj by default)? LDC will not do cross-module inlining otherwise. — DavidLDC couldn't inline it either. My only options at this point is to write the assembly or link to a C library.Btw, DMD and LDC had similar performance.
Oct 27 2015
On 27 Oct 2015 9:15 pm, "David Nadlinger via Digitalmars-d" < digitalmars-d puremagic.com> wrote:On Tuesday, 27 October 2015 at 18:19:38 UTC, Etienne Cimon wrote:write the assembly or link to a C library.On Tuesday, 27 October 2015 at 18:18:36 UTC, Etienne Cimon wrote:LDC couldn't inline it either. My only options at this point is torelevant modules at once, and with `ldc2 -singleobj` or `ldmd` (which enables -singleobj by default)? LDC will not do cross-module inlining otherwise.Btw, DMD and LDC had similar performance.This would be very strange in numerical code. Are you building all the=E2=80=94 DavidThere was a recent talk on optimisations using a sumMatrix function and about 9 variants of it. Unfortunately the graphs of LDC didn't fit the screen, but the speaker assured there was nothing interesting to see (for the most part, the half graphs of LDC only started to look more performant than DMD when vectors started being used). Ignoring the vector implementations, it was found that DMD did best when the loop was hand unrolled, and GDC outperformed the vectorized versions (and all other compilers) by using an obscure one-liner combination of std.algorithm functions with unsafe math optimisations turned on. std.algorithm.sum was the slowest of the bunch. I'll have to see if anything was published. Iain.
Oct 29 2015
On Thursday, 29 October 2015 at 08:24:19 UTC, Iain Buclaw wrote:std.algorithm.sum was the slowest of the bunch.I would be a little careful making comparisons with std.algorithm.sum because it uses a variety of different algorithms.
Oct 29 2015
On 29 October 2015 at 14:41, jmh530 via Digitalmars-d < digitalmars-d puremagic.com> wrote:On Thursday, 29 October 2015 at 08:24:19 UTC, Iain Buclaw wrote:Well, the point of the talk was about simplicity vs. performance. As the implementation was being writing interactively, std.algorithm.sum was tested first because it's the most obvious way to achieve the end goal.std.algorithm.sum was the slowest of the bunch.I would be a little careful making comparisons with std.algorithm.sum because it uses a variety of different algorithms.
Oct 29 2015
Am Mon, 26 Oct 2015 11:37:16 +0000 schrieb Etienne Cimon <etcimon gmail.com>:On Monday, 26 October 2015 at 04:48:09 UTC, H. S. Teoh wrote:Remember though, that classic inline asm is not transparent to the compiler and doesn't inline. -- MarcoOn Mon, Oct 26, 2015 at 02:37:16AM +0000, Etienne Cimon via Digitalmars-d wrote: If you must use DMD, I recommend filing an enhancement request and bothering Walter about it. TI'd really like the performance benefits to be available to DMD users as well. I think I'll have to write it all with inline assembler just to be sure...
Oct 28 2015
On Monday, 26 October 2015 at 02:37:18 UTC, Etienne Cimon wrote:I've been playing around with perf and my web server and found that the bottleneck is by far the math module of Botan: https://github.com/etcimon/botan/blob/master/source/botan/math/mp/mp_core.d [...]http://dlang.org/pragma.html#inline not working as expected? Writing inline assembly should really be the last resort...
Oct 26 2015
On 10/25/2015 7:37 PM, Etienne Cimon wrote:I think the best option would be for an inline feature in DMD that works, but I'm wondering what the stance is right now about the subject?There have been some recent changes to improve inlining in dmd: https://github.com/D-Programming-Language/dmd/pull/5153 https://github.com/D-Programming-Language/dmd/pull/5150 https://github.com/D-Programming-Language/dmd/pull/5136 More work needs to be done, but it is progress.
Oct 27 2015