digitalmars.D.announce - Increasing D Compiler Speed by Over 75%
- Walter Bright (1/1) Jul 25 2013 http://www.reddit.com/r/programming/comments/1j1i30/increasing_the_d_com...
- Brad Anderson (3/4) Jul 25 2013 I propose we always refer to compiling as "doing the nasty" from
- Nick Sabalausky (3/8) Jul 25 2013 Yea, that's just absolutely classic :)
- dennis luehring (4/5) Jul 26 2013 do you compare dmc based and visualc based dmd builds?
- Walter Bright (3/6) Jul 26 2013 It would be most interesting to see just what it was that made the vc bu...
- Temtaime (4/4) Jul 30 2013 DMC is ugly compiler.
- Brad Anderson (6/10) Jul 30 2013 I'm willing to bet Walter would accept pull requests to add
- Walter Bright (2/3) Jul 30 2013 I'm sad that I never got the opportunity to be insulted by Jobs.
- dennis luehring (5/11) Jul 30 2013 ugly means bad or miss-designed, but please show me a better 16/32(64)
- Walter Bright (2/3) Jul 31 2013 That's an old number now. Someone want to try it with the current HEAD?
- dennis luehring (48/51) Jul 31 2013 tried to but failed
- Rainer Schuetze (17/20) Jul 31 2013 I have just tried yesterdays dmd to build Visual D (it builds some
- Walter Bright (9/24) Jul 31 2013 That makes it clear that the dmc malloc() was the dominator, not code ge...
- Richard Webb (5/14) Aug 02 2013 It still appears that the DMC malloc is a big reason for the difference
- Walter Bright (3/7) Aug 02 2013 Yes, I agree, the DMC malloc is clearly a large performance problem. I h...
- dennis luehring (3/11) Jul 31 2013 can you also give us also timings for
- Rainer Schuetze (7/21) Jul 31 2013 std.algorithm -unittest -main:
- dennis luehring (3/28) Jul 31 2013 so we can "still" say das msc builds are around two times faster - or
- dennis luehring (4/29) Aug 01 2013 results from mingw, vs2012(13) and llvm-clang builds would be also very
- Walter Bright (2/2) Aug 01 2013 I've now upgraded dmc so dmd builds can take advantage of improved code ...
- Rainer Schuetze (8/11) Aug 02 2013 Although my laptop got quite a bit faster overnight (I guess it was
- Walter Bright (55/63) Aug 02 2013 The two dmc times shouldn't be the same. I see a definite improvement.
- Rainer Schuetze (23/40) Aug 02 2013 My disassembly looks exactly the same. I don't think that a single div
- Walter Bright (2/8) Aug 02 2013 I'm using an AMD FX-6100.
- Rainer Schuetze (5/18) Aug 02 2013 This processor seems to do a little better with the mov reg,imm
-
Daniel Murphy
(7/13)
Aug 02 2013
"Rainer Schuetze"
wrote in message - Walter Bright (2/6) Aug 02 2013 Hmm, very interesting!
- Dmitry Olshansky (8/16) Aug 02 2013 Made a pull to provide an implementation of rmem.c on top of Win32 Heap ...
- Don (7/8) Jul 26 2013 I just reported this compile speed killer:
http://www.reddit.com/r/programming/comments/1j1i30/increasing_the_d_compiler_speed_by_over_75/
Jul 25 2013
On Thursday, 25 July 2013 at 18:03:22 UTC, Walter Bright wrote:http://www.reddit.com/r/programming/comments/1j1i30/increasing_the_d_compiler_speed_by_over_75/I propose we always refer to compiling as "doing the nasty" from this moment forward.
Jul 25 2013
On Thu, 25 Jul 2013 20:04:10 +0200 "Brad Anderson" <eco gnuk.net> wrote:On Thursday, 25 July 2013 at 18:03:22 UTC, Walter Bright wrote:Yea, that's just absolutely classic :)http://www.reddit.com/r/programming/comments/1j1i30/increasing_the_d_compiler_speed_by_over_75/I propose we always refer to compiling as "doing the nasty" from this moment forward.
Jul 25 2013
Am 25.07.2013 20:03, schrieb Walter Bright:http://www.reddit.com/r/programming/comments/1j1i30/increasing_the_d_compiler_speed_by_over_75/do you compare dmc based and visualc based dmd builds? the vc dmd build seems to be always two times faster - how does that look with your optimization?
Jul 26 2013
On 7/26/2013 1:25 AM, dennis luehring wrote:do you compare dmc based and visualc based dmd builds? the vc dmd build seems to be always two times faster - how does that look with your optimization?It would be most interesting to see just what it was that made the vc build faster. But that won't help on Linux/FreeBSD/OSX.
Jul 26 2013
DMC is ugly compiler. It will be much nicer if you'll use mingw for that purpose on Windows. GCC usually generates more faster code that VC does. http://sourceforge.net/projects/mingwbuilds/
Jul 30 2013
On Tuesday, 30 July 2013 at 09:04:10 UTC, Temtaime wrote:DMC is ugly compiler. It will be much nicer if you'll use mingw for that purpose on Windows. GCC usually generates more faster code that VC does. http://sourceforge.net/projects/mingwbuilds/I'm willing to bet Walter would accept pull requests to add support for mingw like he did with VC. Be sure to document the build process when you make the changes. Sidenote: Insulting Walter's work isn't a great way to get him to do your a favor.
Jul 30 2013
On 7/30/2013 11:16 AM, Brad Anderson wrote:Sidenote: Insulting Walter's work isn't a great way to get him to do your a favor.I'm sad that I never got the opportunity to be insulted by Jobs.
Jul 30 2013
Am 30.07.2013 11:04, schrieb Temtaime:DMC is ugly compiler. It will be much nicer if you'll use mingw for that purpose on Windows. GCC usually generates more faster code that VC does. http://sourceforge.net/projects/mingwbuilds/DMC is ugly compiler.ugly means bad or miss-designed, but please show me a better 16/32(64) bit full c/c++ compiler out thereGCC usually generates more faster code that VC does.currently the vc builded dmd is about 2 times faster in compiling, do you think that a mingw build will even top this?
Jul 30 2013
On 7/30/2013 11:40 PM, dennis luehring wrote:currently the vc builded dmd is about 2 times faster in compilingThat's an old number now. Someone want to try it with the current HEAD?
Jul 31 2013
Am 31.07.2013 09:00, schrieb Walter Bright:On 7/30/2013 11:40 PM, dennis luehring wrote:tried to but failed downloaded dmd-master.zip (from github) downloaded dmd.2.063.2.zip buidl dmd-master with vs2010 copied the produces dmd_msc.exe to dmd.2.063.2\dmd2\windows\bin dmd.2.063.2\dmd2\src\phobos>..\..\windows\bin\dmd.exe std\algorithm -unittest -main gives Error: cannot read file ûmain.d (what is this "û" in front of main.d?) dmd.2.063.2\dmd2\src\phobos>..\..\windows\bin\dmd_msc.exe std\algorithm -unittest -main gives std\datetime.d(31979): Error: pure function 'std.datetime.enforceValid!"hours".enforceValid' cannot call impure function 'core.time.TimeException.this' std\datetime.d(13556): Error: template instance std.datetime.enforceValid!"hours" error instantiating std\datetime.d(31984): Error: pure function 'std.datetime.enforceValid!"minutes".enforceValid' cannot call impure function 'core.time.TimeException.this' std\datetime.d(13557): Error: template instance std.datetime.enforceValid!"minutes" error instantiating std\datetime.d(31989): Error: pure function 'std.datetime.enforceValid!"seconds".enforceValid' cannot call impure function 'core.time.TimeException.this' std\datetime.d(13558): Error: template instance std.datetime.enforceValid!"seconds" error instantiating std\datetime.d(33284): called from here: (TimeOfDay __ctmp1990; , __ctmp1990).this(0, 0, 0) std\datetime.d(33293): Error: CTFE failed because of previous errors in this std\datetime.d(31974): Error: pure function 'std.datetime.enforceValid!"months".enforceValid' cannot call impure function 'core.time.TimeException.this' std\datetime.d(8994): Error: template instance std.datetime.enforceValid!"months" error instantiating std\datetime.d(32012): Error: pure function 'std.datetime.enforceValid!"days".enforceValid' cannot call impure function 'core.time.TimeException.this' std\datetime.d(8995): Error: template instance std.datetime.enforceValid!"days" error instantiating std\datetime.d(33389): called from here: (Date __ctmp1999; , __ctmp1999).this(-3760, 9, 7) std\datetime.d(33458): Error: CTFE failed because of previous errors in this Error: undefined identifier '_xopCmp' and a compiler crash my former benchmark where done the same way and it worked without any problems - this master seems to have problemscurrently the vc builded dmd is about 2 times faster in compilingThat's an old number now. Someone want to try it with the current HEAD?
Jul 31 2013
On 31.07.2013 09:00, Walter Bright wrote:On 7/30/2013 11:40 PM, dennis luehring wrote:I have just tried yesterdays dmd to build Visual D (it builds some libraries and contains a few short non-compiling tasks in between): Debug build dmd_dmc: 23 sec, std new 43 sec Debug build dmd_msc: 19 sec, std new 20 sec "std new" is the version without the "block allocator". Release build dmd_dmc: 3 min 30, std new 5 min 25 Release build dmd_msc: 1 min 32, std new 1 min 40 The release builds use "-release -O -inline" and need a bit more than 1 GB memory for two of the libraries (I still had to patch dmd_dmc to be large-address-aware). This shows that removing most of the allocations was a good optimization for the dmc-Runtime, but does not have a large, but still notable impact on a faster heap implementation (the VS runtime usually maps directly to the Windows API for non-Debug builds). I suspect the backend and the optimizer do not use "new" a lot, but plain "malloc" calls, so they still suffer from the slow runtime.currently the vc builded dmd is about 2 times faster in compilingThat's an old number now. Someone want to try it with the current HEAD?
Jul 31 2013
Thanks for doing this, this is good information. On 7/31/2013 2:24 PM, Rainer Schuetze wrote:I have just tried yesterdays dmd to build Visual D (it builds some libraries and contains a few short non-compiling tasks in between): Debug build dmd_dmc: 23 sec, std new 43 sec Debug build dmd_msc: 19 sec, std new 20 secThat makes it clear that the dmc malloc() was the dominator, not code gen."std new" is the version without the "block allocator". Release build dmd_dmc: 3 min 30, std new 5 min 25 Release build dmd_msc: 1 min 32, std new 1 min 40 The release builds use "-release -O -inline" and need a bit more than 1 GB memory for two of the libraries (I still had to patch dmd_dmc to be large-address-aware). This shows that removing most of the allocations was a good optimization for the dmc-Runtime, but does not have a large, but still notable impact on a faster heap implementation (the VS runtime usually maps directly to the Windows API for non-Debug builds). I suspect the backend and the optimizer do not use "new" a lot, but plain "malloc" calls, so they still suffer from the slow runtime.Actually, dmc still should give a better showing. All the optimizations I've put into dmd also went into dmc, and do result in significantly better code speed. For example, the hash modulus optimization has a significant impact, but I haven't released that dmc yet. Optimized builds have an entirely different profile than debug builds, and I haven't investigated that.
Jul 31 2013
On 01/08/2013 00:32, Walter Bright wrote:Thanks for doing this, this is good information. On 7/31/2013 2:24 PM, Rainer Schuetze wrote:It still appears that the DMC malloc is a big reason for the difference between DMC and MSVC builds when compiling the algorithm unit tests. (a very quick test suggests that changing the global new in rmem.c to call HeapAlloc instead of malloc gives a large speedup).I have just tried yesterdays dmd to build Visual D (it builds some libraries and contains a few short non-compiling tasks in between): Debug build dmd_dmc: 23 sec, std new 43 sec Debug build dmd_msc: 19 sec, std new 20 secThat makes it clear that the dmc malloc() was the dominator, not code gen.
Aug 02 2013
On 8/2/2013 4:18 AM, Richard Webb wrote:It still appears that the DMC malloc is a big reason for the difference between DMC and MSVC builds when compiling the algorithm unit tests. (a very quick test suggests that changing the global new in rmem.c to call HeapAlloc instead of malloc gives a large speedup).Yes, I agree, the DMC malloc is clearly a large performance problem. I had not realized this.
Aug 02 2013
Am 31.07.2013 23:24, schrieb Rainer Schuetze:On 31.07.2013 09:00, Walter Bright wrote:can you also give us also timings for (dmd_dmc|dmd_msc) std\algorithm -unittest -mainOn 7/30/2013 11:40 PM, dennis luehring wrote:I have just tried yesterdays dmd to build Visual D (it builds some libraries and contains a few short non-compiling tasks in between):currently the vc builded dmd is about 2 times faster in compilingThat's an old number now. Someone want to try it with the current HEAD?
Jul 31 2013
On 01.08.2013 07:33, dennis luehring wrote:Am 31.07.2013 23:24, schrieb Rainer Schuetze:std.algorithm -unittest -main: dmd_dmc 20 sec, std new 61 sec dmd_msc 11 sec, std new 13 sec std.algorithm -unittest -main -O: dmd_dmc 27 sec, std new 68 sec dmd_msc 16 sec, std new 18 secOn 31.07.2013 09:00, Walter Bright wrote:can you also give us also timings for (dmd_dmc|dmd_msc) std\algorithm -unittest -mainOn 7/30/2013 11:40 PM, dennis luehring wrote:I have just tried yesterdays dmd to build Visual D (it builds some libraries and contains a few short non-compiling tasks in between):currently the vc builded dmd is about 2 times faster in compilingThat's an old number now. Someone want to try it with the current HEAD?
Jul 31 2013
Am 01.08.2013 08:16, schrieb Rainer Schuetze:On 01.08.2013 07:33, dennis luehring wrote:so we can "still" say das msc builds are around two times faster - or even fasterAm 31.07.2013 23:24, schrieb Rainer Schuetze:std.algorithm -unittest -main: dmd_dmc 20 sec, std new 61 sec dmd_msc 11 sec, std new 13 sec std.algorithm -unittest -main -O: dmd_dmc 27 sec, std new 68 sec dmd_msc 16 sec, std new 18 secOn 31.07.2013 09:00, Walter Bright wrote:can you also give us also timings for (dmd_dmc|dmd_msc) std\algorithm -unittest -mainOn 7/30/2013 11:40 PM, dennis luehring wrote:I have just tried yesterdays dmd to build Visual D (it builds some libraries and contains a few short non-compiling tasks in between):currently the vc builded dmd is about 2 times faster in compilingThat's an old number now. Someone want to try it with the current HEAD?
Jul 31 2013
Am 01.08.2013 08:16, schrieb Rainer Schuetze:On 01.08.2013 07:33, dennis luehring wrote:results from mingw, vs2012(13) and llvm-clang builds would be also very interesting, but i don't know if dmd can be build with mingw or clang out of the box under windowsAm 31.07.2013 23:24, schrieb Rainer Schuetze:std.algorithm -unittest -main: dmd_dmc 20 sec, std new 61 sec dmd_msc 11 sec, std new 13 sec std.algorithm -unittest -main -O: dmd_dmc 27 sec, std new 68 sec dmd_msc 16 sec, std new 18 secOn 31.07.2013 09:00, Walter Bright wrote:can you also give us also timings for (dmd_dmc|dmd_msc) std\algorithm -unittest -mainOn 7/30/2013 11:40 PM, dennis luehring wrote:I have just tried yesterdays dmd to build Visual D (it builds some libraries and contains a few short non-compiling tasks in between):currently the vc builded dmd is about 2 times faster in compilingThat's an old number now. Someone want to try it with the current HEAD?
Aug 01 2013
I've now upgraded dmc so dmd builds can take advantage of improved code generation. http://www.digitalmars.com/download/freecompiler.html
Aug 01 2013
On 02.08.2013 00:36, Walter Bright wrote:I've now upgraded dmc so dmd builds can take advantage of improved code generation. http://www.digitalmars.com/download/freecompiler.htmlAlthough my laptop got quite a bit faster overnight (I guess it was throttled for some reason yesterday), relative results don't change: std.algorithm -main -unittest dmc85?: 12.5 sec dmc857: 12.5 sec msc: 7 sec BTW: I usually use VS2008, but now also tried VS2010 - no difference.
Aug 02 2013
On 8/2/2013 12:57 AM, Rainer Schuetze wrote:The two dmc times shouldn't be the same. I see a definite improvement. Disassemble aav.obj, and look at the function aaGetRvalue. It should look like this: ?_aaGetRvalue YAPAXPAUAA PAX Z: push EBX mov EBX,0Ch[ESP] push ESI cmp dword ptr 0Ch[ESP],0 je L184 mov EAX,0Ch[ESP] mov ECX,4[EAX] cmp ECX,4 jne L139 mov ESI,EBX and ESI,3 jmp short L166 L139: cmp ECX,01Fh jne L15E ======== note this section does not have a div instruction in it ============== mov EAX,EBX mov EDX,08421085h mov ECX,EBX mul EDX mov EAX,ECX sub EAX,EDX shr EAX,1 lea EDX,[EAX][EDX] shr EDX,4 imul EAX,EDX,01Fh sub ECX,EAX mov ESI,ECX ========================================================================== jmp short L166 L15E: mov EAX,EBX xor EDX,EDX div ECX mov ESI,EDX L166: mov ECX,0Ch[ESP] mov ECX,[ECX] mov EDX,[ESI*4][ECX] test EDX,EDX je L184 L173: cmp 4[EDX],EBX jne L17E mov EAX,8[EDX] pop ESI pop EBX ret L17E: mov EDX,[EDX] test EDX,EDX jne L173 L184: pop ESI xor EAX,EAX pop EBX rethttp://www.digitalmars.com/download/freecompiler.htmlAlthough my laptop got quite a bit faster overnight (I guess it was throttled for some reason yesterday), relative results don't change: std.algorithm -main -unittest dmc85?: 12.5 sec dmc857: 12.5 sec msc: 7 sec BTW: I usually use VS2008, but now also tried VS2010 - no difference.
Aug 02 2013
On 02.08.2013 10:24, Walter Bright wrote:On 8/2/2013 12:57 AM, Rainer Schuetze wrote:My disassembly looks exactly the same. I don't think that a single div operation in a rather long function has a lot of impact on modern processors. I'm running an i7, according to the instruction tables by Agner Fog, the div has latency of 17-28 cycles and a reciprocal throughput of 7-17 cycles. If I estimate the latency of the asm snippet, I also get 16 cycles. And that doesn't take the additional tests and jumps into consideration. ======== note this section does not have a div instruction in it ============== mov EAX,EBX mov EDX,08421085h ; latency 3 mov ECX,EBX mul EDX ; latency 5 mov EAX,ECX sub EAX,EDX ; latency 1 shr EAX,1 ; latency 1 lea EDX,[EAX][EDX] ; latency 1 shr EDX,4 ; latency 1 imul EAX,EDX,01Fh ; latency 3 sub ECX,EAX ; latency 1 mov ESI,ECX ==========================================================================The two dmc times shouldn't be the same. I see a definite improvement. Disassemble aav.obj, and look at the function aaGetRvalue. It should look like this:http://www.digitalmars.com/download/freecompiler.htmlAlthough my laptop got quite a bit faster overnight (I guess it was throttled for some reason yesterday), relative results don't change: std.algorithm -main -unittest dmc85?: 12.5 sec dmc857: 12.5 sec msc: 7 sec BTW: I usually use VS2008, but now also tried VS2010 - no difference.
Aug 02 2013
On 8/2/2013 2:47 AM, Rainer Schuetze wrote:My disassembly looks exactly the same. I don't think that a single div operation in a rather long function has a lot of impact on modern processors. I'm running an i7, according to the instruction tables by Agner Fog, the div has latency of 17-28 cycles and a reciprocal throughput of 7-17 cycles. If I estimate the latency of the asm snippet, I also get 16 cycles. And that doesn't take the additional tests and jumps into consideration.I'm using an AMD FX-6100.
Aug 02 2013
On 02.08.2013 18:37, Walter Bright wrote:On 8/2/2013 2:47 AM, Rainer Schuetze wrote:This processor seems to do a little better with the mov reg,imm operation but otherwise is similar. The DIV operation has larger worst-case latency, though (16-48 cycles). Better to just use a power of 2 for the array sizes anyway...My disassembly looks exactly the same. I don't think that a single div operation in a rather long function has a lot of impact on modern processors. I'm running an i7, according to the instruction tables by Agner Fog, the div has latency of 17-28 cycles and a reciprocal throughput of 7-17 cycles. If I estimate the latency of the asm snippet, I also get 16 cycles. And that doesn't take the additional tests and jumps into consideration.I'm using an AMD FX-6100.
Aug 02 2013
"Rainer Schuetze" <r.sagitario gmx.de> wrote in message news:ktbvam$dvf$1 digitalmars.com... large-address-aware).This shows that removing most of the allocations was a good optimization for the dmc-Runtime, but does not have a large, but still notable impact on a faster heap implementation (the VS runtime usually maps directly to the Windows API for non-Debug builds). I suspect the backend and the optimizer do not use "new" a lot, but plain "malloc" calls, so they still suffer from the slow runtime.On a related note, I just tried replacing the two ::malloc calls in rmem's operator new with VirtualAlloc and I get a reduction from 13 seconds to 9 seconds (compiling "dmd std\range -unittest -main") with a release build of dmd.
Aug 02 2013
On 8/2/2013 8:18 AM, Daniel Murphy wrote:On a related note, I just tried replacing the two ::malloc calls in rmem's operator new with VirtualAlloc and I get a reduction from 13 seconds to 9 seconds (compiling "dmd std\range -unittest -main") with a release build of dmd.Hmm, very interesting!
Aug 02 2013
02-Aug-2013 20:40, Walter Bright пишет:On 8/2/2013 8:18 AM, Daniel Murphy wrote:Made a pull to provide an implementation of rmem.c on top of Win32 Heap API. https://github.com/D-Programming-Language/dmd/pull/2445 Also noting that global new/delete are not reentrant already, added NO_SERIALIZE flag to save on locking/unlocking of heap. For me this gets from 13 to 8 seconds. -- Dmitry OlshanskyOn a related note, I just tried replacing the two ::malloc calls in rmem's operator new with VirtualAlloc and I get a reduction from 13 seconds to 9 seconds (compiling "dmd std\range -unittest -main") with a release build of dmd.Hmm, very interesting!
Aug 02 2013
On Thursday, 25 July 2013 at 18:03:22 UTC, Walter Bright wrote:http://www.reddit.com/r/programming/comments/1j1i30/increasing_the_d_compiler_speed_by_over_75/I just reported this compile speed killer: http://d.puremagic.com/issues/show_bug.cgi?id=10716 It has a big impact on some of the tests in the DMD test suite. It might also be responsible for a significant part of the compilation time of Phobos, since array literals tend to be widely used inside unittest functions.
Jul 26 2013