digitalmars.D - pi benchmark on ldc and dmd
- Walter Bright (2/2) Aug 01 2011 http://www.reddit.com/r/programming/comments/j48tf/how_is_c_better_than_...
- bearophile (5/8) Aug 01 2011 Do you mean code similar to this one, or code that uses std.bigint?
- Adam D. Ruppe (6/7) Aug 01 2011 It was something that used bigint. I whipped it up myself earlier
- bearophile (6/9) Aug 01 2011 OK.
- Adam D. Ruppe (39/44) Aug 01 2011 Actually, I don't think that would be relevant here.
- Andrew Wiley (11/55) Aug 01 2011 Yes, GDC takes forever and a half to build. That's true of anything in G...
- Adam D. Ruppe (8/10) Aug 02 2011 Building dmd from the zip took 37 *seconds* for me just now, after
- Adam D. Ruppe (159/159) Aug 02 2011 I think I have it: 64 bit registers. I got ldc to work
- Walter Bright (3/6) Aug 02 2011 dmd does use all the registers on the x64, but it seems to not be enregi...
- bearophile (30/31) Aug 02 2011 The D code is about 2.8 times slower than the Haskell version, and it ha...
- Marco Leise (2/14) Aug 02 2011 Is this Indonesian cast to ASCII? :p
- bearophile (27/44) Aug 02 2011 I agree it's very bad looking, it isn't idiomatic Haskell code. But it c...
- Walter Bright (56/59) Aug 02 2011 When I compile it, it uses the registers:
- Adam D. Ruppe (13/14) Aug 02 2011 hmm.... this is my error, but might be a bug too.
- Brad Roberts (6/28) Aug 02 2011 Ok.. I'm pretty sure that's a bug I discovered the other day in the
- Brad Roberts (7/39) Aug 02 2011 https://github.com/D-Programming-Language/dmd/pull/287
- Trass3r (7/18) Aug 02 2011 ...
- Walter Bright (7/8) Aug 02 2011 No. They'll get removed shortly.
- Andrew Wiley (30/40) Aug 02 2011 For the record, I'm fine with the current arrangement and just playing
- Trass3r (4/7) Aug 02 2011 Make sure you disable bootstrapping. Compiling gdc works pleasantly fast...
- Jason House (2/5) Aug 02 2011
- Walter Bright (16/18) Aug 02 2011 Often when I see benchmark results like this, I wait to see what the act...
- KennyTM~ (4/9) Aug 02 2011 Let dmd have an '-O9999' flag to as a synonym of '-O -inline
- Adam D. Ruppe (5/5) Aug 02 2011 On the flags: I did use them, but didn't write it all out and
- Iain Buclaw (3/15) Aug 02 2011 -Ofast sounds better. ;)
- Andrew Wiley (2/19) Aug 02 2011
- simendsjo (3/24) Aug 02 2011 How about replacing -w with -9001?
- Robert Clipsham (8/10) Aug 02 2011 I was talking to David Nadlinger the other day, and there was some sort
- David Nadlinger (6/12) Aug 02 2011 Nope, this turned out to be a bug in my program, where some memory chunk...
http://www.reddit.com/r/programming/comments/j48tf/how_is_c_better_than_d/c29do98 Anyone care to examine the assembler output and figure out why?
Aug 01 2011
Walter:http://www.reddit.com/r/programming/comments/j48tf/how_is_c_better_than_d/c29do98 Anyone care to examine the assembler output and figure out why?Do you mean code similar to this one, or code that uses std.bigint? http://shootout.alioth.debian.org/debian/program.php?test=pidigits&lang=gdc&id=3 Bye, bearophile
Aug 01 2011
bearophile wrote:Do you mean code similar to this one, or code that uses std.bigint?It was something that used bigint. I whipped it up myself earlier this morning, but left the code on my laptop. I'll post it when I have a chance. I ran obj2asm on it myself, but was a little short on time, so I haven't really analyzed it yet.
Aug 01 2011
Adam D. Ruppe:It was something that used bigint. I whipped it up myself earlier this morning, but left the code on my laptop. I'll post it when I have a chance.OK. In such situations it's never enough to compare the D code compiled with DMD to the D code compiled with LDC. You also need a reference point, like a C version compiled with GCC (here using GMP bignums). Such reference points are necessary to anchor performance discussions to something. Bye, bearophile
Aug 01 2011
bearophile wrote:In such situations it's never enough to compare the D code compiled with DMD to the D code compiled with LDC. You also need a reference point, like a C version compiled with GCC (here using GMP bignums). Such reference points are necessary to anchor performance discussions to something.Actually, I don't think that would be relevant here. The thread started with someone saying the DMD backend is garbage and should be abandoned. I'm sick and tired of hearing people say that. The Digital Mars code has many, many advantages over the others*. But, it was challenged specifically on the optimizer, so to check that out, I wanted all other things to be equal. Same code, same front end, same computer, as close to same runtime and library is possible with different compilers. The only difference should be the backend so we can draw conclusions about it without other factors skewing the results. So for this, I just wanted to compare dmd backend to ldc and gdc backend so I didn't worry too much about absolute numbers or other languages. (Actually, one of the reasons I picked the pi one was after the embarrassing defeat in floating point, I was hoping dmd could score a second victory and I could follow up on that "prove it" post with satisfaction. Alas, the facts didn't work out that way. Though, I still do find dmd to beat g++ on a lot of real world code - things like slices actually make a sizable difference.) But regardless, it was just about comparing backends, not doing language comparisons. === * To name a huge one. Today was the first time I ever got ldc or gdc to actually work on my computer, and it took a long, long time to do it. I've tried in the past, and failed, so this was a triumph. Big success. I was waiting over an hour just for gcc+gdc to compile! In the time it takes for gcc's configure script to run, you can make clean, build dmd, druntime and phobos. It's a huge hassle to get the code together too. I had to go to *four* different sites to get gdc's stuff together (like 80 MB of crap, compressed!), and two different ones to get even the ldc binary to work. Pain in my ASS. And this is on Linux too. I pity the fool who tries to do this on Windows, knowing how so much linux software treats their Windows "ports". I'd like to contrast to dmd: unzip and play with wild abandon.
Aug 01 2011
On Mon, Aug 1, 2011 at 8:38 PM, Adam D. Ruppe <destructionator gmail.com>wrote:bearophile wrote:Yes, GDC takes forever and a half to build. That's true of anything in GCC, and it's just because they don't trust the native C compiler at all. LDC builds in under a half hour, even on my underpowered ARM SoC, so I don't see how you could be having trouble there. As for Windows, Daniel Green (hopefully I'm remembering right) has been posting GDC binaries. I do respect that DMD generates reasonably fast executables recklessly fast, but it also doesn't exist outside x86 and x86_64 and the debug symbols (at least on Linux) are just hilariously bad. Now if I could just get GDC to pad structs correctly on ARM...In such situations it's never enough to compare the D code compiled with DMD to the D code compiled with LDC. You also need a reference point, like a C version compiled with GCC (here using GMP bignums). Such reference points are necessary to anchor performance discussions to something.Actually, I don't think that would be relevant here. The thread started with someone saying the DMD backend is garbage and should be abandoned. I'm sick and tired of hearing people say that. The Digital Mars code has many, many advantages over the others*. But, it was challenged specifically on the optimizer, so to check that out, I wanted all other things to be equal. Same code, same front end, same computer, as close to same runtime and library is possible with different compilers. The only difference should be the backend so we can draw conclusions about it without other factors skewing the results. So for this, I just wanted to compare dmd backend to ldc and gdc backend so I didn't worry too much about absolute numbers or other languages. (Actually, one of the reasons I picked the pi one was after the embarrassing defeat in floating point, I was hoping dmd could score a second victory and I could follow up on that "prove it" post with satisfaction. Alas, the facts didn't work out that way. Though, I still do find dmd to beat g++ on a lot of real world code - things like slices actually make a sizable difference.) But regardless, it was just about comparing backends, not doing language comparisons. === * To name a huge one. Today was the first time I ever got ldc or gdc to actually work on my computer, and it took a long, long time to do it. I've tried in the past, and failed, so this was a triumph. Big success. I was waiting over an hour just for gcc+gdc to compile! In the time it takes for gcc's configure script to run, you can make clean, build dmd, druntime and phobos. It's a huge hassle to get the code together too. I had to go to *four* different sites to get gdc's stuff together (like 80 MB of crap, compressed!), and two different ones to get even the ldc binary to work. Pain in my ASS. And this is on Linux too. I pity the fool who tries to do this on Windows, knowing how so much linux software treats their Windows "ports". I'd like to contrast to dmd: unzip and play with wild abandon.
Aug 01 2011
LDC builds in under a half hour, even on my underpowered ARM SoC, so I don't see how you could be having trouble there.Building dmd from the zip took 37 *seconds* for me just now, after running a make clean (this is on Linux). gdc and ldc have their advantages, but they have disadvantages too. I think the people saying "abandon dmd" don't know the other side of the story. Basically, I think the more compilers we have for D the better. gdc is good. ldc is good. And so is dmd. We shouldn't abandon any of them.
Aug 02 2011
I think I have it: 64 bit registers. I got ldc to work in 32 bit (didn't have that yesterday, so I was doing 64 bit only) and compiled. No difference in timing between ldc 32 bit and dmd 32 bit. The disassembly isn't identical but the time is. (The disassembly seems to mainly order things differently, but ldc has fewer jump instructions too.) Anyway. In 64 bit, ldc gets a speedup over dmd. Looking at the asm output, it looks like dmd doesn't use any of the new registers, whereas ldc does. (dmd's 64 bit looks mostly like 32 bit code with r instead of e.) Here's the program. It's based on one of the Python ones. ==== import std.bigint; import std.stdio; alias BigInt number; void main() { auto N = 10000; number i, k, ns; number k1 = 1; number n,a,d,t,u; n = 1; d = 1; while(1) { k += 1; t = n<<1; n *= k; a += t; k1 += 2; a *= k1; d *= k1; if(a >= n) { t = (n*3 +a)/d; u = (n*3 +a)%d; u += n; if(d > u) { ns = ns*10 + t; i += 1; if(i % 10 == 0) { debug writefln ("%010d\t:%d", ns, i); ns = 0; } if(i >= N) { break; } a -= d*t; a *= 10; n *= 10; } } } } ===== BigInt's calls aren't inlined, but that's a frontend issue. Let's eliminate that by switching to long in that alias. The result will be wrong, but that's beside the point for now. I just want to see integer math. (this is why the writefln is debug too) With optimizations turned on, ldc again wins by the same ratio - it runs in about 2/3 the time - and the code is much easier to look at. Let's see what's going on. The relevant loop from DMD (64 bit): === L47: inc qword ptr -040h[RBP] mov RAX,-028h[RBP] add RAX,RAX mov -010h[RBP],RAX mov RAX,-040h[RBP] imul RAX,-028h[RBP] mov -028h[RBP],RAX mov RAX,-010h[RBP] add -020h[RBP],RAX add qword ptr -030h[RBP],2 mov RAX,-030h[RBP] imul RAX,-020h[RBP] mov -020h[RBP],RAX mov RAX,-030h[RBP] imul RAX,-018h[RBP] mov -018h[RBP],RAX mov RAX,-020h[RBP] cmp RAX,-028h[RBP] jl L47 mov RAX,-028h[RBP] lea RAX,[RAX*2][RAX] add RAX,-020h[RBP] mov -058h[RBP],RAX cqo idiv qword ptr -018h[RBP] mov -010h[RBP],RAX mov RAX,-058h[RBP] cqo idiv qword ptr -018h[RBP] mov -8[RBP],RDX mov RAX,-028h[RBP] add -8[RBP],RAX mov RAX,-018h[RBP] cmp RAX,-8[RBP] jle L47 mov RAX,-038h[RBP] lea RAX,[RAX*4][RAX] add RAX,RAX add RAX,-010h[RBP] mov -038h[RBP],RAX inc qword ptr -048h[RBP] mov RAX,-048h[RBP] mov RCX,0Ah cqo idiv RCX test RDX,RDX jne L109 mov qword ptr -038h[RBP],0 L109: cmp qword ptr -048h[RBP],02710h jge L137 mov RAX,-018h[RBP] imul RAX,-010h[RBP] sub -020h[RBP],RAX imul EAX,-020h[RBP],0Ah mov -020h[RBP],RAX imul EAX,-028h[RBP],0Ah mov -028h[RBP],RAX jmp L47 === and from ldc 64 bit: ==== L20: add RDI,2 inc RCX lea R9,[R10*2][R9] imul R9,RDI imul R8,RDI imul R10,RCX cmp R9,R10 jl L20 lea RAX,[R10*2][R10] add RAX,R9 cqo idiv R8 add RDX,R10 cmp R8,RDX jle L20 cmp RSI,0270Fh jg L73 imul RAX,R8 sub R9,RAX add R9,R9 lea R9,[R9*4][R9] inc RSI add R10,R10 lea R10,[R10*4][R10] jmp short L20 === First thing that immediately pops out is the code is a lot shorter. Second thing that jumps out is it looks like ldc makes better use of the registers. Indeed, the shortness looks to be thanks to the registers eliminating a lot of movs. So I'm pretty sure the difference is caused by dmd not using the new registers in x64. The other differences look trivial to my eyes.
Aug 02 2011
On 8/2/2011 12:49 PM, Adam D. Ruppe wrote:So I'm pretty sure the difference is caused by dmd not using the new registers in x64. The other differences look trivial to my eyes.dmd does use all the registers on the x64, but it seems to not be enregistering here. I'll have a look see.
Aug 02 2011
Adam D. Ruppe:Here's the program. It's based on one of the Python ones.The D code is about 2.8 times slower than the Haskell version, and it has a bug, shown here: import std.stdio, std.bigint; void main() { int x = 100; writefln("%010d", x); BigInt bx = x; writefln("%010d", bx); } Output: 0000000100 100 ---------------------------- The Haskell code I've used: -- Compile with: ghc --make -O3 -XBangPatterns -rtsopts pidigits_hs.hs import System i%ds | i >= n = [] | True = (concat h ++ "\t:" ++ show j ++ "\n") ++ j%t where k = i+10; j = min n k (h,t) | k > n = (take (n`mod`10) ds ++ replicate (k-n) " ",[]) | True = splitAt 10 ds where k = j+1; t (n,a,d)=k&s; (q,r)=(n*3+a)`divMod`d j&(n,a,d) = (n*j,(a+n*2)*y,d*y) where y=(j*2+1) main = putStr.pidgits.read.head =<< getArgs Bye, bearophile
Aug 02 2011
Am 02.08.2011, 22:35 Uhr, schrieb bearophile <bearophileHUGS lycos.com>:i%ds | i >= n = [] | True = (concat h ++ "\t:" ++ show j ++ "\n") ++ j%t where k = i+10; j = min n k (h,t) | k > n = (take (n`mod`10) ds ++ replicate (k-n) " ",[]) | True = splitAt 10 ds where k = j+1; t (n,a,d)=k&s; (q,r)=(n*3+a)`divMod`d j&(n,a,d) = (n*j,(a+n*2)*y,d*y) where y=(j*2+1) main = putStr.pidgits.read.head =<< getArgsIs this Indonesian cast to ASCII? :p
Aug 02 2011
Marco Leise:Am 02.08.2011, 22:35 Uhr, schrieb bearophile <bearophileHUGS lycos.com>:I agree it's very bad looking, it isn't idiomatic Haskell code. But it contains nothing too much strange (and the algorithm is the same used in the D code). This is formatted a bit better, but I don't fully understand it yet: import System (getArgs) i % ds | i >= n = [] | True = (concat h ++ "\t:" ++ show j ++ "\n") ++ j % t where k = i + 10 j = min n k (h, t) | k > n = (take (n `mod` 10) ds ++ replicate (k - n) " ", []) | True = splitAt 10 ds where k = j + 1 t (n, a, d) = k & s (q, r) = (n * 3 + a) `divMod` d j & (n, a, d) = (n * j, (a + n * 2) * y, d * y) where y = (j * 2 + 1) main = putStr . pidgits . read . head =<< getArgs The Shootout site (where I have copied that code) ranks programs for the performance and their compactness (using a low-performance compressor...), so there you see Haskell (and other languages) programs that are sometimes too much compact and often use clever tricks to increase their performance. In normal Haskell code you don't find those tricks (this specific program seems to not use strange tricks, but on the Haskell Wiki page about this problem (http://www.haskell.org/haskellwiki/Shootout/Pidigits ) you see several programs that are both longer and slower than this one). The first working implementation of a C program is probably long and fast enough, while the first working implementation of a Haskell program is often short but not so fast. Usually there are ways to speed up the Haskell code. My experience of Haskell is limited, so usually when I write some Haskell my head hurts a bit :-) The higher level nature of Python allows me to implement working algorithms that are more complex, so sometimes the code ends being faster than C code, where you often avoid (at a first implementation) too much complex algorithms for fear of too much hard to find bugs, or just too much long to write implementation. Haskell in theory allows you to implement complex algorithms in a short space, and safely. In practice I think you need lot of brain to do this. Haskell sometimes looks like a puzzle language to me (maybe I just need more self-training on functional programming). Bye, bearophilei%ds | i >= n = [] | True = (concat h ++ "\t:" ++ show j ++ "\n") ++ j%t where k = i+10; j = min n k (h,t) | k > n = (take (n`mod`10) ds ++ replicate (k-n) " ",[]) | True = splitAt 10 ds where k = j+1; t (n,a,d)=k&s; (q,r)=(n*3+a)`divMod`d j&(n,a,d) = (n*j,(a+n*2)*y,d*y) where y=(j*2+1) main = putStr.pidgits.read.head =<< getArgsIs this Indonesian cast to ASCII? :p
Aug 02 2011
On 8/2/2011 12:49 PM, Adam D. Ruppe wrote:So I'm pretty sure the difference is caused by dmd not using the new registers in x64. The other differences look trivial to my eyes.When I compile it, it uses the registers: L2E: inc R11 lea R9D,[00h][RSI*2] mov R9,R9 mov RCX,R11 imul RCX,RSI mov RSI,RCX add RDI,R9 add R8,2 mov RDX,R8 imul RDX,RDI mov RDI,RDX mov R10,R8 imul R10,RBX mov RBX,R10 cmp RDI,RSI jl L2E lea RAX,[RCX*2][RCX] add RAX,RDX mov -8[RBP],RAX cqo idiv R10 mov R9,RAX mov R9,R9 mov RAX,-8[RBP] cqo idiv R10 mov R12,RDX mov R12,R12 add R12,RSI cmp RBX,R12 jle L2E lea R14,[R14*4][R14] add R14,R14 add R14,R9 mov R14,R14 inc R13 mov RAX,R13 mov RCX,0Ah cqo idiv RCX test RDX,RDX jne LBD xor R14,R14 LBD: cmp R13,02710h jge LE3 mov RDX,RBX imul RDX,R9 sub RDI,RDX imul R10D,RDI,0Ah mov RDI,R10 imul R12D,RSI,0Ah mov RSI,R12 jmp L2E All I did with your example was replace BigInt with long.
Aug 02 2011
Walter Bright wrote:All I did with your example was replace BigInt with long.hmm.... this is my error, but might be a bug too. Take that same program and add some inline asm to it. void main() { asm { nop; } [... the rest is identical ...] } Now compile it and check the output. With the asm, I get the output I posted. If I cut it out, I get what you posted. My error here is when I did the obj2asm the first time, I added an instruction inline so I could confirm quickly that I was in the right place in the file. (I cut that out later but forgot to rerun obj2asm.)
Aug 02 2011
Ok.. I'm pretty sure that's a bug I discovered the other day in the initilization code of asm blocks. I've already got a fix for it and will be sending a pull request shortly. The asm semantic code calls the 32bit initialization code of the backend unconditionally, which is just wrong. On Tue, 2 Aug 2011, Adam D. Ruppe wrote:Walter Bright wrote:All I did with your example was replace BigInt with long.hmm.... this is my error, but might be a bug too. Take that same program and add some inline asm to it. void main() { asm { nop; } [... the rest is identical ...] } Now compile it and check the output. With the asm, I get the output I posted. If I cut it out, I get what you posted. My error here is when I did the obj2asm the first time, I added an instruction inline so I could confirm quickly that I was in the right place in the file. (I cut that out later but forgot to rerun obj2asm.)
Aug 02 2011
https://github.com/D-Programming-Language/dmd/pull/287 Before pulling this, though, the current win32 compilation failure should be fixed to avoid compounding problems: https://github.com/D-Programming-Language/dmd/pull/288 Later, Brad On Tue, 2 Aug 2011, Brad Roberts wrote:Ok.. I'm pretty sure that's a bug I discovered the other day in the initilization code of asm blocks. I've already got a fix for it and will be sending a pull request shortly. The asm semantic code calls the 32bit initialization code of the backend unconditionally, which is just wrong. On Tue, 2 Aug 2011, Adam D. Ruppe wrote:Walter Bright wrote:All I did with your example was replace BigInt with long.hmm.... this is my error, but might be a bug too. Take that same program and add some inline asm to it. void main() { asm { nop; } [... the rest is identical ...] } Now compile it and check the output. With the asm, I get the output I posted. If I cut it out, I get what you posted. My error here is when I did the obj2asm the first time, I added an instruction inline so I could confirm quickly that I was in the right place in the file. (I cut that out later but forgot to rerun obj2asm.)
Aug 02 2011
Am 02.08.2011, 22:38 Uhr, schrieb Walter Bright <newshound2 digitalmars.com>:L2E: inc R11 lea R9D,[00h][RSI*2] mov R9,R9...mov R9,RAX mov R9,R9...mov R12,RDX mov R12,R12...lea R14,[R14*4][R14] add R14,R14 add R14,R9 mov R14,R14... Any reason for all those mov x,x 's?
Aug 02 2011
On 8/2/2011 3:23 PM, Trass3r wrote:Any reason for all those mov x,x 's?No. They'll get removed shortly. I see three problems with dmd's codegen here: 1. those redundant moves 2. failing to merge a couple divides 3. replacing a mul with an add/lea I'll see about taking care of them. (2) is the most likely culprit on the speed.
Aug 02 2011
On Tue, Aug 2, 2011 at 7:08 AM, Adam D. Ruppe <destructionator gmail.com>wrote:For the record, I'm fine with the current arrangement and just playing devil's advocate here: So far, you've outlined that GDC takes a while to build and the build processes for GDC and LDC are inconvenient as the only disadvantages they have. LDC took about 3 minutes on a Linux VM on my laptop, and since it has proper incremental build support through CMake, I don't really see that qualifying as a disadvantage. The only people that really need to regularly build compilers are the folks that work on them, and that's why we have incremental builds. Now, DMD does have speed on its side. It doesn't have debugging support (you have to jump through hoops on Windows and Linux is just a joke), binary and object file compatibility (even GDC has more going for it on Windows than DMD does), platform compatibility (outside x86 and x86_64), name recognition (I'm a college student, and people look at me funny when I mention Digital Mars), shared library support, and acceptance in the Linux world. The reason I use GDC for pretty much all my development is that it has all those things, and the reason I think it's worth playing devil's advocate and really considering the current situation is that GDC and LDC get all this for free by wiring up the DMD frontend to a different backend. The current state of affairs is certainly maintainable, but I think it's worth some thought as to whether it would be better in the long run if we started officially supporting a more accepted backend. My example would be Go, which got all sorts of notice when gccgo became important enough to get into the GCC codebase. I'm not saying DMD is terrible because it isn't. I'm just saying that there are a lot of benefits to be had by developing a more mature compiler on top of GCC or LLVM, and that we should consider whether that's a goal we should be working more towards.LDC builds in under a half hour, even on my underpowered ARM SoC, so I don't see how you could be having trouble there.Building dmd from the zip took 37 *seconds* for me just now, after running a make clean (this is on Linux). gdc and ldc have their advantages, but they have disadvantages too. I think the people saying "abandon dmd" don't know the other side of the story. Basically, I think the more compilers we have for D the better. gdc is good. ldc is good. And so is dmd. We shouldn't abandon any of them.
Aug 02 2011
Am 02.08.2011, 05:38 Uhr, schrieb Adam D. Ruppe <destructionator gmail.com>:I was waiting over an hour just for gcc+gdc to compile! In the time it takes for gcc's configure script to run, you can make clean, build dmd, druntime and phobos.Make sure you disable bootstrapping. Compiling gdc works pleasantly fast for me. Try compiling it on Windows, that's what I call slow.
Aug 02 2011
The post says they did "dmd -O". They did not mention "-inline -noboundscheck -release". There may be extra flags that are required. Walter Bright Wrote:http://www.reddit.com/r/programming/comments/j48tf/how_is_c_better_than_d/c29do98 Anyone care to examine the assembler output and figure out why?
Aug 02 2011
On 8/2/2011 5:00 AM, Jason House wrote:The post says they did "dmd -O". They did not mention "-inline -noboundscheck -release". There may be extra flags that are required.Often when I see benchmark results like this, I wait to see what the actual problem is before jumping to conclusions. I have a lot of experience with this :-) The results could be any of: 1. wrong flags used (especially by inexperienced users) 2. the benchmark isn't measuring what it purports to be (an example might be it is actually measuring printf or malloc speed, not the generated code) 3. the benchmark is optimized for one particular compiler/language by someone very familiar with that compiler/language and it exploits a particular quirk of it 4. the compiler is hand optimized for a specific benchmark, and the great results disappear if anything in the source code changes (yes, this is dirty, and I've seen it done by big name compilers) 5. the different benchmarks are run on different computers 6. the memory layout could wind up arbitrarily different for the different compilers/languages, resulting in different performance due to memory caching etc.
Aug 02 2011
On Aug 2, 11 20:00, Jason House wrote:The post says they did "dmd -O". They did not mention "-inline -noboundscheck -release". There may be extra flags that are required. Walter Bright Wrote:Let dmd have an '-O9999' flag to as a synonym of '-O -inline -noboundscheck -release' so people won't miss the extra flags in benchmarks. [/joke]http://www.reddit.com/r/programming/comments/j48tf/how_is_c_better_than_d/c29do98 Anyone care to examine the assembler output and figure out why?
Aug 02 2011
On the flags: I did use them, but didn't write it all out and tried to make them irrelevant (by avoiding functions and arrays). But, if the same ones are passed to each compiler, it shouldn't matter anyway... the idea is to get an apples to apples comparison between the two D implementations, not to chase after a number itself.
Aug 02 2011
== Quote from KennyTM~ (kennytm gmail.com)'s articleOn Aug 2, 11 20:00, Jason House wrote:-release". There may be extra flags that are required.The post says they did "dmd -O". They did not mention "-inline -noboundscheck-Ofast sounds better. ;)Walter Bright Wrote:Let dmd have an '-O9999' flag to as a synonym of '-O -inline -noboundscheck -release' so people won't miss the extra flags in benchmarks. [/joke]http://www.reddit.com/r/programming/comments/j48tf/how_is_c_better_than_d/c29do98 Anyone care to examine the assembler output and figure out why?
Aug 02 2011
On Tue, Aug 2, 2011 at 1:31 PM, Iain Buclaw <ibuclaw ubuntu.com> wrote:== Quote from KennyTM~ (kennytm gmail.com)'s article-O9001 will make the Redditors happy.On Aug 2, 11 20:00, Jason House wrote:-noboundscheck -release". There may be extra flags that are required.The post says they did "dmd -O". They did not mention "-inlinehttp://www.reddit.com/r/programming/comments/j48tf/how_is_c_better_than_d/c29do98Walter Bright Wrote:Let dmd have an '-O9999' flag to as a synonym of '-O -inline -noboundscheck -release' so people won't miss the extra flags in benchmarks. [/joke]Anyone care to examine the assembler output and figure out why?-Ofast sounds better. ;)
Aug 02 2011
On 02.08.2011 22:36, Andrew Wiley wrote:On Tue, Aug 2, 2011 at 1:31 PM, Iain Buclaw <ibuclaw ubuntu.com <mailto:ibuclaw ubuntu.com>> wrote: == Quote from KennyTM~ (kennytm gmail.com <mailto:kennytm gmail.com>)'s article > On Aug 2, 11 20:00, Jason House wrote: > > The post says they did "dmd -O". They did not mention "-inline -noboundscheck -release". There may be extra flags that are required. > > > > Walter Bright Wrote: > > > >> http://www.reddit.com/r/programming/comments/j48tf/how_is_c_better_than_d/c29do98 > >> > >> Anyone care to examine the assembler output and figure out why? > > > Let dmd have an '-O9999' flag to as a synonym of '-O -inline > -noboundscheck -release' so people won't miss the extra flags in > benchmarks. [/joke] -O9001 will make the Redditors happy. -Ofast sounds better. ;)How about replacing -w with -9001? http://en.wikipedia.org/wiki/ISO_9001#Contents_of_ISO_9001
Aug 02 2011
On 02/08/2011 00:40, Walter Bright wrote:http://www.reddit.com/r/programming/comments/j48tf/how_is_c_better_than_d/c29do98 Anyone care to examine the assembler output and figure out why?I was talking to David Nadlinger the other day, and there was some sort of codegen bug causing things to massively outperform dmd and clang with equivalent code - it's possible this is the cause, I don't know without looking though. He may be able to shed some light on it. -- Robert http://octarineparrot.com/
Aug 02 2011
On 8/2/11 7:34 PM, Robert Clipsham wrote:On 02/08/2011 00:40, Walter Bright wrote:Nope, this turned out to be a bug in my program, where some memory chunk used as test input data was prematurely garbage collected (that only surfaced with aggressive compiler optimizations, which is why I suspected a compiler bug). DavidAnyone care to examine the assembler output and figure out why?I was talking to David Nadlinger the other day, and there was some sort of codegen bug causing things to massively outperform dmd and clang with equivalent code - it's possible this is the cause, I don't know without looking though. He may be able to shed some light on it.
Aug 02 2011