www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.ldc - Another performance problem

reply "bearophile" <bearophileHUGS lycos.com> writes:
I have found another case where the code compiled with LDC2 is 
slower than the same code compiled with dmd. This time the 
performance difference seems very large. The D code (I compile it 
on 32 bit Windows):

http://rosettacode.org/wiki/Self-referential_sequence#Faster_Low-level_Version

(It's the third D entry).

The dmd compile runs in about 1.22 seconds on my PC. The ldc2 
compile is very slow.

I compile using:

dmd -O -release -inline -noboundscheck 
self_referential_sequence3.d

ldmd2 -O -release -inline -noboundscheck 
self_referential_sequence3.d
+
strip

Bye,
bearophile
Dec 06 2013
next sibling parent reply David Nadlinger <code klickverbot.at> writes:
On Sat, Dec 7, 2013 at 2:25 AM, bearophile <bearophileHUGS lycos.com> wrote:
 http://rosettacode.org/wiki/Self-referential_sequence#Faster_Low-level_Version
Looking at the IR generated on Linux, I can't see any obvious big performance issues (such as e.g. GC allocations where there shouldn't be any), and indeed the code runs in < 1 s on my machine (can't test on Windows right now). I did, however, find a rather severe bug: We emit the "__gshared static" globals in the MemoryPool struct as thread-local, which is a) a big correctness problem, and b) might cause substantial slowdown due to the additional overhead incurred when accessing them. David
Dec 08 2013
next sibling parent "bearophile" <bearophileHUGS lycos.com> writes:
David Nadlinger:

 I did, however, find a rather severe bug: We emit the "__gshared
 static" globals in the MemoryPool struct as thread-local, which 
 is a)
 a big correctness problem, and b) might cause substantial 
 slowdown due
 to the additional overhead incurred when accessing them.
Usually not adding the "__gshared" annotation with DMD doesn't cause a significant slowdown of the code.
 Looking at the IR generated on Linux, I can't see any obvious 
 big
 performance issues (such as e.g. GC allocations where there 
 shouldn't
 be any), and indeed the code runs in < 1 s on my machine (can't 
 test
 on Windows right now).
DMD: http://codepad.org/8e6RCzlz LDC2: http://codepad.org/r2eYIOKg Bye, bearophile
Dec 08 2013
prev sibling parent reply "David Nadlinger" <code klickverbot.at> writes:
On Sunday, 8 December 2013 at 14:03:18 UTC, David Nadlinger wrote:
 I did, however, find a rather severe bug: We emit the "__gshared
 static" globals in the MemoryPool struct as thread-local, which 
 is a)
 a big correctness problem, and b) might cause substantial 
 slowdown due
 to the additional overhead incurred when accessing them.
Turns out that this is actually a DMD issue: http://d.puremagic.com/issues/show_bug.cgi?id=4419 You might want to watch out for this trap in the future (or annoy people to fix it in the frontend, of course). David
Dec 23 2013
parent reply "bearophile" <bearophileHUGS lycos.com> writes:
David Nadlinger:

 Turns out that this is actually a DMD issue: 
 http://d.puremagic.com/issues/show_bug.cgi?id=4419

 You might want to watch out for this trap in the future (or 
 annoy people to fix it in the frontend, of course).
I don't fully understand. I didn't know about issue 4419, it looks like an ugly bug. The same code from the rosettacode was compiled with both ldc2 and dmd, so if there's a front-end bug it should hit both compilers. Also here the performance difference I have seen is so large (20+?) that I don't think thread local bugs could be enough to justify it. The last version of the code I put on Rosettcode can't be compiled with the LDC2 I have because it uses a recent bug fix (it uses the .ptr of a zero length field, that until now was always null), but the Wiki site keeps all the older versions of the page. So I have used the precedent version of the D code, with and without swapping static and __gshared. And the code compiled with dmd is exactly the same performance as before, and the ldc2 code is still as slow as before in both cases. So I think Issue 4419 is not the cause of this problem, unless there's something I don't understand still. Bye, bearophile
Dec 23 2013
parent "David Nadlinger" <code klickverbot.at> writes:
On Tuesday, 24 December 2013 at 07:39:14 UTC, bearophile wrote:
 I don't fully understand.
I wasn't trying to imply that this is the reason for the slowdown you observe, just that my initial response of that being a severe LDC bug was wrong. As for the actual issue, I couldn't reproduce it yet… David
Dec 24 2013
prev sibling parent reply "Kozzi" <kozzi11 gmail.com> writes:
On Saturday, 7 December 2013 at 01:25:30 UTC, bearophile wrote:
 I have found another case where the code compiled with LDC2 is 
 slower than the same code compiled with dmd. This time the 
 performance difference seems very large. The D code (I compile 
 it on 32 bit Windows):

 http://rosettacode.org/wiki/Self-referential_sequence#Faster_Low-level_Version

 (It's the third D entry).

 The dmd compile runs in about 1.22 seconds on my PC. The ldc2 
 compile is very slow.

 I compile using:

 dmd -O -release -inline -noboundscheck 
 self_referential_sequence3.d

 ldmd2 -O -release -inline -noboundscheck 
 self_referential_sequence3.d
 +
 strip

 Bye,
 bearophile
In my case ldmd is faster than dmd
Dec 08 2013
parent reply "bearophile" <bearophileHUGS lycos.com> writes:
Kozzi:

 In my case ldmd is faster than dmd
What is your operating system and compiler versions used? Bye, bearophile
Dec 08 2013
parent "Daniel Kozak" <kozzi11 gmail.com> writes:
On Monday, 9 December 2013 at 00:11:37 UTC, bearophile wrote:
 Kozzi:

 In my case ldmd is faster than dmd
What is your operating system and compiler versions used? Bye, bearophile
Archlinux: LDC - the LLVM D compiler (0.12.1): based on DMD v2.063.2 and LLVM 3.3 DMD - DMD64 D Compiler v2.064
Dec 10 2013