digitalmars.D.ldc - Another performance problem
- bearophile (17/17) Dec 06 2013 I have found another case where the code compiled with LDC2 is
- David Nadlinger (10/11) Dec 08 2013 Looking at the IR generated on Linux, I can't see any obvious big
- bearophile (9/22) Dec 08 2013 Usually not adding the "__gshared" annotation with DMD doesn't
- David Nadlinger (6/12) Dec 23 2013 Turns out that this is actually a DMD issue:
- bearophile (21/25) Dec 23 2013 I don't fully understand.
- David Nadlinger (6/7) Dec 24 2013 I wasn't trying to imply that this is the reason for the slowdown
- Kozzi (2/19) Dec 08 2013 In my case ldmd is faster than dmd
- bearophile (4/5) Dec 08 2013 What is your operating system and compiler versions used?
- Daniel Kozak (5/10) Dec 10 2013 Archlinux:
I have found another case where the code compiled with LDC2 is slower than the same code compiled with dmd. This time the performance difference seems very large. The D code (I compile it on 32 bit Windows): http://rosettacode.org/wiki/Self-referential_sequence#Faster_Low-level_Version (It's the third D entry). The dmd compile runs in about 1.22 seconds on my PC. The ldc2 compile is very slow. I compile using: dmd -O -release -inline -noboundscheck self_referential_sequence3.d ldmd2 -O -release -inline -noboundscheck self_referential_sequence3.d + strip Bye, bearophile
Dec 06 2013
On Sat, Dec 7, 2013 at 2:25 AM, bearophile <bearophileHUGS lycos.com> wrote:http://rosettacode.org/wiki/Self-referential_sequence#Faster_Low-level_VersionLooking at the IR generated on Linux, I can't see any obvious big performance issues (such as e.g. GC allocations where there shouldn't be any), and indeed the code runs in < 1 s on my machine (can't test on Windows right now). I did, however, find a rather severe bug: We emit the "__gshared static" globals in the MemoryPool struct as thread-local, which is a) a big correctness problem, and b) might cause substantial slowdown due to the additional overhead incurred when accessing them. David
Dec 08 2013
David Nadlinger:I did, however, find a rather severe bug: We emit the "__gshared static" globals in the MemoryPool struct as thread-local, which is a) a big correctness problem, and b) might cause substantial slowdown due to the additional overhead incurred when accessing them.Usually not adding the "__gshared" annotation with DMD doesn't cause a significant slowdown of the code.Looking at the IR generated on Linux, I can't see any obvious big performance issues (such as e.g. GC allocations where there shouldn't be any), and indeed the code runs in < 1 s on my machine (can't test on Windows right now).DMD: http://codepad.org/8e6RCzlz LDC2: http://codepad.org/r2eYIOKg Bye, bearophile
Dec 08 2013
On Sunday, 8 December 2013 at 14:03:18 UTC, David Nadlinger wrote:I did, however, find a rather severe bug: We emit the "__gshared static" globals in the MemoryPool struct as thread-local, which is a) a big correctness problem, and b) might cause substantial slowdown due to the additional overhead incurred when accessing them.Turns out that this is actually a DMD issue: http://d.puremagic.com/issues/show_bug.cgi?id=4419 You might want to watch out for this trap in the future (or annoy people to fix it in the frontend, of course). David
Dec 23 2013
David Nadlinger:Turns out that this is actually a DMD issue: http://d.puremagic.com/issues/show_bug.cgi?id=4419 You might want to watch out for this trap in the future (or annoy people to fix it in the frontend, of course).I don't fully understand. I didn't know about issue 4419, it looks like an ugly bug. The same code from the rosettacode was compiled with both ldc2 and dmd, so if there's a front-end bug it should hit both compilers. Also here the performance difference I have seen is so large (20+?) that I don't think thread local bugs could be enough to justify it. The last version of the code I put on Rosettcode can't be compiled with the LDC2 I have because it uses a recent bug fix (it uses the .ptr of a zero length field, that until now was always null), but the Wiki site keeps all the older versions of the page. So I have used the precedent version of the D code, with and without swapping static and __gshared. And the code compiled with dmd is exactly the same performance as before, and the ldc2 code is still as slow as before in both cases. So I think Issue 4419 is not the cause of this problem, unless there's something I don't understand still. Bye, bearophile
Dec 23 2013
On Tuesday, 24 December 2013 at 07:39:14 UTC, bearophile wrote:I don't fully understand.I wasn't trying to imply that this is the reason for the slowdown you observe, just that my initial response of that being a severe LDC bug was wrong. As for the actual issue, I couldn't reproduce it yet… David
Dec 24 2013
On Saturday, 7 December 2013 at 01:25:30 UTC, bearophile wrote:I have found another case where the code compiled with LDC2 is slower than the same code compiled with dmd. This time the performance difference seems very large. The D code (I compile it on 32 bit Windows): http://rosettacode.org/wiki/Self-referential_sequence#Faster_Low-level_Version (It's the third D entry). The dmd compile runs in about 1.22 seconds on my PC. The ldc2 compile is very slow. I compile using: dmd -O -release -inline -noboundscheck self_referential_sequence3.d ldmd2 -O -release -inline -noboundscheck self_referential_sequence3.d + strip Bye, bearophileIn my case ldmd is faster than dmd
Dec 08 2013
Kozzi:In my case ldmd is faster than dmdWhat is your operating system and compiler versions used? Bye, bearophile
Dec 08 2013
On Monday, 9 December 2013 at 00:11:37 UTC, bearophile wrote:Kozzi:Archlinux: LDC - the LLVM D compiler (0.12.1): based on DMD v2.063.2 and LLVM 3.3 DMD - DMD64 D Compiler v2.064In my case ldmd is faster than dmdWhat is your operating system and compiler versions used? Bye, bearophile
Dec 10 2013