digitalmars.D.ldc - Another performance problem

bearophile (17/17) Dec 06 2013 I have found another case where the code compiled with LDC2 is

David Nadlinger (10/11) Dec 08 2013 Looking at the IR generated on Linux, I can't see any obvious big

bearophile (9/22) Dec 08 2013 Usually not adding the "__gshared" annotation with DMD doesn't
David Nadlinger (6/12) Dec 23 2013 Turns out that this is actually a DMD issue:

bearophile (21/25) Dec 23 2013 I don't fully understand.

David Nadlinger (6/7) Dec 24 2013 I wasn't trying to imply that this is the reason for the slowdown

Kozzi (2/19) Dec 08 2013 In my case ldmd is faster than dmd

bearophile (4/5) Dec 08 2013 What is your operating system and compiler versions used?

Daniel Kozak (5/10) Dec 10 2013 Archlinux:

"bearophile" <bearophileHUGS lycos.com> writes:

I have found another case where the code compiled with LDC2 is 
slower than the same code compiled with dmd. This time the 
performance difference seems very large. The D code (I compile it 
on 32 bit Windows):

http://rosettacode.org/wiki/Self-referential_sequence#Faster_Low-level_Version

(It's the third D entry).

The dmd compile runs in about 1.22 seconds on my PC. The ldc2 
compile is very slow.

I compile using:

dmd -O -release -inline -noboundscheck 
self_referential_sequence3.d

ldmd2 -O -release -inline -noboundscheck 
self_referential_sequence3.d
+
strip

Bye,
bearophile

Dec 06 2013

David Nadlinger <code klickverbot.at> writes:

On Sat, Dec 7, 2013 at 2:25 AM, bearophile <bearophileHUGS lycos.com> wrote:
 http://rosettacode.org/wiki/Self-referential_sequence#Faster_Low-level_Version

Looking at the IR generated on Linux, I can't see any obvious big
performance issues (such as e.g. GC allocations where there shouldn't
be any), and indeed the code runs in < 1 s on my machine (can't test
on Windows right now).

I did, however, find a rather severe bug: We emit the "__gshared
static" globals in the MemoryPool struct as thread-local, which is a)
a big correctness problem, and b) might cause substantial slowdown due
to the additional overhead incurred when accessing them.

David

Dec 08 2013

"bearophile" <bearophileHUGS lycos.com> writes:

David Nadlinger:

 I did, however, find a rather severe bug: We emit the "__gshared
 static" globals in the MemoryPool struct as thread-local, which 
 is a)
 a big correctness problem, and b) might cause substantial 
 slowdown due
 to the additional overhead incurred when accessing them.

Usually not adding the "__gshared" annotation with DMD doesn't 
cause a significant slowdown of the code.



 Looking at the IR generated on Linux, I can't see any obvious 
 big
 performance issues (such as e.g. GC allocations where there 
 shouldn't
 be any), and indeed the code runs in < 1 s on my machine (can't 
 test
 on Windows right now).

DMD:
http://codepad.org/8e6RCzlz

LDC2:
http://codepad.org/r2eYIOKg

Bye,
bearophile

Dec 08 2013

"David Nadlinger" <code klickverbot.at> writes:

On Sunday, 8 December 2013 at 14:03:18 UTC, David Nadlinger wrote:
 I did, however, find a rather severe bug: We emit the "__gshared
 static" globals in the MemoryPool struct as thread-local, which 
 is a)
 a big correctness problem, and b) might cause substantial 
 slowdown due
 to the additional overhead incurred when accessing them.

Turns out that this is actually a DMD issue: 
http://d.puremagic.com/issues/show_bug.cgi?id=4419

You might want to watch out for this trap in the future (or annoy 
people to fix it in the frontend, of course).

David

Dec 23 2013

"bearophile" <bearophileHUGS lycos.com> writes:

David Nadlinger:

 Turns out that this is actually a DMD issue: 
 http://d.puremagic.com/issues/show_bug.cgi?id=4419

 You might want to watch out for this trap in the future (or 
 annoy people to fix it in the frontend, of course).

I don't fully understand.

I didn't know about issue 4419, it looks like an ugly bug.

The same code from the rosettacode was compiled with both ldc2 
and dmd, so if there's a front-end bug it should hit both 
compilers.

Also here the performance difference I have seen is so large 
(20+?) that I don't think thread local bugs could be enough to 
justify it.

The last version of the code I put on Rosettcode can't be 
compiled with the LDC2 I have because it uses a recent bug fix 
(it uses the .ptr of a zero length field, that until now was 
always null), but the Wiki site keeps all the older versions of 
the page. So I have used the precedent version of the D code, 
with and without swapping static and __gshared. And the code 
compiled with dmd is exactly the same performance as before, and 
the ldc2 code is still as slow as before in both cases.

So I think Issue 4419 is not the cause of this problem, unless 
there's something I don't understand still.

Bye,
bearophile

Dec 23 2013

"David Nadlinger" <code klickverbot.at> writes:

On Tuesday, 24 December 2013 at 07:39:14 UTC, bearophile wrote:
 I don't fully understand.

I wasn't trying to imply that this is the reason for the slowdown 
you observe, just that my initial response of that being a severe 
LDC bug was wrong.

As for the actual issue, I couldn't reproduce it yet…

David

Dec 24 2013

"Kozzi" <kozzi11 gmail.com> writes:

On Saturday, 7 December 2013 at 01:25:30 UTC, bearophile wrote:
 I have found another case where the code compiled with LDC2 is 
 slower than the same code compiled with dmd. This time the 
 performance difference seems very large. The D code (I compile 
 it on 32 bit Windows):

 http://rosettacode.org/wiki/Self-referential_sequence#Faster_Low-level_Version

 (It's the third D entry).

 The dmd compile runs in about 1.22 seconds on my PC. The ldc2 
 compile is very slow.

 I compile using:

 dmd -O -release -inline -noboundscheck 
 self_referential_sequence3.d

 ldmd2 -O -release -inline -noboundscheck 
 self_referential_sequence3.d
 +
 strip

 Bye,
 bearophile

In my case ldmd is faster than dmd

Dec 08 2013

"bearophile" <bearophileHUGS lycos.com> writes:

Kozzi:

 In my case ldmd is faster than dmd

What is your operating system and compiler versions used?

Bye,
bearophile

Dec 08 2013

"Daniel Kozak" <kozzi11 gmail.com> writes:

On Monday, 9 December 2013 at 00:11:37 UTC, bearophile wrote:
 Kozzi:

 In my case ldmd is faster than dmd

 What is your operating system and compiler versions used?

 Bye,
 bearophile

Archlinux:
LDC - the LLVM D compiler (0.12.1):
   based on DMD v2.063.2 and LLVM 3.3

DMD - DMD64 D Compiler v2.064

Dec 10 2013

D Programming

C/C++ Programming

Other

digitalmars.D.ldc - Another performance problem