digitalmars.D - Perlin noise benchmark speed

Nick Treleaven (13/13) Jun 20 2014 Hi,

Nick Treleaven (4/6) Jun 20 2014 Also, it does appear to be using the correct compiler flags (at least

David Nadlinger (8/15) Jun 20 2014 -release is missing, although that probably isn't playing a big

MrSmith (2/18) Jun 20 2014 struct can be used instead of class
Robert Schadek via Digitalmars-d (2/17) Jun 20 2014 I converted Noise2DContext into a struct, I gone add some more to my pat...

Robert Schadek via Digitalmars-d (2/8) Jun 20 2014 I added some final pure @safe stuff

David Nadlinger (8/9) Jun 20 2014 Thanks. As a general comment, I'd be careful with suggesting the

dennis luehring (3/16) Jun 20 2014 write, printf etc. performance is benchmarked also - so not clear

dennis luehring (9/32) Jun 20 2014 using perf with 10 is maybe too small to give good avarge result infos
Mattcoder (6/9) Jun 20 2014 Indeed and using Windows (At least 8), the size of command-window

David Nadlinger (14/22) Jun 20 2014 Before I wrote the above, I briefly ran the benchmark on my local

Ary Borenszweig (4/18) Jun 20 2014 I just tried it with ldc and it's faster (faster than Go, slower than
bearophile (8/10) Jun 20 2014 This should be compiled with LDC2, it's more idiomatic and a

bearophile (3/4) Jun 20 2014 Sorry for the awful tabs.

bearophile (7/7) Jun 20 2014 So this is the best so far version:

Mattcoder (10/12) Jun 20 2014 Just one note, with the last version of DMD:

Mattcoder (8/21) Jun 20 2014 Sorry, I forgot this:

bearophile (5/9) Jun 20 2014 If you remove the calls to floor, you are avoiding the main

bearophile (5/6) Jun 20 2014 Yes, I know, at the top of the file I have specified it's for

bearophile (6/6) Jun 20 2014 If I add this import in Noise2DContext.getGradients the run-time

whassup (2/8) Jun 20 2014
JR (6/12) Jun 20 2014 Was just about to post that if I cheat and replace usage of

dennis luehring (4/15) Jun 20 2014 it does not makes sense to "optmized" this example more and more - it

Mattcoder (4/7) Jun 20 2014 Oh please, let him continue, I'm really learning a lot with these
bearophile (11/13) Jun 20 2014 But the original code is not fast. So someone has to find what's

dennis luehring (5/18) Jun 20 2014 as long as you find out its a library thing

David Nadlinger (9/37) Jun 21 2014 bearophile's work is very valuable regardless of what the cause

bearophile (4/5) Jun 20 2014 And a simple benchmark for D ranges/parallelism:
bearophile (5/6) Jun 20 2014 And a simple benchmark for D ranges/parallelism:
weaselcat (20/35) Mar 23 2015 I saw this thread when searching for something on the site, been

Iain Buclaw via Digitalmars-d (6/45) Mar 23 2015 I'd suspect stdc.math to be SSE3/SSE4 optimised assembly, where as std.m...

Nick Treleaven <ntrel-public yahoo.co.uk> writes:

Hi,
A Perlin noise benchmark was quoted in this reddit thread:

http://www.reddit.com/r/rust/comments/289enx/c0de517e_where_is_my_c_replacement/cibn6sr

It apparently shows the 3 main D compilers producing slower code than 
Go, Rust, gcc, clang, Nimrod:

https://github.com/nsf/pnoise#readme

I initially wondered about std.random, but got this response:

"Yeah, but std.random is not used in that benchmark, it just initializes 
256 random vectors and permutates 256 sequential integers. What spins in 
a loop is just plain FP math and array read/writes. I'm sure it can be 
done faster, maybe D compilers are bad at automatic inlining or something. "

Obviously this is only one person's benchmark, but I wondered if people 
would like to check their code and suggest reasons for the speed deficit.

Jun 20 2014

Nick Treleaven <ntrel-public yahoo.co.uk> writes:

On 20/06/2014 13:32, Nick Treleaven wrote:
 It apparently shows the 3 main D compilers producing slower code than
 Go, Rust, gcc, clang, Nimrod:

Also, it does appear to be using the correct compiler flags (at least 
for dmd):
https://github.com/nsf/pnoise/blob/master/compile.bash

Jun 20 2014

"David Nadlinger" <code klickverbot.at> writes:

On Friday, 20 June 2014 at 12:34:55 UTC, Nick Treleaven wrote:
 On 20/06/2014 13:32, Nick Treleaven wrote:
 It apparently shows the 3 main D compilers producing slower 
 code than
 Go, Rust, gcc, clang, Nimrod:

 Also, it does appear to be using the correct compiler flags (at 
 least for dmd):
 https://github.com/nsf/pnoise/blob/master/compile.bash

-release is missing, although that probably isn't playing a big 
role here.

Another minor issues is that Noise2DContext isn't final, making 
the calls to get virtual.

This should cause such a big difference though. Hopefully 
somebody can investigate this more closely.

David

Jun 20 2014

"MrSmith" <mrsmith33 yandex.ru> writes:

On Friday, 20 June 2014 at 12:56:46 UTC, David Nadlinger wrote:
 On Friday, 20 June 2014 at 12:34:55 UTC, Nick Treleaven wrote:
 On 20/06/2014 13:32, Nick Treleaven wrote:
 It apparently shows the 3 main D compilers producing slower 
 code than
 Go, Rust, gcc, clang, Nimrod:

 Also, it does appear to be using the correct compiler flags 
 (at least for dmd):
 https://github.com/nsf/pnoise/blob/master/compile.bash

 -release is missing, although that probably isn't playing a big 
 role here.

 Another minor issues is that Noise2DContext isn't final, making 
 the calls to get virtual.

 This should cause such a big difference though. Hopefully 
 somebody can investigate this more closely.

 David

struct can be used instead of class

Jun 20 2014

Robert Schadek via Digitalmars-d <digitalmars-d puremagic.com> writes:

On 06/20/2014 02:56 PM, David Nadlinger via Digitalmars-d wrote:
 On Friday, 20 June 2014 at 12:34:55 UTC, Nick Treleaven wrote:
 On 20/06/2014 13:32, Nick Treleaven wrote:
 It apparently shows the 3 main D compilers producing slower code than
 Go, Rust, gcc, clang, Nimrod:

 Also, it does appear to be using the correct compiler flags (at least
 for dmd):
 https://github.com/nsf/pnoise/blob/master/compile.bash

 -release is missing, although that probably isn't playing a big role
 here.

 Another minor issues is that Noise2DContext isn't final, making the
 calls to get virtual.

 This should cause such a big difference though. Hopefully somebody can
 investigate this more closely.

 David

I converted Noise2DContext into a struct, I gone add some more to my patch

Jun 20 2014

Robert Schadek via Digitalmars-d <digitalmars-d puremagic.com> writes:

On 06/20/2014 02:34 PM, Nick Treleaven via Digitalmars-d wrote:
 On 20/06/2014 13:32, Nick Treleaven wrote:
 It apparently shows the 3 main D compilers producing slower code than
 Go, Rust, gcc, clang, Nimrod:

 Also, it does appear to be using the correct compiler flags (at least
 for dmd):
 https://github.com/nsf/pnoise/blob/master/compile.bash

I added some final pure  safe stuff

Jun 20 2014

"David Nadlinger" <code klickverbot.at> writes:

On Friday, 20 June 2014 at 13:20:16 UTC, Robert Schadek via 
Digitalmars-d wrote:
 I added some final pure  safe stuff

Thanks. As a general comment, I'd be careful with suggesting the 
use of pure/ safe/… for performance improvements in 
microbenchmarks. While it is certainly good D style to use them 
wherever possible, it might lead people less familiar with D to 
believe that fast D code needs a lot of annotations.

David

Jun 20 2014

dennis luehring <dl.soluz gmx.net> writes:

Am 20.06.2014 14:32, schrieb Nick Treleaven:
 Hi,
 A Perlin noise benchmark was quoted in this reddit thread:

 http://www.reddit.com/r/rust/comments/289enx/c0de517e_where_is_my_c_replacement/cibn6sr

 It apparently shows the 3 main D compilers producing slower code than
 Go, Rust, gcc, clang, Nimrod:

 https://github.com/nsf/pnoise#readme

 I initially wondered about std.random, but got this response:

 "Yeah, but std.random is not used in that benchmark, it just initializes
 256 random vectors and permutates 256 sequential integers. What spins in
 a loop is just plain FP math and array read/writes. I'm sure it can be
 done faster, maybe D compilers are bad at automatic inlining or something. "

 Obviously this is only one person's benchmark, but I wondered if people
 would like to check their code and suggest reasons for the speed deficit.

write, printf etc. performance is benchmarked also - so not clear
if pnoise is super-fast but write is super-slow etc...

Jun 20 2014

dennis luehring <dl.soluz gmx.net> writes:

Am 20.06.2014 15:14, schrieb dennis luehring:
 Am 20.06.2014 14:32, schrieb Nick Treleaven:
 Hi,
 A Perlin noise benchmark was quoted in this reddit thread:

 http://www.reddit.com/r/rust/comments/289enx/c0de517e_where_is_my_c_replacement/cibn6sr

 It apparently shows the 3 main D compilers producing slower code than
 Go, Rust, gcc, clang, Nimrod:

 https://github.com/nsf/pnoise#readme

 I initially wondered about std.random, but got this response:

 "Yeah, but std.random is not used in that benchmark, it just initializes
 256 random vectors and permutates 256 sequential integers. What spins in
 a loop is just plain FP math and array read/writes. I'm sure it can be
 done faster, maybe D compilers are bad at automatic inlining or something. "

 Obviously this is only one person's benchmark, but I wondered if people
 would like to check their code and suggest reasons for the speed deficit.

 write, printf etc. performance is benchmarked also - so not clear
 if pnoise is super-fast but write is super-slow etc...

using perf with 10 is maybe too small to give good avarge result infos
and also runtime startup etc. is measured - it not clear what is slower

these benchmarks should be seperated into 3 parts

runtime-startup
pure pnoise
result output - needed only once for verification, return dummy output 
will fit better to test the pnoise speed

are array bounds checks active?

Jun 20 2014

"Mattcoder" <fromtheotherside mail.com> writes:

On Friday, 20 June 2014 at 13:14:04 UTC, dennis luehring wrote:
 write, printf etc. performance is benchmarked also - so not 
 clear
 if pnoise is super-fast but write is super-slow etc...

Indeed and using Windows (At least 8), the size of command-window 
(CMD) interferes in the result drastically... for example: 
running this test with console maximized will take: 2.58s while 
the same test but in small window: 2.11s!

Matheus.

Jun 20 2014

"David Nadlinger" <code klickverbot.at> writes:

On Friday, 20 June 2014 at 13:46:26 UTC, Mattcoder wrote:
 On Friday, 20 June 2014 at 13:14:04 UTC, dennis luehring wrote:
 write, printf etc. performance is benchmarked also - so not 
 clear
 if pnoise is super-fast but write is super-slow etc...

 Indeed and using Windows (At least 8), the size of 
 command-window (CMD) interferes in the result drastically... 
 for example: running this test with console maximized will 
 take: 2.58s while the same test but in small window: 2.11s!

Before I wrote the above, I briefly ran the benchmark on my local 
(OS X) machine, and verified that the bulk of the time is indeed 
spent in the noise calculation loop (with stdout piped into 
/dev/null). Still, the LDC-compiled code is only about half as 
fast as the Clang-compiled version, and there is no good reason 
why it should be.

My new guess is a difference in inlining heuristics (note also 
that the Rust version uses inlining hints). The big difference 
between GCC and Clang might be a hint that the performance drop 
is caused by a rather minute difference in optimizer tuning.

Thus, we really need somebody to sit down with a 
profiler/disassembler and figure out what is going on.

David

Jun 20 2014

Ary Borenszweig <ary esperanto.org.ar> writes:

On 6/20/14, 9:32 AM, Nick Treleaven wrote:
 Hi,
 A Perlin noise benchmark was quoted in this reddit thread:

 http://www.reddit.com/r/rust/comments/289enx/c0de517e_where_is_my_c_replacement/cibn6sr


 It apparently shows the 3 main D compilers producing slower code than
 Go, Rust, gcc, clang, Nimrod:

 https://github.com/nsf/pnoise#readme

 I initially wondered about std.random, but got this response:

 "Yeah, but std.random is not used in that benchmark, it just initializes
 256 random vectors and permutates 256 sequential integers. What spins in
 a loop is just plain FP math and array read/writes. I'm sure it can be
 done faster, maybe D compilers are bad at automatic inlining or
 something. "

 Obviously this is only one person's benchmark, but I wondered if people
 would like to check their code and suggest reasons for the speed deficit.

I just tried it with ldc and it's faster (faster than Go, slower than 
Ni. But this is still slower than other languages. And other languages 
keep the array bounds check on...

Jun 20 2014

"bearophile" <bearophileHUGS lycos.com> writes:

Nick Treleaven:

 A Perlin noise benchmark was quoted in this reddit thread:

 http://www.reddit.com/r/rust/comments/289enx/c0de517e_where_is_my_c_replacement/cibn6sr

This should be compiled with LDC2, it's more idiomatic and a 
little faster than the original D version:
http://dpaste.dzfl.pl/8d2ff04b62d3

I have already seen that if I inline Noise2DContext.get in the 
main manually the program gets faster (but not yet fast enough).

Bye,
bearophile

Jun 20 2014

"bearophile" <bearophileHUGS lycos.com> writes:

 http://dpaste.dzfl.pl/8d2ff04b62d3

Sorry for the awful tabs.

Bye,
bearophile

Jun 20 2014

"bearophile" <bearophileHUGS lycos.com> writes:

So this is the best so far version:

http://dpaste.dzfl.pl/8dae9b359f27

I don't show the version with manually inlined function.

(I have also seen that GCC generates on my cpu a little faster 
code if I don't use sse registers.)

Bye,
bearophile

Jun 20 2014

"Mattcoder" <fromtheotherside mail.com> writes:

On Friday, 20 June 2014 at 16:02:56 UTC, bearophile wrote:
 So this is the best so far version:

 http://dpaste.dzfl.pl/8dae9b359f27

Just one note, with the last version of DMD:

dmd -O -noboundscheck -inline -release pnoise.d
pnoise.d(42): Error: pure function 
'pnoise.Noise2DContext.getGradients' cannot c
all impure function 'core.stdc.math.floor'
pnoise.d(43): Error: pure function 
'pnoise.Noise2DContext.getGradients' cannot c
all impure function 'core.stdc.math.floor'

Matheus.

Jun 20 2014

"Mattcoder" <fromtheotherside mail.com> writes:

On Friday, 20 June 2014 at 18:29:35 UTC, Mattcoder wrote:
 On Friday, 20 June 2014 at 16:02:56 UTC, bearophile wrote:
 So this is the best so far version:

 http://dpaste.dzfl.pl/8dae9b359f27

 Just one note, with the last version of DMD:

 dmd -O -noboundscheck -inline -release pnoise.d
 pnoise.d(42): Error: pure function 
 'pnoise.Noise2DContext.getGradients' cannot c
 all impure function 'core.stdc.math.floor'
 pnoise.d(43): Error: pure function 
 'pnoise.Noise2DContext.getGradients' cannot c
 all impure function 'core.stdc.math.floor'

 Matheus.

Sorry, I forgot this:

Beside the error above, which for now I'm using:

immutable float x0f = cast(int)x; //x.floor;
immutable float y0f = cast(int)y; //y.floor;

Just to compile, your version here is twice faster than the 
original one.

Matheus.

Jun 20 2014

"bearophile" <bearophileHUGS lycos.com> writes:

Mattcoder:

 Beside the error above, which for now I'm using:

 immutable float x0f = cast(int)x; //x.floor;
 immutable float y0f = cast(int)y; //y.floor;

 Just to compile,

If you remove the calls to floor, you are avoiding the main 
problem to fix.

Bye,
bearohile

Jun 20 2014

"bearophile" <bearophileHUGS lycos.com> writes:

Mattcoder:

 Just one note, with the last version of DMD:

Yes, I know, at the top of the file I have specified it's for 
ldc2.

Bye,
bearophile

Jun 20 2014

"bearophile" <bearophileHUGS lycos.com> writes:

If I add this import in Noise2DContext.getGradients the run-time 
decreases a lot (I am now just two times slower than gcc with 
-Ofast):

import core.stdc.math: floor;

Bye,
bearophile

Jun 20 2014

"whassup" <Whasss yahoo.com> writes:

  GO BEAROPHILE YOU CAN DO IT

On Friday, 20 June 2014 at 15:24:38 UTC, bearophile wrote:
 If I add this import in Noise2DContext.getGradients the 
 run-time decreases a lot (I am now just two times slower than 
 gcc with -Ofast):

 import core.stdc.math: floor;

 Bye,
 bearophile

Jun 20 2014

"JR" <zorael gmail.com> writes:

On Friday, 20 June 2014 at 15:24:38 UTC, bearophile wrote:
 If I add this import in Noise2DContext.getGradients the 
 run-time decreases a lot (I am now just two times slower than 
 gcc with -Ofast):

 import core.stdc.math: floor;

 Bye,
 bearophile

Was just about to post that if I cheat and replace usage of 
floor(x) with cast(float)cast(int)x, ldc2 is almost down to gcc 
speeds (119.6ms average over 100 full executions vs gcc 102.7ms).

It stood out in the callgraph. Because profiling before 
optimizing.

Jun 20 2014

dennis luehring <dl.soluz gmx.net> writes:

Am 20.06.2014 17:09, schrieb bearophile:
 Nick Treleaven:

 A Perlin noise benchmark was quoted in this reddit thread:

 http://www.reddit.com/r/rust/comments/289enx/c0de517e_where_is_my_c_replacement/cibn6sr

 This should be compiled with LDC2, it's more idiomatic and a
 little faster than the original D version:
 http://dpaste.dzfl.pl/8d2ff04b62d3

 I have already seen that if I inline Noise2DContext.get in the
 main manually the program gets faster (but not yet fast enough).

 Bye,
 bearophile

it does not makes sense to "optmized" this example more and more - it 
should be fast with the original version (except the missing finals on 
the virtuals)

Jun 20 2014

"Mattcoder" <fromtheotherside mail.com> writes:

On Friday, 20 June 2014 at 18:32:22 UTC, dennis luehring wrote:
 it does not makes sense to "optmized" this example more and 
 more - it should be fast with the original version (except the 
 missing finals on the virtuals)

Oh please, let him continue, I'm really learning a lot with these 
optimizations.

Matheus.

Jun 20 2014

"bearophile" <bearophileHUGS lycos.com> writes:

dennis luehring:

 it does not makes sense to "optmized" this example more and 
 more - it should be fast with the original version

But the original code is not fast. So someone has to find what's 
broken. I have shown part of the broken parts to fix (floor on 
ldc2).

Also, the original code is not written in a fully idiomatic way, 
also because unfortunately today the "lazy" way to write D code 
is not always the best/right way (example: you have to add ton of 
immutable/const, and annotations, because immutability is not the 
default), so a code fix is good.

Bye,
bearophile

Jun 20 2014

dennis luehring <dl.soluz gmx.net> writes:

Am 20.06.2014 22:44, schrieb bearophile:
 dennis luehring:

 it does not makes sense to "optmized" this example more and
 more - it should be fast with the original version

 But the original code is not fast. So someone has to find what's
 broken. I have shown part of the broken parts to fix (floor on
 ldc2).

 Also, the original code is not written in a fully idiomatic way,
 also because unfortunately today the "lazy" way to write D code
 is not always the best/right way (example: you have to add ton of
 immutable/const, and annotations, because immutability is not the
 default), so a code fix is good.

 Bye,
 bearophile

as long as you find out its a library thing

the c version is without any annotations and immutable/const the fastest 
- so whats the problem with D here, it can't(shouln't) be that one needs 
to work/change that much on such simple code to reach c speed

Jun 20 2014

"David Nadlinger" <code klickverbot.at> writes:

On Saturday, 21 June 2014 at 05:00:25 UTC, dennis luehring wrote:
 Am 20.06.2014 22:44, schrieb bearophile:
 dennis luehring:

 it does not makes sense to "optmized" this example more and
 more - it should be fast with the original version

 But the original code is not fast. So someone has to find 
 what's
 broken. I have shown part of the broken parts to fix (floor on
 ldc2).

 Also, the original code is not written in a fully idiomatic 
 way,
 also because unfortunately today the "lazy" way to write D code
 is not always the best/right way (example: you have to add ton 
 of
 immutable/const, and annotations, because immutability is not 
 the
 default), so a code fix is good.

 Bye,
 bearophile

 as long as you find out its a library thing

 the c version is without any annotations and immutable/const 
 the fastest - so whats the problem with D here, it 
 can't(shouln't) be that one needs to work/change that much on 
 such simple code to reach c speed

bearophile's work is very valuable regardless of what the cause 
is, as it provides a pretty decent hint of what could be improved 
for anybody investigating the issue.

This is not to say that we wouldn't need to fix our compilers (in 
end user terms, i.e. compiler + standard library) to make those 
examples fast – zero-cost abstractions are one of the main 
strengths of D.

David

Jun 21 2014

"bearophile" <bearophileHUGS lycos.com> writes:

Nick Treleaven:

 A Perlin noise benchmark was quoted in this reddit thread:

And a simple benchmark for D ranges/parallelism:

Bye,
bearophile

Jun 20 2014

"bearophile" <bearophileHUGS lycos.com> writes:

Nick Treleaven:

 A Perlin noise benchmark was quoted in this reddit thread:

And a simple benchmark for D ranges/parallelism:

http://www.reddit.com/r/programming/comments/28mub4/clash_of_the_lambdas_comparing_lambda_performance/

Bye,
bearophile

Jun 20 2014

"weaselcat" <weaselcat gmail.com> writes:

On Friday, 20 June 2014 at 12:32:39 UTC, Nick Treleaven wrote:
 Hi,
 A Perlin noise benchmark was quoted in this reddit thread:

 http://www.reddit.com/r/rust/comments/289enx/c0de517e_where_is_my_c_replacement/cibn6sr

 It apparently shows the 3 main D compilers producing slower 
 code than Go, Rust, gcc, clang, Nimrod:

 https://github.com/nsf/pnoise#readme

 I initially wondered about std.random, but got this response:

 "Yeah, but std.random is not used in that benchmark, it just 
 initializes 256 random vectors and permutates 256 sequential 
 integers. What spins in a loop is just plain FP math and array 
 read/writes. I'm sure it can be done faster, maybe D compilers 
 are bad at automatic inlining or something. "

 Obviously this is only one person's benchmark, but I wondered 
 if people would like to check their code and suggest reasons 
 for the speed deficit.

I saw this thread when searching for something on the site, been 
a few months since anyone posted-

I fixed the D flags, gdc is now about 15% faster than the second 
fastest in the benchmark(C - gcc) which obviously puts D in first.
some notes:

LDC is missing _tons_ of inline opportunities, killing it in 
comparison to GDC. I think GDC inlined pretty much everything. 
LDC is about 50% slower.

Also, AFAICT there's no fast-math switch for LDC(enabling this 
for GDC might actually be compromising it though : ) )

I think LDC turns the floor in std.math into the same as the stdc 
one, but GDC does not. std.math.floor is still abysmally slow, I 
thought it was because it was still using reals but that does not 
seem to be the case. GDC slows to a crawl(10-20x slower) if you 
replace the stdc floor with the one in std.math(just remove the 
alias)

I thought this might be interesting to someone(i.e, LDC/GDC folks 
or phobos math folks)

bye.

Mar 23 2015

Iain Buclaw via Digitalmars-d <digitalmars-d puremagic.com> writes:

I'd suspect stdc.math to be SSE3/SSE4 optimised assembly, where as std.math
uses a very generic (works on almost every float format) implementation
that is at least 'pure'.

Iain.
On 24 Mar 2015 00:30, "weaselcat via Digitalmars-d" <
digitalmars-d puremagic.com> wrote:

 On Friday, 20 June 2014 at 12:32:39 UTC, Nick Treleaven wrote:

 Hi,
 A Perlin noise benchmark was quoted in this reddit thread:

 http://www.reddit.com/r/rust/comments/289enx/c0de517e_
 where_is_my_c_replacement/cibn6sr

 It apparently shows the 3 main D compilers producing slower code than Go,
 Rust, gcc, clang, Nimrod:

 https://github.com/nsf/pnoise#readme

 I initially wondered about std.random, but got this response:

 "Yeah, but std.random is not used in that benchmark, it just initializes
 256 random vectors and permutates 256 sequential integers. What spins in a
 loop is just plain FP math and array read/writes. I'm sure it can be done
 faster, maybe D compilers are bad at automatic inlining or something. "

 Obviously this is only one person's benchmark, but I wondered if people
 would like to check their code and suggest reasons for the speed deficit.

 I saw this thread when searching for something on the site, been a few
 months since anyone posted-

 I fixed the D flags, gdc is now about 15% faster than the second fastest
 in the benchmark(C - gcc) which obviously puts D in first.
 some notes:

 LDC is missing _tons_ of inline opportunities, killing it in comparison to
 GDC. I think GDC inlined pretty much everything. LDC is about 50% slower.

 Also, AFAICT there's no fast-math switch for LDC(enabling this for GDC
 might actually be compromising it though : ) )

 I think LDC turns the floor in std.math into the same as the stdc one, but
 GDC does not. std.math.floor is still abysmally slow, I thought it was
 because it was still using reals but that does not seem to be the case. GDC
 slows to a crawl(10-20x slower) if you replace the stdc floor with the one
 in std.math(just remove the alias)

 I thought this might be interesting to someone(i.e, LDC/GDC folks or
 phobos math folks)

 bye.

Mar 23 2015

D Programming

C/C++ Programming

Other

digitalmars.D - Perlin noise benchmark speed