digitalmars.D - Updated D Benchmarks

Robert Clipsham (23/23) Mar 14 2009 Hi all,

bearophile (14/21) Mar 15 2009 The purpose of the reference is to see how far are the D implementations...

Robert Clipsham (29/49) Mar 15 2009 None of the ones that I'm currently using are. This is just an arbitrary...

bearophile (33/43) Mar 15 2009 Robert Clipsham:

Robert Clipsham (15/67) Mar 15 2009 I've got a better idea. That page is automatically generated from an xml...

bearophile (7/10) Mar 15 2009 I don't like XML; a small txt table is so easy to process with three lin...

Robert Clipsham (5/7) Mar 15 2009 If you would like to provide me with a script to convert the xml file to...

naryl (13/21) Mar 15 2009 I think this will suffice:

bearophile (18/20) Mar 15 2009 A Python version a little more resilient to changes in that file:

bearophile (12/12) Mar 15 2009 Sorry, assuming a "tidy XML file" is silly. Better:

bearophile (5/5) Mar 15 2009 Robert Clipsham, eventually your site may become like this page (it may ...

Robert Clipsham (5/12) Mar 16 2009 This looks like the way to go for the benchmarks. When I've added all

bearophile (13/17) Mar 16 2009 I have sent you an email with several benchmarks.

Robert Clipsham <robert octarineparrot.com> writes:

Hi all,

After reading through your comments from my last post, I have 
implemented most of the changes you have requested.

  * Added Compile times
  * Added Memory and Virtual Memory usage
  * Added final executable size
  * Tests are now run 4 times, and readings are the minimum taken from 
the last three runs only (or maximum in the case of memory usage)
  * Graphs now line up properly
  * More detailed compiler information is given
  * Benchmarks have been tweaked to last longer

The only request I believe I've missed (correct me if I'm wrong!) is a C 
or C++ reference. This was planned for inclusion, but I couldn't come to 
a conclusion on which compiler to include and whether to use C or C++. 
Before you suggest having multiple references, I would rather only have 
one reference otherwise it becomes more general language benchmarks 
rather than D.

The benchmarks can be found at http://dbench.octarineparrot.com/.

Updated source code can be found at 
http://hg.octarineparrot.com/dbench/file/tip.

Again, if you have any comments or ideas for improvements let me know! 
If you can come to a conclusion on what C/C++ compiler to use as a 
reference I will rerun the benchmarks with a reference.

Mar 14 2009

bearophile <bearophileHUGS lycos.com> writes:

Robert Clipsham, the pages are indeed improved a lot. Thank you for your work.

with all benchmarks limited to 256mb memory usage<

- Some benchmarks of the Shootout site will probably need more than 256 mb of
RAM.


 The only request I believe I've missed (correct me if I'm wrong!) is a C 
 or C++ reference. This was planned for inclusion, but I couldn't come to 
 a conclusion on which compiler to include and whether to use C or C++.
 Before you suggest having multiple references, I would rather only have 
 one reference otherwise it becomes more general language benchmarks 
 rather than D.

The purpose of the reference is to see how far are the D implementations from
"a good enough" compilation. In most cases this means that you can time a C or
C++ version (in some benchmarks other languages are faster than C/C++, but for
the moment we can ignore this).
So my suggest is just to take a look at the Shootout site, where you take your
code from (D implementations aren't present anymore, but there are kept
elsewhere if you need them), and use the C version every time it's the faster
between the C and C++ ones, and use the C++ version in the other cases (I give
preference to the C version because they are generally simpler).

Few notes:
- I may also offer you few more benchmarks not present in the Shootout site.
- Now I suggest you to add more benchmarks.
- Did you strip the executable produced by ldc and gdc? (If you do it, or you
don't do it, then add a note that says it).
- What is the trouble of nbody with gdc?
- when you give an URL into an email or post I suggest you to not put a full
stop "." at the end, otherwise the person that reads the post may have to
delete it manually later from the URL. If you really want to add a full stop or
a closing parentheses, put a space before it, like this:
(http://www.digitalmars.com/webnews/ ).
- The compilation times are too much small, so such values are probably noisy.
So you may add another value: you can compile all the programs and take the
total time required (this isn't the sum of the single compilation times).
Otherwise you may need a different benchmark, a much longer D program, that you
can compile with all three compilers.
- From your results it seems ldc needs more memory to run the programs. The LDC
team may take a look at this.

Bye,
bearophile

Mar 15 2009

Robert Clipsham <robert octarineparrot.com> writes:

bearophile wrote:
 Robert Clipsham, the pages are indeed improved a lot. Thank you for your work.

Thanks, I'm glad you approve!

 with all benchmarks limited to 256mb memory usage<

 - Some benchmarks of the Shootout site will probably need more than 256 mb of
RAM.

None of the ones that I'm currently using are. This is just an arbitrary 
limit I have put in place so my server doesn't run out of memory.

 The only request I believe I've missed (correct me if I'm wrong!) is a C 
 or C++ reference. This was planned for inclusion, but I couldn't come to 
 a conclusion on which compiler to include and whether to use C or C++.
 Before you suggest having multiple references, I would rather only have 
 one reference otherwise it becomes more general language benchmarks 
 rather than D.

 
 The purpose of the reference is to see how far are the D implementations from
"a good enough" compilation. In most cases this means that you can time a C or
C++ version (in some benchmarks other languages are faster than C/C++, but for
the moment we can ignore this).
 So my suggest is just to take a look at the Shootout site, where you take your
code from (D implementations aren't present anymore, but there are kept
elsewhere if you need them), and use the C version every time it's the faster
between the C and C++ ones, and use the C++ version in the other cases (I give
preference to the C version because they are generally simpler).

So you suggest I choose whichever performs best out of C or C++? What 
compiler would you recommend? Before I was leaning towards C++ (not sure 
on a compiler), purely because it has a more similar feature set to D/

 Few notes:
 - I may also offer you few more benchmarks not present in the Shootout site.

Thanks! I'd love to add more benchmarks, 6 doesn't give a great 
overview. Someone has already sent me a few which I plan to include, 
most of them seem to be x86-32 specific though so I've asked they're 
updated before I include them.

 - Now I suggest you to add more benchmarks.

My thoughts exactly!

 - Did you strip the executable produced by ldc and gdc? (If you do it, or you
don't do it, then add a note that says it).

No, I did not strip them. I think I might add another page, one with 
executable sizes, another with stripped executable sizes.

 - What is the trouble of nbody with gdc?

I can't remember off the top of my head, I seem to recall it was a 
linking error though. I did try to debug it when they were originally 
run I didn't manage to get anywhere with it though.

 - when you give an URL into an email or post I suggest you to not put a full
stop "." at the end, otherwise the person that reads the post may have to
delete it manually later from the URL. If you really want to add a full stop or
a closing parentheses, put a space before it, like this:
(http://www.digitalmars.com/webnews/ ).

I generally do, it was 4am when I posted though ;P

 - The compilation times are too much small, so such values are probably noisy.
So you may add another value: you can compile all the programs and take the
total time required (this isn't the sum of the single compilation times).
Otherwise you may need a different benchmark, a much longer D program, that you
can compile with all three compilers.

I don't see a problem with this. Even if the values are noisy it still 
shows that the compilation times are tiny! I quite like your idea of 
summing the total compile time, I believe this will probably confuse 
some people and lead them to think D takes a long time to compile. I 
should note that compile times are unlikely to be accurate in any case 
as they are only compiled once.

 - From your results it seems ldc needs more memory to run the programs. The
LDC team may take a look at this.

There doesn't seem to be that much difference, but I'm sure they'd be 
happy to look into it. One of the ideas of the benchmarks is to add a 
bit of competition and see if we can get D compilers even faster! :D
I think once we have a reference from a C or C++ compiler this will 
become even more true (providing D isn't faster already).

Mar 15 2009

bearophile <bearophileHUGS lycos.com> writes:

Robert Clipsham:

I have seen you have put all graphs in a page. This is probably better. When
you have 10-20 benchmarks you may need less thick bars.

You can add the raw timings, formatted into an ASCII table, a bit like this
(don't use an HTML table):
http://zi.fi/shootout/rawresults.txt
There's no strict need of a separated file, a <pre>...</pre> part in the page
is enough too.
It's useful for automatic processing of your data, for example with a small
script.
It's so useful, that you too can use such script to generate your html page
with a small python script from such table of numbers.


None of the ones that I'm currently using are.<

I know, but here you can see C++ benchmarks that use 300+ MB:
http://shootout.alioth.debian.org/u64/benchmark.php?test=all&lang=gpp&lang2=gpp&box=1


So you suggest I choose whichever performs best out of C or C++?<

Yep, it gives a more reliable reference. But if you don't like this suggestion
do as you like. Using C++ only too is acceptable to me.


What compiler would you recommend?<

GCC or LLVM-GCC seems fine. They aren't equal, as you may have seen from my
benchmarks. GCC is probably better, more developed and more widespread.



- I may also offer you few more benchmarks not present in the Shootout site.<<


Thanks! I'd love to add more benchmarks, 6 doesn't give a great overview.<

OK, I can probably find you 5-10 more small benchmarks.
I think a private email is better for this (or I'll put a zip somewhere and
I'll give you an URL).


No, I did not strip them. I think I might add another page, one with executable
sizes, another with stripped executable sizes.<

Stripped only versions are enough too.


- What is the trouble of nbody with gdc?<<


I can't remember off the top of my head, I seem to recall it was a linking
error though. I did try to debug it when they were originally run I didn't
manage to get anywhere with it though.<

Such trouble can probably be fixed.


- From your results it seems ldc needs more memory to run the programs. The LDC
team may take a look at this.<


There doesn't seem to be that much difference,<

This is a small Python script with data scraped manually from your page (this
is why having a raw table is useful):

data = """ldc 0.69     dmd 0.63     gdc 0.63
ldc 30.7     dmd 30.64     gdc 30.65
ldc 140.24     dmd 120.61     gdc 120.62
ldc 16.68     dmd 16.62     gdc 16.63
ldc 0.95     dmd 1.52     gdc 0.87    """

data = data.replace("ldc", "").replace("dmd", "").replace("gdc",
"").splitlines()
data = [map(float, line.split()) for line in data]
results = [int(round(sum(line))) for line in zip(*data)]
for comp_time in zip("ldc dmd gdc".split(), results):
    print "%s: %d MB" % comp_time


Its output:

ldc: 189 MB
dmd: 170 MB
gdc: 169 MB

To me it seems there's some difference.

Bye,
bearophile

Mar 15 2009

Robert Clipsham <robert octarineparrot.com> writes:

bearophile wrote:
 Robert Clipsham:
 
 I have seen you have put all graphs in a page. This is probably better. When
you have 10-20 benchmarks you may need less thick bars.
 
 You can add the raw timings, formatted into an ASCII table, a bit like this
(don't use an HTML table):
 http://zi.fi/shootout/rawresults.txt
 There's no strict need of a separated file, a <pre>...</pre> part in the page
is enough too.
 It's useful for automatic processing of your data, for example with a small
script.
 It's so useful, that you too can use such script to generate your html page
with a small python script from such table of numbers.

I've got a better idea. That page is automatically generated from an xml 
file, I'll just make that available instead.

 None of the ones that I'm currently using are.<

 I know, but here you can see C++ benchmarks that use 300+ MB:
 http://shootout.alioth.debian.org/u64/benchmark.php?test=all&lang=gpp&lang2=gpp&box=1

I would probably have to exclude tests that use that much memory, there 
isn't enough ram in my server to go much higher than 256mb in the 
benchmarks (without taking out all the services running on it first).

 So you suggest I choose whichever performs best out of C or C++?<

 
 Yep, it gives a more reliable reference. But if you don't like this suggestion
do as you like. Using C++ only too is acceptable to me.

I'll probably go all C++, we'll see what other people want though.

 What compiler would you recommend?< 

 GCC or LLVM-GCC seems fine. They aren't equal, as you may have seen from my
benchmarks. GCC is probably better, more developed and more widespread.

I'll probably go with GCC then. Again, we'll see what anyone else thinks 
first.

 OK, I can probably find you 5-10 more small benchmarks.
 I think a private email is better for this (or I'll put a zip somewhere and
I'll give you an URL).

That'd be great! Thanks.

 Stripped only versions are enough too.

But if I go with both then I've got more data up there for not much more 
effort :P

 - What is the trouble of nbody with gdc?<<


 
 I can't remember off the top of my head, I seem to recall it was a linking
error though. I did try to debug it when they were originally run I didn't
manage to get anywhere with it though.<

 
 Such trouble can probably be fixed.

Probably, I'll look into it again before the next time I run the benchmarks,

 - From your results it seems ldc needs more memory to run the programs. The
LDC team may take a look at this.<


 
 There doesn't seem to be that much difference,<

 
 This is a small Python script with data scraped manually from your page (this
is why having a raw table is useful):
 
 data = """ldc 0.69     dmd 0.63     gdc 0.63
 ldc 30.7     dmd 30.64     gdc 30.65
 ldc 140.24     dmd 120.61     gdc 120.62
 ldc 16.68     dmd 16.62     gdc 16.63
 ldc 0.95     dmd 1.52     gdc 0.87    """
 
 data = data.replace("ldc", "").replace("dmd", "").replace("gdc",
"").splitlines()
 data = [map(float, line.split()) for line in data]
 results = [int(round(sum(line))) for line in zip(*data)]
 for comp_time in zip("ldc dmd gdc".split(), results):
     print "%s: %d MB" % comp_time
 
 
 Its output:
 
 ldc: 189 MB
 dmd: 170 MB
 gdc: 169 MB
 
 To me it seems there's some difference.
 

OK, it's more difference than I saw with a quick glance... You proved me 
wrong!

Mar 15 2009

bearophile <bearophileHUGS lycos.com> writes:

Robert Clipsham:

I've got a better idea. That page is automatically generated from an xml file,
I'll just make that available instead.<

I don't like XML; a small txt table is so easy to process with three lines of
Python... :-)
(Json is fine too).


I would probably have to exclude tests that use that much memory, there isn't
enough ram in my server to go much higher than 256mb in the benchmarks (without
taking out all the services running on it first).<

Your timings may be quite noisy then. You may need more re-runs and/or much
longer timings.


That'd be great! Thanks.<

OK. I see you accept both ways.

Bye,
bearophile

Mar 15 2009

Robert Clipsham <robert octarineparrot.com> writes:

bearophile wrote:
 I don't like XML; a small txt table is so easy to process with three lines of
Python... :-)
 (Json is fine too).

If you would like to provide me with a script to convert the xml file to 
a text table, I'll happily run it and make it available to you. As it is 
I'm too lazy to write such a thing myself. The xml file will be up in 
about 5 minutes at http://dbench.octarineparrot.com/results.xml .

Mar 15 2009

naryl <cy ngs.ru> writes:

Robert Clipsham Wrote:

 bearophile wrote:
 I don't like XML; a small txt table is so easy to process with three lines of
Python... :-)
 (Json is fine too).

 
 If you would like to provide me with a script to convert the xml file to 
 a text table, I'll happily run it and make it available to you. As it is 
 I'm too lazy to write such a thing myself. The xml file will be up in 
 about 5 minutes at http://dbench.octarineparrot.com/results.xml .

I think this will suffice:

$ sed 's/<[^>]*>//g; /^$/d' < data | sed 'N; N; N; N; N; N; s/\n/ /g'

It'll strip XML tags, blank lines an remove all but every 7th line feeds.
You'll get the following output: http://stashbox.org/448426/out.txt

You can use AWK to process it. For example:
$ awk '/dmd/ {dmd += $5;}
/gdc/ {gdc += $5;}
/ldc/ {ldc += $5;}
END {print "DMD: " dmd/1024 "M\nGDC: " gdc/1024 "M\nLDC: " ldc/1024 "M"}' <
out.txt 

Outputs:
DMD: 170.672M
GDC: 169.398M
LDC: 189.977M

Mar 15 2009

bearophile <bearophileHUGS lycos.com> writes:

naryl:
 I think this will suffice:
 $ sed 's/<[^>]*>//g; /^$/d' < data | sed 'N; N; N; N; N; N; s/\n/ /g'

A Python version a little more resilient to changes in that file:

from xml.dom.minidom import parse
results1 = parse("results.xml").getElementsByTagName("results")
results = results1[0].getElementsByTagName("result")


for node in results[0].childNodes:
    if node.nodeType != node.TEXT_NODE:
        print node.localName,
print


for result in results:
    for node in result.childNodes:
        if node.nodeType != node.TEXT_NODE:
            print node.firstChild.data,
    print

Bye,
bearophile

Mar 15 2009

bearophile <bearophileHUGS lycos.com> writes:

Sorry, assuming a "tidy XML file" is silly. Better:

from xml.dom.minidom import parse

r = parse("results.xml").getElementsByTagName("results")
results = r[0].getElementsByTagName("result")


fields = [n.localName for n in results[0].childNodes if n.nodeType !=
n.TEXT_NODE]
print " ".join(fields)


for r in results:
    print " ".join(r.getElementsByTagName(f)[0].firstChild.data for f in fields)

Bye,
bearophile

Mar 15 2009

bearophile <bearophileHUGS lycos.com> writes:

Robert Clipsham, eventually your site may become like this page (it may be slow
to load, you may need to load it later too):
http://sbcl.boinkor.net/bench/

It's also useful to see how performance evolves across versions, like a brother
of bugzilla, to spot performance bugs.

Bye,
bearophile

Mar 15 2009

Robert Clipsham <robert octarineparrot.com> writes:

bearophile wrote:
 Robert Clipsham, eventually your site may become like this page (it may be
slow to load, you may need to load it later too):
 http://sbcl.boinkor.net/bench/
 
 It's also useful to see how performance evolves across versions, like a
brother of bugzilla, to spot performance bugs.
 
 Bye,
 bearophile

This looks like the way to go for the benchmarks. When I've added all 
the tests people have sent me and added a reference C++ result I will 
put the results file under revision control with the rest of the source 
code and set up a script to generate graphs to show results over time.

Mar 16 2009

bearophile <bearophileHUGS lycos.com> writes:

Robert Clipsham:
 This looks like the way to go for the benchmarks. When I've added all 
 the tests people have sent me and added a reference C++ result I will 
 put the results file under revision control with the rest of the source 
 code and set up a script to generate graphs to show results over time.

I have sent you an email with several benchmarks.

In the meantime I have seen this tiny C++ Global Illumination in 99 lines:
http://kevinbeason.com/smallpt/
There are other scenes available too:
http://kevinbeason.com/smallpt/extraScenes.txt
It's not meant to be fast, so surely there are ways to write much faster C++
code. But it's short enough, and the results are nice enough (even if slow)
that it may be translated to D, for comparison.

To run it with my MinGW I've had to add:
inline double erand48() { return rand() / (double)RAND_MAX; }
and replace erand48(...) with erand48().

The MinGW I use is based on GCC 4.3.1 but I have not used OpenMP, it's used in
this code to essentially divide the running time by the number of available
cores.

Bye,
bearophile

Mar 16 2009

D Programming

C/C++ Programming

Other

digitalmars.D - Updated D Benchmarks