digitalmars.D - Comparing Parallelization in HPC with D, Chapel, and Go

anon (1/1) Nov 21 2014 https://www.academia.edu/3982638/A_Study_of_Successive_Over-relaxation_S...

bearophile (7/8) Nov 21 2014 Thank you for the link, it's very uncommon to see papers that use

Kapps (5/13) Nov 21 2014 The flags make it likely that DMD was used (-O -inline -release).

bearophile (4/6) Nov 21 2014 But I use ldmd2 all the time with those arguments :-)
Russel Winder via Digitalmars-d (28/45) Nov 22 2014 Sorry, I must have missed this thread earlier, hopefully I am not

Sean Kelly (4/4) Nov 22 2014 Yes, I'd be curious to see the code. I also suspect that the

Ziad Hatahet via Digitalmars-d (3/4) Nov 23 2014 Keep us posted!
Russel Winder via Digitalmars-d (10/18) Nov 24 2014 Author replied. He is issuing source code on a bilateral pseudo-NDA. I
ixid (9/26) Nov 24 2014 Whenever there is a benchmark like this the D community outlines
Russel Winder via Digitalmars-d (17/25) Nov 25 2014 The author is currently having a vacation. He has though sent me the

Sparsh Mittal (6/6) Dec 10 2014 I am author of the paper "A Study of Successive Over-relaxation

bearophile (9/15) Dec 10 2014 What compiler, compiler version, and compilation arguments did

Sparsh Mittal (13/13) Dec 12 2014 Thanks for your interest. The users are welcome to make

Marco Leise (10/12) Nov 21 2014 Did they upload the source code and input data somewhere?
Horse (4/5) Nov 21 2014 Here is another where they compare Chapel, Go, Cilk and TBB.
Andrei Amatuni (5/5) Nov 23 2014 This prompted me to google for recent academic papers on D, which

Ziad Hatahet via Digitalmars-d (4/6) Nov 24 2014 Not even remotely rigorous. One has to wonder about the quality of the
Craig Dillabaugh (5/11) Nov 24 2014 My main take away from that paper was that C is much slower than

Russel Winder via Digitalmars-d (13/17) Nov 24 2014 On Mon, 2014-11-24 at 11:53 +0000, Craig Dillabaugh via Digitalmars-d
Nemanja Boric (3/19) Nov 24 2014 :-)

"anon" <anonymous gmail.com> writes:

https://www.academia.edu/3982638/A_Study_of_Successive_Over-relaxation_SOR_Method_Parallelization_Over_Modern_HPC_Languages

Nov 21 2014

"bearophile" <bearophileHUGS lycos.com> writes:

anon:

 https://www.academia.edu/3982638/A_Study_of_Successive_Over-relaxation_SOR_Method_Parallelization_Over_Modern_HPC_Languages

Thank you for the link, it's very uncommon to see papers that use 
D. But where's the D/Go/Chapel source code? What's the 
compiler/version used? (When you do floating point benchmarks 
there's a huge difference between LDC2 and DMD).

Bye,
bearophile

Nov 21 2014

"Kapps" <opantm2+spam gmail.com> writes:

On Friday, 21 November 2014 at 21:53:00 UTC, bearophile wrote:
 anon:

 https://www.academia.edu/3982638/A_Study_of_Successive_Over-relaxation_SOR_Method_Parallelization_Over_Modern_HPC_Languages

 Thank you for the link, it's very uncommon to see papers that 
 use D. But where's the D/Go/Chapel source code? What's the 
 compiler/version used? (When you do floating point benchmarks 
 there's a huge difference between LDC2 and DMD).

 Bye,
 bearophile

The flags make it likely that DMD was used (-O -inline -release). 
IIRC there were some problems with DMD that made it not perform 
too well in these types of benchmarks that use std.parallelism. 
Results would likely have been noticeably better with GDC or LDC.

Nov 21 2014

"bearophile" <bearophileHUGS lycos.com> writes:

Kapps:

 The flags make it likely that DMD was used (-O -inline 
 -release).

But I use ldmd2 all the time with those arguments :-)

Bye,
bearophile

Nov 21 2014

Russel Winder via Digitalmars-d <digitalmars-d puremagic.com> writes:

On Fri, 2014-11-21 at 22:57 +0000, Kapps via Digitalmars-d wrote:
 On Friday, 21 November 2014 at 21:53:00 UTC, bearophile wrote:
 anon:

 https://www.academia.edu/3982638/A_Study_of_Successive_Over-relaxation=



_SOR_Method_Parallelization_Over_Modern_HPC_Languages
 Thank you for the link, it's very uncommon to see papers that=20
 use D. But where's the D/Go/Chapel source code? What's the=20
 compiler/version used? (When you do floating point benchmarks=20
 there's a huge difference between LDC2 and DMD).

 Bye,
 bearophile

=20
 The flags make it likely that DMD was used (-O -inline -release).=20
 IIRC there were some problems with DMD that made it not perform=20
 too well in these types of benchmarks that use std.parallelism.=20
 Results would likely have been noticeably better with GDC or LDC.

Sorry, I must have missed this thread earlier, hopefully I am not
late ;-)

=46rom a quick scan there appears to be no mention of how many cores on
the test machine. Maybe there were only 4?

Hopefully they were using ldc2 and not dmd.

I suspect they we using gc and not gccgo.

The words used about the implementations imply there could be a lot
better realizations of their algorithms in the three languages. Without
actual code though there is very little to be said. I believe it should
be a requirement of academic, and indeed non-academic, publishing of any
work involving timings that the code be made available. Without the code
there is no reproducibility and reproducibility is a cornerstone of
scientific method.

On the upside using Chapel, D and Go shows forward looking. I wonder
about X10 and C++. Not to mention Rust, Java, Groovy, and Python.

I have emailed the author.

--=20
Russel.
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D
Dr Russel Winder      t: +44 20 7585 2200   voip: sip:russel.winder ekiga.n=
et
41 Buckmaster Road    m: +44 7770 465 077   xmpp: russel winder.org.uk
London SW11 1EN, UK   w: www.russel.org.uk  skype: russel_winder

Nov 22 2014

"Sean Kelly" <sean invisibleduck.org> writes:

Yes, I'd be curious to see the code.  I also suspect that the 
functionality in core may not be sufficiently advertised.  At one 
point he mentions using yieldForce to simulate a barrier, which 
suggests he wasn't aware of core.sync.barrier.

Nov 22 2014

Ziad Hatahet via Digitalmars-d <digitalmars-d puremagic.com> writes:

On Sat, Nov 22, 2014 at 7:17 AM, Russel Winder via Digitalmars-d <
digitalmars-d puremagic.com> wrote:

 I have emailed the author.

Keep us posted!

Nov 23 2014

Russel Winder via Digitalmars-d <digitalmars-d puremagic.com> writes:

On Sun, 2014-11-23 at 13:09 -0800, Ziad Hatahet via Digitalmars-d wrote:
 On Sat, Nov 22, 2014 at 7:17 AM, Russel Winder via Digitalmars-d <
 digitalmars-d puremagic.com> wrote:
 
 I have emailed the author.

 
 Keep us posted!

Author replied. He is issuing source code on a bilateral pseudo-NDA. I
will read it to ensure no hidden problems later this evening, and then
reply. Most likely affirmative…

-- 
Russel.
=============================================================================
Dr Russel Winder      t: +44 20 7585 2200   voip: sip:russel.winder ekiga.net
41 Buckmaster Road    m: +44 7770 465 077   xmpp: russel winder.org.uk
London SW11 1EN, UK   w: www.russel.org.uk  skype: russel_winder

Nov 24 2014

"ixid" <nuaccount gmail.com> writes:

On Friday, 21 November 2014 at 22:57:44 UTC, Kapps wrote:
 On Friday, 21 November 2014 at 21:53:00 UTC, bearophile wrote:
 anon:

 https://www.academia.edu/3982638/A_Study_of_Successive_Over-relaxation_SOR_Method_Parallelization_Over_Modern_HPC_Languages

 Thank you for the link, it's very uncommon to see papers that 
 use D. But where's the D/Go/Chapel source code? What's the 
 compiler/version used? (When you do floating point benchmarks 
 there's a huge difference between LDC2 and DMD).

 Bye,
 bearophile

 The flags make it likely that DMD was used (-O -inline 
 -release). IIRC there were some problems with DMD that made it 
 not perform too well in these types of benchmarks that use 
 std.parallelism. Results would likely have been noticeably 
 better with GDC or LDC.

Whenever there is a benchmark like this the D community outlines 
a number of obvious to arcane speedups. Our house needs to be in 
order such that the obvious choice is at least competitive to the 
speed claims made for D. DMD particularly, while not optimisation 
focused, should improve its floating point speed and avoid 
surprising 80 bit floating point behaviours, or at least try to 
be surprising in a manner more in line with what users of other 
languages are used to.

Nov 24 2014

Russel Winder via Digitalmars-d <digitalmars-d puremagic.com> writes:

On Sun, 2014-11-23 at 13:09 -0800, Ziad Hatahet via Digitalmars-d wrote:
 On Sat, Nov 22, 2014 at 7:17 AM, Russel Winder via Digitalmars-d <
 digitalmars-d puremagic.com> wrote:
=20
 I have emailed the author.

=20
 Keep us posted!

The author is currently having a vacation. He has though sent me the
codes. I shall review them and report back to him, not publicly at this
stage. When back from vacation his intention is to set the codes up for
public availability, and hence wider review and debate. At this point
the D community (and I hope the Go community) at large will be able to
constructively chip in suggestions.
--=20
Russel.
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D
Dr Russel Winder      t: +44 20 7585 2200   voip: sip:russel.winder ekiga.n=
et
41 Buckmaster Road    m: +44 7770 465 077   xmpp: russel winder.org.uk
London SW11 1EN, UK   w: www.russel.org.uk  skype: russel_winder

Nov 25 2014

"Sparsh Mittal" <invalid emailaddress.com> writes:

I am author of the paper "A Study of Successive Over-relaxation
Method Parallelization Over Modern HPC Languages".

The code has been made available for academic use at
https://www.academia.edu/9709444/Source_code_of_Parallel_and_Serial_Red-Black_SOR_Implementation_in_Chapel_D_and_Go_Languages

Questions and comments can be sent to my email address [although
note that use of software does not imply support].

Dec 10 2014

"bearophile" <bearophileHUGS lycos.com> writes:

Sparsh Mittal:

 I am author of the paper "A Study of Successive Over-relaxation
 Method Parallelization Over Modern HPC Languages".

 The code has been made available for academic use at
 https://www.academia.edu/9709444/Source_code_of_Parallel_and_Serial_Red-Black_SOR_Implementation_in_Chapel_D_and_Go_Languages

 Questions and comments can be sent to my email address [although
 note that use of software does not imply support].

What compiler, compiler version, and compilation arguments did
you use for the D code? (For such kind of benchmarks the DMD
compiler is the wrong compiler to use).

I have improved and made more idiomatic the serial version of the
D code:
http://dpaste.dzfl.pl/a6743f2eceda

Bye,
bearophile

Dec 10 2014

"Sparsh Mittal" <invalid emailaddress.com> writes:

Thanks for your interest. The users are welcome to make 
improvements to the code and use in their research. Chapel, D and 
Go are all relatively new languages and certainly many 
optimizations are possible with them.

As shown in the paper, I ran the D code with "-inline -O 
-release". I ran the experiments when I was at Iowa State. We had 
departmental servers http://it.engineering.iastate.edu/remote/ 
and I ran the experiments on those with 24 cores (note that this 
link is very frequently updated to show the servers which are 
online).

Now I have moved from there and don't have access to the 
computer.  I am sorry that I don't exactly remember/know answers 
to the other questions.

Dec 12 2014

Marco Leise <Marco.Leise gmx.de> writes:

Am Fri, 21 Nov 2014 21:29:09 +0000
schrieb "anon" <anonymous gmail.com>:

 
 https://www.academia.edu/3982638/A_Study_of_Successive_Over-relaxation_SOR_Method_Parallelization_Over_Modern_HPC_Languages

Did they upload the source code and input data somewhere?
It looks like Chapel and D scale badly with number of threads
while Go makes excellent use of CPU cores and while executing
slower beats the other two   >= 8 threads.

Then again they could have had much higher speed if they used
a GPU driven approach.

-- 
Marco

Nov 21 2014

"Horse" <gohater gmail.com> writes:

On Friday, 21 November 2014 at 21:29:10 UTC, anon wrote:
 https://www.academia.edu/3982638/A_Study_of_Successive_Over-relaxation_SOR_Method_Parallelization_Over_Modern_HPC_Languages


  Here is another where they compare Chapel, Go, Cilk and TBB.
http://arxiv.org/pdf/1302.2837.pdf

  Conclusion: TBB is the best..

Nov 21 2014

"Andrei Amatuni" <andrei.amatuni gmail.com> writes:

This prompted me to google for recent academic papers on D, which
led me to this:

http://research.ijcaonline.org/volume104/number7/pxc3898921.pdf

not exactly the most rigorous research, but it's pretty
favorable...

Nov 23 2014

Ziad Hatahet via Digitalmars-d <digitalmars-d puremagic.com> writes:

On Sun, Nov 23, 2014 at 7:48 PM, Andrei Amatuni via Digitalmars-d <
digitalmars-d puremagic.com> wrote:

 not exactly the most rigorous research, but it's pretty
 favorable...

Not even remotely rigorous. One has to wonder about the quality of the
conference into which this paper was accepted.

Nov 24 2014

"Craig Dillabaugh" <craig.dillabaugh gmail.com> writes:

On Monday, 24 November 2014 at 03:48:27 UTC, Andrei Amatuni wrote:
 This prompted me to google for recent academic papers on D, 
 which
 led me to this:

 http://research.ijcaonline.org/volume104/number7/pxc3898921.pdf

 not exactly the most rigorous research, but it's pretty
 favorable...

My main take away from that paper was that C is much slower than 
Java :o)

Based on those results it likely would have been trounced by 
Python or Ruby too.

Nov 24 2014

Russel Winder via Digitalmars-d <digitalmars-d puremagic.com> writes:

On Mon, 2014-11-24 at 11:53 +0000, Craig Dillabaugh via Digitalmars-d
wrote:
[…]
 My main take away from that paper was that C is much slower than 
 Java :o)

This can happen!

 Based on those results it likely would have been trounced by 
 Python or Ruby too.

I don't know about Ruby, but Python can now be more or less as fast as C
and C++. I am not joking on this one, even my π by quadrature codes can
show Python running computational loops as fast.

-- 
Russel.
=============================================================================
Dr Russel Winder      t: +44 20 7585 2200   voip: sip:russel.winder ekiga.net
41 Buckmaster Road    m: +44 7770 465 077   xmpp: russel winder.org.uk
London SW11 1EN, UK   w: www.russel.org.uk  skype: russel_winder

Nov 24 2014

"Nemanja Boric" <4burgos gmail.com> writes:

 Compilers and interpreters used
 Turbo C++ IDE

:-)

On Monday, 24 November 2014 at 11:53:08 UTC, Craig Dillabaugh 
wrote:
 On Monday, 24 November 2014 at 03:48:27 UTC, Andrei Amatuni 
 wrote:
 This prompted me to google for recent academic papers on D, 
 which
 led me to this:

 http://research.ijcaonline.org/volume104/number7/pxc3898921.pdf

 not exactly the most rigorous research, but it's pretty
 favorable...

 My main take away from that paper was that C is much slower 
 than Java :o)

 Based on those results it likely would have been trounced by 
 Python or Ruby too.

Nov 24 2014

D Programming

C/C++ Programming

Other

digitalmars.D - Comparing Parallelization in HPC with D, Chapel, and Go