digitalmars.D - Comparing Parallelization in HPC with D, Chapel, and Go
- anon (1/1) Nov 21 2014 https://www.academia.edu/3982638/A_Study_of_Successive_Over-relaxation_S...
- bearophile (7/8) Nov 21 2014 Thank you for the link, it's very uncommon to see papers that use
- Kapps (5/13) Nov 21 2014 The flags make it likely that DMD was used (-O -inline -release).
- bearophile (4/6) Nov 21 2014 But I use ldmd2 all the time with those arguments :-)
- Russel Winder via Digitalmars-d (28/45) Nov 22 2014 Sorry, I must have missed this thread earlier, hopefully I am not
- Sean Kelly (4/4) Nov 22 2014 Yes, I'd be curious to see the code. I also suspect that the
- Ziad Hatahet via Digitalmars-d (3/4) Nov 23 2014 Keep us posted!
- Russel Winder via Digitalmars-d (10/18) Nov 24 2014 Author replied. He is issuing source code on a bilateral pseudo-NDA. I
- ixid (9/26) Nov 24 2014 Whenever there is a benchmark like this the D community outlines
- Russel Winder via Digitalmars-d (17/25) Nov 25 2014 The author is currently having a vacation. He has though sent me the
- Sparsh Mittal (6/6) Dec 10 2014 I am author of the paper "A Study of Successive Over-relaxation
- bearophile (9/15) Dec 10 2014 What compiler, compiler version, and compilation arguments did
- Sparsh Mittal (13/13) Dec 12 2014 Thanks for your interest. The users are welcome to make
- Marco Leise (10/12) Nov 21 2014 Did they upload the source code and input data somewhere?
- Horse (4/5) Nov 21 2014 Here is another where they compare Chapel, Go, Cilk and TBB.
- Andrei Amatuni (5/5) Nov 23 2014 This prompted me to google for recent academic papers on D, which
- Ziad Hatahet via Digitalmars-d (4/6) Nov 24 2014 Not even remotely rigorous. One has to wonder about the quality of the
- Craig Dillabaugh (5/11) Nov 24 2014 My main take away from that paper was that C is much slower than
- Russel Winder via Digitalmars-d (13/17) Nov 24 2014 On Mon, 2014-11-24 at 11:53 +0000, Craig Dillabaugh via Digitalmars-d
- Nemanja Boric (3/19) Nov 24 2014 :-)
anon:https://www.academia.edu/3982638/A_Study_of_Successive_Over-relaxation_SOR_Method_Parallelization_Over_Modern_HPC_LanguagesThank you for the link, it's very uncommon to see papers that use D. But where's the D/Go/Chapel source code? What's the compiler/version used? (When you do floating point benchmarks there's a huge difference between LDC2 and DMD). Bye, bearophile
Nov 21 2014
On Friday, 21 November 2014 at 21:53:00 UTC, bearophile wrote:anon:The flags make it likely that DMD was used (-O -inline -release). IIRC there were some problems with DMD that made it not perform too well in these types of benchmarks that use std.parallelism. Results would likely have been noticeably better with GDC or LDC.https://www.academia.edu/3982638/A_Study_of_Successive_Over-relaxation_SOR_Method_Parallelization_Over_Modern_HPC_LanguagesThank you for the link, it's very uncommon to see papers that use D. But where's the D/Go/Chapel source code? What's the compiler/version used? (When you do floating point benchmarks there's a huge difference between LDC2 and DMD). Bye, bearophile
Nov 21 2014
Kapps:The flags make it likely that DMD was used (-O -inline -release).But I use ldmd2 all the time with those arguments :-) Bye, bearophile
Nov 21 2014
On Fri, 2014-11-21 at 22:57 +0000, Kapps via Digitalmars-d wrote:On Friday, 21 November 2014 at 21:53:00 UTC, bearophile wrote:_SOR_Method_Parallelization_Over_Modern_HPC_Languagesanon:https://www.academia.edu/3982638/A_Study_of_Successive_Over-relaxation=Sorry, I must have missed this thread earlier, hopefully I am not late ;-) =46rom a quick scan there appears to be no mention of how many cores on the test machine. Maybe there were only 4? Hopefully they were using ldc2 and not dmd. I suspect they we using gc and not gccgo. The words used about the implementations imply there could be a lot better realizations of their algorithms in the three languages. Without actual code though there is very little to be said. I believe it should be a requirement of academic, and indeed non-academic, publishing of any work involving timings that the code be made available. Without the code there is no reproducibility and reproducibility is a cornerstone of scientific method. On the upside using Chapel, D and Go shows forward looking. I wonder about X10 and C++. Not to mention Rust, Java, Groovy, and Python. I have emailed the author. --=20 Russel. =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D Dr Russel Winder t: +44 20 7585 2200 voip: sip:russel.winder ekiga.n= et 41 Buckmaster Road m: +44 7770 465 077 xmpp: russel winder.org.uk London SW11 1EN, UK w: www.russel.org.uk skype: russel_winderThank you for the link, it's very uncommon to see papers that=20 use D. But where's the D/Go/Chapel source code? What's the=20 compiler/version used? (When you do floating point benchmarks=20 there's a huge difference between LDC2 and DMD). Bye, bearophile=20 The flags make it likely that DMD was used (-O -inline -release).=20 IIRC there were some problems with DMD that made it not perform=20 too well in these types of benchmarks that use std.parallelism.=20 Results would likely have been noticeably better with GDC or LDC.
Nov 22 2014
Yes, I'd be curious to see the code. I also suspect that the functionality in core may not be sufficiently advertised. At one point he mentions using yieldForce to simulate a barrier, which suggests he wasn't aware of core.sync.barrier.
Nov 22 2014
On Sat, Nov 22, 2014 at 7:17 AM, Russel Winder via Digitalmars-d < digitalmars-d puremagic.com> wrote:I have emailed the author.Keep us posted!
Nov 23 2014
On Sun, 2014-11-23 at 13:09 -0800, Ziad Hatahet via Digitalmars-d wrote:On Sat, Nov 22, 2014 at 7:17 AM, Russel Winder via Digitalmars-d < digitalmars-d puremagic.com> wrote:Author replied. He is issuing source code on a bilateral pseudo-NDA. I will read it to ensure no hidden problems later this evening, and then reply. Most likely affirmative… -- Russel. ============================================================================= Dr Russel Winder t: +44 20 7585 2200 voip: sip:russel.winder ekiga.net 41 Buckmaster Road m: +44 7770 465 077 xmpp: russel winder.org.uk London SW11 1EN, UK w: www.russel.org.uk skype: russel_winderI have emailed the author.Keep us posted!
Nov 24 2014
On Friday, 21 November 2014 at 22:57:44 UTC, Kapps wrote:On Friday, 21 November 2014 at 21:53:00 UTC, bearophile wrote:Whenever there is a benchmark like this the D community outlines a number of obvious to arcane speedups. Our house needs to be in order such that the obvious choice is at least competitive to the speed claims made for D. DMD particularly, while not optimisation focused, should improve its floating point speed and avoid surprising 80 bit floating point behaviours, or at least try to be surprising in a manner more in line with what users of other languages are used to.anon:The flags make it likely that DMD was used (-O -inline -release). IIRC there were some problems with DMD that made it not perform too well in these types of benchmarks that use std.parallelism. Results would likely have been noticeably better with GDC or LDC.https://www.academia.edu/3982638/A_Study_of_Successive_Over-relaxation_SOR_Method_Parallelization_Over_Modern_HPC_LanguagesThank you for the link, it's very uncommon to see papers that use D. But where's the D/Go/Chapel source code? What's the compiler/version used? (When you do floating point benchmarks there's a huge difference between LDC2 and DMD). Bye, bearophile
Nov 24 2014
On Sun, 2014-11-23 at 13:09 -0800, Ziad Hatahet via Digitalmars-d wrote:On Sat, Nov 22, 2014 at 7:17 AM, Russel Winder via Digitalmars-d < digitalmars-d puremagic.com> wrote: =20The author is currently having a vacation. He has though sent me the codes. I shall review them and report back to him, not publicly at this stage. When back from vacation his intention is to set the codes up for public availability, and hence wider review and debate. At this point the D community (and I hope the Go community) at large will be able to constructively chip in suggestions. --=20 Russel. =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D Dr Russel Winder t: +44 20 7585 2200 voip: sip:russel.winder ekiga.n= et 41 Buckmaster Road m: +44 7770 465 077 xmpp: russel winder.org.uk London SW11 1EN, UK w: www.russel.org.uk skype: russel_winderI have emailed the author.=20 Keep us posted!
Nov 25 2014
I am author of the paper "A Study of Successive Over-relaxation Method Parallelization Over Modern HPC Languages". The code has been made available for academic use at https://www.academia.edu/9709444/Source_code_of_Parallel_and_Serial_Red-Black_SOR_Implementation_in_Chapel_D_and_Go_Languages Questions and comments can be sent to my email address [although note that use of software does not imply support].
Dec 10 2014
Sparsh Mittal:I am author of the paper "A Study of Successive Over-relaxation Method Parallelization Over Modern HPC Languages". The code has been made available for academic use at https://www.academia.edu/9709444/Source_code_of_Parallel_and_Serial_Red-Black_SOR_Implementation_in_Chapel_D_and_Go_Languages Questions and comments can be sent to my email address [although note that use of software does not imply support].What compiler, compiler version, and compilation arguments did you use for the D code? (For such kind of benchmarks the DMD compiler is the wrong compiler to use). I have improved and made more idiomatic the serial version of the D code: http://dpaste.dzfl.pl/a6743f2eceda Bye, bearophile
Dec 10 2014
Thanks for your interest. The users are welcome to make improvements to the code and use in their research. Chapel, D and Go are all relatively new languages and certainly many optimizations are possible with them. As shown in the paper, I ran the D code with "-inline -O -release". I ran the experiments when I was at Iowa State. We had departmental servers http://it.engineering.iastate.edu/remote/ and I ran the experiments on those with 24 cores (note that this link is very frequently updated to show the servers which are online). Now I have moved from there and don't have access to the computer. I am sorry that I don't exactly remember/know answers to the other questions.
Dec 12 2014
Am Fri, 21 Nov 2014 21:29:09 +0000 schrieb "anon" <anonymous gmail.com>:https://www.academia.edu/3982638/A_Study_of_Successive_Over-relaxation_SOR_Method_Parallelization_Over_Modern_HPC_LanguagesDid they upload the source code and input data somewhere? It looks like Chapel and D scale badly with number of threads while Go makes excellent use of CPU cores and while executing slower beats the other two >= 8 threads. Then again they could have had much higher speed if they used a GPU driven approach. -- Marco
Nov 21 2014
On Friday, 21 November 2014 at 21:29:10 UTC, anon wrote:https://www.academia.edu/3982638/A_Study_of_Successive_Over-relaxation_SOR_Method_Parallelization_Over_Modern_HPC_LanguagesHere is another where they compare Chapel, Go, Cilk and TBB. http://arxiv.org/pdf/1302.2837.pdf Conclusion: TBB is the best..
Nov 21 2014
This prompted me to google for recent academic papers on D, which led me to this: http://research.ijcaonline.org/volume104/number7/pxc3898921.pdf not exactly the most rigorous research, but it's pretty favorable...
Nov 23 2014
On Sun, Nov 23, 2014 at 7:48 PM, Andrei Amatuni via Digitalmars-d < digitalmars-d puremagic.com> wrote:not exactly the most rigorous research, but it's pretty favorable...Not even remotely rigorous. One has to wonder about the quality of the conference into which this paper was accepted.
Nov 24 2014
On Monday, 24 November 2014 at 03:48:27 UTC, Andrei Amatuni wrote:This prompted me to google for recent academic papers on D, which led me to this: http://research.ijcaonline.org/volume104/number7/pxc3898921.pdf not exactly the most rigorous research, but it's pretty favorable...My main take away from that paper was that C is much slower than Java :o) Based on those results it likely would have been trounced by Python or Ruby too.
Nov 24 2014
On Mon, 2014-11-24 at 11:53 +0000, Craig Dillabaugh via Digitalmars-d wrote: […]My main take away from that paper was that C is much slower than Java :o)This can happen!Based on those results it likely would have been trounced by Python or Ruby too.I don't know about Ruby, but Python can now be more or less as fast as C and C++. I am not joking on this one, even my π by quadrature codes can show Python running computational loops as fast. -- Russel. ============================================================================= Dr Russel Winder t: +44 20 7585 2200 voip: sip:russel.winder ekiga.net 41 Buckmaster Road m: +44 7770 465 077 xmpp: russel winder.org.uk London SW11 1EN, UK w: www.russel.org.uk skype: russel_winder
Nov 24 2014
Compilers and interpreters used Turbo C++ IDE:-) On Monday, 24 November 2014 at 11:53:08 UTC, Craig Dillabaugh wrote:On Monday, 24 November 2014 at 03:48:27 UTC, Andrei Amatuni wrote:This prompted me to google for recent academic papers on D, which led me to this: http://research.ijcaonline.org/volume104/number7/pxc3898921.pdf not exactly the most rigorous research, but it's pretty favorable...My main take away from that paper was that C is much slower than Java :o) Based on those results it likely would have been trounced by Python or Ruby too.
Nov 24 2014