digitalmars.D.learn - Phobos threads performance
- bearophile (21/21) Jul 20 2008 I have taken a look at the Chameneos-redux multithread benchmarks, the e...
- The Anh Tran (6/37) Jul 21 2008 Don't use win to write Alioth game. :(
- The Anh Tran (6/6) Jul 21 2008 Clarification: the accepted threadring.d is written in win. I let 503
- bearophile (16/21) Jul 21 2008 I don't fully agree. I think a portable enough language has to allow you...
- The Anh Tran (8/19) Jul 21 2008 I come from Visual C++ world. And just learn D in about 10 days. So i
- bearophile (8/12) Jul 21 2008 The Shootout site allows more than on version for each benchmark, if the...
- The Anh Tran (1/21) Jul 21 2008
- bearophile (7/8) Jul 21 2008 With that you have lost cross-OS compatibility :-]
- bearophile (4/4) Jul 22 2008 I have cleaned your code and I have submitted it, but I don't know yet i...
- The Anh Tran (67/67) Jul 21 2008 This is my newest threadring.d for the threadring game:
I have taken a look at the Chameneos-redux multithread benchmarks, the explanations are are the bottom of this page: http://shootout.alioth.debian.org/gp4/benchmark.php?test=chameneosredux&lang=all (I think I have created a Psyco version almost 2X faster than the Python one). This is the D + Phobos working implementation: http://shootout.alioth.debian.org/gp4/benchmark.php?test=chameneosredux&lang=dlang&id=0 On my Win PC with N = 1_000_000 that D version runs in about 10 seconds. My CPU has two cores, but the CPU usage is about 70-75% (while both the Java and C++ version push the cores to 100%). This is a C++ version, that I think looks very close to the D version (I think they are both translations of the Java version): https://alioth.debian.org/tracker/download.php/30402/411646/310955/2682/chame.cpp To run it on Windows I have used: ftp://sources.redhat.com/pub/pthreads-win32/prebuilt-dll-2-8-0-release/ Added files to MinGW: pthread.h sched.h semaphore.h libpthreadGC2.a Compiled code with: g++ -O3 -s -mthreads chame.cpp -o chame -lpthreadGC2 With still n = 1_000_000 this C++ code runs in about 1.13 seconds. Do you know why is the C++ so much faster, and why the D version doesn't uses the two cores fully? Bye, bearophile
Jul 20 2008
Don't use win to write Alioth game. :( WaitForSingleObject is _much_ slower than pthread_mutex_lock I'm still changing here & there in chame.d Hope it'll better. D version allocate mem during the meeting loop. I omitted that alloc in C++ ver. bearophile wrote:I have taken a look at the Chameneos-redux multithread benchmarks, the explanations are are the bottom of this page: http://shootout.alioth.debian.org/gp4/benchmark.php?test=chameneosredux&lang=all (I think I have created a Psyco version almost 2X faster than the Python one). This is the D + Phobos working implementation: http://shootout.alioth.debian.org/gp4/benchmark.php?test=chameneosredux&lang=dlang&id=0 On my Win PC with N = 1_000_000 that D version runs in about 10 seconds. My CPU has two cores, but the CPU usage is about 70-75% (while both the Java and C++ version push the cores to 100%). This is a C++ version, that I think looks very close to the D version (I think they are both translations of the Java version): https://alioth.debian.org/tracker/download.php/30402/411646/310955/2682/chame.cpp To run it on Windows I have used: ftp://sources.redhat.com/pub/pthreads-win32/prebuilt-dll-2-8-0-release/ Added files to MinGW: pthread.h sched.h semaphore.h libpthreadGC2.a Compiled code with: g++ -O3 -s -mthreads chame.cpp -o chame -lpthreadGC2 With still n = 1_000_000 this C++ code runs in about 1.13 seconds. Do you know why is the C++ so much faster, and why the D version doesn't uses the two cores fully?Bye, bearophile
Jul 21 2008
Clarification: the accepted threadring.d is written in win. I let 503 threads free roam. On my Pentium M 2200Mhz. 10.000.000 only costs ~10s. But on their P4, the result is 330s. If i changed to mutex, in win, it'll slower. But in linux, much faster. But they haven't accepted my new solution.
Jul 21 2008
The Anh Tran:Don't use win to write Alioth game. :(I don't fully agree. I think a portable enough language has to allow you to compile the program on different operating systems and give you the same results. D wants to be a quite portable language. So this is one more test for the language itself. Java generally allows me to do that, as Python. But as you may have seen this time I have found the Psyco version may give different results (but maybe the error is mine somewhere), so I may give up on that. Your D version works correctly on Win too.WaitForSingleObject is _much_ slower than pthread_mutex_lock I'm still changing here & there in chame.d Hope it'll better.So how much faster is the Tango version of yours?D version allocate mem during the meeting loop. I omitted that alloc in C++ ver.Can't you avoid the same allocation with D? Some notes: - Even if most people in this D newsgroups ignore the Shootout site, lot of people take a look at that site when they want to choose what language to use, so developing fast programs for that site is an important advertising. Haskell people have understood this very well, you can see it from the amount of work given in those benchmarks, they have even changed their language to improve results in some of those benchmarks: http://www.haskell.org/haskellwiki/Great_language_shootout - Many times I have found the Shootout site useful to learn pieces of the syntax of other languages. So I think it has a very big pedagogical purpose too. Because it shows you real non banal algorithms implemented in a very efficient way in lot of different languages. So you have to write your code well, because lot of people will learn from your code. - Very often you can find performance problems in your language looking at how it performs compared to other languages. Here for example the threading in Phobos seems various times slower than the C++ version, that in the meantime was posted: http://shootout.alioth.debian.org/gp4/benchmark.php?test=chameneosredux&lang=gpp&id=0 As you can see the C++ version takes 16.7 s, while your D version needs 41 s. Bye, bearophile
Jul 21 2008
I come from Visual C++ world. And just learn D in about 10 days. So i screwed up many times in D code :/ Perhap i'll posted here for peers commence first. bearophile wrote:The Anh Tran:I completely agree with you. But sometime the same code gives big surprise. Ie: threadring game.Don't use win to write Alioth game. :(I don't fully agree. I think a portable enough language has to allow you to compile the program on different operating systems and give you the same results. D wants to be a quite portable language. So this is one more test for the language itself. Java generally allows me to do that, as Python. But as you may have seen this time I have found the Psyco version may give different results (but maybe the error is mine somewhere), so I may give up on that. Your D version works correctly on Win too.Can't you avoid the same allocation with D?Yes :). The D ver was targeting code beauty, not speed. :|- Very often you can find performance problems in your language looking at how it performs compared to other languages. Here for example the threading in Phobos seems various times slower than the C++ version, that in the meantime was posted: http://shootout.alioth.debian.org/gp4/benchmark.php?test=chameneosredux&lang=gpp&id=0 As you can see the C++ version takes 16.7 s, while your D version needs 41 s.I have no idea. They only differ by 2 mem alloc calls.
Jul 21 2008
The Anh Tran:But sometime the same code gives big surprise. Ie: threadring game.This surprises can be useful to you to learn and to the designers of D to debug it (I'm assuming such libraries have to work on both operating systems).The Shootout site allows more than on version for each benchmark, if their purpose is different. For example here you can see two D versions: http://shootout.alioth.debian.org/gp4/benchmark.php?test=binarytrees&lang=all One is mine, its purpose is to have a short and high level code, the other is for speed. The main purpose of the site is to compare speed, but it compares memory used and code complexity too.Can't you avoid the same allocation with D?Yes :). The D ver was targeting code beauty, not speed. :|I have no idea. They only differ by 2 mem alloc calls.Maybe the thread module of Phobos has some problems. But I suggest you to write a version without the memory allocations, to see how it performs. The garbage collector of D may be the fault too here. Bye and thank you, bearophile
Jul 21 2008
bearophile wrote:The Anh Tran:But sometime the same code gives big surprise. Ie: threadring game.This surprises can be useful to you to learn and to the designers of D to debug it (I'm assuming such libraries have to work on both operating systems).The Shootout site allows more than on version for each benchmark, if their purpose is different. For example here you can see two D versions: http://shootout.alioth.debian.org/gp4/benchmark.php?test=binarytrees&lang=all One is mine, its purpose is to have a short and high level code, the other is for speed. The main purpose of the site is to compare speed, but it compares memory used and code complexity too.Can't you avoid the same allocation with D?Yes :). The D ver was targeting code beauty, not speed. :|I have no idea. They only differ by 2 mem alloc calls.Maybe the thread module of Phobos has some problems. But I suggest you to write a version without the memory allocations, to see how it performs. The garbage collector of D may be the fault too here. Bye and thank you, bearophile
Jul 21 2008
The Anh Tran: ...import std.c.linux.pthread;With that you have lost cross-OS compatibility :-] (I can't run it at the moment). Anyway, you may reformat your code a bit, test that it works correctly, and submit it to the Shootoout, to see if they like it. Bye, bearophile
Jul 21 2008
I have cleaned your code and I have submitted it, but I don't know yet if it works correctly on their PC: https://alioth.debian.org/tracker/download.php/30402/411646/310968/2693/chamene2.d Bye, bearophile
Jul 22 2008
This is my newest threadring.d for the threadring game: http://shootout.alioth.debian.org/gp4/benchmark.php?test=threadring&lang=all Need D expert commences. Many thanks. module ThreadRing; import std.stdio : writefln; import std.conv : toInt; import std.c.linux.pthread; import std.c.stdlib : exit; const uint NUM_THREADS = 503; const uint STACK_SIZE = 16*1024; int token = -1; bool finished = false; extern (C) { // static array, local data, should be better for L2 cache pthread_mutex_t[NUM_THREADS] mutex; // again, local data is better for P4 small L2 cache char[STACK_SIZE][NUM_THREADS] stacks; void* thread_func( void *num ) { int thisnode = cast(int)num; int nextnode = ( thisnode + 1 ) % NUM_THREADS; while (true) { pthread_mutex_lock( &(mutex[ thisnode ]) ); if ( token > 0 ) // branch prediction as taken { token--; pthread_mutex_unlock( &(mutex[ nextnode ]) ); } else { writefln( thisnode +1 ); exit(0); } } return null; } } int main(string[] args) { try { token = toInt(args[1]); } catch (Exception e) { token = 1000; // test case } pthread_t cthread; pthread_attr_t stack_attr; pthread_attr_init(&stack_attr); for (int i = 0; i < NUM_THREADS; i++) { pthread_mutex_init( &(mutex[ i ]), null); pthread_mutex_lock( &(mutex[ i ]) ); // manual set stack space & stack size for each thread // stack space is allocated closely together pthread_attr_setstack( &stack_attr, &(stacks[i]), STACK_SIZE ); pthread_create( &cthread, &stack_attr, &thread_func, cast(void*)i ); } // start game pthread_mutex_unlock( &(mutex[0]) ); // wait for result pthread_join( cthread, null ); return 1; }
Jul 21 2008