digitalmars.D.learn - std.parallelism amap not scaling?
- safety0ff (31/31) Oct 07 2013 Hello,
Hello, I tried converting a c++/go ray tracing benchmark [1] to D [2]. I tried to using std.parallelism amap to implement parallelism, but it does not seem to scale in the manner I expect. By running the program with different numbers of threads in the thread pool, I got these results (core i7 sandy bridge, 4 core +HT): Threads 1 2 3 4 5 6 7 8 Real time (s) 34.14 26.894 21.293 20.184 19.998 25.977 34.15 39.404 User time (s) 62.84 65.182 65.895 70.851 78.521 111.012 157.448 173.074 System time (s) 0.27 0.562 1.276 1.596 2.178 4.008 6.588 8.652 System calls 155808 224084 291634 403496 404161 360065 360065 258661 System calls error 39643 80245 99000 147487 155605 142922 142922 108454 I got these measurements using latest DMD/druntime/phobos, compiled with "-O -release -inline -noboundscheck" I used time and strace -c to measure: e.g.: time ./main -h=256 -w=256 -t=7 > /dev/null strace -c ./main -h=256 -w=256 -t=7 > /dev/null What I also noticed in the task manager is that no matter what I did, I could not get the utilization to go anywhere close to 99% (unlike the C++ program in [1].) My interpretation of these results is that std.parallelism.amap does significant communication between threads which causes issues with scaling. [1] https://github.com/kid0m4n/rays [2] https://github.com/Safety0ff/rays/tree/master/drays
Oct 07 2013
Ok, well I re-wrote the parallelism amap into spawning/joining threads and the results are similar, except notably less system calls (specifically, less futex calls.)
Oct 07 2013
I think I've found the culprit: Memory managment / GC, disabling the GC caused the program to eat up all my memory. I'll have to look into this later.
Oct 07 2013
On Monday, 7 October 2013 at 21:13:53 UTC, safety0ff wrote:I think I've found the culprit: Memory managment / GC, disabling the GC caused the program to eat up all my memory. I'll have to look into this later.From what I've gathered from http://forum.dlang.org/thread/dbeliopehpsncrckdfal forum.dlang.org, your use of enum makes it copy (and allocate) those variables on each access. Quoth Dmitry Olshansky in that thread (with its slightly different context);And the answer is - don't use ENUM with ctRegex. The problem is that ctRegex returns you a pack of datastructures (=arrays). Using them with enum makes it behave as if you pasted them as array literals and these do allocate each time.Merely replacing all occurences of enum with immutable seems to make a world of difference. I benched your main.d a bit on this laptop (also i7, so 4 real cores + HT); http://dpaste.dzfl.pl/a4ecc84f4 Note that inlining slows it down. I didn't verify its output, but if those numbers are true then ldmd2 -O -release -noboundscheck is a beast.
Oct 08 2013
On Tuesday, 8 October 2013 at 10:54:08 UTC, JR wrote:On Monday, 7 October 2013 at 21:13:53 UTC, safety0ff wrote:Thank you for responding! I went ahead and stub'ed the gc as per: http://forum.dlang.org/thread/fbjeivugntvudgopyfll forum.dlang.org and ended coming to the same thread/conclusion. Enum creating hidden allocations is evil. :(I think I've found the culprit: Memory managment / GC, disabling the GC caused the program to eat up all my memory. I'll have to look into this later.From what I've gathered from http://forum.dlang.org/thread/dbeliopehpsncrckdfal forum.dlang.org, your use of enum makes it copy (and allocate) those variables on each access. Quoth Dmitry Olshansky in that thread (with its slightly different context);And the answer is - don't use ENUM with ctRegex. The problem is that ctRegex returns you a pack of datastructures (=arrays). Using them with enum makes it behave as if you pasted them as array literals and these do allocate each time.Merely replacing all occurences of enum with immutable seems to make a world of difference. I benched your main.d a bit on this laptop (also i7, so 4 real cores + HT); http://dpaste.dzfl.pl/a4ecc84f4 Note that inlining slows it down. I didn't verify its output, but if those numbers are true then ldmd2 -O -release -noboundscheck is a beast.
Oct 08 2013