digitalmars.D.learn - std.parallelism amap not scaling?

safety0ff (31/31) Oct 07 2013 Hello,

safety0ff (3/3) Oct 07 2013 Ok, well I re-wrote the parallelism amap into spawning/joining

safety0ff (3/3) Oct 07 2013 I think I've found the culprit: Memory managment / GC, disabling

JR (14/22) Oct 08 2013 From what I've gathered from

safety0ff (6/29) Oct 08 2013 Thank you for responding!

"safety0ff" <safety0ff.dev gmail.com> writes:

Hello,
I tried converting a c++/go ray tracing benchmark [1] to D [2].

I tried to using std.parallelism amap to implement parallelism, 
but it does not seem to scale in the manner I expect.

By running the program with different numbers of threads in the 
thread pool, I got these results (core i7 sandy bridge, 4 core 
+HT):

Threads			1	2	3	4	5	6	7	8
Real time 
(s)		34.14	26.894	21.293	20.184	19.998	25.977	34.15	39.404
User time 
(s)		62.84	65.182	65.895	70.851	78.521	111.012	157.448	173.074
System time (s)		0.27	0.562	1.276	1.596	2.178	4.008	6.588	8.652
System 
calls		155808	224084	291634	403496	404161	360065	360065	258661
System calls error	39643	80245	99000	147487	155605	142922	142922 
108454

I got these measurements using latest DMD/druntime/phobos, 
compiled with "-O -release -inline -noboundscheck"

I used time and strace -c to measure:
e.g.:
time ./main -h=256 -w=256 -t=7 > /dev/null
strace -c ./main -h=256 -w=256 -t=7 > /dev/null

What I also noticed in the task manager is that no matter what I 
did, I could not get the utilization to go anywhere close to 99% 
(unlike the C++ program in [1].)

My interpretation of these results is that std.parallelism.amap 
does significant communication between threads which causes 
issues with scaling.

[1] https://github.com/kid0m4n/rays
[2] https://github.com/Safety0ff/rays/tree/master/drays

Oct 07 2013

"safety0ff" <safety0ff.dev gmail.com> writes:

Ok, well I re-wrote the parallelism amap into spawning/joining 
threads and the results are similar, except notably less system 
calls (specifically, less futex calls.)

Oct 07 2013

"safety0ff" <safety0ff.dev gmail.com> writes:

I think I've found the culprit: Memory managment / GC, disabling 
the GC caused the program to eat up all my memory.

I'll have to look into this later.

Oct 07 2013

"JR" <zorael gmail.com> writes:

On Monday, 7 October 2013 at 21:13:53 UTC, safety0ff wrote:
 I think I've found the culprit: Memory managment / GC, 
 disabling the GC caused the program to eat up all my memory.

 I'll have to look into this later.

 From what I've gathered from 
http://forum.dlang.org/thread/dbeliopehpsncrckdfal forum.dlang.org, 
your use of enum makes it copy (and allocate) those variables on 
each access.

Quoth Dmitry Olshansky in that thread (with its slightly 
different context);
 And the answer is - don't use ENUM with ctRegex.
 The problem is that ctRegex returns you a pack of 
 datastructures (=arrays).
 Using them with enum makes it behave as if you pasted them as 
 array literals and these do allocate each time.

Merely replacing all occurences of enum with immutable seems to 
make a world of difference. I benched your main.d a bit on this 
laptop (also i7, so 4 real cores + HT); 
http://dpaste.dzfl.pl/a4ecc84f4

Note that inlining slows it down. I didn't verify its output, but 
if those numbers are true then ldmd2 -O -release -noboundscheck 
is a beast.

Oct 08 2013

"safety0ff" <safety0ff.dev gmail.com> writes:

On Tuesday, 8 October 2013 at 10:54:08 UTC, JR wrote:
 On Monday, 7 October 2013 at 21:13:53 UTC, safety0ff wrote:
 I think I've found the culprit: Memory managment / GC, 
 disabling the GC caused the program to eat up all my memory.

 I'll have to look into this later.

 From what I've gathered from 
 http://forum.dlang.org/thread/dbeliopehpsncrckdfal forum.dlang.org, 
 your use of enum makes it copy (and allocate) those variables 
 on each access.

 Quoth Dmitry Olshansky in that thread (with its slightly 
 different context);
 And the answer is - don't use ENUM with ctRegex.
 The problem is that ctRegex returns you a pack of 
 datastructures (=arrays).
 Using them with enum makes it behave as if you pasted them as 
 array literals and these do allocate each time.

 Merely replacing all occurences of enum with immutable seems to 
 make a world of difference. I benched your main.d a bit on this 
 laptop (also i7, so 4 real cores + HT); 
 http://dpaste.dzfl.pl/a4ecc84f4

 Note that inlining slows it down. I didn't verify its output, 
 but if those numbers are true then ldmd2 -O -release 
 -noboundscheck is a beast.

Thank you for responding!
I went ahead and stub'ed the gc as per: 
http://forum.dlang.org/thread/fbjeivugntvudgopyfll forum.dlang.org
and ended coming to the same thread/conclusion.

Enum creating hidden allocations is evil. :(

Oct 08 2013

D Programming

C/C++ Programming

Other

digitalmars.D.learn - std.parallelism amap not scaling?