www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - Using tasks without GC?

reply Chris Katko <ckatko gmail.com> writes:
When I program, it's usually videogame ideas. That implies a 
soft, real-time requirement. In general, that requires the mantra 
"allocations are evil, use object pools whenever possible." 
[storing data in static arrays and 'deleting' is usually just 
marking an entry as is_deleted=true and re-using "dead" ones.]

I'm looking through D's parallelism module and the docs state, 
up-front:

  >Creates a Task on the GC heap that calls an alias.

The modern, scalable way to make a parallel game engine uses 
tasks. (as opposed to functional decomposition where 1 thread is 
networking 1 thread is physics, etc.) That requires making LOTS 
of tasks (_per frame_!) and dispatching them. And a 60 FPS 
frametime is... 16 ms or less.

So my question is: Has anyone done any analysis over how 
"dangerous" it is to use GC'd tasks for _small_ tasks (in terms 
of milliseconds)? Is the GC going to fire off all the time and 
send jitter off the charts? Because while freeze-the-world for 20 
milliseconds would probably be unnoticable for many business 
apps, it would completely break a videogame.

I wonder how difficult it would be to either modify the existing 
parallel task codebase (or, write my own?) to use static pools 
instead. Allocate once an array of "MAX_NUM_TASKS" tasks (eating 
the minor memory hit) to prevent touching any allocation. [Even 
if it wasn't GC, allocating every frame in say, C++, is 
dangerous. malloc/new is slow and subject to fragmentation and 
permissions checks.]

Any advice, thoughts? Thanks,
--Chris Katko
Jan 03 2020
next sibling parent "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Sat, Jan 04, 2020 at 12:51:15AM +0000, Chris Katko via Digitalmars-d-learn
wrote:
[...]
 I'm looking through D's parallelism module and the docs state,
 up-front:
 
  >Creates a Task on the GC heap that calls an alias.
 
 The modern, scalable way to make a parallel game engine uses tasks.
 (as opposed to functional decomposition where 1 thread is networking 1
 thread is physics, etc.) That requires making LOTS of tasks (_per
 frame_!) and dispatching them. And a 60 FPS frametime is... 16 ms or
 less.
[...] I don't know the inner workings of std.parallelism well enough to answer your question directly, but couldn't you just spawn a bunch of workers that can be assigned tasks? So you only allocate up-front when first spawning the workers, but during the main loop you have a fixed number of workers that can receive and execute tasks. So you don't have to allocate at all after the initial up-front allocation. Generally, doing a lot of small GC allocations is bad for performance. I would avoid doing that, and use a fixed number of workers that you create, say at the beginning of every level or something, and thereafter you never deallocate them, just leave them idle if there are no tasks, otherwise have some dispatcher to assign tasks to them. T -- Latin's a dead language, as dead as can be; it killed off all the Romans, and now it's killing me! -- Schoolboy
Jan 03 2020
prev sibling next sibling parent Elronnd <elronnd elronnd.net> writes:
You can control when the gc runs.  So if you know the allocations 
are small enough that they won't OOM, then you can say 
GC.disable, and it straight up won't run at all.  But you can 
manually run a collection cycle (during a loading screen or 
whatever) with GC.collect.  See 
http://dpldocs.info/experimental-docs/core.memory.GC.html
Jan 03 2020
prev sibling parent reply dwdv <dwdv posteo.de> writes:
 Creates a Task on the GC heap that calls an alias.
If possible, there's also scopedTask, which allocates on the stack: https://dlang.org/phobos/std_parallelism.html#.scopedTask
 So my question is: Has anyone done any analysis over how "dangerous" it is to
use GC'd tasks for 
 _small_ tasks (in terms of milliseconds)?
Nothing major, but https://github.com/mratsim/weave/tree/master/ enchmarks/fibonacci puts quite a bit of pressure on various implementations. You might want to profile ./pfib 40: import std; ulong fib(uint n) { if (n < 2) return n; auto x = scopedTask!fib(n-1); // vs. Task!fib(n-1); taskPool.put(x); auto y = fib(n-2); return x.yieldForce + y; // {yield,spin,work}Force } void main(string[] args) { enforce(args.length == 2, "Usage: fib <n-th fibonacci number requested>"); auto n = args[1].to!uint; // defaultPoolThreads(totalCPUs); writefln!"fib(%d) = %d"(n, fib(n)); } At least D isn't locking up beyond 12 tasks; looking at you, stdlib-nim. :)
Jan 04 2020
parent Chris Katko <ckatko gmail.com> writes:
Thanks everyone, looks like i'll have to benchmark myself (which 
is fine) but I'm always afraid because I know "proper 
benchmarking is hard. (TM)"

Feel free to throw any other side advice in. I'm looking to get a 
broad perspective on this.

Straight up shutting off the garbage collector in exchange for 
memory is an interesting concept. But I wonder how quickly it 
will get eaten up. ALSO, if I do shut it off, when i turn it on 
is it going to take much longer? Does the GC take linear / 
quadratically more time based on the N of "items need to be 
freed"?

The thing is, I'm looking to parallelize a dedicated server for 
running MASSIVE numbers of objects (10000's or 100000's) at high 
tick rate. Worse, for my particular "game mode", the "rounds" can 
last hours. So while a slow crawl of RAM is okay, it won't be 
okay if it hits 30 GB in an hour. So that's going to be another 
"it works IF it works [in our particular game/application]" as 
opposed to "this definitely will/won't work". That's more 
"benchmarking and see" scenarios, which I'm trying to avoid as 
much as possible. You normally don't want to willfully START a 
project with a design where the only way to know if it works... 
is to bench.

There's also an option of periodically firing off (without ending 
the game) say, once every hour and tell everyone to just "live 
with it" because it's (I HOPE!) not going to be a 15/30/60 second 
delay. In my particular application, that still wouldn't be that 
disruptive. (Unless turning GC off hits max ram every couple 
minutes. Then again nobody is going to be okay with that.)

On Saturday, 4 January 2020 at 11:30:53 UTC, dwdv wrote:
 Creates a Task on the GC heap that calls an alias.
If possible, there's also scopedTask, which allocates on the stack: https://dlang.org/phobos/std_parallelism.html#.scopedTask
 So my question is: Has anyone done any analysis over how 
 "dangerous" it is to use GC'd tasks for _small_ tasks (in 
 terms of milliseconds)?
Nothing major, but https://github.com/mratsim/weave/tree/master/benchmarks/fibonacci puts quite a bit of pressure on various implementations. You might want to profile ./pfib 40: import std; ulong fib(uint n) { if (n < 2) return n; auto x = scopedTask!fib(n-1); // vs. Task!fib(n-1); taskPool.put(x); auto y = fib(n-2); return x.yieldForce + y; // {yield,spin,work}Force } void main(string[] args) { enforce(args.length == 2, "Usage: fib <n-th fibonacci number requested>"); auto n = args[1].to!uint; // defaultPoolThreads(totalCPUs); writefln!"fib(%d) = %d"(n, fib(n)); } At least D isn't locking up beyond 12 tasks; looking at you, stdlib-nim. :)
Jan 05 2020