digitalmars.D - Thread GC non "stop-the-world"
- Oscar Martin (66/66) Sep 22 2014 The cost of using the current GC in D, although beneficial for
- Rikki Cattermole (39/106) Sep 22 2014 Short, I dislike pretty much all changes to __gshared/shared.
- Oscar Martin (16/21) Sep 23 2014 Yeah, these changes break many things, and so are not suitable
- "Marc =?UTF-8?B?U2Now7x0eiI=?= <schuetzm gmx.net> (4/26) Sep 24 2014 There can also be a shared _and_ a local GC at the same time, and
- Oscar Martin (14/44) Sep 24 2014 Yes, a shared GC should be a possibility, but how you avoid the
- Wyatt (10/28) Sep 25 2014 This thread reminds me again of a paper I read a few months ago
- Oscar Martin (2/33) Sep 25 2014 An interesting paper. Thank you very much
- Sean Kelly (4/7) Sep 25 2014 Pretty much for reasons of being able to call C functions and
- Kagamin (5/13) Nov 21 2014 BTW, C usually accepts data only for reading, and writes mostly
- Sean Kelly (5/20) Nov 21 2014 "usually" isn't sufficient if you're trying to make a GC that
- Kagamin (4/4) Nov 21 2014 I believe I have never seen such C function. What can it do to
- Kagamin (5/16) Sep 23 2014 A use case, which comes to mind: a game saves progress to the
- "Marc =?UTF-8?B?U2Now7x0eiI=?= <schuetzm gmx.net> (3/22) Sep 23 2014 This can be done without sharing. Of course, a uniqueness concept
- Kagamin (2/2) Sep 23 2014 The question is how thread-local GC will account for data passed
- "Marc =?UTF-8?B?U2Now7x0eiI=?= <schuetzm gmx.net> (2/4) Sep 23 2014 std.concurrency.send() could notify the GC.
- Kagamin (1/1) Sep 23 2014 And what GC does? Pins the allocated blocks for another thread?
- "Marc =?UTF-8?B?U2Now7x0eiI=?= <schuetzm gmx.net> (5/6) Sep 23 2014 Assuming there is one thread-local GC per thread, it transfers
- Oscar Martin (7/13) Sep 23 2014 Yes. A mechanism for transfer of responsibility and pins would be
- "Marc =?UTF-8?B?U2Now7x0eiI=?= <schuetzm gmx.net> (7/23) Sep 24 2014 Physically moving the objects is not necessary, it only needs to
- David Nadlinger (10/12) Sep 23 2014 I was briefly discussing this with Andrei at (I think) DConf
- Sean Kelly (4/10) Sep 23 2014 ... and casting to invariant, since invariant is implicitly
- Oscar Martin (8/20) Sep 23 2014 Yes, it could be a palliative measure, and yes, it require
- Kagamin (12/15) Sep 24 2014 Yes, that sounds expensive. A real example from my work: client
- Sean Kelly (4/20) Sep 24 2014 Large allocations are the easy case, as the allocation lives in
- Kagamin (7/9) Sep 25 2014 Dataset is not a contiguous object. It's like an in-memory
- Oscar Martin (5/21) Sep 24 2014 Yes, that's the problem I see with the shared GC. But I think
- Kagamin (5/10) Sep 25 2014 You might want to look at Nimrod. AFAIK, it uses thread-local GC
- deadalnix (4/6) Sep 23 2014 I don't think you clearly understand what thread local means.
- Rainer Schuetze (32/75) Sep 28 2014 https://github.com/rainers/druntime/gcx_precise2
The cost of using the current GC in D, although beneficial for many types of programs, is unaffordable for programs such as games, etc... that need to perform repetitive tasks every short periods of time. The fact that a GC.malloc/realloc on any thread can trigger a memory collection that stop ALL threads of the program for a variable time prevents it. Conversations in the forum as "RFC: reference Counted Throwable", "Escaping the Tyranny of the GC: std.rcstring, first blood" and the nogc attribute show that this is increasingly perceived as a problem. Besides the ever-recurring "reference counting", many people propose to improve the current implementation of GC. Rainer Schuetze developed a concurrent GC in Windows: http://rainers.github.io/visuald/druntime/concurrentgc.html With some/a lot of work and a little help compiler (currently it indicates by a flag if a class/structure contains pointers/references to other classes/structures, it could increase this support to indicate which fields are pointers/references) we could implement a semi-incremental-generational-copying GC-conservative like: http://www.hboehm.info/gc/ or http://www.ravenbrook.com/project/mps/ Being incremental, they try to minimize the "stop-the-world" phase. But even with an advanced GC, as programs become more complex and use more memory, pause time also increases. See for example (I know it's not normal case, but in a few years ...) http://blog.mgm-tp.com/2014/04/controlling-gc-pauses-with-g1-collector (*) What if: - It is forbidden for "__gshared" have references/pointers to objects allocated by the GC (if the compiler can help with this prohibition, perfect, if not the developer have to know what he is doing) - "shared" types are not allocated by the GC (they could be reference counted or manually released or ...) - "immutable" types are no longer implicitly "shared" In short, the memory accessible from multiple threads is not managed by the GC. With these restrictions each thread would have its "I_Allocator", whose default implementation would be an incremental-generational-semi-conservative-copying GC, with no inteference with any of the other program threads (it should be responsible only for the memory reserved for that thread). Other implementations of "I_Allocator" could be based on Andrei's allocators. With "setThreadAllocator" (similar to current gc_setProxy) you could switch between the different implementations if you need. Threads with critical time requirements could work with an implementation of "I_Allocator" not based on the GC. It would be possible simulate scoped classes: { setThreadAllocator(I_Allocator_pseudo_stack) scope(exit) { I_Allocator_pseudo_stack.deleteAll(); setThreadAllocator(I_Allocator_gc); } auto obj = MyClass(); ... // Destructor are called and memory released } Obviously changes (*) break compatibility with existing code, and therefore maybe they are not appropriate for D2. Also these are general ideas, sure these changes lead to other problems. But the point I want to convey is that in my opinion, while these problems are solvable, a language for "system programming" is incompatible with shared data managed by a GC Thoughts?
Sep 22 2014
On Tuesday, 23 September 2014 at 00:15:51 UTC, Oscar Martin wrote:The cost of using the current GC in D, although beneficial for many types of programs, is unaffordable for programs such as games, etc... that need to perform repetitive tasks every short periods of time. The fact that a GC.malloc/realloc on any thread can trigger a memory collection that stop ALL threads of the program for a variable time prevents it. Conversations in the forum as "RFC: reference Counted Throwable", "Escaping the Tyranny of the GC: std.rcstring, first blood" and the nogc attribute show that this is increasingly perceived as a problem. Besides the ever-recurring "reference counting", many people propose to improve the current implementation of GC. Rainer Schuetze developed a concurrent GC in Windows: http://rainers.github.io/visuald/druntime/concurrentgc.html With some/a lot of work and a little help compiler (currently it indicates by a flag if a class/structure contains pointers/references to other classes/structures, it could increase this support to indicate which fields are pointers/references) we could implement a semi-incremental-generational-copying GC-conservative like: http://www.hboehm.info/gc/ or http://www.ravenbrook.com/project/mps/ Being incremental, they try to minimize the "stop-the-world" phase. But even with an advanced GC, as programs become more complex and use more memory, pause time also increases. See for example (I know it's not normal case, but in a few years ...) http://blog.mgm-tp.com/2014/04/controlling-gc-pauses-with-g1-collector (*) What if: - It is forbidden for "__gshared" have references/pointers to objects allocated by the GC (if the compiler can help with this prohibition, perfect, if not the developer have to know what he is doing) - "shared" types are not allocated by the GC (they could be reference counted or manually released or ...) - "immutable" types are no longer implicitly "shared" In short, the memory accessible from multiple threads is not managed by the GC. With these restrictions each thread would have its "I_Allocator", whose default implementation would be an incremental-generational-semi-conservative-copying GC, with no inteference with any of the other program threads (it should be responsible only for the memory reserved for that thread). Other implementations of "I_Allocator" could be based on Andrei's allocators. With "setThreadAllocator" (similar to current gc_setProxy) you could switch between the different implementations if you need. Threads with critical time requirements could work with an implementation of "I_Allocator" not based on the GC. It would be possible simulate scoped classes: { setThreadAllocator(I_Allocator_pseudo_stack) scope(exit) { I_Allocator_pseudo_stack.deleteAll(); setThreadAllocator(I_Allocator_gc); } auto obj = MyClass(); ... // Destructor are called and memory released } Obviously changes (*) break compatibility with existing code, and therefore maybe they are not appropriate for D2. Also these are general ideas, sure these changes lead to other problems. But the point I want to convey is that in my opinion, while these problems are solvable, a language for "system programming" is incompatible with shared data managed by a GC Thoughts?Short, I dislike pretty much all changes to __gshared/shared. Breaks too many things. Atleast with Cmsed, (I'm evil here) where I use __gshared essentially as a read only variable but modifiable when starting up (to modify need synchronized, to read doesn't). I have already suggested before in threads something similar to what your suggesting with regards to setting allocator except: The memory manager is in a stack. Default is GC e.g. the current one. Compiler knows which pointers escapes. Can pass to pure functions however. with(myAllocator) { // myAllocator.opWithIn ...//allocate } // myAllocator.opWithCanFree // myAllocator.opWithOut class MyAllocator : Allocator { override void opWithIn(string func = __FUNCTION__, int line = __LINE__) { GC.pushAllocator(this); } override void opWithCanFree(void** freeablePointers) { //... } override void opWithOut(string func = __FUNCTION__, int line = __LINE__) { GC.popAllocator(); } void* alloc(size_t amount) { return ...; } void free(void*) { //... } } You may have something about thread allocators though. Humm druntime would already need changes so maybe. Ehh this really needs a DIP instead of me whining. If I do it, ETA December.
Sep 22 2014
On Tuesday, 23 September 2014 at 01:58:50 UTC, Rikki Cattermole wrote:Short, I dislike pretty much all changes to __gshared/shared. Breaks too many things. Atleast with Cmsed, (I'm evil here) where I use __gshared essentially as a read only variable but modifiable when starting up (to modify need synchronized, to read doesn't).Yeah, these changes break many things, and so are not suitable for D2. My intention was only to point out how expensive is for the GC to deal with shared memory. Come to think a little more: what if each thread can have its own GC, but by default all use the current GC (this would require minimal changes to druntime). "__gshared", "shared" and "immutable", continue as now, which does not break anything. If I as a programmer take care of managing (either manually or through reference counting) all of the shared memory ("__gshared", "shared" or "immutable") that can be referenced from multiple threads, I could replace in my program the global GC by a indiviual thread GC I'll try to implement a GC optimized for a thread and try that solution
Sep 23 2014
On Tuesday, 23 September 2014 at 18:39:09 UTC, Oscar Martin wrote:On Tuesday, 23 September 2014 at 01:58:50 UTC, Rikki Cattermole wrote:There can also be a shared _and_ a local GC at the same time, and a thread could opt from the shared GC (or choose not to opt in by not allocating from the shared heap).Short, I dislike pretty much all changes to __gshared/shared. Breaks too many things. Atleast with Cmsed, (I'm evil here) where I use __gshared essentially as a read only variable but modifiable when starting up (to modify need synchronized, to read doesn't).Yeah, these changes break many things, and so are not suitable for D2. My intention was only to point out how expensive is for the GC to deal with shared memory. Come to think a little more: what if each thread can have its own GC, but by default all use the current GC (this would require minimal changes to druntime). "__gshared", "shared" and "immutable", continue as now, which does not break anything. If I as a programmer take care of managing (either manually or through reference counting) all of the shared memory ("__gshared", "shared" or "immutable") that can be referenced from multiple threads, I could replace in my program the global GC by a indiviual thread GC I'll try to implement a GC optimized for a thread and try that solution
Sep 24 2014
On Wednesday, 24 September 2014 at 08:13:15 UTC, Marc Schütz wrote:On Tuesday, 23 September 2014 at 18:39:09 UTC, Oscar Martin wrote:Yes, a shared GC should be a possibility, but how you avoid the "stop-the-world" phase for that GC? Obviously this pause can be minimized by performing the most work out of that phase, but after seeing the test of other people on internet about advanced GCs (java, .net) I do not think it's enough for some programs But hey, I guess it's enough to cover the greatest number of cases. My goal is to start implementing the thread GC. Then I will do testing of performance and pauses (my program requires managing audio every 10 ms) and then I might dare to implement the shared GC, which is obviously more complex if desired to minimize the pauses. We'll see what the outcomeOn Tuesday, 23 September 2014 at 01:58:50 UTC, Rikki Cattermole wrote:There can also be a shared _and_ a local GC at the same time, and a thread could opt from the shared GC (or choose not to opt in by not allocating from the shared heap).Short, I dislike pretty much all changes to __gshared/shared. Breaks too many things. Atleast with Cmsed, (I'm evil here) where I use __gshared essentially as a read only variable but modifiable when starting up (to modify need synchronized, to read doesn't).Yeah, these changes break many things, and so are not suitable for D2. My intention was only to point out how expensive is for the GC to deal with shared memory. Come to think a little more: what if each thread can have its own GC, but by default all use the current GC (this would require minimal changes to druntime). "__gshared", "shared" and "immutable", continue as now, which does not break anything. If I as a programmer take care of managing (either manually or through reference counting) all of the shared memory ("__gshared", "shared" or "immutable") that can be referenced from multiple threads, I could replace in my program the global GC by a indiviual thread GC I'll try to implement a GC optimized for a thread and try that solution
Sep 24 2014
On Wednesday, 24 September 2014 at 20:15:52 UTC, Oscar Martin wrote:On Wednesday, 24 September 2014 at 08:13:15 UTC, Marc Schütz wrote:This thread reminds me again of a paper I read a few months ago with a clever way of dealing with the sharing problem while maintaining performance: https://research.microsoft.com/en-us/um/people/simonpj/papers/parallel/local-gc.pdf The caveat for D being this design requires read and write barriers and I'm pretty sure I recall correctly that those have been vetoed several times for complexity. -WyattThere can also be a shared _and_ a local GC at the same time, and a thread could opt from the shared GC (or choose not to opt in by not allocating from the shared heap).Yes, a shared GC should be a possibility, but how you avoid the "stop-the-world" phase for that GC? Obviously this pause can be minimized by performing the most work out of that phase, but after seeing the test of other people on internet about advanced GCs (java, .net) I do not think it's enough for some programs But hey, I guess it's enough to cover the greatest number of cases. My goal is to start implementing the thread GC. Then I will do testing of performance and pauses (my program requires managing audio every 10 ms) and then I might dare to implement the shared GC, which is obviously more complex if desired to minimize the pauses. We'll see what the outcome
Sep 25 2014
On Thursday, 25 September 2014 at 13:55:42 UTC, Wyatt wrote:On Wednesday, 24 September 2014 at 20:15:52 UTC, Oscar Martin wrote:An interesting paper. Thank you very muchOn Wednesday, 24 September 2014 at 08:13:15 UTC, Marc Schütz wrote:This thread reminds me again of a paper I read a few months ago with a clever way of dealing with the sharing problem while maintaining performance: https://research.microsoft.com/en-us/um/people/simonpj/papers/parallel/local-gc.pdf The caveat for D being this design requires read and write barriers and I'm pretty sure I recall correctly that those have been vetoed several times for complexity. -WyattThere can also be a shared _and_ a local GC at the same time, and a thread could opt from the shared GC (or choose not to opt in by not allocating from the shared heap).Yes, a shared GC should be a possibility, but how you avoid the "stop-the-world" phase for that GC? Obviously this pause can be minimized by performing the most work out of that phase, but after seeing the test of other people on internet about advanced GCs (java, .net) I do not think it's enough for some programs But hey, I guess it's enough to cover the greatest number of cases. My goal is to start implementing the thread GC. Then I will do testing of performance and pauses (my program requires managing audio every 10 ms) and then I might dare to implement the shared GC, which is obviously more complex if desired to minimize the pauses. We'll see what the outcome
Sep 25 2014
On Thursday, 25 September 2014 at 13:55:42 UTC, Wyatt wrote:The caveat for D being this design requires read and write barriers and I'm pretty sure I recall correctly that those have been vetoed several times for complexity.Pretty much for reasons of being able to call C functions and inline asm code. Memory barriers may still be possible in these scenarios, but they would be extremely expensive.
Sep 25 2014
On Thursday, 25 September 2014 at 21:59:15 UTC, Sean Kelly wrote:On Thursday, 25 September 2014 at 13:55:42 UTC, Wyatt wrote:BTW, C usually accepts data only for reading, and writes mostly strings and buffers - plain data without pointers. In both cases it doesn't need to notify GC (as far as I understand write barriers).The caveat for D being this design requires read and write barriers and I'm pretty sure I recall correctly that those have been vetoed several times for complexity.Pretty much for reasons of being able to call C functions and inline asm code. Memory barriers may still be possible in these scenarios, but they would be extremely expensive.
Nov 21 2014
On Friday, 21 November 2014 at 10:24:09 UTC, Kagamin wrote:On Thursday, 25 September 2014 at 21:59:15 UTC, Sean Kelly wrote:"usually" isn't sufficient if you're trying to make a GC that doesn't collect live data. It's possible that we could do something around calls to extern (C) functions that accept a type containing pointers, but I'd have to give this some thought.On Thursday, 25 September 2014 at 13:55:42 UTC, Wyatt wrote:BTW, C usually accepts data only for reading, and writes mostly strings and buffers - plain data without pointers. In both cases it doesn't need to notify GC (as far as I understand write barriers).The caveat for D being this design requires read and write barriers and I'm pretty sure I recall correctly that those have been vetoed several times for complexity.Pretty much for reasons of being able to call C functions and inline asm code. Memory barriers may still be possible in these scenarios, but they would be extremely expensive.
Nov 21 2014
I believe I have never seen such C function. What can it do to screw managed memory? It usually requires allocation of new memory from GC and assigning it to an old object, but (true) C function is very unlikely to allocate memory from GC.
Nov 21 2014
On Tuesday, 23 September 2014 at 00:15:51 UTC, Oscar Martin wrote:http://blog.mgm-tp.com/2014/04/controlling-gc-pauses-with-g1-collector (*) What if: - It is forbidden for "__gshared" have references/pointers to objects allocated by the GC (if the compiler can help with this prohibition, perfect, if not the developer have to know what he is doing) - "shared" types are not allocated by the GC (they could be reference counted or manually released or ...) - "immutable" types are no longer implicitly "shared" In short, the memory accessible from multiple threads is not managed by the GC.A use case, which comes to mind: a game saves progress to the server, the main thread prepares data to be saved (a relatively lightweight operation) and hands it over to another thread, which saves the data in background. How would you do it?
Sep 23 2014
On Tuesday, 23 September 2014 at 09:04:48 UTC, Kagamin wrote:On Tuesday, 23 September 2014 at 00:15:51 UTC, Oscar Martin wrote:This can be done without sharing. Of course, a uniqueness concept would be needed.http://blog.mgm-tp.com/2014/04/controlling-gc-pauses-with-g1-collector (*) What if: - It is forbidden for "__gshared" have references/pointers to objects allocated by the GC (if the compiler can help with this prohibition, perfect, if not the developer have to know what he is doing) - "shared" types are not allocated by the GC (they could be reference counted or manually released or ...) - "immutable" types are no longer implicitly "shared" In short, the memory accessible from multiple threads is not managed by the GC.A use case, which comes to mind: a game saves progress to the server, the main thread prepares data to be saved (a relatively lightweight operation) and hands it over to another thread, which saves the data in background. How would you do it?
Sep 23 2014
The question is how thread-local GC will account for data passed to another thread.
Sep 23 2014
On Tuesday, 23 September 2014 at 10:38:29 UTC, Kagamin wrote:The question is how thread-local GC will account for data passed to another thread.std.concurrency.send() could notify the GC.
Sep 23 2014
And what GC does? Pins the allocated blocks for another thread?
Sep 23 2014
On Tuesday, 23 September 2014 at 15:23:16 UTC, Kagamin wrote:And what GC does? Pins the allocated blocks for another thread?Assuming there is one thread-local GC per thread, it transfers responsibility of the allocated data from the sender to the receiver. This means, the old GC doesn't need to scan it any more, but the new one does.
Sep 23 2014
On Tuesday, 23 September 2014 at 15:28:30 UTC, Marc Schütz wrote:On Tuesday, 23 September 2014 at 15:23:16 UTC, Kagamin wrote:Yes. A mechanism for transfer of responsibility and pins would be needed. Basically we have to think that a thread GC just look for roots on his stack/registers and managed memory, and may move the managed objects in a collection, so a reference used in another thread may become invalid for that other thread anytimeAnd what GC does? Pins the allocated blocks for another thread?Assuming there is one thread-local GC per thread, it transfers responsibility of the allocated data from the sender to the receiver. This means, the old GC doesn't need to scan it any more, but the new one does.
Sep 23 2014
On Tuesday, 23 September 2014 at 18:53:04 UTC, Oscar Martin wrote:On Tuesday, 23 September 2014 at 15:28:30 UTC, Marc Schütz wrote:Physically moving the objects is not necessary, it only needs to "move" the responsibility. With some work, it might even be possible to move the objects' metadata from the old to the new heap, so that each GC would only need to access its own heap during a scan, which avoids synchronization with the other threads.On Tuesday, 23 September 2014 at 15:23:16 UTC, Kagamin wrote:Yes. A mechanism for transfer of responsibility and pins would be needed. Basically we have to think that a thread GC just look for roots on his stack/registers and managed memory, and may move the managed objects in a collection, so a reference used in another thread may become invalid for that other thread anytimeAnd what GC does? Pins the allocated blocks for another thread?Assuming there is one thread-local GC per thread, it transfers responsibility of the allocated data from the sender to the receiver. This means, the old GC doesn't need to scan it any more, but the new one does.
Sep 24 2014
On Tuesday, 23 September 2014 at 10:38:29 UTC, Kagamin wrote:The question is how thread-local GC will account for data passed to another thread.I was briefly discussing this with Andrei at (I think) DConf 2013. I suggested moving data to a separate global GC heap on casting stuff to shared. Assigning types with indirections to a __gshared variable might also trigger this, unless we can find a better design. IIRC, Andrei dismissed this as impractical due to the overhead and need for precise scanning. I still like to think that it would be worth it, though, even if I can't spare the time for looking into an implementation right now. David
Sep 23 2014
On Tuesday, 23 September 2014 at 16:47:09 UTC, David Nadlinger wrote:On Tuesday, 23 September 2014 at 10:38:29 UTC, Kagamin wrote:... and casting to invariant, since invariant is implicitly shared (insert cuss-words here).The question is how thread-local GC will account for data passed to another thread.I was briefly discussing this with Andrei at (I think) DConf 2013. I suggested moving data to a separate global GC heap on casting stuff to shared.
Sep 23 2014
On Tuesday, 23 September 2014 at 16:47:09 UTC, David Nadlinger wrote:On Tuesday, 23 September 2014 at 10:38:29 UTC, Kagamin wrote:Yes, it could be a palliative measure, and yes, it require precise scanning. I do not think it is easy to implement on the stack. And in any case I believe the problem is to have multiple references to the same object from different threads, which forces you to "stop-the-world". That problem still existThe question is how thread-local GC will account for data passed to another thread.I was briefly discussing this with Andrei at (I think) DConf 2013. I suggested moving data to a separate global GC heap on casting stuff to shared. Assigning types with indirections to a __gshared variable might also trigger this, unless we can find a better design. IIRC, Andrei dismissed this as impractical due to the overhead and need for precise scanning. I still like to think that it would be worth it, though, even if I can't spare the time for looking into an implementation right now. David
Sep 23 2014
On Tuesday, 23 September 2014 at 16:47:09 UTC, David Nadlinger wrote:I was briefly discussing this with Andrei at (I think) DConf 2013. I suggested moving data to a separate global GC heap on casting stuff to shared.Yes, that sounds expensive. A real example from my work: client receives big dataset (~1GB) from server in a background thread, builds and checks constraints and indexes (which is sort of expensive too; RBTree) and hands it over to the main thread. And client machine is not quite powerful for frequent marshaling of such big dataset, handling it at all is enough of a problem. If you copied it twice, you have 3GB working set, and GC needs somewhat 2x reserve, raising memory requirements to 6GB, without dup requirements are 1-2GB. Also when you trigger collection during copying to shared GC, what it does, stops the world again?
Sep 24 2014
On Wednesday, 24 September 2014 at 11:59:52 UTC, Kagamin wrote:On Tuesday, 23 September 2014 at 16:47:09 UTC, David Nadlinger wrote:Large allocations are the easy case, as the allocation lives in its own pool and you can just move the entire pool. Copying objects is the tricky part...I was briefly discussing this with Andrei at (I think) DConf 2013. I suggested moving data to a separate global GC heap on casting stuff to shared.Yes, that sounds expensive. A real example from my work: client receives big dataset (~1GB) from server in a background thread, builds and checks constraints and indexes (which is sort of expensive too; RBTree) and hands it over to the main thread. And client machine is not quite powerful for frequent marshaling of such big dataset, handling it at all is enough of a problem. If you copied it twice, you have 3GB working set, and GC needs somewhat 2x reserve, raising memory requirements to 6GB, without dup requirements are 1-2GB. Also when you trigger collection during copying to shared GC, what it does, stops the world again?
Sep 24 2014
On Wednesday, 24 September 2014 at 14:36:13 UTC, Sean Kelly wrote:Large allocations are the easy case, as the allocation lives in its own pool and you can just move the entire pool.Dataset is not a contiguous object. It's like an in-memory database: tables can added or removed from it, rows can be added or removed from tables, fields in rows can be set with various values. In the end a dataset is a collection of a big number of relatively small objects, big datasets are collections of big numbers of objects.
Sep 25 2014
On Wednesday, 24 September 2014 at 11:59:52 UTC, Kagamin wrote:On Tuesday, 23 September 2014 at 16:47:09 UTC, David Nadlinger wrote:Yes, that's the problem I see with the shared GC. But I think cases like this should be solved "easily" with a mechanism for transfer of responsibility between thread GCs. The truly problematic cases are shared objects with roots in various threadsI was briefly discussing this with Andrei at (I think) DConf 2013. I suggested moving data to a separate global GC heap on casting stuff to shared.Yes, that sounds expensive. A real example from my work: client receives big dataset (~1GB) from server in a background thread, builds and checks constraints and indexes (which is sort of expensive too; RBTree) and hands it over to the main thread. And client machine is not quite powerful for frequent marshaling of such big dataset, handling it at all is enough of a problem. If you copied it twice, you have 3GB working set, and GC needs somewhat 2x reserve, raising memory requirements to 6GB, without dup requirements are 1-2GB. Also when you trigger collection during copying to shared GC, what it does, stops the world again?
Sep 24 2014
On Wednesday, 24 September 2014 at 20:24:01 UTC, Oscar Martin wrote:Yes, that's the problem I see with the shared GC. But I think cases like this should be solved "easily" with a mechanism for transfer of responsibility between thread GCs. The truly problematic cases are shared objects with roots in various threadsYou might want to look at Nimrod. AFAIK, it uses thread-local GC and thread groups are planned to be introduced; as I understand, shared GC will stop only threads in a group and not other groups.
Sep 25 2014
On Tuesday, 23 September 2014 at 10:38:29 UTC, Kagamin wrote:The question is how thread-local GC will account for data passed to another thread.I don't think you clearly understand what thread local means. Also, there is reason why I'm beating the drum to get an ownership type qualifier. So you can transfer ownership.
Sep 23 2014
On 23.09.2014 02:15, Oscar Martin wrote:With some/a lot of work and a little help compiler (currently it indicates by a flag if a class/structure contains pointers/references to other classes/structures, it could increase this support to indicate which fields are pointers/references)https://github.com/rainers/druntime/gcx_precise2 we could implement asemi-incremental-generational-copying GC-conservative like: http://www.hboehm.info/gc/ or http://www.ravenbrook.com/project/mps/ Being incremental, they try to minimize the "stop-the-world" phase. But even with an advanced GC, as programs become more complex and use more memory, pause time also increases. See for example (I know it's not normal case, but in a few years ...)As others have already mentioned, incremental GCs need read/write barriers. There is currently resistence to implementing these in the compiler, the alternative in the library is using page protection, but this is very coarse. "semi-incremental-generational-copying" is probably asking to much in one step ;-)http://blog.mgm-tp.com/2014/04/controlling-gc-pauses-with-g1-collector (*) What if: - It is forbidden for "__gshared" have references/pointers to objects allocated by the GC (if the compiler can help with this prohibition, perfect, if not the developer have to know what he is doing) - "shared" types are not allocated by the GC (they could be reference counted or manually released or ...)shared objects will eventually contain references to other objects that you don't want to handle manually (e.g. string). That means you will have to add the memory range of the shared object to some GC for scanning. Back to square one...- "immutable" types are no longer implicitly "shared" In short, the memory accessible from multiple threads is not managed by the GC.Is the compiler meant to help via the type system? I don't think this works as AFAIK the recommended way to work with shared objects is to cast away shared after synchronizing on some mutex: class C { void doWork() { /*...*/ } void doSharedWork() shared { synchronized(someMutex) { C self = cast(C)this; self.doWork(); } } } Maybe I missed other patterns to use shared (apart from atomics on primitive types). Are there any?With these restrictions each thread would have its "I_Allocator", whose default implementation would be an incremental-generational-semi-conservative-copying GC, with no inteference with any of the other program threads (it should be responsible only for the memory reserved for that thread). Other implementations of "I_Allocator" could be based on Andrei's allocators. With "setThreadAllocator" (similar to current gc_setProxy) you could switch between the different implementations if you need. Threads with critical time requirements could work with an implementation of "I_Allocator" not based on the GC. It would be possible simulate scoped classes: { setThreadAllocator(I_Allocator_pseudo_stack) scope(exit) { I_Allocator_pseudo_stack.deleteAll(); setThreadAllocator(I_Allocator_gc); } auto obj = MyClass(); ... // Destructor are called and memory released }There is a DIP by Walter with similar functionality: http://wiki.dlang.org/DIP46
Sep 28 2014