digitalmars.D - Thread GC non "stop-the-world"

Oscar Martin (66/66) Sep 22 2014 The cost of using the current GC in D, although beneficial for

Rikki Cattermole (39/106) Sep 22 2014 Short, I dislike pretty much all changes to __gshared/shared.

Oscar Martin (16/21) Sep 23 2014 Yeah, these changes break many things, and so are not suitable

"Marc =?UTF-8?B?U2Now7x0eiI=?= <schuetzm gmx.net> (4/26) Sep 24 2014 There can also be a shared _and_ a local GC at the same time, and

Oscar Martin (14/44) Sep 24 2014 Yes, a shared GC should be a possibility, but how you avoid the

Wyatt (10/28) Sep 25 2014 This thread reminds me again of a paper I read a few months ago

Oscar Martin (2/33) Sep 25 2014 An interesting paper. Thank you very much
Sean Kelly (4/7) Sep 25 2014 Pretty much for reasons of being able to call C functions and

Kagamin (5/13) Nov 21 2014 BTW, C usually accepts data only for reading, and writes mostly

Sean Kelly (5/20) Nov 21 2014 "usually" isn't sufficient if you're trying to make a GC that

Kagamin (4/4) Nov 21 2014 I believe I have never seen such C function. What can it do to

Kagamin (5/16) Sep 23 2014 A use case, which comes to mind: a game saves progress to the

"Marc =?UTF-8?B?U2Now7x0eiI=?= <schuetzm gmx.net> (3/22) Sep 23 2014 This can be done without sharing. Of course, a uniqueness concept

Kagamin (2/2) Sep 23 2014 The question is how thread-local GC will account for data passed

"Marc =?UTF-8?B?U2Now7x0eiI=?= <schuetzm gmx.net> (2/4) Sep 23 2014 std.concurrency.send() could notify the GC.

Kagamin (1/1) Sep 23 2014 And what GC does? Pins the allocated blocks for another thread?

"Marc =?UTF-8?B?U2Now7x0eiI=?= <schuetzm gmx.net> (5/6) Sep 23 2014 Assuming there is one thread-local GC per thread, it transfers

Oscar Martin (7/13) Sep 23 2014 Yes. A mechanism for transfer of responsibility and pins would be

"Marc =?UTF-8?B?U2Now7x0eiI=?= <schuetzm gmx.net> (7/23) Sep 24 2014 Physically moving the objects is not necessary, it only needs to

David Nadlinger (10/12) Sep 23 2014 I was briefly discussing this with Andrei at (I think) DConf

Sean Kelly (4/10) Sep 23 2014 ... and casting to invariant, since invariant is implicitly
Oscar Martin (8/20) Sep 23 2014 Yes, it could be a palliative measure, and yes, it require
Kagamin (12/15) Sep 24 2014 Yes, that sounds expensive. A real example from my work: client

Sean Kelly (4/20) Sep 24 2014 Large allocations are the easy case, as the allocation lives in

Kagamin (7/9) Sep 25 2014 Dataset is not a contiguous object. It's like an in-memory

Oscar Martin (5/21) Sep 24 2014 Yes, that's the problem I see with the shared GC. But I think

Kagamin (5/10) Sep 25 2014 You might want to look at Nimrod. AFAIK, it uses thread-local GC

deadalnix (4/6) Sep 23 2014 I don't think you clearly understand what thread local means.

Rainer Schuetze (32/75) Sep 28 2014 https://github.com/rainers/druntime/gcx_precise2

"Oscar Martin" <omarmed gmail.com> writes:

The cost of using the current GC in D, although beneficial for 
many types of programs, is unaffordable for programs such as 
games, etc... that need to perform repetitive tasks every short 
periods of time. The fact that a GC.malloc/realloc on any thread 
can trigger a memory collection that stop ALL threads of the 
program for a variable time prevents it. Conversations in the 
forum as "RFC: reference Counted Throwable", "Escaping the 
Tyranny of the GC: std.rcstring, first blood" and the  nogc 
attribute show that this is increasingly perceived as a problem.
Besides the ever-recurring "reference counting", many people 
propose to improve the current implementation of GC. Rainer 
Schuetze developed a concurrent GC in Windows:

    http://rainers.github.io/visuald/druntime/concurrentgc.html

With some/a lot of work and a little help compiler (currently it 
indicates by a flag if a class/structure contains 
pointers/references to other classes/structures, it could 
increase this support to indicate which fields are 
pointers/references) we could implement a 
semi-incremental-generational-copying GC-conservative like:

    http://www.hboehm.info/gc/
or
    http://www.ravenbrook.com/project/mps/

Being incremental, they try to minimize the "stop-the-world" 
phase. But even with an advanced GC, as programs become more 
complex and use more memory, pause time also increases. See for 
example (I know it's not normal case, but in a few years ...)

    
http://blog.mgm-tp.com/2014/04/controlling-gc-pauses-with-g1-collector

(*) What if:
- It is forbidden for "__gshared" have references/pointers to 
objects allocated by the GC (if the compiler can help with this 
prohibition, perfect, if not the developer have to know what he 
is doing)
- "shared" types are not allocated by the GC (they could be 
reference counted or manually released or ...)
- "immutable" types are no longer implicitly "shared"

In short, the memory accessible from multiple threads is not 
managed by the GC.

With these restrictions each thread would have its "I_Allocator", 
whose default implementation would be an 
incremental-generational-semi-conservative-copying GC, with no 
inteference with any of the other program threads (it should be 
responsible only for the memory reserved for that thread). Other 
implementations of "I_Allocator" could be based on Andrei's 
allocators. With "setThreadAllocator" (similar to current 
gc_setProxy) you could switch between the different 
implementations if you need. Threads with critical time 
requirements could work with an implementation of "I_Allocator" 
not based on the GC. It would be possible simulate scoped classes:

{
	setThreadAllocator(I_Allocator_pseudo_stack)
	scope(exit) {
		I_Allocator_pseudo_stack.deleteAll();
		setThreadAllocator(I_Allocator_gc);
	}
	auto obj = MyClass();
	...
	// Destructor are called and memory released
}

Obviously changes (*) break compatibility with existing code, and 
therefore maybe they are not appropriate for D2. Also these are 
general ideas, sure these changes lead to other problems. But the 
point I want to convey is that in my opinion, while these 
problems are solvable, a language for "system programming" is 
incompatible with shared data managed by a GC

Thoughts?

Sep 22 2014

"Rikki Cattermole" <alphaglosined gmail.com> writes:

On Tuesday, 23 September 2014 at 00:15:51 UTC, Oscar Martin wrote:
 The cost of using the current GC in D, although beneficial for 
 many types of programs, is unaffordable for programs such as 
 games, etc... that need to perform repetitive tasks every short 
 periods of time. The fact that a GC.malloc/realloc on any 
 thread can trigger a memory collection that stop ALL threads of 
 the program for a variable time prevents it. Conversations in 
 the forum as "RFC: reference Counted Throwable", "Escaping the 
 Tyranny of the GC: std.rcstring, first blood" and the  nogc 
 attribute show that this is increasingly perceived as a problem.
 Besides the ever-recurring "reference counting", many people 
 propose to improve the current implementation of GC. Rainer 
 Schuetze developed a concurrent GC in Windows:

    http://rainers.github.io/visuald/druntime/concurrentgc.html

 With some/a lot of work and a little help compiler (currently 
 it indicates by a flag if a class/structure contains 
 pointers/references to other classes/structures, it could 
 increase this support to indicate which fields are 
 pointers/references) we could implement a 
 semi-incremental-generational-copying GC-conservative like:

    http://www.hboehm.info/gc/
 or
    http://www.ravenbrook.com/project/mps/

 Being incremental, they try to minimize the "stop-the-world" 
 phase. But even with an advanced GC, as programs become more 
 complex and use more memory, pause time also increases. See for 
 example (I know it's not normal case, but in a few years ...)

    
 http://blog.mgm-tp.com/2014/04/controlling-gc-pauses-with-g1-collector

 (*) What if:
 - It is forbidden for "__gshared" have references/pointers to 
 objects allocated by the GC (if the compiler can help with this 
 prohibition, perfect, if not the developer have to know what he 
 is doing)
 - "shared" types are not allocated by the GC (they could be 
 reference counted or manually released or ...)
 - "immutable" types are no longer implicitly "shared"

 In short, the memory accessible from multiple threads is not 
 managed by the GC.

 With these restrictions each thread would have its 
 "I_Allocator", whose default implementation would be an 
 incremental-generational-semi-conservative-copying GC, with no 
 inteference with any of the other program threads (it should be 
 responsible only for the memory reserved for that thread). 
 Other implementations of "I_Allocator" could be based on 
 Andrei's allocators. With "setThreadAllocator" (similar to 
 current gc_setProxy) you could switch between the different 
 implementations if you need. Threads with critical time 
 requirements could work with an implementation of "I_Allocator" 
 not based on the GC. It would be possible simulate scoped 
 classes:

 {
 	setThreadAllocator(I_Allocator_pseudo_stack)
 	scope(exit) {
 		I_Allocator_pseudo_stack.deleteAll();
 		setThreadAllocator(I_Allocator_gc);
 	}
 	auto obj = MyClass();
 	...
 	// Destructor are called and memory released
 }

 Obviously changes (*) break compatibility with existing code, 
 and therefore maybe they are not appropriate for D2. Also these 
 are general ideas, sure these changes lead to other problems. 
 But the point I want to convey is that in my opinion, while 
 these problems are solvable, a language for "system 
 programming" is incompatible with shared data managed by a GC

 Thoughts?

Short, I dislike pretty much all changes to __gshared/shared. 
Breaks too many things.
Atleast with Cmsed, (I'm evil here) where I use __gshared 
essentially as a read only variable but modifiable when starting 
up (to modify need synchronized, to read doesn't).

I have already suggested before in threads something similar to 
what your suggesting with regards to setting allocator except:

The memory manager is in a stack. Default is GC e.g. the current 
one.
Compiler knows which pointers escapes. Can pass to pure functions 
however.

with(myAllocator) { // myAllocator.opWithIn
   ...//allocate
} // myAllocator.opWithCanFree
// myAllocator.opWithOut

class MyAllocator : Allocator {
   override void opWithIn(string func = __FUNCTION__, int line = 
__LINE__) {
     GC.pushAllocator(this);
   }

   override void opWithCanFree(void** freeablePointers) {
     //...
   }

   override void opWithOut(string func = __FUNCTION__, int line = 
__LINE__) {
     GC.popAllocator();
   }

   void* alloc(size_t amount) {
     return ...;
   }

   void free(void*) {
      //...
   }
}

You may have something about thread allocators though. Humm 
druntime would already need changes so maybe.

Ehh this really needs a DIP instead of me whining. If I do it, 
ETA December.

Sep 22 2014

"Oscar Martin" <omarmed gmail.com> writes:

On Tuesday, 23 September 2014 at 01:58:50 UTC, Rikki Cattermole 
wrote:
 Short, I dislike pretty much all changes to __gshared/shared. 
 Breaks too many things.
 Atleast with Cmsed, (I'm evil here) where I use __gshared 
 essentially as a read only variable but modifiable when 
 starting up (to modify need synchronized, to read doesn't).

Yeah, these changes break many things, and so are not suitable 
for D2. My intention was only to point out how expensive is for 
the GC to deal with shared memory.

Come to think a little more: what if each thread can have its own 
GC, but by default all use the current GC (this would require 
minimal changes to druntime). "__gshared", "shared" and 
"immutable", continue as now, which does not break anything. If I 
as a programmer take care of managing (either manually or through 
reference counting) all of the shared memory ("__gshared", 
"shared" or "immutable") that can be referenced from multiple 
threads, I could replace in my program the global GC by a 
indiviual thread GC

I'll try to implement a GC optimized for a thread and try that 
solution

Sep 23 2014

"Marc =?UTF-8?B?U2Now7x0eiI=?= <schuetzm gmx.net> writes:

On Tuesday, 23 September 2014 at 18:39:09 UTC, Oscar Martin wrote:
 On Tuesday, 23 September 2014 at 01:58:50 UTC, Rikki Cattermole 
 wrote:
 Short, I dislike pretty much all changes to __gshared/shared. 
 Breaks too many things.
 Atleast with Cmsed, (I'm evil here) where I use __gshared 
 essentially as a read only variable but modifiable when 
 starting up (to modify need synchronized, to read doesn't).

 Yeah, these changes break many things, and so are not suitable 
 for D2. My intention was only to point out how expensive is for 
 the GC to deal with shared memory.

 Come to think a little more: what if each thread can have its 
 own GC, but by default all use the current GC (this would 
 require minimal changes to druntime). "__gshared", "shared" and 
 "immutable", continue as now, which does not break anything. If 
 I as a programmer take care of managing (either manually or 
 through reference counting) all of the shared memory 
 ("__gshared", "shared" or "immutable") that can be referenced 
 from multiple threads, I could replace in my program the global 
 GC by a indiviual thread GC

 I'll try to implement a GC optimized for a thread and try that 
 solution

There can also be a shared _and_ a local GC at the same time, and 
a thread could opt from the shared GC (or choose not to opt in by 
not allocating from the shared heap).

Sep 24 2014

"Oscar Martin" <omarmed gmail.com> writes:

On Wednesday, 24 September 2014 at 08:13:15 UTC, Marc Schütz 
wrote:
 On Tuesday, 23 September 2014 at 18:39:09 UTC, Oscar Martin 
 wrote:
 On Tuesday, 23 September 2014 at 01:58:50 UTC, Rikki 
 Cattermole wrote:
 Short, I dislike pretty much all changes to __gshared/shared. 
 Breaks too many things.
 Atleast with Cmsed, (I'm evil here) where I use __gshared 
 essentially as a read only variable but modifiable when 
 starting up (to modify need synchronized, to read doesn't).

 Yeah, these changes break many things, and so are not suitable 
 for D2. My intention was only to point out how expensive is 
 for the GC to deal with shared memory.

 Come to think a little more: what if each thread can have its 
 own GC, but by default all use the current GC (this would 
 require minimal changes to druntime). "__gshared", "shared" 
 and "immutable", continue as now, which does not break 
 anything. If I as a programmer take care of managing (either 
 manually or through reference counting) all of the shared 
 memory ("__gshared", "shared" or "immutable") that can be 
 referenced from multiple threads, I could replace in my 
 program the global GC by a indiviual thread GC

 I'll try to implement a GC optimized for a thread and try that 
 solution

 There can also be a shared _and_ a local GC at the same time, 
 and a thread could opt from the shared GC (or choose not to opt 
 in by not allocating from the shared heap).

Yes, a shared GC should be a possibility, but how you avoid the 
"stop-the-world" phase for that GC?

Obviously this pause can be minimized by performing the most work 
out of that phase, but after seeing the test of other people on 
internet about advanced GCs (java, .net) I do not think it's 
enough for some programs

But hey, I guess it's enough to cover the greatest number of 
cases. My goal is to start implementing the thread GC. Then I 
will do testing of performance and pauses (my program requires 
managing audio every 10 ms) and then I might dare to implement 
the shared GC, which is obviously more complex if desired to 
minimize the pauses. We'll see what the outcome

Sep 24 2014

"Wyatt" <wyatt.epp gmail.com> writes:

On Wednesday, 24 September 2014 at 20:15:52 UTC, Oscar Martin 
wrote:
 On Wednesday, 24 September 2014 at 08:13:15 UTC, Marc Schütz 
 wrote:
 There can also be a shared _and_ a local GC at the same time, 
 and a thread could opt from the shared GC (or choose not to 
 opt in by not allocating from the shared heap).

 Yes, a shared GC should be a possibility, but how you avoid the 
 "stop-the-world" phase for that GC?

 Obviously this pause can be minimized by performing the most 
 work out of that phase, but after seeing the test of other 
 people on internet about advanced GCs (java, .net) I do not 
 think it's enough for some programs

 But hey, I guess it's enough to cover the greatest number of 
 cases. My goal is to start implementing the thread GC. Then I 
 will do testing of performance and pauses (my program requires 
 managing audio every 10 ms) and then I might dare to implement 
 the shared GC, which is obviously more complex if desired to 
 minimize the pauses. We'll see what the outcome

This thread reminds me again of a paper I read a few months ago 
with a clever way of dealing with the sharing problem while 
maintaining performance: 
https://research.microsoft.com/en-us/um/people/simonpj/papers/parallel/local-gc.pdf

The caveat for D being this design requires read and write 
barriers and I'm pretty sure I recall correctly that those have 
been vetoed several times for complexity.

-Wyatt

Sep 25 2014

"Oscar Martin" <omarmed gmail.com> writes:

On Thursday, 25 September 2014 at 13:55:42 UTC, Wyatt wrote:
 On Wednesday, 24 September 2014 at 20:15:52 UTC, Oscar Martin 
 wrote:
 On Wednesday, 24 September 2014 at 08:13:15 UTC, Marc Schütz 
 wrote:
 There can also be a shared _and_ a local GC at the same time, 
 and a thread could opt from the shared GC (or choose not to 
 opt in by not allocating from the shared heap).

 Yes, a shared GC should be a possibility, but how you avoid 
 the "stop-the-world" phase for that GC?

 Obviously this pause can be minimized by performing the most 
 work out of that phase, but after seeing the test of other 
 people on internet about advanced GCs (java, .net) I do not 
 think it's enough for some programs

 But hey, I guess it's enough to cover the greatest number of 
 cases. My goal is to start implementing the thread GC. Then I 
 will do testing of performance and pauses (my program requires 
 managing audio every 10 ms) and then I might dare to implement 
 the shared GC, which is obviously more complex if desired to 
 minimize the pauses. We'll see what the outcome

 This thread reminds me again of a paper I read a few months ago 
 with a clever way of dealing with the sharing problem while 
 maintaining performance: 
 https://research.microsoft.com/en-us/um/people/simonpj/papers/parallel/local-gc.pdf

 The caveat for D being this design requires read and write 
 barriers and I'm pretty sure I recall correctly that those have 
 been vetoed several times for complexity.

 -Wyatt

An interesting paper. Thank you very much

Sep 25 2014

"Sean Kelly" <sean invisibleduck.org> writes:

On Thursday, 25 September 2014 at 13:55:42 UTC, Wyatt wrote:
 The caveat for D being this design requires read and write 
 barriers and I'm pretty sure I recall correctly that those have 
 been vetoed several times for complexity.

Pretty much for reasons of being able to call C functions and 
inline asm code.  Memory barriers may still be possible in these 
scenarios, but they would be extremely expensive.

Sep 25 2014

"Kagamin" <spam here.lot> writes:

On Thursday, 25 September 2014 at 21:59:15 UTC, Sean Kelly wrote:
 On Thursday, 25 September 2014 at 13:55:42 UTC, Wyatt wrote:
 The caveat for D being this design requires read and write 
 barriers and I'm pretty sure I recall correctly that those 
 have been vetoed several times for complexity.

 Pretty much for reasons of being able to call C functions and 
 inline asm code.  Memory barriers may still be possible in 
 these scenarios, but they would be extremely expensive.

BTW, C usually accepts data only for reading, and writes mostly 
strings and buffers - plain data without pointers. In both cases 
it doesn't need to notify GC (as far as I understand write 
barriers).

Nov 21 2014

"Sean Kelly" <sean invisibleduck.org> writes:

On Friday, 21 November 2014 at 10:24:09 UTC, Kagamin wrote:
 On Thursday, 25 September 2014 at 21:59:15 UTC, Sean Kelly 
 wrote:
 On Thursday, 25 September 2014 at 13:55:42 UTC, Wyatt wrote:
 The caveat for D being this design requires read and write 
 barriers and I'm pretty sure I recall correctly that those 
 have been vetoed several times for complexity.

 Pretty much for reasons of being able to call C functions and 
 inline asm code.  Memory barriers may still be possible in 
 these scenarios, but they would be extremely expensive.

 BTW, C usually accepts data only for reading, and writes mostly 
 strings and buffers - plain data without pointers. In both 
 cases it doesn't need to notify GC (as far as I understand 
 write barriers).

"usually" isn't sufficient if you're trying to make a GC that 
doesn't collect live data.  It's possible that we could do 
something around calls to extern (C) functions that accept a type 
containing pointers, but I'd have to give this some thought.

Nov 21 2014

"Kagamin" <spam here.lot> writes:

I believe I have never seen such C function. What can it do to 
screw managed memory? It usually requires allocation of new 
memory from GC and assigning it to an old object, but (true) C 
function is very unlikely to allocate memory from GC.

Nov 21 2014

"Kagamin" <spam here.lot> writes:

On Tuesday, 23 September 2014 at 00:15:51 UTC, Oscar Martin wrote:
 http://blog.mgm-tp.com/2014/04/controlling-gc-pauses-with-g1-collector

 (*) What if:
 - It is forbidden for "__gshared" have references/pointers to 
 objects allocated by the GC (if the compiler can help with this 
 prohibition, perfect, if not the developer have to know what he 
 is doing)
 - "shared" types are not allocated by the GC (they could be 
 reference counted or manually released or ...)
 - "immutable" types are no longer implicitly "shared"

 In short, the memory accessible from multiple threads is not 
 managed by the GC.

A use case, which comes to mind: a game saves progress to the 
server, the main thread prepares data to be saved (a relatively 
lightweight operation) and hands it over to another thread, which 
saves the data in background. How would you do it?

Sep 23 2014

"Marc =?UTF-8?B?U2Now7x0eiI=?= <schuetzm gmx.net> writes:

On Tuesday, 23 September 2014 at 09:04:48 UTC, Kagamin wrote:
 On Tuesday, 23 September 2014 at 00:15:51 UTC, Oscar Martin 
 wrote:
 http://blog.mgm-tp.com/2014/04/controlling-gc-pauses-with-g1-collector

 (*) What if:
 - It is forbidden for "__gshared" have references/pointers to 
 objects allocated by the GC (if the compiler can help with 
 this prohibition, perfect, if not the developer have to know 
 what he is doing)
 - "shared" types are not allocated by the GC (they could be 
 reference counted or manually released or ...)
 - "immutable" types are no longer implicitly "shared"

 In short, the memory accessible from multiple threads is not 
 managed by the GC.

 A use case, which comes to mind: a game saves progress to the 
 server, the main thread prepares data to be saved (a relatively 
 lightweight operation) and hands it over to another thread, 
 which saves the data in background. How would you do it?

This can be done without sharing. Of course, a uniqueness concept 
would be needed.

Sep 23 2014

"Kagamin" <spam here.lot> writes:

The question is how thread-local GC will account for data passed 
to another thread.

Sep 23 2014

"Marc =?UTF-8?B?U2Now7x0eiI=?= <schuetzm gmx.net> writes:

On Tuesday, 23 September 2014 at 10:38:29 UTC, Kagamin wrote:
 The question is how thread-local GC will account for data 
 passed to another thread.

std.concurrency.send() could notify the GC.

Sep 23 2014

"Kagamin" <spam here.lot> writes:

And what GC does? Pins the allocated blocks for another thread?

Sep 23 2014

"Marc =?UTF-8?B?U2Now7x0eiI=?= <schuetzm gmx.net> writes:

On Tuesday, 23 September 2014 at 15:23:16 UTC, Kagamin wrote:
 And what GC does? Pins the allocated blocks for another thread?

Assuming there is one thread-local GC per thread, it transfers 
responsibility of the allocated data from the sender to the 
receiver. This means, the old GC doesn't need to scan it any 
more, but the new one does.

Sep 23 2014

"Oscar Martin" <omarmed gmail.com> writes:

On Tuesday, 23 September 2014 at 15:28:30 UTC, Marc Schütz wrote:
 On Tuesday, 23 September 2014 at 15:23:16 UTC, Kagamin wrote:
 And what GC does? Pins the allocated blocks for another thread?

 Assuming there is one thread-local GC per thread, it transfers 
 responsibility of the allocated data from the sender to the 
 receiver. This means, the old GC doesn't need to scan it any 
 more, but the new one does.

Yes. A mechanism for transfer of responsibility and pins would be 
needed.

Basically we have to think that a thread GC just look for roots 
on his stack/registers and managed memory, and may move the 
managed objects in a collection, so a reference used in another 
thread may become invalid for that other thread anytime

Sep 23 2014

"Marc =?UTF-8?B?U2Now7x0eiI=?= <schuetzm gmx.net> writes:

On Tuesday, 23 September 2014 at 18:53:04 UTC, Oscar Martin wrote:
 On Tuesday, 23 September 2014 at 15:28:30 UTC, Marc Schütz 
 wrote:
 On Tuesday, 23 September 2014 at 15:23:16 UTC, Kagamin wrote:
 And what GC does? Pins the allocated blocks for another 
 thread?

 Assuming there is one thread-local GC per thread, it transfers 
 responsibility of the allocated data from the sender to the 
 receiver. This means, the old GC doesn't need to scan it any 
 more, but the new one does.

 Yes. A mechanism for transfer of responsibility and pins would 
 be needed.

 Basically we have to think that a thread GC just look for roots 
 on his stack/registers and managed memory, and may move the 
 managed objects in a collection, so a reference used in another 
 thread may become invalid for that other thread anytime

Physically moving the objects is not necessary, it only needs to 
"move" the responsibility. With some work, it might even be 
possible to move the objects' metadata from the old to the new 
heap, so that each GC would only need to access its own heap 
during a scan, which avoids synchronization with the other 
threads.

Sep 24 2014

"David Nadlinger" <code klickverbot.at> writes:

On Tuesday, 23 September 2014 at 10:38:29 UTC, Kagamin wrote:
 The question is how thread-local GC will account for data 
 passed to another thread.

I was briefly discussing this with Andrei at (I think) DConf 
2013. I suggested moving data to a separate global GC heap on 
casting stuff to shared. Assigning types with indirections to a 
__gshared variable might also trigger this, unless we can find a 
better design. IIRC, Andrei dismissed this as impractical due to 
the overhead and need for precise scanning. I still like to think 
that it would be worth it, though, even if I can't spare the time 
for looking into an implementation right now.

David

Sep 23 2014

"Sean Kelly" <sean invisibleduck.org> writes:

On Tuesday, 23 September 2014 at 16:47:09 UTC, David Nadlinger 
wrote:
 On Tuesday, 23 September 2014 at 10:38:29 UTC, Kagamin wrote:
 The question is how thread-local GC will account for data 
 passed to another thread.

 I was briefly discussing this with Andrei at (I think) DConf 
 2013. I suggested moving data to a separate global GC heap on 
 casting stuff to shared.

... and casting to invariant, since invariant is implicitly 
shared (insert cuss-words here).

Sep 23 2014

"Oscar Martin" <omarmed gmail.com> writes:

On Tuesday, 23 September 2014 at 16:47:09 UTC, David Nadlinger 
wrote:
 On Tuesday, 23 September 2014 at 10:38:29 UTC, Kagamin wrote:
 The question is how thread-local GC will account for data 
 passed to another thread.

 I was briefly discussing this with Andrei at (I think) DConf 
 2013. I suggested moving data to a separate global GC heap on 
 casting stuff to shared. Assigning types with indirections to a 
 __gshared variable might also trigger this, unless we can find 
 a better design. IIRC, Andrei dismissed this as impractical due 
 to the overhead and need for precise scanning. I still like to 
 think that it would be worth it, though, even if I can't spare 
 the time for looking into an implementation right now.

 David

Yes, it could be a palliative measure, and yes, it require 
precise scanning. I do not think it is easy to implement on the 
stack.

And in any case I believe the problem is to have multiple 
references to the same object from different threads, which 
forces you to "stop-the-world". That problem still exist

Sep 23 2014

"Kagamin" <spam here.lot> writes:

On Tuesday, 23 September 2014 at 16:47:09 UTC, David Nadlinger 
wrote:
 I was briefly discussing this with Andrei at (I think) DConf 
 2013. I suggested moving data to a separate global GC heap on 
 casting stuff to shared.

Yes, that sounds expensive. A real example from my work: client 
receives big dataset (~1GB) from server in a background thread, 
builds and checks constraints and indexes (which is sort of 
expensive too; RBTree) and hands it over to the main thread. And 
client machine is not quite powerful for frequent marshaling of 
such big dataset, handling it at all is enough of a problem. If 
you copied it twice, you have 3GB working set, and GC needs 
somewhat 2x reserve, raising memory requirements to 6GB, without 
dup requirements are 1-2GB. Also when you trigger collection 
during copying to shared GC, what it does, stops the world again?

Sep 24 2014

"Sean Kelly" <sean invisibleduck.org> writes:

On Wednesday, 24 September 2014 at 11:59:52 UTC, Kagamin wrote:
 On Tuesday, 23 September 2014 at 16:47:09 UTC, David Nadlinger 
 wrote:
 I was briefly discussing this with Andrei at (I think) DConf 
 2013. I suggested moving data to a separate global GC heap on 
 casting stuff to shared.

 Yes, that sounds expensive. A real example from my work: client 
 receives big dataset (~1GB) from server in a background thread, 
 builds and checks constraints and indexes (which is sort of 
 expensive too; RBTree) and hands it over to the main thread. 
 And client machine is not quite powerful for frequent 
 marshaling of such big dataset, handling it at all is enough of 
 a problem. If you copied it twice, you have 3GB working set, 
 and GC needs somewhat 2x reserve, raising memory requirements 
 to 6GB, without dup requirements are 1-2GB. Also when you 
 trigger collection during copying to shared GC, what it does, 
 stops the world again?

Large allocations are the easy case, as the allocation lives in 
its own pool and you can just move the entire pool.  Copying 
objects is the tricky part...

Sep 24 2014

"Kagamin" <spam here.lot> writes:

On Wednesday, 24 September 2014 at 14:36:13 UTC, Sean Kelly wrote:
 Large allocations are the easy case, as the allocation lives in 
 its own pool and you can just move the entire pool.

Dataset is not a contiguous object. It's like an in-memory 
database: tables can added or removed from it, rows can be added 
or removed from tables, fields in rows can be set with various 
values. In the end a dataset is a collection of a big number of 
relatively small objects, big datasets are collections of big 
numbers of objects.

Sep 25 2014

"Oscar Martin" <omarmed gmail.com> writes:

On Wednesday, 24 September 2014 at 11:59:52 UTC, Kagamin wrote:
 On Tuesday, 23 September 2014 at 16:47:09 UTC, David Nadlinger 
 wrote:
 I was briefly discussing this with Andrei at (I think) DConf 
 2013. I suggested moving data to a separate global GC heap on 
 casting stuff to shared.

 Yes, that sounds expensive. A real example from my work: client 
 receives big dataset (~1GB) from server in a background thread, 
 builds and checks constraints and indexes (which is sort of 
 expensive too; RBTree) and hands it over to the main thread. 
 And client machine is not quite powerful for frequent 
 marshaling of such big dataset, handling it at all is enough of 
 a problem. If you copied it twice, you have 3GB working set, 
 and GC needs somewhat 2x reserve, raising memory requirements 
 to 6GB, without dup requirements are 1-2GB. Also when you 
 trigger collection during copying to shared GC, what it does, 
 stops the world again?

Yes, that's the problem I see with the shared GC. But I think 
cases like this should be solved "easily" with a mechanism for 
transfer of responsibility between thread GCs. The truly 
problematic cases are shared objects with roots in various threads

Sep 24 2014

"Kagamin" <spam here.lot> writes:

On Wednesday, 24 September 2014 at 20:24:01 UTC, Oscar Martin 
wrote:
 Yes, that's the problem I see with the shared GC. But I think 
 cases like this should be solved "easily" with a mechanism for 
 transfer of responsibility between thread GCs. The truly 
 problematic cases are shared objects with roots in various 
 threads

You might want to look at Nimrod. AFAIK, it uses thread-local GC 
and thread groups are planned to be introduced; as I understand, 
shared GC will stop only threads in a group and not other groups.

Sep 25 2014

"deadalnix" <deadalnix gmail.com> writes:

On Tuesday, 23 September 2014 at 10:38:29 UTC, Kagamin wrote:
 The question is how thread-local GC will account for data 
 passed to another thread.

I don't think you clearly understand what thread local means.

Also, there is reason why I'm beating the drum to get an
ownership type qualifier. So you can transfer ownership.

Sep 23 2014

Rainer Schuetze <r.sagitario gmx.de> writes:

On 23.09.2014 02:15, Oscar Martin wrote:
 With some/a lot of work and a little help compiler (currently it
 indicates by a flag if a class/structure contains pointers/references to
 other classes/structures, it could increase this support to indicate
 which fields are pointers/references)

https://github.com/rainers/druntime/gcx_precise2

we could implement a
 semi-incremental-generational-copying GC-conservative like:

     http://www.hboehm.info/gc/
 or
     http://www.ravenbrook.com/project/mps/

 Being incremental, they try to minimize the "stop-the-world" phase. But
 even with an advanced GC, as programs become more complex and use more
 memory, pause time also increases. See for example (I know it's not
 normal case, but in a few years ...)

As others have already mentioned, incremental GCs need read/write 
barriers. There is currently resistence to implementing these in the 
compiler, the alternative in the library is using page protection, but 
this is very coarse.

"semi-incremental-generational-copying" is probably asking to much in 
one step ;-)

 http://blog.mgm-tp.com/2014/04/controlling-gc-pauses-with-g1-collector

 (*) What if:
 - It is forbidden for "__gshared" have references/pointers to objects
 allocated by the GC (if the compiler can help with this prohibition,
 perfect, if not the developer have to know what he is doing)
 - "shared" types are not allocated by the GC (they could be reference
 counted or manually released or ...)

shared objects will eventually contain references to other objects that 
you don't want to handle manually (e.g. string). That means you will 
have to add the memory range of the shared object to some GC for 
scanning. Back to square one...

 - "immutable" types are no longer implicitly "shared"

 In short, the memory accessible from multiple threads is not managed by
 the GC.

Is the compiler meant to help via the type system? I don't think this 
works as AFAIK the recommended way to work with shared objects is to 
cast away shared after synchronizing on some mutex:

class C
{
	void doWork() { /*...*/ }

	void doSharedWork() shared
	{
		synchronized(someMutex)
		{
			C self = cast(C)this;
			self.doWork();
		}
	}
}

Maybe I missed other patterns to use shared (apart from atomics on 
primitive types). Are there any?

 With these restrictions each thread would have its "I_Allocator", whose
 default implementation would be an
 incremental-generational-semi-conservative-copying GC, with no
 inteference with any of the other program threads (it should be
 responsible only for the memory reserved for that thread). Other
 implementations of "I_Allocator" could be based on Andrei's allocators.
 With "setThreadAllocator" (similar to current gc_setProxy) you could
 switch between the different implementations if you need. Threads with
 critical time requirements could work with an implementation of
 "I_Allocator" not based on the GC. It would be possible simulate scoped
 classes:

 {
      setThreadAllocator(I_Allocator_pseudo_stack)
      scope(exit) {
          I_Allocator_pseudo_stack.deleteAll();
          setThreadAllocator(I_Allocator_gc);
      }
      auto obj = MyClass();
      ...
      // Destructor are called and memory released
 }

There is a DIP by Walter with similar functionality: 
http://wiki.dlang.org/DIP46

Sep 28 2014

D Programming

C/C++ Programming

Other

digitalmars.D - Thread GC non "stop-the-world"