www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Memory management and local GC?

reply Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= <ola.fosheim.grostad gmail.com> writes:
I wonder, what is keeping D from making full use of shared?

It seems to me that if you required everything externally 
"referential" reachable from non-thread-local globals to be typed 
as shared, then the compiler could assume that everything 
allocated that is not shared should go on the local GC.

If we then add reference counting for shared types, and make 
non-thread-local class instances ref counted then we no longer 
need to lock threads during GC collection. Then you require all 
shared resource handlers to be reference counted class objects.

You can still hand out local GC objects to other threads if you 
manually pin them first, can be done as a "pin-counter", or 
perhaps better named a "borrow count".

Since a thread local GC will have less pointers to scan through, 
collection would also be faster.

Are there some practical considerations I haven't thought about?

Caveat: you have to deal with more shared protected objects.

Solution: you add an isolated pointer type that automatically 
allows member access as long as the object has noe been actually 
shared yet. So when you allocate a new shared object you receive 
it as an isolated object in the global shared memory pool.
Oct 31 2020
next sibling parent reply Kagamin <spam here.lot> writes:
Say, allocate a dictionary, network connection or XML document 
and put it in shared context. How do you deal with them?
Oct 31 2020
parent reply Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= <ola.fosheim.grostad gmail.com> writes:
On Saturday, 31 October 2020 at 19:05:43 UTC, Kagamin wrote:
 Say, allocate a dictionary, network connection or XML document 
 and put it in shared context. How do you deal with them?
Ok, so we assume safe. When you allocate you have access to it as an "isolated shared" which taint all references obtained through it, so you cannot store references to the internals. BUT, you can configure it. After you are done with the isolated configuration/usage you do a move operation to transfer it to a shared pointer (effectively encapsulating it). Then you need to obtain access to it through the implementation of the shared protocol for the object (not implemented as safe). So for a dictionary, where you want to store references to local objects you would need a dictionary written especially for shared access as you would need to have a borrow-pointer to keep it pinned in the "foreign" GC-pool. So it is up to the dictionary implementation to prevent multiple threads to access the same object at the same time. So that makes it possible to set up a shared cache and use it in safe code, but the object accesses have to be limited somewhat. We need to think of shared globals as facades in this model. A network connection would probably best be implemented as a struct that you embed as a private field in a shared class object. Or maybe you think about something else? I expect there to be pitfalls, so more feedback is good. :-)
Oct 31 2020
next sibling parent Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= <ola.fosheim.grostad gmail.com> writes:
On Saturday, 31 October 2020 at 19:31:23 UTC, Ola Fosheim Grøstad 
wrote:
 for shared access as you would need to have a borrow-pointer to 
 keep it pinned in the "foreign" GC-pool. So it is up to the
The pinning has to be a trusted operation though to prevent multiple threads to access the GC local object. If you want to do everything in safe code you could use the same "isolated" mechanisms to allow pinning in safe code. Borrowed pointers remain scope limited when other threads access them, since the pointer is tagged as borrowed the type system can ensure that it isn't retained. Another option would be for the dictionary to allow access through a callback that scope-restricts the object.
Oct 31 2020
prev sibling parent reply Kagamin <spam here.lot> writes:
The size and complexity of implementation of first class shared 
support in all types holds it from happening.
Nov 01 2020
parent Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= <ola.fosheim.grostad gmail.com> writes:
On Sunday, 1 November 2020 at 09:52:30 UTC, Kagamin wrote:
 The size and complexity of implementation of first class shared 
 support in all types holds it from happening.
Shared should just be a simple type-wrapper that prevents access to internals unless you use specific methods marked as thread-safe for safe code. Requiring atomic access to members do not uphold the invariants of the object, so that is the wrong solution. If one add some kind of "accessible-shared" taint after the caller has obtained read or write access then you can use the type system to prevent disasters. One probably should have "readable-unshared" and "writeable-unshared", to distinguish between read-lock and write-lock protection. Then you map those to say "scope parameters", so that unshared references cannot be retained.
Nov 01 2020
prev sibling parent reply Steven Schveighoffer <schveiguy gmail.com> writes:
On 10/31/20 2:53 PM, Ola Fosheim Grøstad wrote:
 I wonder, what is keeping D from making full use of shared?
 
 It seems to me that if you required everything externally "referential" 
 reachable from non-thread-local globals to be typed as shared, then the 
 compiler could assume that everything allocated that is not shared 
 should go on the local GC.
 
 If we then add reference counting for shared types, and make 
 non-thread-local class instances ref counted then we no longer need to 
 lock threads during GC collection. Then you require all shared resource 
 handlers to be reference counted class objects.
What about cycles in shared data? Typically, when people talk about a "thread local GC", I point out that this doesn't help, because a thread-local GC can point at the shared GC, which means that you still have to stop the world to scan the shared GC. But your idea is interesting in that there is no GC for shared data. If it could be made to work, it might be a nice upgrade. With a good reference counting system, you can also designate different memory management systems for different threads.
 
 You can still hand out local GC objects to other threads if you manually 
 pin them first, can be done as a "pin-counter", or perhaps better named 
 a "borrow count".
Hm... this means that they now become shared. How does that work? Handing unshared references to other threads is going to be a problem. What is the problem with allocating them shared in the first place? Ideally, you will NEVER transfer thread-local data to a shared context. An exception might be immutable data. It also might make sense to move the data to the shared heap before sharing.
 Caveat: you have to deal with more shared protected objects.
 
 Solution: you add an isolated pointer type that automatically allows 
 member access as long as the object has noe been actually shared yet. So 
 when you allocate a new shared object you receive it as an isolated 
 object in the global shared memory pool.
 
This is part of the -preview=nosharedaccess switch -- you need to provide mechanisms on how to actually use shared data. Such a wrapper can be written with this in mind. -Steve
Nov 02 2020
next sibling parent reply IGotD- <nise nise.com> writes:
On Monday, 2 November 2020 at 12:28:34 UTC, Steven Schveighoffer 
wrote:
 What is the problem with allocating them shared in the first 
 place? Ideally, you will NEVER transfer thread-local data to a 
 shared context.
How would you distinguish between global and local GC allocated data in the language? Many times you need GC allocated data that can be used globally, so we need new D local key words like "localnew" or "globalnew". Then since we are talking on GC operations on GC allocated memory should be disallowed between threads but what mechanisms does D have in order to prevent pure access to that data? Only disallowing GC operations is just going half way as I see it. In general these multithreaded allocators usually use "stages" meaning that they have several memory regions to play with hoping that it will reduce mutex locking. Usually there is an upper limit how many stages you want before there isn't any performance advantage. Also tracking stages requires resources itself. If D would open up such stage for every thread, there will be a new stage for every thread with no upper limit. This will require extra metadata as well as a new region. I guess that implementation wise this will be complicated and also require even more memory than it does today.
Nov 02 2020
parent reply Steven Schveighoffer <schveiguy gmail.com> writes:
On 11/2/20 8:23 AM, IGotD- wrote:
 On Monday, 2 November 2020 at 12:28:34 UTC, Steven Schveighoffer wrote:
 What is the problem with allocating them shared in the first place? 
 Ideally, you will NEVER transfer thread-local data to a shared context.
How would you distinguish between global and local GC allocated data in the language? Many times you need GC allocated data that can be used globally, so we need new D local key words like "localnew" or "globalnew".
It's possible to allocate without using `new`. Something like: GlobalPool.allocate!T(ctor_args) // returns shared(T)*
 Then since we are talking on GC operations on GC allocated memory should 
 be disallowed between threads but what mechanisms does D have in order 
 to prevent pure access to that data? Only disallowing GC operations is 
 just going half way as I see it.
D has shared qualifier, which indicates whether any other thread has access to it. This is kind of the lynch pin of all these schemes. Without that enforced properly, you can't build anything. But with it enforced properly, you have options. There are still problems. Like immutable is implicitly shared. Or you can have a type which contains both shared and unshared members, where is that allocated?
 
 In general these multithreaded allocators usually use "stages" meaning 
 that they have several memory regions to play with hoping that it will 
 reduce mutex locking. Usually there is an upper limit how many stages 
 you want before there isn't any performance advantage. Also tracking 
 stages requires resources itself. If D would open up such stage for 
 every thread, there will be a new stage for every thread with no upper 
 limit. This will require extra metadata as well as a new region. I guess 
 that implementation wise this will be complicated and also require even 
 more memory than it does today.
 
You don't necessarily need to assign regions to threads. You could potentially use one region and assign different pools to different threads. If the GC doesn't need to scan shared data at all, then it's a matter of skipping the pools that aren't interesting to your local GC. As long as a pool never gets moved to another thread, you can avoid most locking. -Steve
Nov 02 2020
next sibling parent IGotD- <nise nise.com> writes:
On Monday, 2 November 2020 at 14:44:08 UTC, Steven Schveighoffer 
wrote:
 You don't necessarily need to assign regions to threads. You 
 could potentially use one region and assign different pools to 
 different threads. If the GC doesn't need to scan shared data 
 at all, then it's a matter of skipping the pools that aren't 
 interesting to your local GC.

 As long as a pool never gets moved to another thread, you can 
 avoid most locking.
Can't you spread the allocations out just like contemporary C-lib mallocs do in order to reduce locking? This way the GC can work just as before for the programmer. I would go for automatic solutions as far as possible without involving the programmer or changing the language.
Nov 02 2020
prev sibling parent Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= <ola.fosheim.grostad gmail.com> writes:
On Monday, 2 November 2020 at 14:44:08 UTC, Steven Schveighoffer 
wrote:
 D has shared qualifier, which indicates whether any other 
 thread has access to it.

 This is kind of the lynch pin of all these schemes. Without 
 that enforced properly, you can't build anything.

 But with it enforced properly, you have options.

 There are still problems. Like immutable is implicitly shared. 
 Or you can have a type which contains both shared and unshared 
 members, where is that allocated?
There is something called separation logic, which might be interesting to look at if you are interested in this topic. https://en.wikipedia.org/wiki/Separation_logic
Nov 02 2020
prev sibling parent Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= <ola.fosheim.grostad gmail.com> writes:
On Monday, 2 November 2020 at 12:28:34 UTC, Steven Schveighoffer 
wrote:
 What about cycles in shared data?
I am making the assumption that global shared facades (caches etc) are written by library authors that know what they do, so they would use weak pointers where necessary. However, your idea of using the current GC infrastructure for sanitization could be helpful! So in development builds you could detect such flaws at runtime during testing.
 But your idea is interesting in that there is no GC for shared 
 data. If it could be made to work, it might be a nice upgrade. 
 With a good reference counting system, you can also designate 
 different memory management systems for different threads.
I am making the assumption that most safe-programmers should be discouraged from making non-TLS globals and typically not design their own shared facades (caches, databases etc). So that those global "hubs" would ideally be part of a framework/library with an underlying strategic model for parallelism.
 You can still hand out local GC objects to other threads if 
 you manually pin them first, can be done as a "pin-counter", 
 or perhaps better named a "borrow count".
Hm... this means that they now become shared. How does that work? Handing unshared references to other threads is going to be a problem.
Not if they are isolated. For instance, you might have a framework for scraping websites. Then a thread hands a thread local request object to the framework and when the framework has fetched the data the origin thread receive the request object with the data. You could even let the framework allocate local GC data for the thread if the GC is written with a special pool for that purpose. Think of it like this: A safe thread is like an actor. But there are advanced global services at your disposal that can provide for you using the "building materials" most useful to you (like thread local GC memory).
 What is the problem with allocating them shared in the first 
 place? Ideally, you will NEVER transfer thread-local data to a 
 shared context.
I don't know if that is true. Think for instance of data-science computations. The scientist only wants to write code in a scripty way in his safe thread, and receive useful stuff from the massively parallel framework without having to deal with shared himself? That would be more welcoming to newbies. I think. If you could just tell them: 1. never put things in globals unless they are immutable (lookup tables). 2. don't deal with shared until you become more experienced. 3. use these ready made worker-frameworks for parallel computing.
 This is part of the -preview=nosharedaccess switch -- you need 
 to provide mechanisms on how to actually use shared data.

 Such a wrapper can be written with this in mind.
Interesting!
Nov 02 2020