digitalmars.D - Thread local and memory allocation
- deadalnix (30/30) Oct 03 2011 D's uses thread local storage for most of its data. And it's a good thin...
- Sean Kelly (34/51) Oct 03 2011 has no way to handle it in the future as things are specified.
- bearophile (6/7) Oct 03 2011 (I am ignorant still about such issues, so I usually keep myself quiet a...
- deadalnix (10/13) Oct 03 2011 Yes, I was thinking is such a thing. Each thread has a local heap and
- Jason House (2/3) Oct 03 2011 Why not run the collection for a single thread in the thread being colle...
- Sean Kelly (12/17) Oct 03 2011 is that when the GC collects memory, the thread that finalizes =
- Jason House (3/11) Oct 03 2011 The world can't be stopped when finalizers run or the app can deadlock. ...
- Walter Bright (4/6) Oct 03 2011 It is a great idea, and it has been discussed before. The difficulties a...
- Sean Kelly (11/16) Oct 03 2011 are when thread local allocated data gets shared with other threads, =
- Walter Bright (6/13) Oct 03 2011 Right. The current language allows no way to determine in advance if an
- deadalnix (13/31) Oct 04 2011 Do you mean manage the memory that way :
- Walter Bright (4/13) Oct 04 2011 Yes.
- deadalnix (6/8) Oct 04 2011 That is explicitly said to be unsafe on D's website. As long as a
- Walter Bright (9/16) Oct 04 2011 Unsafe doesn't mean "undefined behavior". It just means the compiler can...
- deadalnix (19/27) Oct 05 2011 This looks like more a flaw in the type system or lack of tools to deal
- Walter Bright (4/17) Oct 05 2011 Both your solutions can work, but they can be highly error prone as they...
- bearophile (4/5) Oct 04 2011 Casts are often the points where type systems fail :-)
- Andrew Wiley (13/35) Oct 04 2011 Assuming we have to make a call to the GC when an object toggles its
- Robert Jacques (2/45) Oct 04 2011 It's entirely possible to simply allocate the memory for the object from...
- Andrew Wiley (5/56) Oct 04 2011 When an object is created and later cast to shared, the compiler
- Robert Jacques (2/61) Oct 04 2011 But that not the scenario being discussed. In fact, having a dangling re...
- Andrew Wiley (5/64) Oct 04 2011 If you meant that the *user* should be responsible for making sure
- Robert Jacques (2/68) Oct 04 2011 I would phrase it as a shift D's memory model towards NUMA. By the way, ...
- Sean Kelly (8/68) Oct 05 2011 Maybe the correct approach is simply to try and eliminate the mutex prot...
- Robert Jacques (7/12) Oct 03 2011 I've been a proponent of thread-local garbage collection, so naturally I...
- deadalnix (5/35) Oct 04 2011 The GC switch you suggest doesn't take into account all cases. you
- Robert Jacques (2/43) Oct 04 2011 Well, our current GC behaves as a shared GC.
D's uses thread local storage for most of its data. And it's a good thing. However, the allocation mecanism isn't aware of it. In addition, it has no way to handle it in the future as things are specified. As long as you don't have any pointer in shared memory to thread local data (thank to the type system) so this is something GC could use at his own advantage. As long as good pratice should minimize as much as possible the usage of shared data, this design choice make things worse for good design, which is, IMO, not D's phylosophy. The advantages of handling this at memory management levels are the followings : - Swap friendlyness. Data of a given thread can be located in blocks, so an idle thread can be swapped easily without huge penality on performance. Anyone who have used chrome and firefox with a lots of tabs on a machine with limited memory know what I'm talking about : firefox uses less memory than whrome, but performance are terrible anyway, because chrome memory layout is more cache friendly (tabs memory isn't mixed with each others). - Effisciency in heavily multithreaded application like servers : the more thread run in the program, the more a stop the world GC is costly. As long as good design imply separate data from thread as much as possible, a thread local collection can be triggered at time without stopping other threads. Even is thoses improvements are not implemented yet and anytime soon, it kinda sad that the current interface doesn't allow for this. What I suggest in add a flag SHARED in BlkAttr and store it as an attribute of the block. Later modification could be made according to this flag. This attribute shouldn't be modifiable later on. What do you think ? Is it something it worth working on ? If it is, how can I help ?
Oct 03 2011
On Oct 3, 2011, at 12:48 PM, deadalnix wrote:D's uses thread local storage for most of its data. And it's a good =thing.=20 However, the allocation mecanism isn't aware of it. In addition, it =has no way to handle it in the future as things are specified.=20 As long as you don't have any pointer in shared memory to thread local =data (thank to the type system) so this is something GC could use at his = own advantage.=20 As long as good pratice should minimize as much as possible the usage =of shared data, this design choice make things worse for good design, = which is, IMO, not D's phylosophy.=20 The advantages of handling this at memory management levels are the =followings :- Swap friendlyness. Data of a given thread can be located in blocks, =so an idle thread can be swapped easily without huge penality on = performance. Anyone who have used chrome and firefox with a lots of tabs = on a machine with limited memory know what I'm talking about : firefox = uses less memory than whrome, but performance are terrible anyway, = because chrome memory layout is more cache friendly (tabs memory isn't = mixed with each others).- Effisciency in heavily multithreaded application like servers : the =more thread run in the program, the more a stop the world GC is costly. = As long as good design imply separate data from thread as much as = possible, a thread local collection can be triggered at time without = stopping other threads.=20 Even is thoses improvements are not implemented yet and anytime soon, =it kinda sad that the current interface doesn't allow for this.=20 What I suggest in add a flag SHARED in BlkAttr and store it as an =attribute of the block. Later modification could be made according to = this flag. This attribute shouldn't be modifiable later on.=20 What do you think ? Is it something it worth working on ? If it is, =how can I help ? There's another important issue that hasn't yet been addressed, which is = that when the GC collects memory, the thread that finalizes non-shared = data should be the one that created it. So that SHARED flag should = really be a thread-id of some sort. Alternately, each thread could = allocate from its own pool, with shared allocations coming from a common = pool. This would allow the lock granularity to be reduced and in some = cases eliminated. I'd like to move to CDGC as an intermediate step, and that will need = some testing and polish. That would allow for precise collections if = the compiler support is added. Then the thread-local finalization has = to be tackled one way or another. I'd favor per-thread heaps but am = open to suggestions and/or help.=
Oct 03 2011
Sean Kelly:I'd favor per-thread heaps but am open to suggestions and/or help.(I am ignorant still about such issues, so I usually keep myself quiet about them. Please forgive me if I am saying stupid things.) The memory today is organized like a tree, the larger memories are slower, and the far the memory is, the more costly it is to move data across, and to keep coherence across two pieces of data that are supposed to be the "same" data. If I have a CPU with 4 cores, each core has hyper-threading, and each pair of cores has its L2 cache, then I think it is better for your code to use 2 heaps (one heap for each L2 cache). If future CPUs will have even more cores, with totally independent local memory, then this memory has to correspond to a different heap. Bye, bearophile
Oct 03 2011
Yes, I was thinking is such a thing. Each thread has a local heap and you have a common shared heap too, with shared data in it. Is such a case, the flag is suffiscient because then the GC could handle that and trigger thread local heap allocation instead of shared one. This is consistent with the swap friendliness I was talking about and can reduce the need of synchronization when allocating memory (a lock will only occur if the GC doesn't have any memory left in his pool for the given thread). And solve finalization's thread, yes. Le 03/10/2011 22:54, Sean Kelly a écrit :On Oct 3, 2011, at 12:48 PM, deadalnix wrote: There's another important issue that hasn't yet been addressed, which is that when the GC collects memory, the thread that finalizes non-shared data should be the one that created it. So that SHARED flag should really be a thread-id of some sort. Alternately, each thread could allocate from its own pool, with shared allocations coming from a common pool. This would allow the lock granularity to be reduced and in some cases eliminated. I'd like to move to CDGC as an intermediate step, and that will need some testing and polish. That would allow for precise collections if the compiler support is added. Then the thread-local finalization has to be tackled one way or another. I'd favor per-thread heaps but am open to suggestions and/or help.
Oct 03 2011
Sean Kelly Wrote:There's another important issue that hasn't yet been addressed, which is that when the GC collects memory, the thread that finalizes non-shared data should be the one that created it. So that SHARED flag should really be a thread-id of some sort. Alternately, each thread could allocate from its own pool, with shared allocations coming from a common pool. This would allow the lock granularity to be reduced and in some cases eliminated.Why not run the collection for a single thread in the thread being collected? It's a simple way to force where the finalizer runs. It's a big step up from stop-the world collections, but still requires pauses.
Oct 03 2011
On Oct 3, 2011, at 3:27 PM, Jason House wrote:Sean Kelly Wrote:is that when the GC collects memory, the thread that finalizes = non-shared data should be the one that created it. So that SHARED flag = should really be a thread-id of some sort. Alternately, each thread = could allocate from its own pool, with shared allocations coming from a = common pool. This would allow the lock granularity to be reduced and in = some cases eliminated.There's another important issue that hasn't yet been addressed, which ==20 =20 Why not run the collection for a single thread in the thread being =collected? It's a simple way to force where the finalizer runs. It's a = big step up from stop-the world collections, but still requires pauses. The world can't be stopped when finalizers run or the app can deadlock. = So the only correct behavior is to have the creator of a TLS block be = the one to finalize it.=
Oct 03 2011
Sean Kelly Wrote:On Oct 3, 2011, at 3:27 PM, Jason House wrote:The world can't be stopped when finalizers run or the app can deadlock. So the only correct behavior is to have the creator of a TLS block be the one to finalize it. If only one thread is stopped, how will a deadlock occur? If it's a deadlock due to new allocations, doesn't the current GC already handle that?Sean Kelly Wrote:There's another important issue that hasn't yet been addressed, which is that when the GC collects memory, the thread that finalizes non-shared data should be the one that created it. So that SHARED flag should really be a thread-id of some sort. Alternately, each thread could allocate from its own pool, with shared allocations coming from a common pool. This would allow the lock granularity to be reduced and in some cases eliminated.Why not run the collection for a single thread in the thread being collected? It's a simple way to force where the finalizer runs. It's a big step up from stop-the world collections, but still requires pauses.
Oct 03 2011
On 10/3/2011 12:48 PM, deadalnix wrote:What do you think ? Is it something it worth working on ? If it is, how can I help ?It is a great idea, and it has been discussed before. The difficulties are when thread local allocated data gets shared with other threads, like for instance immutable data that is implicitly shareable.
Oct 03 2011
On Oct 3, 2011, at 3:55 PM, Walter Bright wrote:On 10/3/2011 12:48 PM, deadalnix wrote:how can IWhat do you think ? Is it something it worth working on ? If it is, =are when thread local allocated data gets shared with other threads, = like for instance immutable data that is implicitly shareable. Immutable data would have to be allocated on the shared heap as well, = which means the contention for the shared heap may actually be fairly = significant. But the alternatives are all too complex (migrating = immutable data from local pools to a common pool when a thread = terminates, etc). There's also the problem of transferring knowledge of = whether something is immutable into the allocation routine. As things = stand, I don't believe that type info is available.=help ?=20 It is a great idea, and it has been discussed before. The difficulties =
Oct 03 2011
On 10/3/2011 4:20 PM, Sean Kelly wrote:Immutable data would have to be allocated on the shared heap as well, which means the contention for the shared heap may actually be fairly significant. But the alternatives are all too complex (migrating immutable data from local pools to a common pool when a thread terminates, etc). There's also the problem of transferring knowledge of whether something is immutable into the allocation routine. As things stand, I don't believe that type info is available.Right. The current language allows no way to determine in advance if an allocation will be eventually made immutable (or shared) or not. However, if the gc used thread local pools to do the allocation from (not the collection), the gc would go faster because it wouldn't need locking to allocate from those pools. This change can happen without any language or compiler changes.
Oct 03 2011
Le 04/10/2011 02:15, Walter Bright a écrit :On 10/3/2011 4:20 PM, Sean Kelly wrote:Do you mean manage the memory that way : Shared heap -> TL pool within the shared heap -> allocation in thread from TL pool. And complete GC collect. This is a good solution do reduce contention on allocation. But a very different thing than I was initially talking about. ******* Back to the point, Considering you have pointer to immutable from any dataset, but not the other way around, this is also valid to get a flag for it in the allocation interface. What is the issue with the compiler here ?Immutable data would have to be allocated on the shared heap as well, which means the contention for the shared heap may actually be fairly significant. But the alternatives are all too complex (migrating immutable data from local pools to a common pool when a thread terminates, etc). There's also the problem of transferring knowledge of whether something is immutable into the allocation routine. As things stand, I don't believe that type info is available.Right. The current language allows no way to determine in advance if an allocation will be eventually made immutable (or shared) or not. However, if the gc used thread local pools to do the allocation from (not the collection), the gc would go faster because it wouldn't need locking to allocate from those pools. This change can happen without any language or compiler changes.
Oct 04 2011
On 10/4/2011 1:22 AM, deadalnix wrote:Do you mean manage the memory that way : Shared heap -> TL pool within the shared heap -> allocation in thread from TL pool. And complete GC collect.Yes.This is a good solution do reduce contention on allocation. But a very different thing than I was initially talking about.Yes.Back to the point, Considering you have pointer to immutable from any dataset, but not the other way around, this is also valid to get a flag for it in the allocation interface. What is the issue with the compiler here ?Allocate an object, then cast it to immutable, and pass it to another thread.
Oct 04 2011
Le 04/10/2011 10:52, Walter Bright a écrit :Allocate an object, then cast it to immutable, and pass it to another thread.That is explicitly said to be unsafe on D's website. As long as a reference exist in the creating thread, this should work, but if thoses references disapears, you'll end up with memory corruption. This is why the type system is made for isn't it ? And if you decide to do funky stuff bypassing it, unsafe things can happen.
Oct 04 2011
On 10/4/2011 2:32 AM, deadalnix wrote:Le 04/10/2011 10:52, Walter Bright a écrit :Unsafe doesn't mean "undefined behavior". It just means the compiler cannot guarantee that you did it correctly. If you do it correctly, it still should work. On the other hand, "undefined behavior" cannot be done correctly. With casts to immutable, it is perfectly correct if you, the user, ensure that there are no other mutable references to the same data. It's just that the compiler itself cannot make this guarantee, hence it's "unsafe". Casting from immutable to mutable, on the other hand, is "undefined behavior" because neither the compiler nor you, the user, can guarantee it will work.Allocate an object, then cast it to immutable, and pass it to another thread.That is explicitly said to be unsafe on D's website. As long as a reference exist in the creating thread, this should work, but if thoses references disapears, you'll end up with memory corruption.
Oct 04 2011
Le 04/10/2011 20:30, Walter Bright a écrit :On 10/4/2011 2:32 AM, deadalnix wrote: With casts to immutable, it is perfectly correct if you, the user, ensure that there are no other mutable references to the same data. It's just that the compiler itself cannot make this guarantee, hence it's "unsafe". Casting from immutable to mutable, on the other hand, is "undefined behavior" because neither the compiler nor you, the user, can guarantee it will work.This looks like more a flaw in the type system or lack of tools to deal with the type system than a real allocation issue. I see two solutions to deal with this : Something allocated on a Thread local heap can be seen from other threads, and this is safe as long as a reference is kept in the allocating thread. So, if you cast something TL and mutable as immutable, you have to ensure yourself that you will not modify it. Plus, you need to ensure that you keep a reference on that object in the allocating thread, otherwise, you'll see it collected. A shared casted as immutable should exprience any issue. The other apporach is to give a way to explicitely say to the compiler that this will be casted as shared/immutable at some point and should be allocated on the corresponding heap. Thoses two solutions are not exclusives and can be both implemented. Maybe I'm wrong, but it doesn't seems that the issue is that big. Anyway, thoses thing have a big impact, so they should be considered several times.
Oct 05 2011
On 10/5/2011 6:25 AM, deadalnix wrote:I see two solutions to deal with this : Something allocated on a Thread local heap can be seen from other threads, and this is safe as long as a reference is kept in the allocating thread. So, if you cast something TL and mutable as immutable, you have to ensure yourself that you will not modify it. Plus, you need to ensure that you keep a reference on that object in the allocating thread, otherwise, you'll see it collected. A shared casted as immutable should exprience any issue. The other apporach is to give a way to explicitely say to the compiler that this will be casted as shared/immutable at some point and should be allocated on the corresponding heap. Thoses two solutions are not exclusives and can be both implemented. Maybe I'm wrong, but it doesn't seems that the issue is that big.Both your solutions can work, but they can be highly error prone as they rely on the programmer getting the details right, and there's little means to verify that they did get them right.
Oct 05 2011
deadalnix:This is why the type system is made for isn't it ?Casts are often the points where type systems fail :-) Bye, bearophile
Oct 04 2011
On Tue, Oct 4, 2011 at 3:52 AM, Walter Bright <newshound2 digitalmars.com> wrote:On 10/4/2011 1:22 AM, deadalnix wrote:Assuming we have to make a call to the GC when an object toggles its immutable/shared state, it seems like this whole approach would basically murder anyone doing message passing with ownership changes, because the workflow tends to be create an object -> cast to shared -> send to another thread -> cast away shared -> do work -> cast to shared... On the other hand, I guess the counterargument is that locking an uncontended lock is on the order of two instructions (or so I'm told), so casting away shared probably isn't ever necessary. It just seems somewhat counterintuitive that casting to and from shared would be slower than unnecessarily locking the object.Do you mean manage the memory that way : Shared heap -> TL pool within the shared heap -> allocation in thread from TL pool. And complete GC collect.Yes.This is a good solution do reduce contention on allocation. But a very different thing than I was initially talking about.Yes.Back to the point, Considering you have pointer to immutable from any dataset, but not the other way around, this is also valid to get a flag for it in the allocation interface. What is the issue with the compiler here ?Allocate an object, then cast it to immutable, and pass it to another thread.
Oct 04 2011
On Tue, 04 Oct 2011 10:54:58 -0400, Andrew Wiley <wiley.andrew.j gmail.com> wrote:On Tue, Oct 4, 2011 at 3:52 AM, Walter Bright <newshound2 digitalmars.com> wrote:It's entirely possible to simply allocate the memory for the object from the shared heap to start with. Then no more calls to the GC are needed.On 10/4/2011 1:22 AM, deadalnix wrote:Assuming we have to make a call to the GC when an object toggles its immutable/shared state, it seems like this whole approach would basically murder anyone doing message passing with ownership changes, because the workflow tends to be create an object -> cast to shared -> send to another thread -> cast away shared -> do work -> cast to shared... On the other hand, I guess the counterargument is that locking an uncontended lock is on the order of two instructions (or so I'm told), so casting away shared probably isn't ever necessary. It just seems somewhat counterintuitive that casting to and from shared would be slower than unnecessarily locking the object.Do you mean manage the memory that way : Shared heap -> TL pool within the shared heap -> allocation in thread from TL pool. And complete GC collect.Yes.This is a good solution do reduce contention on allocation. But a very different thing than I was initially talking about.Yes.Back to the point, Considering you have pointer to immutable from any dataset, but not the other way around, this is also valid to get a flag for it in the allocation interface. What is the issue with the compiler here ?Allocate an object, then cast it to immutable, and pass it to another thread.
Oct 04 2011
On Tue, Oct 4, 2011 at 8:59 PM, Robert Jacques <sandford jhu.edu> wrote:On Tue, 04 Oct 2011 10:54:58 -0400, Andrew Wiley <wiley.andrew.j gmail.com> wrote:When an object is created and later cast to shared, the compiler *can't* know that it should allocate from the shared heap because the cast may not be anywhere near where the object was created. The same problem goes for immutable.On Tue, Oct 4, 2011 at 3:52 AM, Walter Bright <newshound2 digitalmars.com> wrote:It's entirely possible to simply allocate the memory for the object from the shared heap to start with. Then no more calls to the GC are needed.On 10/4/2011 1:22 AM, deadalnix wrote:Assuming we have to make a call to the GC when an object toggles its immutable/shared state, it seems like this whole approach would basically murder anyone doing message passing with ownership changes, because the workflow tends to be create an object -> cast to shared -> send to another thread -> cast away shared -> do work -> cast to shared... On the other hand, I guess the counterargument is that locking an uncontended lock is on the order of two instructions (or so I'm told), so casting away shared probably isn't ever necessary. It just seems somewhat counterintuitive that casting to and from shared would be slower than unnecessarily locking the object.Do you mean manage the memory that way : Shared heap -> TL pool within the shared heap -> allocation in thread from TL pool. And complete GC collect.Yes.This is a good solution do reduce contention on allocation. But a very different thing than I was initially talking about.Yes.Back to the point, Considering you have pointer to immutable from any dataset, but not the other way around, this is also valid to get a flag for it in the allocation interface. What is the issue with the compiler here ?Allocate an object, then cast it to immutable, and pass it to another thread.
Oct 04 2011
On Tue, 04 Oct 2011 23:55:19 -0400, Andrew Wiley <wiley.andrew.j gmail.com> wrote:On Tue, Oct 4, 2011 at 8:59 PM, Robert Jacques <sandford jhu.edu> wrote:But that not the scenario being discussed. In fact, having a dangling reference, and therefore having an object mutate under you, is just as dangerous as having the GC re-use a memory block. And honestly, as a GC clear or re-use is likely to segfault early and often, its a very detectable bug. Besides, anyone attempting to do this is going to be actively managing references and/or making this a library implementation detail. In fact, they're probably using going to use unqiue!T, which would always allocate from the correct heap.On Tue, 04 Oct 2011 10:54:58 -0400, Andrew Wiley <wiley.andrew.j gmail.com> wrote:When an object is created and later cast to shared, the compiler *can't* know that it should allocate from the shared heap because the cast may not be anywhere near where the object was created. The same problem goes for immutable.On Tue, Oct 4, 2011 at 3:52 AM, Walter Bright <newshound2 digitalmars.com> wrote:It's entirely possible to simply allocate the memory for the object from the shared heap to start with. Then no more calls to the GC are needed.On 10/4/2011 1:22 AM, deadalnix wrote:Assuming we have to make a call to the GC when an object toggles its immutable/shared state, it seems like this whole approach would basically murder anyone doing message passing with ownership changes, because the workflow tends to be create an object -> cast to shared -> send to another thread -> cast away shared -> do work -> cast to shared... On the other hand, I guess the counterargument is that locking an uncontended lock is on the order of two instructions (or so I'm told), so casting away shared probably isn't ever necessary. It just seems somewhat counterintuitive that casting to and from shared would be slower than unnecessarily locking the object.Do you mean manage the memory that way : Shared heap -> TL pool within the shared heap -> allocation in thread from TL pool. And complete GC collect.Yes.This is a good solution do reduce contention on allocation. But a very different thing than I was initially talking about.Yes.Back to the point, Considering you have pointer to immutable from any dataset, but not the other way around, this is also valid to get a flag for it in the allocation interface. What is the issue with the compiler here ?Allocate an object, then cast it to immutable, and pass it to another thread.
Oct 04 2011
On Tue, Oct 4, 2011 at 10:55 PM, Andrew Wiley <wiley.andrew.j gmail.com> wrote:On Tue, Oct 4, 2011 at 8:59 PM, Robert Jacques <sandford jhu.edu> wrote:If you meant that the *user* should be responsible for making sure it's allocated on the shared heap, then yes, that's possible, but you're putting GC implementation details into the type system. That may or may not be a good thing.On Tue, 04 Oct 2011 10:54:58 -0400, Andrew Wiley <wiley.andrew.j gmail.com> wrote:When an object is created and later cast to shared, the compiler *can't* know that it should allocate from the shared heap because the cast may not be anywhere near where the object was created. The same problem goes for immutable.On Tue, Oct 4, 2011 at 3:52 AM, Walter Bright <newshound2 digitalmars.com> wrote:It's entirely possible to simply allocate the memory for the object from the shared heap to start with. Then no more calls to the GC are needed.On 10/4/2011 1:22 AM, deadalnix wrote:Assuming we have to make a call to the GC when an object toggles its immutable/shared state, it seems like this whole approach would basically murder anyone doing message passing with ownership changes, because the workflow tends to be create an object -> cast to shared -> send to another thread -> cast away shared -> do work -> cast to shared... On the other hand, I guess the counterargument is that locking an uncontended lock is on the order of two instructions (or so I'm told), so casting away shared probably isn't ever necessary. It just seems somewhat counterintuitive that casting to and from shared would be slower than unnecessarily locking the object.Do you mean manage the memory that way : Shared heap -> TL pool within the shared heap -> allocation in thread from TL pool. And complete GC collect.Yes.This is a good solution do reduce contention on allocation. But a very different thing than I was initially talking about.Yes.Back to the point, Considering you have pointer to immutable from any dataset, but not the other way around, this is also valid to get a flag for it in the allocation interface. What is the issue with the compiler here ?Allocate an object, then cast it to immutable, and pass it to another thread.
Oct 04 2011
On Tue, 04 Oct 2011 23:56:52 -0400, Andrew Wiley <wiley.andrew.j gmail.com> wrote:On Tue, Oct 4, 2011 at 10:55 PM, Andrew Wiley <wiley.andrew.j gmail.com> wrote:I would phrase it as a shift D's memory model towards NUMA. By the way, GP GPU is here to stay and it's NUMA. HPC software is cache aware, which is NUMA. And all high-end server systems are NUMA aware, to say nothing of cluster/fabric computing.On Tue, Oct 4, 2011 at 8:59 PM, Robert Jacques <sandford jhu.edu> wrote:If you meant that the *user* should be responsible for making sure it's allocated on the shared heap, then yes, that's possible, but you're putting GC implementation details into the type system. That may or may not be a good thing.On Tue, 04 Oct 2011 10:54:58 -0400, Andrew Wiley <wiley.andrew.j gmail.com> wrote:When an object is created and later cast to shared, the compiler *can't* know that it should allocate from the shared heap because the cast may not be anywhere near where the object was created. The same problem goes for immutable.On Tue, Oct 4, 2011 at 3:52 AM, Walter Bright <newshound2 digitalmars.com> wrote:It's entirely possible to simply allocate the memory for the object from the shared heap to start with. Then no more calls to the GC are needed.On 10/4/2011 1:22 AM, deadalnix wrote:Assuming we have to make a call to the GC when an object toggles its immutable/shared state, it seems like this whole approach would basically murder anyone doing message passing with ownership changes, because the workflow tends to be create an object -> cast to shared -> send to another thread -> cast away shared -> do work -> cast to shared... On the other hand, I guess the counterargument is that locking an uncontended lock is on the order of two instructions (or so I'm told), so casting away shared probably isn't ever necessary. It just seems somewhat counterintuitive that casting to and from shared would be slower than unnecessarily locking the object.Do you mean manage the memory that way : Shared heap -> TL pool within the shared heap -> allocation in thread from TL pool. And complete GC collect.Yes.This is a good solution do reduce contention on allocation. But a very different thing than I was initially talking about.Yes.Back to the point, Considering you have pointer to immutable from any dataset, but not the other way around, this is also valid to get a flag for it in the allocation interface. What is the issue with the compiler here ?Allocate an object, then cast it to immutable, and pass it to another thread.
Oct 04 2011
Maybe the correct approach is simply to try and eliminate the mutex protecti= ng GC operations so allocations can be performed concurrently by multiple th= reads? Sent from my iPhone On Oct 4, 2011, at 8:55 PM, Andrew Wiley <wiley.andrew.j gmail.com> wrote:On Tue, Oct 4, 2011 at 8:59 PM, Robert Jacques <sandford jhu.edu> wrote:m>On Tue, 04 Oct 2011 10:54:58 -0400, Andrew Wiley <wiley.andrew.j gmail.co=wrote: =20On Tue, Oct 4, 2011 at 3:52 AM, Walter Bright <newshound2 digitalmars.com> wrote:=20 On 10/4/2011 1:22 AM, deadalnix wrote:=20 Do you mean manage the memory that way : Shared heap -> TL pool within the shared heap -> allocation in thread from TL pool. =20 And complete GC collect.=20 Yes. =20 =20This is a good solution do reduce contention on allocation. But a very=edifferent thing than I was initially talking about.=20 Yes. =20 =20Back to the point, =20 Considering you have pointer to immutable from any dataset, but not th=he=20 It's entirely possible to simply allocate the memory for the object from t==20 Assuming we have to make a call to the GC when an object toggles its immutable/shared state, it seems like this whole approach would basically murder anyone doing message passing with ownership changes, because the workflow tends to be create an object -> cast to shared -> send to another thread -> cast away shared -> do work -> cast to shared... On the other hand, I guess the counterargument is that locking an uncontended lock is on the order of two instructions (or so I'm told), so casting away shared probably isn't ever necessary. It just seems somewhat counterintuitive that casting to and from shared would be slower than unnecessarily locking the object. =20other way around, this is also valid to get a flag for it in the allocation interface. =20 What is the issue with the compiler here ?=20 Allocate an object, then cast it to immutable, and pass it to another thread. =20shared heap to start with. Then no more calls to the GC are needed. =20=20 When an object is created and later cast to shared, the compiler *can't* know that it should allocate from the shared heap because the cast may not be anywhere near where the object was created. The same problem goes for immutable.
Oct 05 2011
On Mon, 03 Oct 2011 15:48:57 -0400, deadalnix <deadalnix gmail.com> wrote: [snip]What I suggest in add a flag SHARED in BlkAttr and store it as an attribute of the block. Later modification could be made according to this flag. This attribute shouldn't be modifiable later on. What do you think ? Is it something it worth working on ? If it is, how can I help ?I've been a proponent of thread-local garbage collection, so naturally I think it's a good idea :) There are some GCs specifically tailored for immutable data, so I'd probably wish to add separate SHARED and IMMUTABLE flags. On the con side, the major issue with thread-local GCs is that currently we don't have good ways of building shared and immutable data. This leads to people building data with mutable structures and casting at the end. Now the issue with shared, is mostly a quality of implementation issue. However, building immutable data structures efficiently requires a unique (aka. mobile) storage type, which we'll probably get at the same time D gets an ownership type system. That is to say, no time in the foreseeable future. That said, there are are mitigating factors. First, by far the most common example of the build & cast pattern involves string/array building; a task appender addresses in spades. Second, std.allocators could be used to determine which heap to allocate from. Third, we could op to be able to switch the GC from thread-local to shared mode and visa versa; the idea being that inside an object building routine, all allocations would be casted to immutable/shared and thus the local heap should be bypassed. As for how can you help, I'd suggest building a thread local gc, following the design of the recent discussion on std.allocators, if you're up to it.
Oct 03 2011
Le 04/10/2011 08:02, Robert Jacques a écrit :On Mon, 03 Oct 2011 15:48:57 -0400, deadalnix <deadalnix gmail.com> wrote: [snip]The GC switch you suggest doesn't take into account all cases. you cannot get something work without shared GC. The thing is that shared data must be collected with a shared collection cycle. But most of data aren't.What I suggest in add a flag SHARED in BlkAttr and store it as an attribute of the block. Later modification could be made according to this flag. This attribute shouldn't be modifiable later on. What do you think ? Is it something it worth working on ? If it is, how can I help ?I've been a proponent of thread-local garbage collection, so naturally I think it's a good idea :) There are some GCs specifically tailored for immutable data, so I'd probably wish to add separate SHARED and IMMUTABLE flags. On the con side, the major issue with thread-local GCs is that currently we don't have good ways of building shared and immutable data. This leads to people building data with mutable structures and casting at the end. Now the issue with shared, is mostly a quality of implementation issue. However, building immutable data structures efficiently requires a unique (aka. mobile) storage type, which we'll probably get at the same time D gets an ownership type system. That is to say, no time in the foreseeable future. That said, there are are mitigating factors. First, by far the most common example of the build & cast pattern involves string/array building; a task appender addresses in spades. Second, std.allocators could be used to determine which heap to allocate from. Third, we could op to be able to switch the GC from thread-local to shared mode and visa versa; the idea being that inside an object building routine, all allocations would be casted to immutable/shared and thus the local heap should be bypassed. As for how can you help, I'd suggest building a thread local gc, following the design of the recent discussion on std.allocators, if you're up to it.
Oct 04 2011
On Tue, 04 Oct 2011 17:50:03 -0400, deadalnix <deadalnix gmail.com> wrote:Le 04/10/2011 08:02, Robert Jacques a écrit :Well, our current GC behaves as a shared GC.On Mon, 03 Oct 2011 15:48:57 -0400, deadalnix <deadalnix gmail.com> wrote: [snip]The GC switch you suggest doesn't take into account all cases. you cannot get something work without shared GC. The thing is that shared data must be collected with a shared collection cycle. But most of data aren't.What I suggest in add a flag SHARED in BlkAttr and store it as an attribute of the block. Later modification could be made according to this flag. This attribute shouldn't be modifiable later on. What do you think ? Is it something it worth working on ? If it is, how can I help ?I've been a proponent of thread-local garbage collection, so naturally I think it's a good idea :) There are some GCs specifically tailored for immutable data, so I'd probably wish to add separate SHARED and IMMUTABLE flags. On the con side, the major issue with thread-local GCs is that currently we don't have good ways of building shared and immutable data. This leads to people building data with mutable structures and casting at the end. Now the issue with shared, is mostly a quality of implementation issue. However, building immutable data structures efficiently requires a unique (aka. mobile) storage type, which we'll probably get at the same time D gets an ownership type system. That is to say, no time in the foreseeable future. That said, there are are mitigating factors. First, by far the most common example of the build & cast pattern involves string/array building; a task appender addresses in spades. Second, std.allocators could be used to determine which heap to allocate from. Third, we could op to be able to switch the GC from thread-local to shared mode and visa versa; the idea being that inside an object building routine, all allocations would be casted to immutable/shared and thus the local heap should be bypassed. As for how can you help, I'd suggest building a thread local gc, following the design of the recent discussion on std.allocators, if you're up to it.
Oct 04 2011