digitalmars.D - Thread local and memory allocation

deadalnix (30/30) Oct 03 2011 D's uses thread local storage for most of its data. And it's a good thin...

Sean Kelly (34/51) Oct 03 2011 has no way to handle it in the future as things are specified.

bearophile (6/7) Oct 03 2011 (I am ignorant still about such issues, so I usually keep myself quiet a...
deadalnix (10/13) Oct 03 2011 Yes, I was thinking is such a thing. Each thread has a local heap and
Jason House (2/3) Oct 03 2011 Why not run the collection for a single thread in the thread being colle...

Sean Kelly (12/17) Oct 03 2011 is that when the GC collects memory, the thread that finalizes =

Jason House (3/11) Oct 03 2011 The world can't be stopped when finalizers run or the app can deadlock. ...

Walter Bright (4/6) Oct 03 2011 It is a great idea, and it has been discussed before. The difficulties a...

Sean Kelly (11/16) Oct 03 2011 are when thread local allocated data gets shared with other threads, =

Walter Bright (6/13) Oct 03 2011 Right. The current language allows no way to determine in advance if an

deadalnix (13/31) Oct 04 2011 Do you mean manage the memory that way :

Walter Bright (4/13) Oct 04 2011 Yes.

deadalnix (6/8) Oct 04 2011 That is explicitly said to be unsafe on D's website. As long as a

Walter Bright (9/16) Oct 04 2011 Unsafe doesn't mean "undefined behavior". It just means the compiler can...

deadalnix (19/27) Oct 05 2011 This looks like more a flaw in the type system or lack of tools to deal

Walter Bright (4/17) Oct 05 2011 Both your solutions can work, but they can be highly error prone as they...

bearophile (4/5) Oct 04 2011 Casts are often the points where type systems fail :-)

Andrew Wiley (13/35) Oct 04 2011 Assuming we have to make a call to the GC when an object toggles its

Robert Jacques (2/45) Oct 04 2011 It's entirely possible to simply allocate the memory for the object from...

Andrew Wiley (5/56) Oct 04 2011 When an object is created and later cast to shared, the compiler

Robert Jacques (2/61) Oct 04 2011 But that not the scenario being discussed. In fact, having a dangling re...

Andrew Wiley (5/64) Oct 04 2011 If you meant that the *user* should be responsible for making sure

Robert Jacques (2/68) Oct 04 2011 I would phrase it as a shift D's memory model towards NUMA. By the way, ...

Sean Kelly (8/68) Oct 05 2011 Maybe the correct approach is simply to try and eliminate the mutex prot...

Robert Jacques (7/12) Oct 03 2011 I've been a proponent of thread-local garbage collection, so naturally I...

deadalnix (5/35) Oct 04 2011 The GC switch you suggest doesn't take into account all cases. you

Robert Jacques (2/43) Oct 04 2011 Well, our current GC behaves as a shared GC.

deadalnix <deadalnix gmail.com> writes:

D's uses thread local storage for most of its data. And it's a good thing.

However, the allocation mecanism isn't aware of it. In addition, it has 
no way to handle it in the future as things are specified.

As long as you don't have any pointer in shared memory to thread local 
data (thank to the type system) so this is something GC could use at his 
own advantage.

As long as good pratice should minimize as much as possible the usage of 
shared data, this design choice make things worse for good design, which 
is, IMO, not D's phylosophy.

The advantages of handling this at memory management levels are the 
followings :
- Swap friendlyness. Data of a given thread can be located in blocks, so 
an idle thread can be swapped easily without huge penality on 
performance. Anyone who have used chrome and firefox with a lots of tabs 
on a machine with limited memory know what I'm talking about : firefox 
uses less memory than whrome, but performance are terrible anyway, 
because chrome memory layout is more cache friendly (tabs memory isn't 
mixed with each others).
- Effisciency in heavily multithreaded application like servers : the 
more thread run in the program, the more a stop the world GC is costly. 
As long as good design imply separate data from thread as much as 
possible, a thread local collection can be triggered at time without 
stopping other threads.

Even is thoses improvements are not implemented yet and anytime soon, it 
kinda sad that the current interface doesn't allow for this.

What I suggest in add a flag SHARED in BlkAttr and store it as an 
attribute of the block. Later modification could be made according to 
this flag. This attribute shouldn't be modifiable later on.

What do you think ? Is it something it worth working on ? If it is, how 
can I help ?

Oct 03 2011

Sean Kelly <sean invisibleduck.org> writes:

On Oct 3, 2011, at 12:48 PM, deadalnix wrote:

 D's uses thread local storage for most of its data. And it's a good =

thing.
=20
 However, the allocation mecanism isn't aware of it. In addition, it =

has no way to handle it in the future as things are specified.
=20
 As long as you don't have any pointer in shared memory to thread local =

data (thank to the type system) so this is something GC could use at his =
own advantage.
=20
 As long as good pratice should minimize as much as possible the usage =

of shared data, this design choice make things worse for good design, =
which is, IMO, not D's phylosophy.
=20
 The advantages of handling this at memory management levels are the =

followings :
 - Swap friendlyness. Data of a given thread can be located in blocks, =

so an idle thread can be swapped easily without huge penality on =
performance. Anyone who have used chrome and firefox with a lots of tabs =
on a machine with limited memory know what I'm talking about : firefox =
uses less memory than whrome, but performance are terrible anyway, =
because chrome memory layout is more cache friendly (tabs memory isn't =
mixed with each others).
 - Effisciency in heavily multithreaded application like servers : the =

more thread run in the program, the more a stop the world GC is costly. =
As long as good design imply separate data from thread as much as =
possible, a thread local collection can be triggered at time without =
stopping other threads.
=20
 Even is thoses improvements are not implemented yet and anytime soon, =

it kinda sad that the current interface doesn't allow for this.
=20
 What I suggest in add a flag SHARED in BlkAttr and store it as an =

attribute of the block. Later modification could be made according to =
this flag. This attribute shouldn't be modifiable later on.
=20
 What do you think ? Is it something it worth working on ? If it is, =

how can I help ?

There's another important issue that hasn't yet been addressed, which is =
that when the GC collects memory, the thread that finalizes non-shared =
data should be the one that created it.  So that SHARED flag should =
really be a thread-id of some sort.  Alternately, each thread could =
allocate from its own pool, with shared allocations coming from a common =
pool.  This would allow the lock granularity to be reduced and in some =
cases eliminated.

I'd like to move to CDGC as an intermediate step, and that will need =
some testing and polish.  That would allow for precise collections if =
the compiler support is added.  Then the thread-local finalization has =
to be tackled one way or another.  I'd favor per-thread heaps but am =
open to suggestions and/or help.=

Oct 03 2011

bearophile <bearophileHUGS lycos.com> writes:

Sean Kelly:

 I'd favor per-thread heaps but am open to suggestions and/or help.

(I am ignorant still about such issues, so I usually keep myself quiet about
them. Please forgive me if I am saying stupid things.)
The memory today is organized like a tree, the larger memories are slower, and
the far the memory is, the more costly it is to move data across, and to keep
coherence across two pieces of data that are supposed to be the "same" data.
If I have a CPU with 4 cores, each core has hyper-threading, and each pair of
cores has its L2 cache, then I think it is better for your code to use 2 heaps
(one heap for each L2 cache). If future CPUs will have even more cores, with
totally independent local memory, then this memory has to correspond to a
different heap.

Bye,
bearophile

Oct 03 2011

deadalnix <deadalnix gmail.com> writes:

Yes, I was thinking is such a thing. Each thread has a local heap and 
you have a common shared heap too, with shared data in it.

Is such a case, the flag is suffiscient because then the GC could handle 
that and trigger thread local heap allocation instead of shared one.

This is consistent with the swap friendliness I was talking about and 
can reduce the need of synchronization when allocating memory (a lock 
will only occur if the GC doesn't have any memory left in his pool for 
the given thread).

And solve finalization's thread, yes.

Le 03/10/2011 22:54, Sean Kelly a �crit :
 On Oct 3, 2011, at 12:48 PM, deadalnix wrote:

 There's another important issue that hasn't yet been addressed, which is that
when the GC collects memory, the thread that finalizes non-shared data should
be the one that created it.  So that SHARED flag should really be a thread-id
of some sort.  Alternately, each thread could allocate from its own pool, with
shared allocations coming from a common pool.  This would allow the lock
granularity to be reduced and in some cases eliminated.

 I'd like to move to CDGC as an intermediate step, and that will need some
testing and polish.  That would allow for precise collections if the compiler
support is added.  Then the thread-local finalization has to be tackled one way
or another.  I'd favor per-thread heaps but am open to suggestions and/or help.

Oct 03 2011

Jason House <jason.james.house gmail.com> writes:

Sean Kelly Wrote:
 There's another important issue that hasn't yet been addressed, which is that
when the GC collects memory, the thread that finalizes non-shared data should
be the one that created it.  So that SHARED flag should really be a thread-id
of some sort.  Alternately, each thread could allocate from its own pool, with
shared allocations coming from a common pool.  This would allow the lock
granularity to be reduced and in some cases eliminated.


Why not run the collection for a single thread in the thread being collected?
It's a simple way to force where the finalizer runs. It's a big step up from
stop-the world collections, but still requires pauses.

Oct 03 2011

Sean Kelly <sean invisibleduck.org> writes:

On Oct 3, 2011, at 3:27 PM, Jason House wrote:

 Sean Kelly Wrote:
 There's another important issue that hasn't yet been addressed, which =


is that when the GC collects memory, the thread that finalizes =
non-shared data should be the one that created it.  So that SHARED flag =
should really be a thread-id of some sort.  Alternately, each thread =
could allocate from its own pool, with shared allocations coming from a =
common pool.  This would allow the lock granularity to be reduced and in =
some cases eliminated.
=20
=20
 Why not run the collection for a single thread in the thread being =

collected? It's a simple way to force where the finalizer runs. It's a =
big step up from stop-the world collections, but still requires pauses.

The world can't be stopped when finalizers run or the app can deadlock.  =
So the only correct behavior is to have the creator of a TLS block be =
the one to finalize it.=

Oct 03 2011

Jason House <jason.james.house gmail.com> writes:

Sean Kelly Wrote:

 On Oct 3, 2011, at 3:27 PM, Jason House wrote:
 
 Sean Kelly Wrote:
 There's another important issue that hasn't yet been addressed, which is that
when the GC collects memory, the thread that finalizes non-shared data should
be the one that created it.  So that SHARED flag should really be a thread-id
of some sort.  Alternately, each thread could allocate from its own pool, with
shared allocations coming from a common pool.  This would allow the lock
granularity to be reduced and in some cases eliminated.

 
 
 Why not run the collection for a single thread in the thread being collected?
It's a simple way to force where the finalizer runs. It's a big step up from
stop-the world collections, but still requires pauses.

 

The world can't be stopped when finalizers run or the app can deadlock.  So the
only correct behavior is to have the creator of a TLS block be the one to
finalize it.

If only one thread is stopped, how will a deadlock occur? If it's a deadlock
due to new allocations, doesn't the current GC already handle that?

Oct 03 2011

Walter Bright <newshound2 digitalmars.com> writes:

On 10/3/2011 12:48 PM, deadalnix wrote:
 What do you think ? Is it something it worth working on ? If it is, how can I
 help ?

It is a great idea, and it has been discussed before. The difficulties are when 
thread local allocated data gets shared with other threads, like for instance 
immutable data that is implicitly shareable.

Oct 03 2011

Sean Kelly <sean invisibleduck.org> writes:

On Oct 3, 2011, at 3:55 PM, Walter Bright wrote:

 On 10/3/2011 12:48 PM, deadalnix wrote:
 What do you think ? Is it something it worth working on ? If it is, =


how can I
 help ?

=20
 It is a great idea, and it has been discussed before. The difficulties =

are when thread local allocated data gets shared with other threads, =
like for instance immutable data that is implicitly shareable.

Immutable data would have to be allocated on the shared heap as well, =
which means the contention for the shared heap may actually be fairly =
significant.  But the alternatives are all too complex (migrating =
immutable data from local pools to a common pool when a thread =
terminates, etc).  There's also the problem of transferring knowledge of =
whether something is immutable into the allocation routine.  As things =
stand, I don't believe that type info is available.=

Oct 03 2011

Walter Bright <newshound2 digitalmars.com> writes:

On 10/3/2011 4:20 PM, Sean Kelly wrote:
 Immutable data would have to be allocated on the shared heap as well, which
 means the contention for the shared heap may actually be fairly significant.
 But the alternatives are all too complex (migrating immutable data from local
 pools to a common pool when a thread terminates, etc).  There's also the
 problem of transferring knowledge of whether something is immutable into the
 allocation routine.  As things stand, I don't believe that type info is
 available.

Right. The current language allows no way to determine in advance if an 
allocation will be eventually made immutable (or shared) or not.

However, if the gc used thread local pools to do the allocation from (not the 
collection), the gc would go faster because it wouldn't need locking to
allocate 
from those pools. This change can happen without any language or compiler
changes.

Oct 03 2011

deadalnix <deadalnix gmail.com> writes:

Le 04/10/2011 02:15, Walter Bright a �crit :
 On 10/3/2011 4:20 PM, Sean Kelly wrote:
 Immutable data would have to be allocated on the shared heap as well,
 which
 means the contention for the shared heap may actually be fairly
 significant.
 But the alternatives are all too complex (migrating immutable data
 from local
 pools to a common pool when a thread terminates, etc). There's also the
 problem of transferring knowledge of whether something is immutable
 into the
 allocation routine. As things stand, I don't believe that type info is
 available.

 Right. The current language allows no way to determine in advance if an
 allocation will be eventually made immutable (or shared) or not.

 However, if the gc used thread local pools to do the allocation from
 (not the collection), the gc would go faster because it wouldn't need
 locking to allocate from those pools. This change can happen without any
 language or compiler changes.

Do you mean manage the memory that way :
Shared heap -> TL pool within the shared heap -> allocation in thread 
from TL pool.

And complete GC collect.

This is a good solution do reduce contention on allocation. But a very 
different thing than I was initially talking about.

*******

Back to the point,

Considering you have pointer to immutable from any dataset, but not the 
other way around, this is also valid to get a flag for it in the 
allocation interface.

What is the issue with the compiler here ?

Oct 04 2011

Walter Bright <newshound2 digitalmars.com> writes:

On 10/4/2011 1:22 AM, deadalnix wrote:
 Do you mean manage the memory that way :
 Shared heap -> TL pool within the shared heap -> allocation in thread from TL
pool.

 And complete GC collect.

Yes.


 This is a good solution do reduce contention on allocation. But a very
different
 thing than I was initially talking about.

Yes.


 Back to the point,

 Considering you have pointer to immutable from any dataset, but not the other
 way around, this is also valid to get a flag for it in the allocation
interface.

 What is the issue with the compiler here ?

Allocate an object, then cast it to immutable, and pass it to another thread.

Oct 04 2011

deadalnix <deadalnix gmail.com> writes:

Le 04/10/2011 10:52, Walter Bright a �crit :
 Allocate an object, then cast it to immutable, and pass it to another
 thread.

That is explicitly said to be unsafe on D's website. As long as a 
reference exist in the creating thread, this should work, but if thoses 
references disapears, you'll end up with memory corruption.

This is why the type system is made for isn't it ? And if you decide to 
do funky stuff bypassing it, unsafe things can happen.

Oct 04 2011

Walter Bright <newshound2 digitalmars.com> writes:

On 10/4/2011 2:32 AM, deadalnix wrote:
 Le 04/10/2011 10:52, Walter Bright a �crit :
 Allocate an object, then cast it to immutable, and pass it to another
 thread.

 That is explicitly said to be unsafe on D's website. As long as a reference
 exist in the creating thread, this should work, but if thoses references
 disapears, you'll end up with memory corruption.

Unsafe doesn't mean "undefined behavior". It just means the compiler cannot 
guarantee that you did it correctly. If you do it correctly, it still should
work.

On the other hand, "undefined behavior" cannot be done correctly.

With casts to immutable, it is perfectly correct if you, the user, ensure that 
there are no other mutable references to the same data. It's just that the 
compiler itself cannot make this guarantee, hence it's "unsafe".

Casting from immutable to mutable, on the other hand, is "undefined behavior" 
because neither the compiler nor you, the user, can guarantee it will work.

Oct 04 2011

deadalnix <deadalnix gmail.com> writes:

Le 04/10/2011 20:30, Walter Bright a �crit :
 On 10/4/2011 2:32 AM, deadalnix wrote:
 With casts to immutable, it is perfectly correct if you, the user,
 ensure that there are no other mutable references to the same data. It's
 just that the compiler itself cannot make this guarantee, hence it's
 "unsafe".

 Casting from immutable to mutable, on the other hand, is "undefined
 behavior" because neither the compiler nor you, the user, can guarantee
 it will work.

This looks like more a flaw in the type system or lack of tools to deal 
with the type system than a real allocation issue.

I see two solutions to deal with this :

Something allocated on a Thread local heap can be seen from other 
threads, and this is safe as long as a reference is kept in the 
allocating thread.

So, if you cast something TL and mutable as immutable, you have to 
ensure yourself that you will not modify it. Plus, you need to ensure 
that you keep a reference on that object in the allocating thread, 
otherwise, you'll see it collected.

A shared casted as immutable should exprience any issue.

The other apporach is to give a way to explicitely say to the compiler 
that this will be casted as shared/immutable at some point and should be 
allocated on the corresponding heap.

Thoses two solutions are not exclusives and can be both implemented. 
Maybe I'm wrong, but it doesn't seems that the issue is that big.

Anyway, thoses thing have a big impact, so they should be considered 
several times.

Oct 05 2011

Walter Bright <newshound2 digitalmars.com> writes:

On 10/5/2011 6:25 AM, deadalnix wrote:
 I see two solutions to deal with this :

 Something allocated on a Thread local heap can be seen from other threads, and
 this is safe as long as a reference is kept in the allocating thread.

 So, if you cast something TL and mutable as immutable, you have to ensure
 yourself that you will not modify it. Plus, you need to ensure that you keep a
 reference on that object in the allocating thread, otherwise, you'll see it
 collected.

 A shared casted as immutable should exprience any issue.

 The other apporach is to give a way to explicitely say to the compiler that
this
 will be casted as shared/immutable at some point and should be allocated on the
 corresponding heap.

 Thoses two solutions are not exclusives and can be both implemented. Maybe I'm
 wrong, but it doesn't seems that the issue is that big.

Both your solutions can work, but they can be highly error prone as they rely
on 
the programmer getting the details right, and there's little means to verify 
that they did get them right.

Oct 05 2011

bearophile <bearophileHUGS lycos.com> writes:

deadalnix:

 This is why the type system is made for isn't it ?

Casts are often the points where type systems fail :-)

Bye,
bearophile

Oct 04 2011

Andrew Wiley <wiley.andrew.j gmail.com> writes:

On Tue, Oct 4, 2011 at 3:52 AM, Walter Bright
<newshound2 digitalmars.com> wrote:
 On 10/4/2011 1:22 AM, deadalnix wrote:
 Do you mean manage the memory that way :
 Shared heap -> TL pool within the shared heap -> allocation in thread from
 TL pool.

 And complete GC collect.

 Yes.


 This is a good solution do reduce contention on allocation. But a very
 different
 thing than I was initially talking about.

 Yes.


 Back to the point,

 Considering you have pointer to immutable from any dataset, but not the
 other
 way around, this is also valid to get a flag for it in the allocation
 interface.

 What is the issue with the compiler here ?

 Allocate an object, then cast it to immutable, and pass it to another
 thread.

Assuming we have to make a call to the GC when an object toggles its
immutable/shared state, it seems like this whole approach would
basically murder anyone doing message passing with ownership changes,
because the workflow tends to be create an object -> cast to shared ->
send to another thread -> cast away shared -> do work -> cast to
shared...
On the other hand, I guess the counterargument is that locking an
uncontended lock is on the order of two instructions (or so I'm told),
so casting away shared probably isn't ever necessary. It just seems
somewhat counterintuitive that casting to and from shared would be
slower than unnecessarily locking the object.

Oct 04 2011

"Robert Jacques" <sandford jhu.edu> writes:

On Tue, 04 Oct 2011 10:54:58 -0400, Andrew Wiley <wiley.andrew.j gmail.com>
wrote:

 On Tue, Oct 4, 2011 at 3:52 AM, Walter Bright
 <newshound2 digitalmars.com> wrote:
 On 10/4/2011 1:22 AM, deadalnix wrote:
 Do you mean manage the memory that way :
 Shared heap -> TL pool within the shared heap -> allocation in thread from
 TL pool.

 And complete GC collect.

 Yes.


 This is a good solution do reduce contention on allocation. But a very
 different
 thing than I was initially talking about.

 Yes.


 Back to the point,

 Considering you have pointer to immutable from any dataset, but not the
 other
 way around, this is also valid to get a flag for it in the allocation
 interface.

 What is the issue with the compiler here ?

 Allocate an object, then cast it to immutable, and pass it to another
 thread.

 Assuming we have to make a call to the GC when an object toggles its
 immutable/shared state, it seems like this whole approach would
 basically murder anyone doing message passing with ownership changes,
 because the workflow tends to be create an object -> cast to shared ->
 send to another thread -> cast away shared -> do work -> cast to
 shared...
 On the other hand, I guess the counterargument is that locking an
 uncontended lock is on the order of two instructions (or so I'm told),
 so casting away shared probably isn't ever necessary. It just seems
 somewhat counterintuitive that casting to and from shared would be
 slower than unnecessarily locking the object.

It's entirely possible to simply allocate the memory for the object from the
shared heap to start with. Then no more calls to the GC are needed.

Oct 04 2011

Andrew Wiley <wiley.andrew.j gmail.com> writes:

On Tue, Oct 4, 2011 at 8:59 PM, Robert Jacques <sandford jhu.edu> wrote:
 On Tue, 04 Oct 2011 10:54:58 -0400, Andrew Wiley <wiley.andrew.j gmail.com>
 wrote:

 On Tue, Oct 4, 2011 at 3:52 AM, Walter Bright
 <newshound2 digitalmars.com> wrote:
 On 10/4/2011 1:22 AM, deadalnix wrote:
 Do you mean manage the memory that way :
 Shared heap -> TL pool within the shared heap -> allocation in thread
 from
 TL pool.

 And complete GC collect.

 Yes.


 This is a good solution do reduce contention on allocation. But a very
 different
 thing than I was initially talking about.

 Yes.


 Back to the point,

 Considering you have pointer to immutable from any dataset, but not the
 other
 way around, this is also valid to get a flag for it in the allocation
 interface.

 What is the issue with the compiler here ?

 Allocate an object, then cast it to immutable, and pass it to another
 thread.

 Assuming we have to make a call to the GC when an object toggles its
 immutable/shared state, it seems like this whole approach would
 basically murder anyone doing message passing with ownership changes,
 because the workflow tends to be create an object -> cast to shared ->
 send to another thread -> cast away shared -> do work -> cast to
 shared...
 On the other hand, I guess the counterargument is that locking an
 uncontended lock is on the order of two instructions (or so I'm told),
 so casting away shared probably isn't ever necessary. It just seems
 somewhat counterintuitive that casting to and from shared would be
 slower than unnecessarily locking the object.

 It's entirely possible to simply allocate the memory for the object from the
 shared heap to start with. Then no more calls to the GC are needed.

When an object is created and later cast to shared, the compiler
*can't* know that it should allocate from the shared heap because the
cast may not be anywhere near where the object was created. The same
problem goes for immutable.

Oct 04 2011

"Robert Jacques" <sandford jhu.edu> writes:

On Tue, 04 Oct 2011 23:55:19 -0400, Andrew Wiley <wiley.andrew.j gmail.com>
wrote:

 On Tue, Oct 4, 2011 at 8:59 PM, Robert Jacques <sandford jhu.edu> wrote:
 On Tue, 04 Oct 2011 10:54:58 -0400, Andrew Wiley <wiley.andrew.j gmail.com>
 wrote:

 On Tue, Oct 4, 2011 at 3:52 AM, Walter Bright
 <newshound2 digitalmars.com> wrote:
 On 10/4/2011 1:22 AM, deadalnix wrote:
 Do you mean manage the memory that way :
 Shared heap -> TL pool within the shared heap -> allocation in thread
 from
 TL pool.

 And complete GC collect.

 Yes.


 This is a good solution do reduce contention on allocation. But a very
 different
 thing than I was initially talking about.

 Yes.


 Back to the point,

 Considering you have pointer to immutable from any dataset, but not the
 other
 way around, this is also valid to get a flag for it in the allocation
 interface.

 What is the issue with the compiler here ?

 Allocate an object, then cast it to immutable, and pass it to another
 thread.

 Assuming we have to make a call to the GC when an object toggles its
 immutable/shared state, it seems like this whole approach would
 basically murder anyone doing message passing with ownership changes,
 because the workflow tends to be create an object -> cast to shared ->
 send to another thread -> cast away shared -> do work -> cast to
 shared...
 On the other hand, I guess the counterargument is that locking an
 uncontended lock is on the order of two instructions (or so I'm told),
 so casting away shared probably isn't ever necessary. It just seems
 somewhat counterintuitive that casting to and from shared would be
 slower than unnecessarily locking the object.

 It's entirely possible to simply allocate the memory for the object from the
 shared heap to start with. Then no more calls to the GC are needed.

 When an object is created and later cast to shared, the compiler
 *can't* know that it should allocate from the shared heap because the
 cast may not be anywhere near where the object was created. The same
 problem goes for immutable.

But that not the scenario being discussed. In fact, having a dangling
reference, and therefore having an object mutate under you, is just as
dangerous as having the GC re-use a memory block. And honestly, as a GC clear
or re-use is likely to segfault early and often, its a very detectable bug.
Besides, anyone attempting to do this is going to be actively managing
references and/or making this a library implementation detail. In fact, they're
probably using going to use unqiue!T, which would always allocate from the
correct heap.

Oct 04 2011

Andrew Wiley <wiley.andrew.j gmail.com> writes:

On Tue, Oct 4, 2011 at 10:55 PM, Andrew Wiley <wiley.andrew.j gmail.com> wrote:
 On Tue, Oct 4, 2011 at 8:59 PM, Robert Jacques <sandford jhu.edu> wrote:
 On Tue, 04 Oct 2011 10:54:58 -0400, Andrew Wiley <wiley.andrew.j gmail.com>
 wrote:

 On Tue, Oct 4, 2011 at 3:52 AM, Walter Bright
 <newshound2 digitalmars.com> wrote:
 On 10/4/2011 1:22 AM, deadalnix wrote:
 Do you mean manage the memory that way :
 Shared heap -> TL pool within the shared heap -> allocation in thread
 from
 TL pool.

 And complete GC collect.

 Yes.


 This is a good solution do reduce contention on allocation. But a very
 different
 thing than I was initially talking about.

 Yes.


 Back to the point,

 Considering you have pointer to immutable from any dataset, but not the
 other
 way around, this is also valid to get a flag for it in the allocation
 interface.

 What is the issue with the compiler here ?

 Allocate an object, then cast it to immutable, and pass it to another
 thread.

 Assuming we have to make a call to the GC when an object toggles its
 immutable/shared state, it seems like this whole approach would
 basically murder anyone doing message passing with ownership changes,
 because the workflow tends to be create an object -> cast to shared ->
 send to another thread -> cast away shared -> do work -> cast to
 shared...
 On the other hand, I guess the counterargument is that locking an
 uncontended lock is on the order of two instructions (or so I'm told),
 so casting away shared probably isn't ever necessary. It just seems
 somewhat counterintuitive that casting to and from shared would be
 slower than unnecessarily locking the object.

 It's entirely possible to simply allocate the memory for the object from the
 shared heap to start with. Then no more calls to the GC are needed.

 When an object is created and later cast to shared, the compiler
 *can't* know that it should allocate from the shared heap because the
 cast may not be anywhere near where the object was created. The same
 problem goes for immutable.

If you meant that the *user* should be responsible for making sure
it's allocated on the shared heap, then yes, that's possible, but
you're putting GC implementation details into the type system. That
may or may not be a good thing.

Oct 04 2011

"Robert Jacques" <sandford jhu.edu> writes:

On Tue, 04 Oct 2011 23:56:52 -0400, Andrew Wiley <wiley.andrew.j gmail.com>
wrote:

 On Tue, Oct 4, 2011 at 10:55 PM, Andrew Wiley <wiley.andrew.j gmail.com> wrote:
 On Tue, Oct 4, 2011 at 8:59 PM, Robert Jacques <sandford jhu.edu> wrote:
 On Tue, 04 Oct 2011 10:54:58 -0400, Andrew Wiley <wiley.andrew.j gmail.com>
 wrote:

 On Tue, Oct 4, 2011 at 3:52 AM, Walter Bright
 <newshound2 digitalmars.com> wrote:
 On 10/4/2011 1:22 AM, deadalnix wrote:
 Do you mean manage the memory that way :
 Shared heap -> TL pool within the shared heap -> allocation in thread
 from
 TL pool.

 And complete GC collect.

 Yes.


 This is a good solution do reduce contention on allocation. But a very
 different
 thing than I was initially talking about.

 Yes.


 Back to the point,

 Considering you have pointer to immutable from any dataset, but not the
 other
 way around, this is also valid to get a flag for it in the allocation
 interface.

 What is the issue with the compiler here ?

 Allocate an object, then cast it to immutable, and pass it to another
 thread.

 Assuming we have to make a call to the GC when an object toggles its
 immutable/shared state, it seems like this whole approach would
 basically murder anyone doing message passing with ownership changes,
 because the workflow tends to be create an object -> cast to shared ->
 send to another thread -> cast away shared -> do work -> cast to
 shared...
 On the other hand, I guess the counterargument is that locking an
 uncontended lock is on the order of two instructions (or so I'm told),
 so casting away shared probably isn't ever necessary. It just seems
 somewhat counterintuitive that casting to and from shared would be
 slower than unnecessarily locking the object.

 It's entirely possible to simply allocate the memory for the object from the
 shared heap to start with. Then no more calls to the GC are needed.

 When an object is created and later cast to shared, the compiler
 *can't* know that it should allocate from the shared heap because the
 cast may not be anywhere near where the object was created. The same
 problem goes for immutable.

 If you meant that the *user* should be responsible for making sure
 it's allocated on the shared heap, then yes, that's possible, but
 you're putting GC implementation details into the type system. That
 may or may not be a good thing.

I would phrase it as a shift D's memory model towards NUMA. By the way, GP GPU
is here to stay and it's NUMA. HPC software is cache aware, which is NUMA. And
all high-end server systems are NUMA aware, to say nothing of cluster/fabric
computing.

Oct 04 2011

Sean Kelly <sean invisibleduck.org> writes:

Maybe the correct approach is simply to try and eliminate the mutex protecti=
ng GC operations so allocations can be performed concurrently by multiple th=
reads?

Sent from my iPhone

On Oct 4, 2011, at 8:55 PM, Andrew Wiley <wiley.andrew.j gmail.com> wrote:

 On Tue, Oct 4, 2011 at 8:59 PM, Robert Jacques <sandford jhu.edu> wrote:
 On Tue, 04 Oct 2011 10:54:58 -0400, Andrew Wiley <wiley.andrew.j gmail.co=


m>
 wrote:
=20
 On Tue, Oct 4, 2011 at 3:52 AM, Walter Bright
 <newshound2 digitalmars.com> wrote:
=20
 On 10/4/2011 1:22 AM, deadalnix wrote:
=20
 Do you mean manage the memory that way :
 Shared heap -> TL pool within the shared heap -> allocation in thread
 from
 TL pool.
=20
 And complete GC collect.

=20
 Yes.
=20
=20
 This is a good solution do reduce contention on allocation. But a very=





 different
 thing than I was initially talking about.

=20
 Yes.
=20
=20
 Back to the point,
=20
 Considering you have pointer to immutable from any dataset, but not th=





e
 other
 way around, this is also valid to get a flag for it in the allocation
 interface.
=20
 What is the issue with the compiler here ?

=20
 Allocate an object, then cast it to immutable, and pass it to another
 thread.
=20

=20
 Assuming we have to make a call to the GC when an object toggles its
 immutable/shared state, it seems like this whole approach would
 basically murder anyone doing message passing with ownership changes,
 because the workflow tends to be create an object -> cast to shared ->
 send to another thread -> cast away shared -> do work -> cast to
 shared...
 On the other hand, I guess the counterargument is that locking an
 uncontended lock is on the order of two instructions (or so I'm told),
 so casting away shared probably isn't ever necessary. It just seems
 somewhat counterintuitive that casting to and from shared would be
 slower than unnecessarily locking the object.
=20

=20
 It's entirely possible to simply allocate the memory for the object from t=


he
 shared heap to start with. Then no more calls to the GC are needed.
=20

=20
 When an object is created and later cast to shared, the compiler
 *can't* know that it should allocate from the shared heap because the
 cast may not be anywhere near where the object was created. The same
 problem goes for immutable.

Oct 05 2011

"Robert Jacques" <sandford jhu.edu> writes:

On Mon, 03 Oct 2011 15:48:57 -0400, deadalnix <deadalnix gmail.com> wrote:

[snip]

 What I suggest in add a flag SHARED in BlkAttr and store it as an
 attribute of the block. Later modification could be made according to
 this flag. This attribute shouldn't be modifiable later on.

 What do you think ? Is it something it worth working on ? If it is, how
 can I help ?

I've been a proponent of thread-local garbage collection, so naturally I think
it's a good idea :)

There are some GCs specifically tailored for immutable data, so I'd probably
wish to add separate SHARED and IMMUTABLE flags.

On the con side, the major issue with thread-local GCs is that currently we
don't have good ways of building shared and immutable data. This leads to
people building data with mutable structures and casting at the end. Now the
issue with shared, is mostly a quality of implementation issue. However,
building immutable data structures efficiently requires a unique (aka. mobile)
storage type, which we'll probably get at the same time D gets an ownership
type system. That is to say, no time in the foreseeable future. That said,
there are are mitigating factors. First, by far the most common example of the
build & cast pattern involves string/array building; a task appender addresses
in spades. Second, std.allocators could be used to determine which heap to
allocate from. Third, we could op to be able to switch the GC from thread-local
to shared mode and visa versa; the idea being that inside an object building
routine, all allocations would be casted to immutable/shared and thus the  
local heap should be bypassed.

As for how can you help, I'd suggest building a thread local gc, following the
design of the recent discussion on std.allocators, if you're up to it.

Oct 03 2011

deadalnix <deadalnix gmail.com> writes:

Le 04/10/2011 08:02, Robert Jacques a �crit :
 On Mon, 03 Oct 2011 15:48:57 -0400, deadalnix <deadalnix gmail.com> wrote:

 [snip]

 What I suggest in add a flag SHARED in BlkAttr and store it as an
 attribute of the block. Later modification could be made according to
 this flag. This attribute shouldn't be modifiable later on.

 What do you think ? Is it something it worth working on ? If it is, how
 can I help ?

 I've been a proponent of thread-local garbage collection, so naturally I
 think it's a good idea :)

 There are some GCs specifically tailored for immutable data, so I'd
 probably wish to add separate SHARED and IMMUTABLE flags.

 On the con side, the major issue with thread-local GCs is that currently
 we don't have good ways of building shared and immutable data. This
 leads to people building data with mutable structures and casting at the
 end. Now the issue with shared, is mostly a quality of implementation
 issue. However, building immutable data structures efficiently requires
 a unique (aka. mobile) storage type, which we'll probably get at the
 same time D gets an ownership type system. That is to say, no time in
 the foreseeable future. That said, there are are mitigating factors.
 First, by far the most common example of the build & cast pattern
 involves string/array building; a task appender addresses in spades.
 Second, std.allocators could be used to determine which heap to allocate
 from. Third, we could op to be able to switch the GC from thread-local
 to shared mode and visa versa; the idea being that inside an object
 building routine, all allocations would be casted to immutable/shared
 and thus the local heap should be bypassed.

 As for how can you help, I'd suggest building a thread local gc,
 following the design of the recent discussion on std.allocators, if
 you're up to it.

The GC switch you suggest doesn't take into account all cases. you 
cannot get something work without shared GC. The thing is that shared 
data must be collected with a shared collection cycle.

But most of data aren't.

Oct 04 2011

"Robert Jacques" <sandford jhu.edu> writes:

On Tue, 04 Oct 2011 17:50:03 -0400, deadalnix <deadalnix gmail.com> wrote:
 Le 04/10/2011 08:02, Robert Jacques a �crit :
 On Mon, 03 Oct 2011 15:48:57 -0400, deadalnix <deadalnix gmail.com> wrote:

 [snip]

 What I suggest in add a flag SHARED in BlkAttr and store it as an
 attribute of the block. Later modification could be made according to
 this flag. This attribute shouldn't be modifiable later on.

 What do you think ? Is it something it worth working on ? If it is, how
 can I help ?

 I've been a proponent of thread-local garbage collection, so naturally I
 think it's a good idea :)

 There are some GCs specifically tailored for immutable data, so I'd
 probably wish to add separate SHARED and IMMUTABLE flags.

 On the con side, the major issue with thread-local GCs is that currently
 we don't have good ways of building shared and immutable data. This
 leads to people building data with mutable structures and casting at the
 end. Now the issue with shared, is mostly a quality of implementation
 issue. However, building immutable data structures efficiently requires
 a unique (aka. mobile) storage type, which we'll probably get at the
 same time D gets an ownership type system. That is to say, no time in
 the foreseeable future. That said, there are are mitigating factors.
 First, by far the most common example of the build & cast pattern
 involves string/array building; a task appender addresses in spades.
 Second, std.allocators could be used to determine which heap to allocate
 from. Third, we could op to be able to switch the GC from thread-local
 to shared mode and visa versa; the idea being that inside an object
 building routine, all allocations would be casted to immutable/shared
 and thus the local heap should be bypassed.

 As for how can you help, I'd suggest building a thread local gc,
 following the design of the recent discussion on std.allocators, if
 you're up to it.

 The GC switch you suggest doesn't take into account all cases. you
 cannot get something work without shared GC. The thing is that shared
 data must be collected with a shared collection cycle.

 But most of data aren't.

Well, our current GC behaves as a shared GC.

Oct 04 2011

D Programming

C/C++ Programming

Other

digitalmars.D - Thread local and memory allocation