digitalmars.D - A separate GC idea

digitalmars.D - A separate GC idea - multiple D GCs

Chris Katko (42/42) Jan 21 2022 So I was going to ask this related question in my other thread

rikki cattermole (15/21) Jan 21 2022 Indeed we have thought about this.

Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= (13/23) Jan 22 2022 Yes, this is better, although you would need to change the

rikki cattermole (7/10) Jan 22 2022 This isn't what I had in mind.

Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= (10/12) Jan 22 2022 I don't quite see how you plan to make this work without tracking

rikki cattermole (20/20) Jan 22 2022 That's not the direction I'm thinking. Its not about freeing memory. Its...

Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= (21/32) Jan 22 2022 I believe the number of roots is irrelevant, what you need is to

rikki cattermole (10/23) Jan 22 2022 Yeah those are roots. They still have to be added into the global heap

Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= (4/6) Jan 22 2022 Well, but I would never use it. Forking is crazy expensive.

Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= (21/27) Jan 22 2022 Here are some downsides of a forking collector:

rikki cattermole (4/10) Jan 22 2022 Agreed. There is no one size fits all solution.
Elronnd (8/17) Jan 22 2022 Probably likely to evict some stuff, but I don't see why you

Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= (14/22) Jan 22 2022 If you change the TLB, then affected address ranges should in

Elronnd (3/6) Jan 22 2022 I assume you would only lose TLB, not cache.

Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= (6/12) Jan 23 2022 I wouldn't make any assumptions, what I get from a quick Google
Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= (5/11) Jan 23 2022 I wouldn't make any assumptions, what I get from a quick Google

max haughton (4/16) Jan 23 2022 L1 cache is often virtually addressed but physically tagged so

IGotD- (15/22) Jan 22 2022 Does C# have thread pinned GC memory? I haven't seen anything

Adam Ruppe (13/20) Jan 21 2022 The big problem is that data isn't thread local (including TLS);

Tejas (3/10) Jan 21 2022 Is this intended behaviour? Or was it accidental? Is it atleast

H. S. Teoh (28/40) Jan 21 2022 Immutable data is shared across threads by default. So any immutable
Adam D Ruppe (3/6) Jan 21 2022 No.

=?UTF-8?Q?Ali_=c3=87ehreli?= (13/15) Jan 21 2022 I ended up with that design by accident: My D library had to spawn
Walter Bright (3/8) Jan 21 2022 Yes. The trouble is what happens when a pointer in one pool is cast to a...

H. S. Teoh (13/22) Jan 21 2022 It almost makes one want to tag pointer types at compile-time as

Tejas (3/17) Jan 21 2022 Isn't going global from thread local done via `cast(shared)`

Sebastiaan Koppe (10/19) Jan 22 2022 Instead of segregating GC by execution contexts like threads or

rikki cattermole (4/6) Jan 22 2022 For my proposal, I may call it a fiber, but in reality it can just as

Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= (4/8) Jan 22 2022 The term «task» is quite generic and covers fibers as well as

Chris Katko <ckatko gmail.com> writes:

So I was going to ask this related question in my other thread 
but I now see it's hit over 30 replies (!) and the discussion is 
interesting but a separate topic.

So a related question: Has anyone ever thought about 
"thread-local garbage collection" or some sort of "multiple pool 
[same process/thread] garbage collection"? The idea here is, each 
thread [or thread collection] would have its own garbage 
collector, and, be the only thread that pauses during a 
collection event.

Basically, you cordon off your memory use like you would with say 
a memory pool, so that when a GC freezes it only freezes that 
section. Smaller, independent sections means less freeze time and 
the rest of the threads keep running on multiple core CPUs.

There are obviously some issues:

  1 - Is the D GC designed to allow something like this? Can it be 
prevented from walking into areas it shouldn't? I know you can 
make malloc non-system memory it "shouldn't" touch if you never 
make pointers to it?

  2 - It will definitely be more complex, and D can be too complex 
and finnicky already. (SEGFAULT TIME.) So patterns to prevent 
that will be needed. "Am I reading in my thread, or thread 37?"

  3 - Any boundary where threads "touch" will be either a 
nightmare or at least need some sort of mechanism to cross and 
synchronize across said boundary. And a frozen thread, 
synchronized, will freeze any reads by non-frozen threads. 
[Though one might be able to schedule garbage collections only 
when massive reads/writes aren't currently needed.]

I think the simple, easiest way to try this would be to just 
spawn multiple processes [each having their own D GC collector] 
and somehow share memory between them. But I have no idea if it's 
easy to work around the inherent "multiple process = context 
switch" overhead making it actually slower.

As for the "why?". I'm not sure if there are huge benefits, mild 
benefits, benefits in only niche scenarios (like games, which is 
the kind I play with). But I'm just curious about the prospect.

Because while an "iterative GC" requires... an entire new GC. 
Splitting off multiple existing D garbage collectors into their 
own fields could maybe work. [And once again, without benchmarks 
a single D GC may be more than fast for my needs.] But I'm 
curious and like to know my options, suggestions, and people's 
general thoughts on it.

Thanks, have a great day!

Jan 21 2022

rikki cattermole <rikki cattermole.co.nz> writes:

On 22/01/2022 2:56 AM, Chris Katko wrote:
 So a related question: Has anyone ever thought about "thread-local 
 garbage collection" or some sort of "multiple pool [same process/thread] 
 garbage collection"? The idea here is, each thread [or thread 
 collection] would have its own garbage collector, and, be the only 
 thread that pauses during a collection event.

Indeed we have thought about this.

What I want is a fiber aware GC and that implicitly means thread-local too.

But I'm not sure it would be any use with the existing GC to have the 
hooks, it would need to be properly designed to take advantage of it.

 Because while an "iterative GC" requires... an entire new GC.

The existing GC has absolutely horrible code.

I tried a while back to get it to support snapshotting (Windows specific 
concurrency for GC's) and I couldn't find where it even did the scanning 
for pointers... yeah.

https://github.com/dlang/druntime/blob/master/src/core/internal/gc/impl/conservative/gc.d

After a quick look it does look like it has been improved somewhat with 
more comments since then.



What we need is a full reimplementation of the GC that is easy to dig 
into. After that, forking, precise, generational should all be pretty 
straight forward to implement.

Jan 21 2022

Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= <ola.fosheim.grostad gmail.com> writes:

On Friday, 21 January 2022 at 14:09:18 UTC, rikki cattermole 
wrote:
 On 22/01/2022 2:56 AM, Chris Katko wrote:
 So a related question: Has anyone ever thought about 
 "thread-local garbage collection" or some sort of "multiple 
 pool [same process/thread] garbage collection"? The idea here 
 is, each thread [or thread collection] would have its own 
 garbage collector, and, be the only thread that pauses during 
 a collection event.

 Indeed we have thought about this.

 What I want is a fiber aware GC and that implicitly means 
 thread-local too.

Yes, this is better, although you would need to change the 
language semantics so that non-shared means local to a 
fiber/actor as the fiber/actor can move from one thread to 
another.

You also need a per-thread runtime that can collect from sleeping 
fibers/actors and in the worst case signal other thread runtimes 
to collect theirs. Otherwise you would run out of memory when 
memory is available...

In general I'd say you would want to have support for many 
different memory managers in the same executable as different 
fibers/actors may have different memory management needs.

Jan 22 2022

rikki cattermole <rikki cattermole.co.nz> writes:

On 22/01/2022 10:24 PM, Ola Fosheim Grøstad wrote:
 Yes, this is better, although you would need to change the language 
 semantics so that non-shared means local to a fiber/actor as the 
 fiber/actor can move from one thread to another.

This isn't what I had in mind.

By telling the GC about fibers, it should be able to take advantage of 
this information and limit the time it takes to do stop the world scanning.

If it knows memory is live, it doesn't need to scan for it.

If it knows that memory isn't referenced in that fiber anymore, then it 
only needs to confirm that it isn't anywhere else.

Jan 22 2022

Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= <ola.fosheim.grostad gmail.com> writes:

On Saturday, 22 January 2022 at 10:05:07 UTC, rikki cattermole 
wrote:
 If it knows that memory isn't referenced in that fiber anymore, 
 then it only needs to confirm that it isn't anywhere else.

I don't quite see how you plan to make this work without tracking 
dirty objects, which has a performance/storage penalty.

Fiber1 has memory chunk A

Fiber2 has memory chunks A, B

You confirm that sleeping Fiber1 does not have access to B

Fiber2 sets A.next = B

At this point, you assume wrongly that Fiber1 does not have 
access to B?

Jan 22 2022

rikki cattermole <rikki cattermole.co.nz> writes:

That's not the direction I'm thinking. Its not about freeing memory. Its 
about deferment.

The best way to make your code faster is to simply do less work.

So to make global memory scanning faster, we either have to decrease the 
number of roots (likely to end with segfaults) or to decrease the memory 
ranges to scan for.

And since we have a context to associate with memory that is allocated 
and with it roots, we have a pretty good idea of if that memory is still 
alive or not during the execution of said context.

Think about the context of a web server/service. You have some global 
heap memory, router, plugins, adminy type stuff, that sort of thing. 
Most of that sort of memory is either reused or sticks around till the 
end of the process.

But then there is request specific memory that gets allocated then 
thrown away. If you can defer needing to add those memory ranges into 
the global heap, that'll speed up scanning drastically as that is where 
most of your free-able memory will originate from.

To top it off, when you have a concurrent GC design like forking or 
snapshotting, you can scan for those dead fibers memory ranges and it 
won't require stopping the world.

Jan 22 2022

Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= <ola.fosheim.grostad gmail.com> writes:

On Saturday, 22 January 2022 at 10:38:36 UTC, rikki cattermole 
wrote:
 So to make global memory scanning faster, we either have to 
 decrease the number of roots (likely to end with segfaults) or 
 to decrease the memory ranges to scan for.

I believe the number of roots is irrelevant, what you need is to 
have heap separation where you ensure through the type system 



type system (or programmer) is responsible for the pointed-to 
object being live.

So, if you let a fiber have it's own heap of this kind, then you 
can reduce the scanning?

But you need to either improve the type system or put the burden 
on the programmer in C++-style.

Either way, I believe distinguishing between ownership and 
borrows can improve on GC performance, not only unique_ptr style 
ownership.

 But then there is request specific memory that gets allocated 
 then thrown away. If you can defer needing to add those memory 
 ranges into the global heap, that'll speed up scanning 
 drastically as that is where most of your free-able memory will 
 originate from.

But you need to scan those memory ranges if they can contain 
pointers that can reach GC-memory?

 To top it off, when you have a concurrent GC design like 
 forking or snapshotting, you can scan for those dead fibers 
 memory ranges and it won't require stopping the world.

You want to require taking a snapshot-copy of the 
pointer-containing heap? That will make the collector much less 
useful, I think.

Jan 22 2022

rikki cattermole <rikki cattermole.co.nz> writes:

On 22/01/2022 11:59 PM, Ola Fosheim Grøstad wrote:
 But then there is request specific memory that gets allocated then 
 thrown away. If you can defer needing to add those memory ranges into 
 the global heap, that'll speed up scanning drastically as that is 
 where most of your free-able memory will originate from.

 
 But you need to scan those memory ranges if they can contain pointers 
 that can reach GC-memory?

Yeah those are roots. They still have to be added into the global heap 
to be scanned.

The goal is to decrease the number of memory ranges that need to be 
scanned for in a full scan.

Cost = Roots * Ranges

Get Ranges down (until it actually does need to be scanned for), lowers 
the cost, that's the goal by what I am suggesting!

 To top it off, when you have a concurrent GC design like forking or 
 snapshotting, you can scan for those dead fibers memory ranges and it 
 won't require stopping the world.

 
 You want to require taking a snapshot-copy of the pointer-containing 
 heap? That will make the collector much less useful, I think.

We already have forking, and that is super useful for getting user 
threads to not stop. We are only missing snapshotting for Windows now.

Jan 22 2022

Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= <ola.fosheim.grostad gmail.com> writes:

On Saturday, 22 January 2022 at 11:06:24 UTC, rikki cattermole 
wrote:
 We already have forking, and that is super useful for getting 
 user threads to not stop.

Well, but I would never use it. Forking is crazy expensive. 
Messes with TLB and therefore caches IIRC.

Jan 22 2022

Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= <ola.fosheim.grostad gmail.com> writes:

On Saturday, 22 January 2022 at 11:27:39 UTC, Ola Fosheim Grøstad 
wrote:
 On Saturday, 22 January 2022 at 11:06:24 UTC, rikki cattermole 
 wrote:
 We already have forking, and that is super useful for getting 
 user threads to not stop.

 Well, but I would never use it. Forking is crazy expensive. 
 Messes with TLB and therefore caches IIRC.

Here are some downsides of a forking collector:

1. messes with TLB, wipes all caches completely (AFAIK).

2. will hog extra threads, thus if your program is using all 
cores, you will see a penalty

3. requires a reduced activity in pointer mutation in the main 
program, so the forking collector has to saturate the databus in 
order to complete quickly, which is bad for the main process

4. requires carefulness in configuring OS resource handling

5. if you actually are out of memory, or close to it, then there 
is no way for you to fork, so it will fail and the process will 
instead be killed

6. makes it more difficult to coordinate with real time threads 
(throttling/backpressure)

7. not really portable, platform dependent

A forking collector is just not suitable for system level 
programming, it is very much a solution for a high level 
programming language running on hardware with lots of headroom.

If you are going high level, you might as well introduce write 
barriers for pointer mutation.

Jan 22 2022

rikki cattermole <rikki cattermole.co.nz> writes:

On 23/01/2022 1:39 AM, Ola Fosheim Grøstad wrote:
 A forking collector is just not suitable for system level programming, 
 it is very much a solution for a high level programming language running 
 on hardware with lots of headroom.
 
 If you are going high level, you might as well introduce write barriers 
 for pointer mutation.

Agreed. There is no one size fits all solution.

Having more GC's including ones that require write barriers as an option 
are great things to have!

Jan 22 2022

Elronnd <elronnd elronnd.net> writes:

On Saturday, 22 January 2022 at 12:39:44 UTC, Ola Fosheim Grøstad 
wrote:
 Here are some downsides of a forking collector:
 1. messes with TLB, wipes all caches completely (AFAIK).

Probably likely to evict some stuff, but I don't see why you 
would lose everything.

 2. will hog extra threads, thus if your program is using all 
 cores, you will see a penalty

It probably isn't using all threads, though.

 5. if you actually are out of memory, or close to it, then 
 there is no way for you to fork, so it will fail and the 
 process will instead be killed

You can fall back to regular gc if fork fails (and I think the 
fork gc does this).

 If you are going high level, you might as well introduce write 
 barriers for pointer mutation.

I agree with your other critiques, and I agree with this.

Jan 22 2022

Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= <ola.fosheim.grostad gmail.com> writes:

On Saturday, 22 January 2022 at 16:45:02 UTC, Elronnd wrote:
 On Saturday, 22 January 2022 at 12:39:44 UTC, Ola Fosheim 
 Grøstad wrote:
 Here are some downsides of a forking collector:
 1. messes with TLB, wipes all caches completely (AFAIK).

 Probably likely to evict some stuff, but I don't see why you 
 would lose everything.

If you change the TLB, then affected address ranges should in 
general be flushed although maybe this is too pessimistic in the 
case of a fork. I don't know the details of what different 
CPU/MMU hardware implementations require and how various OSes 
deal with this.

But both the fork and the COW page copying that will occur when 
the process writes to memory pages after the fork hurt.


 You can fall back to regular gc if fork fails (and I think the 
 fork gc does this).

Good point, but if you succeed with the fork in a low memory 
situation then you are in risk of being killed by Linux OOM 
Killer.  Maybe only the GC collector will be killed since it is a 
child process, or how does this work?

So what happens then, will D detect this failure and switch to a 
cheaper GC collection process?

Jan 22 2022

Elronnd <elronnd elronnd.net> writes:

On Saturday, 22 January 2022 at 19:43:33 UTC, Ola Fosheim Grøstad 
wrote:
 If you change the TLB, then affected address ranges should in 
 general be flushed although maybe this is too pessimistic in 
 the case of a fork.

I assume you would only lose TLB, not cache.

Jan 22 2022

Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= <ola.fosheim.grostad gmail.com> writes:

On Saturday, 22 January 2022 at 21:45:36 UTC, Elronnd wrote:
 On Saturday, 22 January 2022 at 19:43:33 UTC, Ola Fosheim 
 Grøstad wrote:
 If you change the TLB, then affected address ranges should in 
 general be flushed although maybe this is too pessimistic in 
 the case of a fork.

 I assume you would only lose TLB, not cache.

I wouldn't make any assumptions, what I get from a quick Google 
search fits with what I've read on this topic before: on ARM a 
TLB flush implies a cache flush, and the Linux documentation 
https://tldp.org/LDP/khg/HyperNews/get/memory/flush.html 
describes the pattern: flush cache, modify, flush TLB.

Jan 23 2022

Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= <ola.fosheim.grostad gmail.com> writes:

On Saturday, 22 January 2022 at 21:45:36 UTC, Elronnd wrote:
 On Saturday, 22 January 2022 at 19:43:33 UTC, Ola Fosheim 
 Grøstad wrote:
 If you change the TLB, then affected address ranges should in 
 general be flushed although maybe this is too pessimistic in 
 the case of a fork.

 I assume you would only lose TLB, not cache.

I wouldn't make any assumptions, what I get from a quick Google 
search fits with what I've read on this topic before: on ARM a 
TLB flush implies a cache flush, and 
[tldp.org](https://tldp.org/LDP/khg/HyperNews/get/memory/flush.html) describes
the pattern for Linux as: flush cache, modify, flush TLB.

Jan 23 2022

max haughton <maxhaton gmail.com> writes:

On Sunday, 23 January 2022 at 12:30:03 UTC, Ola Fosheim Grøstad 
wrote:
 On Saturday, 22 January 2022 at 21:45:36 UTC, Elronnd wrote:
 On Saturday, 22 January 2022 at 19:43:33 UTC, Ola Fosheim 
 Grøstad wrote:
 If you change the TLB, then affected address ranges should in 
 general be flushed although maybe this is too pessimistic in 
 the case of a fork.

 I assume you would only lose TLB, not cache.

 I wouldn't make any assumptions, what I get from a quick Google 
 search fits with what I've read on this topic before: on ARM a 
 TLB flush implies a cache flush, and 
 [tldp.org](https://tldp.org/LDP/khg/HyperNews/get/memory/flush.html) describes
the pattern for Linux as: flush cache, modify, flush TLB.

L1 cache is often virtually addressed but physically tagged so 
that makes sense.

Jan 23 2022

IGotD- <nise nise.com> writes:

On Saturday, 22 January 2022 at 10:05:07 UTC, rikki cattermole 
wrote:
 This isn't what I had in mind.

 By telling the GC about fibers, it should be able to take 
 advantage of this information and limit the time it takes to do 
 stop the world scanning.

 If it knows memory is live, it doesn't need to scan for it.

 If it knows that memory isn't referenced in that fiber anymore, 
 then it only needs to confirm that it isn't anywhere else.


that you need to "move" the memory to other threads in that 
language but I don't really know what is going on under the hood 
there.


one of the reasons D is nice to use. Manually moving memory 
around will have an effect on how the language looks and ease of 
use. Another thing is that thread pinned memory doesn't work well 
with thread pools as any thread in a pool might tamper with the 
memory.

In general, tracing GC doesn't scale well with large amount of 
memory. I would rather go with something similar to ORC in Nim in 
order to reduce the tracing.

Jan 22 2022

Adam Ruppe <destructionator gmail.com> writes:

On Friday, 21 January 2022 at 13:56:09 UTC, Chris Katko wrote:
 So a related question: Has anyone ever thought about 
 "thread-local garbage collection"

The big problem is that data isn't thread local (including TLS); 
nothing stops one thread from pointing into another thread. So 
any GC that depends on a barrier there is liable to look buggy.


You can do it for special cases with fork like you said, or with 
unregistering threads like Guillaume does

 I think the simple, easiest way to try this would be to just 
 spawn multiple processes [each having their own D GC collector] 
 and somehow share memory between them. But I have no idea if 
 it's easy to work around the inherent "multiple process = 
 context switch" overhead making it actually slower.

I actually do exactly this with my web server, but it is easy 
there since web requests are supposed to be independent anyway.

re context switches btw, processes and threads both have them. 
they aren't that different on the low level, it is just a matter 
of how much of the memory space is shared. default shared = 
thread, default unshared = process.

Jan 21 2022

Tejas <notrealemail gmail.com> writes:

On Friday, 21 January 2022 at 14:37:04 UTC, Adam Ruppe wrote:
 On Friday, 21 January 2022 at 13:56:09 UTC, Chris Katko wrote:
 So a related question: Has anyone ever thought about 
 "thread-local garbage collection"

 The big problem is that data isn't thread local (including 
 TLS); nothing stops one thread from pointing into another 
 thread. So any GC that depends on a barrier there is liable to 
 look buggy.


Is this intended behaviour? Or was it accidental? Is it atleast 
guaranteed that only one thread can hold a muta le reference?

Jan 21 2022

"H. S. Teoh" <hsteoh quickfur.ath.cx> writes:

On Fri, Jan 21, 2022 at 03:16:32PM +0000, Tejas via Digitalmars-d wrote:
 On Friday, 21 January 2022 at 14:37:04 UTC, Adam Ruppe wrote:
 On Friday, 21 January 2022 at 13:56:09 UTC, Chris Katko wrote:
 So a related question: Has anyone ever thought about "thread-local
 garbage collection"

 
 The big problem is that data isn't thread local (including TLS);
 nothing stops one thread from pointing into another thread. So any
 GC that depends on a barrier there is liable to look buggy.

 
 
 Is this intended behaviour? Or was it accidental? Is it atleast
 guaranteed that only one thread can hold a muta le reference?

Immutable data is shared across threads by default.  So any immutable
data would need to be collected by a global GC.  However, you don't
always know if some data is going to end up being immutable when you
allocate, e.g.:

	int[] createSomeData() pure {
		return [ 1, 2, 3 ];	// mutable when we allocate
	}

	void main() {
		// Implicitly convert from mutable to immutable because
		// createSomeData() is pure and the data is unique.
		immutable int[] constants = createSomeData();

		shareMyData(constants);

		// Now who should collect, per-thread GC or global GC?
		constants = null;
	}

So assume we allocate the initial array on the per-thread heap. Then it
gets implicitly cast to immutable because it was unique, and shared with
another thread. Now who should collect, the allocating thread's GC or
the GC of the thread the data was shared with?  Neither are correct: you
need a global GC.  At the very least, you need to synchronize across
individual threads' GCs, which at least partially negates the benefit of
a per-thread GC.

This is just one example of several that show why it's a hard problem in
the current language.


T

-- 
One Word to write them all, One Access to find them, One Excel to count them
all, And thus to Windows bind them. -- Mike Champion

Jan 21 2022

Adam D Ruppe <destructionator gmail.com> writes:

On Friday, 21 January 2022 at 15:16:32 UTC, Tejas wrote:
 Is this intended behaviour?

Yes.

 Is it atleast guaranteed that only one thread can hold a muta 
 le reference?

No.

Jan 21 2022

=?UTF-8?Q?Ali_=c3=87ehreli?= <acehreli yahoo.com> writes:

On 1/21/22 05:56, Chris Katko wrote:

 some sort of "multiple pool [same process/thread]
 garbage collection"?

I ended up with that design by accident: My D library had to spawn 
multiple D processes instead of multiple threads. This was a workaround 
to my inability to execute initialization functions of D shared 
libraries that were dynamically loaded by foreign runtimes (Python and 
C++ were in the complex picture).

In the end, I realized that my library was using multiple D runtimes on 
those multiple D processes. :)

I've never gotten to profiling whether my necessary inter-process 
communication was hurting performance. Even if it did, the now-unstopped 
worlds might be better in the end. I even thought about making a DConf 
presentation about the findings but it never happened. :)

Ali

Jan 21 2022

Walter Bright <newshound2 digitalmars.com> writes:

On 1/21/2022 5:56 AM, Chris Katko wrote:
 So a related question: Has anyone ever thought about "thread-local garbage 
 collection" or some sort of "multiple pool [same process/thread] garbage 
 collection"? The idea here is, each thread [or thread collection] would have
its 
 own garbage collector, and, be the only thread that pauses during a collection 
 event.

Yes. The trouble is what happens when a pointer in one pool is cast to a
pointer 
in another pool.

Jan 21 2022

"H. S. Teoh" <hsteoh quickfur.ath.cx> writes:

On Fri, Jan 21, 2022 at 02:43:31PM -0800, Walter Bright via Digitalmars-d wrote:
 On 1/21/2022 5:56 AM, Chris Katko wrote:
 So a related question: Has anyone ever thought about "thread-local
 garbage collection" or some sort of "multiple pool [same
 process/thread] garbage collection"? The idea here is, each thread
 [or thread collection] would have its own garbage collector, and, be
 the only thread that pauses during a collection event.

 
 Yes. The trouble is what happens when a pointer in one pool is cast to
 a pointer in another pool.

It almost makes one want to tag pointer types at compile-time as
thread-local or global. Casting from thread-local to global would emit a
call to some druntime hook to note the transfer (which, presumably,
should only occur rarely). Stuff with only global references will be
collected by the global GC, which can be scheduled to run less
frequently (or disabled if you never do such casts).

But yeah, this is a slippery slope on the slide down towards Rust... :-P
It seems we just can't get any farther from where we are without
starting to need managed pointer types.


T

-- 
Who told you to swim in Crocodile Lake without life insurance??

Jan 21 2022

Tejas <notrealemail gmail.com> writes:

On Friday, 21 January 2022 at 22:56:46 UTC, H. S. Teoh wrote:
 On Fri, Jan 21, 2022 at 02:43:31PM -0800, Walter Bright via 
 Digitalmars-d wrote:
 [...]

 It almost makes one want to tag pointer types at compile-time 
 as thread-local or global. Casting from thread-local to global 
 would emit a call to some druntime hook to note the transfer 
 (which, presumably, should only occur rarely). Stuff with only 
 global references will be collected by the global GC, which can 
 be scheduled to run less frequently (or disabled if you never 
 do such casts).

 But yeah, this is a slippery slope on the slide down towards 
 Rust... :-P It seems we just can't get any farther from where 
 we are without starting to need managed pointer types.


 T

Isn't going global from thread local done via `cast(shared)` 
though? Is that not enough to notify the compiler?

Jan 21 2022

Sebastiaan Koppe <mail skoppe.eu> writes:

On Friday, 21 January 2022 at 13:56:09 UTC, Chris Katko wrote:
 So I was going to ask this related question in my other thread 
 but I now see it's hit over 30 replies (!) and the discussion 
 is interesting but a separate topic.

 So a related question: Has anyone ever thought about 
 "thread-local garbage collection" or some sort of "multiple 
 pool [same process/thread] garbage collection"? The idea here 
 is, each thread [or thread collection] would have its own 
 garbage collector, and, be the only thread that pauses during a 
 collection event.

Instead of segregating GC by execution contexts like threads or 
fibers, I think it makes more sense to separate it by task 
instead.

When making the most of multiple cores you try to limit 
synchronizations to tasks boundaries anyway. So a task itself 
basically runs isolated. Any memory it burns through can 
basically be dropped when the task is done.

Of course the devil is in the details, but I think it is more 
promising than segregating by execution context.

Jan 22 2022

rikki cattermole <rikki cattermole.co.nz> writes:

On 23/01/2022 4:14 AM, Sebastiaan Koppe wrote:
 Instead of segregating GC by execution contexts like threads or fibers, 
 I think it makes more sense to separate it by task instead.

For my proposal, I may call it a fiber, but in reality it can just as 
easily be a task.

The only difference is one has a dedicated stack, the other does not.

Jan 22 2022

Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= <ola.fosheim.grostad gmail.com> writes:

On Saturday, 22 January 2022 at 15:32:27 UTC, rikki cattermole 
wrote:
 For my proposal, I may call it a fiber, but in reality it can 
 just as easily be a task.

 The only difference is one has a dedicated stack, the other 
 does not.

The term «task» is quite generic and covers fibers as well as 
other computational units, with or without a stack.

Jan 22 2022

D Programming

C/C++ Programming

Other

digitalmars.D - A separate GC idea - multiple D GCs