digitalmars.D - Thought on limiting scope of GC

Jerry (15/15) Feb 13 2014 Hi all,

Andrei Alexandrescu (5/20) Feb 13 2014 Yah, it's a classic (with the manes "track" -> "mark" and "cleanup" ->

Jerry (10/36) Feb 14 2014 I don't follow the global GC comment. Let's say you're using global GC

Francesco Cattoglio (4/9) Feb 14 2014 Track cannot make sure that no reference escapes, therefore
Andrei Alexandrescu (12/25) Feb 14 2014 Oh, I think mark/sweep in the "mark/sweep idiom" are different from

Jerry (4/17) Feb 14 2014 The difference is that I'd like the ability for some objects to live

Andrei Alexandrescu (3/22) Feb 14 2014 Then I guess you'd need to use two allocators.

thedeemon (5/16) Feb 13 2014 What if allocateStuff() writes address of some newly allocated

Jerry (6/21) Feb 14 2014 This is a concern. Rather than passing a single object into the

Paulo Pinto (3/24) Feb 13 2014 How do imagine it to work in multi-core programs? Does it only

Jerry (5/21) Feb 14 2014 I think this can be handled by storing the thread that requests

Namespace (3/24) Feb 14 2014 Looks like DIP 46: http://wiki.dlang.org/DIP46
tcak (9/30) Feb 14 2014 A programmer's aim is to tell computer what to do. Purpose of GC

Paulo Pinto (8/46) Feb 14 2014 This only works when you are the only guy on the team and have a

tcak (12/29) Feb 14 2014 Many people wants to disable GC to improve performance (if there

Paulo Pinto (15/45) Feb 14 2014 Again, this example only works when you are the only guy working on the
Jerry (4/13) Feb 14 2014 My proposal was to leave GC enabled for the whole program. The track

Jerry <jlquinn optonline.net> writes:

Hi all,

I just had the following thought on limiting the gc in regions.  I don't
know if this would address some of Manu's concerns, but here goes:

My thought is to have something like the following:

GC.track();
auto obj = allocateStuff();
GC.cleanup(obj);

The idea here is that track() tells GC to explicitly track all objects
created from that point until the cleanup call.  The cleanup() call
tells gc to limit its collection to those objects allocated since the
track() call.  The obj parameter tells gc to consider obj live.

This way, you can avoid tracking everything that may get created, but
you can limit how much work gets done.

Comments? Slams?

Jerry

Feb 13 2014

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 2/13/14, 8:41 PM, Jerry wrote:
 Hi all,

 I just had the following thought on limiting the gc in regions.  I don't
 know if this would address some of Manu's concerns, but here goes:

 My thought is to have something like the following:

 GC.track();
 auto obj = allocateStuff();
 GC.cleanup(obj);

 The idea here is that track() tells GC to explicitly track all objects
 created from that point until the cleanup call.  The cleanup() call
 tells gc to limit its collection to those objects allocated since the
 track() call.  The obj parameter tells gc to consider obj live.

 This way, you can avoid tracking everything that may get created, but
 you can limit how much work gets done.

 Comments? Slams?

 Jerry

Yah, it's a classic (with the manes "track" -> "mark" and "cleanup" -> 
"sweep"). Allocators support that already, and installing a global GC 
should do as well.

Andrei

Feb 13 2014

Jerry <jlquinn optonline.net> writes:

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

 On 2/13/14, 8:41 PM, Jerry wrote:
 Hi all,

 I just had the following thought on limiting the gc in regions.  I don't
 know if this would address some of Manu's concerns, but here goes:

 My thought is to have something like the following:

 GC.track();
 auto obj = allocateStuff();
 GC.cleanup(obj);

 The idea here is that track() tells GC to explicitly track all objects
 created from that point until the cleanup call.  The cleanup() call
 tells gc to limit its collection to those objects allocated since the
 track() call.  The obj parameter tells gc to consider obj live.

 This way, you can avoid tracking everything that may get created, but
 you can limit how much work gets done.

 Comments? Slams?

 Jerry

 Yah, it's a classic (with the manes "track" -> "mark" and "cleanup" -> 
 "sweep"). Allocators support that already, and installing a global GC should
 do as well.

I don't follow the global GC comment.  Let's say you're using global GC
in general but want to control more tightly what it's doing at a
particular region of the code.

Mark looks at all things that have been allocated and possibly live.
Track says keep track of objects allocated after the track call, and
cleanup only looks at those objects that were recently allocated,
ignoring the rest of the heap.

If you're saying that allocators will provide the means of doing this,
then that's fine.

Feb 14 2014

"Francesco Cattoglio" <francesco.cattoglio gmail.com> writes:

On Friday, 14 February 2014 at 11:28:11 UTC, Jerry wrote:
 Track says keep track of objects allocated after the track 
 call, and
 cleanup only looks at those objects that were recently 
 allocated,
 ignoring the rest of the heap.

Track cannot make sure that no reference escapes, therefore 
cleaning up an object could be a huge error. This would however 
make sense e.g. inside pure functions.

Feb 14 2014

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 2/14/14, 3:28 AM, Jerry wrote:
 Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
 Yah, it's a classic (with the manes "track" -> "mark" and "cleanup" ->
 "sweep"). Allocators support that already, and installing a global GC should
 do as well.

 I don't follow the global GC comment.  Let's say you're using global GC
 in general but want to control more tightly what it's doing at a
 particular region of the code.

 Mark looks at all things that have been allocated and possibly live.

Oh, I think mark/sweep in the "mark/sweep idiom" are different from 
"mark & sweep garbage collector". I looked for the evidence that the 
idiom does exist under that name, but apparently I was wrong.

Anyhow, I guess track/cleanup is less confusing.

 Track says keep track of objects allocated after the track call, and
 cleanup only looks at those objects that were recently allocated,
 ignoring the rest of the heap.

 If you're saying that allocators will provide the means of doing this,
 then that's fine.

I'm thinking of something like:

MyAllocator alloc = ...;
alloc.installGlobally();
...
alloc.deallocateAll();
alloc.uninstallGlobally();



Andrei

Feb 14 2014

Jerry <jlquinn optonline.net> writes:

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

 On 2/14/14, 3:28 AM, Jerry wrote:
 Track says keep track of objects allocated after the track call, and
 cleanup only looks at those objects that were recently allocated,
 ignoring the rest of the heap.

 If you're saying that allocators will provide the means of doing this,
 then that's fine.

 I'm thinking of something like:

 MyAllocator alloc = ...;
 alloc.installGlobally();
 ...
 alloc.deallocateAll();
 alloc.uninstallGlobally();

The difference is that I'd like the ability for some objects to live
after the region ends.  I.e. it's reducing the scope of the GC, not
temporarily replacing it with a completely separate heap.

Feb 14 2014

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 2/14/14, 8:26 AM, Jerry wrote:
 Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

 On 2/14/14, 3:28 AM, Jerry wrote:
 Track says keep track of objects allocated after the track call, and
 cleanup only looks at those objects that were recently allocated,
 ignoring the rest of the heap.

 If you're saying that allocators will provide the means of doing this,
 then that's fine.

 I'm thinking of something like:

 MyAllocator alloc = ...;
 alloc.installGlobally();
 ...
 alloc.deallocateAll();
 alloc.uninstallGlobally();

 The difference is that I'd like the ability for some objects to live
 after the region ends.  I.e. it's reducing the scope of the GC, not
 temporarily replacing it with a completely separate heap.

Then I guess you'd need to use two allocators.

Andrei

Feb 14 2014

"thedeemon" <dlang thedeemon.com> writes:

On Friday, 14 February 2014 at 04:41:43 UTC, Jerry wrote:
 My thought is to have something like the following:

 GC.track();
 auto obj = allocateStuff();
 GC.cleanup(obj);

 The idea here is that track() tells GC to explicitly track all 
 objects
 created from that point until the cleanup call.  The cleanup() 
 call
 tells gc to limit its collection to those objects allocated 
 since the
 track() call.  The obj parameter tells gc to consider obj live.

What if allocateStuff() writes address of some newly allocated 
object to a field of some old object existing before GC.track()? 
You can't just scan only objects created after GC.track(), this 
might create dangling references in the "old generation".

Feb 13 2014

Jerry <jlquinn optonline.net> writes:

"thedeemon" <dlang thedeemon.com> writes:

 On Friday, 14 February 2014 at 04:41:43 UTC, Jerry wrote:
 My thought is to have something like the following:

 GC.track();
 auto obj = allocateStuff();
 GC.cleanup(obj);

 The idea here is that track() tells GC to explicitly track all objects
 created from that point until the cleanup call.  The cleanup() call
 tells gc to limit its collection to those objects allocated since the
 track() call.  The obj parameter tells gc to consider obj live.

 What if allocateStuff() writes address of some newly allocated object to a
 field of some old object existing before GC.track()? You can't just scan only
 objects created after GC.track(), this might create dangling references in the
 "old generation".

This is a concern.  Rather than passing a single object into the
cleanup, a list of objects to consider live can be passed in.  That
would cover at least some of these situations, but not all.

Would it still be useful given this limitation?  Would it give someone
looking for tighter control over GC the tools they need?

Feb 14 2014

"Paulo Pinto" <pjmlp progtools.org> writes:

On Friday, 14 February 2014 at 04:41:43 UTC, Jerry wrote:
 Hi all,

 I just had the following thought on limiting the gc in regions.
  I don't
 know if this would address some of Manu's concerns, but here 
 goes:

 My thought is to have something like the following:

 GC.track();
 auto obj = allocateStuff();
 GC.cleanup(obj);

 The idea here is that track() tells GC to explicitly track all 
 objects
 created from that point until the cleanup call.  The cleanup() 
 call
 tells gc to limit its collection to those objects allocated 
 since the
 track() call.  The obj parameter tells gc to consider obj live.

 This way, you can avoid tracking everything that may get 
 created, but
 you can limit how much work gets done.

 Comments? Slams?

 Jerry

How do imagine it to work in multi-core programs? Does it only 
track thread local allocations?

Feb 13 2014

Jerry <jlquinn optonline.net> writes:

"Paulo Pinto" <pjmlp progtools.org> writes:

 On Friday, 14 February 2014 at 04:41:43 UTC, Jerry wrote:
 My thought is to have something like the following:

 GC.track();
 auto obj = allocateStuff();
 GC.cleanup(obj);

 The idea here is that track() tells GC to explicitly track all objects
 created from that point until the cleanup call.  The cleanup() call
 tells gc to limit its collection to those objects allocated since the
 track() call.  The obj parameter tells gc to consider obj live.

 This way, you can avoid tracking everything that may get created, but
 you can limit how much work gets done.

 How do imagine it to work in multi-core programs? Does it only track thread
 local allocations?

I think this can be handled by storing the thread that requests
tracking, and then each allocation is tracked if it's done from the same
thread that requested tracking.  Then cleanup just considers the objects
that were tracked.

Feb 14 2014

"Namespace" <rswhite4 googlemail.com> writes:

On Friday, 14 February 2014 at 04:41:43 UTC, Jerry wrote:
 Hi all,

 I just had the following thought on limiting the gc in regions.
  I don't
 know if this would address some of Manu's concerns, but here 
 goes:

 My thought is to have something like the following:

 GC.track();
 auto obj = allocateStuff();
 GC.cleanup(obj);

 The idea here is that track() tells GC to explicitly track all 
 objects
 created from that point until the cleanup call.  The cleanup() 
 call
 tells gc to limit its collection to those objects allocated 
 since the
 track() call.  The obj parameter tells gc to consider obj live.

 This way, you can avoid tracking everything that may get 
 created, but
 you can limit how much work gets done.

 Comments? Slams?

 Jerry

Looks like DIP 46: http://wiki.dlang.org/DIP46
I like the idea.

Feb 14 2014

"tcak" <tcak pcak.com> writes:

On Friday, 14 February 2014 at 04:41:43 UTC, Jerry wrote:
 Hi all,

 I just had the following thought on limiting the gc in regions.
  I don't
 know if this would address some of Manu's concerns, but here 
 goes:

 My thought is to have something like the following:

 GC.track();
 auto obj = allocateStuff();
 GC.cleanup(obj);

 The idea here is that track() tells GC to explicitly track all 
 objects
 created from that point until the cleanup call.  The cleanup() 
 call
 tells gc to limit its collection to those objects allocated 
 since the
 track() call.  The obj parameter tells gc to consider obj live.

 This way, you can avoid tracking everything that may get 
 created, but
 you can limit how much work gets done.

 Comments? Slams?

 Jerry

A programmer's aim is to tell computer what to do. Purpose of GC 
is to help him to prevent problems. In default, AFAIK, GC 
considers every part of memory in case there are references in 
them. Well, if the time taking process is scanning all memory, 
programmer could tell to GC, if he/she trusts about correctness, 
not to scan some parts of memory to limit scanning area. Example, 
if I create a char array of 10,000 items, why would I want GC to 
scan it. I won't put any object references in it for sure.

Feb 14 2014

"Paulo Pinto" <pjmlp progtools.org> writes:

On Friday, 14 February 2014 at 09:01:09 UTC, tcak wrote:
 On Friday, 14 February 2014 at 04:41:43 UTC, Jerry wrote:
 Hi all,

 I just had the following thought on limiting the gc in regions.
 I don't
 know if this would address some of Manu's concerns, but here 
 goes:

 My thought is to have something like the following:

 GC.track();
 auto obj = allocateStuff();
 GC.cleanup(obj);

 The idea here is that track() tells GC to explicitly track all 
 objects
 created from that point until the cleanup call.  The cleanup() 
 call
 tells gc to limit its collection to those objects allocated 
 since the
 track() call.  The obj parameter tells gc to consider obj live.

 This way, you can avoid tracking everything that may get 
 created, but
 you can limit how much work gets done.

 Comments? Slams?

 Jerry

 A programmer's aim is to tell computer what to do. Purpose of 
 GC is to help him to prevent problems. In default, AFAIK, GC 
 considers every part of memory in case there are references in 
 them. Well, if the time taking process is scanning all memory, 
 programmer could tell to GC, if he/she trusts about 
 correctness, not to scan some parts of memory to limit scanning 
 area. Example, if I create a char array of 10,000 items, why 
 would I want GC to scan it. I won't put any object references 
 in it for sure.

This only works when you are the only guy on the team and have a 
small codebase to visualize on your head.

The moment a middle size team comes into play, it is chaos.

There is a reason why manual memory managed languages have lost 
their place on the enterprise.

--
Paulo

Feb 14 2014

"tcak" <tcak pcak.com> writes:

 A programmer's aim is to tell computer what to do. Purpose of 
 GC is to help him to prevent problems. In default, AFAIK, GC 
 considers every part of memory in case there are references in 
 them. Well, if the time taking process is scanning all memory, 
 programmer could tell to GC, if he/she trusts about 
 correctness, not to scan some parts of memory to limit 
 scanning area. Example, if I create a char array of 10,000 
 items, why would I want GC to scan it. I won't put any object 
 references in it for sure.

 This only works when you are the only guy on the team and have 
 a small codebase to visualize on your head.

 The moment a middle size team comes into play, it is chaos.

 There is a reason why manual memory managed languages have lost 
 their place on the enterprise.

 --
 Paulo

Many people wants to disable GC to improve performance (if there 
are other reasons, it is not included here.). If after adding new 
codes, memory problems start, just disable the 
GC-disabled-code-parts (as I exampled with that 10,000 item 
array). This way, errors will disappear and performance may 
decrease a little. Then fixing can be done to increase 
performance again.

I think enabling GC for only some parts of code is wrong. It 
should be disabling it for some parts of code. This way, if 
programmer loses control of memory, he/she can remove 
GC-disabling codes, and tada everything works correctly without 
doing any other changes.

Feb 14 2014

Paulo Pinto <pjmlp progtools.org> writes:

Am 14.02.2014 16:46, schrieb tcak:
 A programmer's aim is to tell computer what to do. Purpose of GC is
 to help him to prevent problems. In default, AFAIK, GC considers
 every part of memory in case there are references in them. Well, if
 the time taking process is scanning all memory, programmer could tell
 to GC, if he/she trusts about correctness, not to scan some parts of
 memory to limit scanning area. Example, if I create a char array of
 10,000 items, why would I want GC to scan it. I won't put any object
 references in it for sure.

 This only works when you are the only guy on the team and have a small
 codebase to visualize on your head.

 The moment a middle size team comes into play, it is chaos.

 There is a reason why manual memory managed languages have lost their
 place on the enterprise.

 --
 Paulo

 Many people wants to disable GC to improve performance (if there are
 other reasons, it is not included here.). If after adding new codes,
 memory problems start, just disable the GC-disabled-code-parts (as I
 exampled with that 10,000 item array). This way, errors will disappear
 and performance may decrease a little. Then fixing can be done to
 increase performance again.

 I think enabling GC for only some parts of code is wrong. It should be
 disabling it for some parts of code. This way, if programmer loses
 control of memory, he/she can remove GC-disabling codes, and tada
 everything works correctly without doing any other changes.

Again, this example only works when you are the only guy working on the 
code.

For example, projects of the size of Linux kernel are only viable in 
languages like C, because there are guys validating every single line of 
code that gets added to the kernel.

In most projects that is far from truth, everyone just checks whatever 
they feel like. Then when the thing blows up on the customer and there 
are high escalation meetings going over, there are a few poor souls,
usually senior developers, going over commit history and using tools 
like Insure++ to track down the issue.

Sometimes it takes a whole week to track down such culprits.

I don't miss those days.

--
Paulo

Feb 14 2014

Jerry <jlquinn optonline.net> writes:

"tcak" <tcak pcak.com> writes:

 Many people wants to disable GC to improve performance (if there are other
 reasons, it is not included here.). If after adding new codes, memory problems
 start, just disable the GC-disabled-code-parts (as I exampled with that 10,000
 item array). This way, errors will disappear and performance may decrease a
 little. Then fixing can be done to increase performance again.

 I think enabling GC for only some parts of code is wrong. It should be
 disabling it for some parts of code. This way, if programmer loses control of
 memory, he/she can remove GC-disabling codes, and tada everything works
 correctly without doing any other changes.

My proposal was to leave GC enabled for the whole program.  The track
and cleanup call pair is intended to narrow the scope of GC in some
regions of the code.

Feb 14 2014

D Programming

C/C++ Programming

Other

digitalmars.D - Thought on limiting scope of GC