digitalmars.D - A few considerations on garbage collection

Andrei Alexandrescu (59/59) Apr 30 2014 I'm mulling over a couple of design considerations for allocators, and

Andrei Alexandrescu (6/9) Apr 30 2014 Here I meant:
Orvid King (11/24) Apr 30 2014 Although it would be a breaking change, I am intending to propose
bearophile (5/10) Apr 30 2014 If this is true then this needs to be documented in bold text.
Vladimir Panteleev (4/5) Apr 30 2014 I don't think the current GC can either recognize slices or
Dmitry Olshansky (21/81) Apr 30 2014 It doesn't and CAN'T. As long as there is a pointer into the block it

Andrei Alexandrescu (9/13) Apr 30 2014 Yah, if you find a register possibly pointing somewhere in the middle of...

Vladimir Panteleev (4/19) Apr 30 2014 I guess this is related to the reason Java switched its substring

Andrei Alexandrescu (3/22) Apr 30 2014 It it related, but not necessarily a motivator. I came to this naturally...

Steven Schveighoffer (24/35) Apr 30 2014 I think Dmitry's point is that the GC is not aware of the length portion...

Steven Schveighoffer (29/48) Apr 30 2014 ing

Jonathan M Davis via Digitalmars-d (9/23) May 01 2014 On Wed, 30 Apr 2014 20:08:03 -0400

Marco Leise (20/36) May 04 2014 I see that you are trying to account for allocator designs

Orvid King via Digitalmars-d (4/8) May 05 2014 That should work independent of what the GC decides to do, as it

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

I'm mulling over a couple of design considerations for allocators, and 
was thinking of the following restriction:

1. No out-of-bounds tricks and no pointer arithmetic. Consider:

int[] a = new int[1000];
a = a[250 .. 750];
int* p = a[500 .. $].ptr;

Subsequently the GC should be within its rights to deallocate any memory 
within the first and last 250 integers allocated, even though in theory 
the user may get to them by using pointer arithmetic.

In particular that means once a slice is shrunk, there's no growing back 
unless another slice is around.

I think the current GC already does that.

2. The same may be the case for classes WITHOUT destructors. Consider:

class A
{
    int[1000] a;
    int b;
    int[1000] c;
}
int* p = &(new A).b;

The collector should be allowed to deallocate any memory except b's own, 
even though that means the class has "holes" in it. The current GC does 
not do that.

2. However, the same shall not be the case for classes with destructors. 
Consider:

class B
{
    int[1000] a;
    int b;
    int[1000] c;
    ~this() { ... }
}
int* p = &(new B).b;

This class has a destructor, so it will be kept around in its entirety 
if an internal pointer is held.

3. Classes meant to have destructors called at collection will ALWAYS 
have been allocated with new (i.e. won't start in the middle of some 
other allocation). In other words, only class objects created with new 
will be properly collected. Those forced in odd places with emplace() 
are the responsibility of the user.

4. Currently, struct objects created with new do NOT have their 
destructors called during collection. I think this may continue, meaning 
that structs created with new are somewhat low-level and are meant to be 
destroyed and deallocated manually.

5. This brings up arrays of structs. As far as I understand, destructors 
will never be called for these, even after all references are gone:

struct S { ~this() { ... } }
auto a = new S[100];

Unlike (4), arrays of structs are high-level and frequently used. I 
think we must do something about it, so I plan to support calling 
destructors for arrays of structs.

6. The point above brings to mind more radical possibilities, such as 
making all arrays reference-counted and allowing compulsive deallocation 
when the reference counter goes down to zero. That would rule out things 
like escaping pointers to data inside arrays, which is quite extreme. 
But probably worth bringing up in a brainstorming. If we disallow 
statically constructs that take addresses we may get away with it.

Please chime in with your thoughts.

Andrei

Apr 30 2014

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 4/30/14, 8:33 AM, Andrei Alexandrescu wrote:
 int[] a = new int[1000];
 a = a[250 .. 750];
 int* p = a[500 .. $].ptr;

Here I meant:

int[] a = new int[1000];
int* p = a[500 .. $].ptr;
a = a[250 .. 750];


Andrei

Apr 30 2014

"Orvid King" <blah38621 gmail.com> writes:

On Wednesday, 30 April 2014 at 15:33:29 UTC, Andrei Alexandrescu 
wrote:
 4. Currently, struct objects created with new do NOT have their 
 destructors called during collection. I think this may 
 continue, meaning that structs created with new are somewhat 
 low-level and are meant to be destroyed and deallocated 
 manually.

 5. This brings up arrays of structs. As far as I understand, 
 destructors will never be called for these, even after all 
 references are gone:

 struct S { ~this() { ... } }
 auto a = new S[100];

 Unlike (4), arrays of structs are high-level and frequently 
 used. I think we must do something about it, so I plan to 
 support calling destructors for arrays of structs.

Although it would be a breaking change, I am intending to propose 
that the destructors of heap-allocated structs be handled in the 
same way as destructors of classes in my GC implementation. I am 
also intending to propose a guarantee that, if the program exits 
normally, all destructors will be invoked before the application 
ends. To help facilitate this I intend to have a thread dedicated 
to calling destructors. I'm still working on typing out the 
entire design of the GC, but I don't currently see a reason 
(design-wise) why I'd have to change these things.

Apr 30 2014

"bearophile" <bearophileHUGS lycos.com> writes:

Andrei Alexandrescu:

 Subsequently the GC should be within its rights to deallocate 
 any memory within the first and last 250 integers allocated, 
 even though in theory the user may get to them by using pointer 
 arithmetic.

Such use of point arithmetic is not uncommon.


 I think the current GC already does that.

If this is true then this needs to be documented in bold text.

Bye,
bearophile

Apr 30 2014

"Vladimir Panteleev" <vladimir thecybershadow.net> writes:

On Wednesday, 30 April 2014 at 15:33:29 UTC, Andrei Alexandrescu 
wrote:
 I think the current GC already does that.

I don't think the current GC can either recognize slices or 
deallocate parts of an object.

Apr 30 2014

Dmitry Olshansky <dmitry.olsh gmail.com> writes:

30-Apr-2014 19:33, Andrei Alexandrescu пишет:
 I'm mulling over a couple of design considerations for allocators, and
 was thinking of the following restriction:

 1. No out-of-bounds tricks and no pointer arithmetic. Consider:

 int[] a = new int[1000];
 a = a[250 .. 750];
 int* p = a[500 .. $].ptr;

 Subsequently the GC should be within its rights to deallocate any memory
 within the first and last 250 integers allocated, even though in theory
 the user may get to them by using pointer arithmetic.

 In particular that means once a slice is shrunk, there's no growing back
 unless another slice is around.

 I think the current GC already does that.

It doesn't and CAN'T. As long as there is a pointer into the block it 
stays. There are plenty of ways to shoot yourself in the foot if this 
rule is not respected. Anyhow, for starters, a conservative GC doesn't 
know what is a slice when ptr/length sits in registers!

 2. The same may be the case for classes WITHOUT destructors. Consider:

 class A
 {
     int[1000] a;
     int b;
     int[1000] c;
 }
 int* p = &(new A).b;

 The collector should be allowed to deallocate any memory except b's own,
 even though that means the class has "holes" in it. The current GC does
 not do that.

 2. However, the same shall not be the case for classes with destructors.
 Consider:

 class B
 {
     int[1000] a;
     int b;
     int[1000] c;
     ~this() { ... }
 }
 int* p = &(new B).b;

 This class has a destructor, so it will be kept around in its entirety
 if an internal pointer is held.

Does it make *anything* simpler/better? I don't see significant gains in 
any sane code.

 3. Classes meant to have destructors called at collection will ALWAYS
 have been allocated with new (i.e. won't start in the middle of some
 other allocation). In other words, only class objects created with new
 will be properly collected. Those forced in odd places with emplace()
 are the responsibility of the user.

OK

 4. Currently, struct objects created with new do NOT have their
 destructors called during collection. I think this may continue, meaning
 that structs created with new are somewhat low-level and are meant to be
 destroyed and deallocated manually.

IIRC they do, it's only arrays of such that doesn't. Anyhow having such 
a dangerous construct built-in (new = resource leak) in the language 
becomes questionable.

 5. This brings up arrays of structs. As far as I understand, destructors
 will never be called for these, even after all references are gone:

 struct S { ~this() { ... } }
 auto a = new S[100];

 Unlike (4), arrays of structs are high-level and frequently used. I
 think we must do something about it, so I plan to support calling
 destructors for arrays of structs.

Making the point about NOT calling destructor that much more 
schizophrenic. Either do it properly or not at all.

 6. The point above brings to mind more radical possibilities, such as
 making all arrays reference-counted and allowing compulsive deallocation
 when the reference counter goes down to zero.

So passing them around becomes costly, and down towards the RCslice we 
march.

 That would rule out things
 like escaping pointers to data inside arrays, which is quite extreme.

Discussions about such haven't went anywhere good. Attempts to redesign 
slices to be ref-counted soon lead to RC-ing everything.

BTW there could be sane code in the wild that may have non-reclaimable 
cycles even if ONLY array slices would become ref-counted.

 But probably worth bringing up in a brainstorming. If we disallow
 statically constructs that take addresses we may get away with it.

 Please chime in with your thoughts.

 Andrei


-- 
Dmitry Olshansky

Apr 30 2014

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 4/30/14, 11:15 AM, Dmitry Olshansky wrote:
 It doesn't and CAN'T. As long as there is a pointer into the block it
 stays. There are plenty of ways to shoot yourself in the foot if this
 rule is not respected. Anyhow, for starters, a conservative GC doesn't
 know what is a slice when ptr/length sits in registers!

Yah, if you find a register possibly pointing somewhere in the middle of 
an array, all bets are off.

But let's say that's not the case and all that's pointing in a 1M 
elements int[] is a int[] of 5 elements somewhere in the middle. Do we 
want to support these?

s = s.ptr[-1 .. $ + 1]

and such?


Andrei

Apr 30 2014

"Vladimir Panteleev" <vladimir thecybershadow.net> writes:

On Wednesday, 30 April 2014 at 20:29:37 UTC, Andrei Alexandrescu 
wrote:
 On 4/30/14, 11:15 AM, Dmitry Olshansky wrote:
 It doesn't and CAN'T. As long as there is a pointer into the 
 block it
 stays. There are plenty of ways to shoot yourself in the foot 
 if this
 rule is not respected. Anyhow, for starters, a conservative GC 
 doesn't
 know what is a slice when ptr/length sits in registers!

 Yah, if you find a register possibly pointing somewhere in the 
 middle of an array, all bets are off.

 But let's say that's not the case and all that's pointing in a 
 1M elements int[] is a int[] of 5 elements somewhere in the 
 middle. Do we want to support these?

 s = s.ptr[-1 .. $ + 1]

 and such?

I guess this is related to the reason Java switched its substring 
method to make a copy instead of hold a reference?

Apr 30 2014

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 4/30/14, 1:34 PM, Vladimir Panteleev wrote:
 On Wednesday, 30 April 2014 at 20:29:37 UTC, Andrei Alexandrescu wrote:
 On 4/30/14, 11:15 AM, Dmitry Olshansky wrote:
 It doesn't and CAN'T. As long as there is a pointer into the block it
 stays. There are plenty of ways to shoot yourself in the foot if this
 rule is not respected. Anyhow, for starters, a conservative GC doesn't
 know what is a slice when ptr/length sits in registers!

 Yah, if you find a register possibly pointing somewhere in the middle
 of an array, all bets are off.

 But let's say that's not the case and all that's pointing in a 1M
 elements int[] is a int[] of 5 elements somewhere in the middle. Do we
 want to support these?

 s = s.ptr[-1 .. $ + 1]

 and such?

 I guess this is related to the reason Java switched its substring method
 to make a copy instead of hold a reference?

It it related, but not necessarily a motivator. I came to this naturally 
while designing allocators. -- Andrei

Apr 30 2014

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Wed, 30 Apr 2014 16:29:38 -0400, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:

 On 4/30/14, 11:15 AM, Dmitry Olshansky wrote:
 It doesn't and CAN'T. As long as there is a pointer into the block it
 stays. There are plenty of ways to shoot yourself in the foot if this
 rule is not respected. Anyhow, for starters, a conservative GC doesn't
 know what is a slice when ptr/length sits in registers!

 Yah, if you find a register possibly pointing somewhere in the middle of  
 an array, all bets are off.

I think Dmitry's point is that the GC is not aware of the length portion  
of a slice, only the pointer portion. By definition, a pointer at a block  
could refer to the whole block.

In other words, in your original example, it is aware that 'a' points at  
the block, but not that a's length is 500. Theoretically, you could assume  
that a pointer may only lock all data *after* the pointer in the block. In  
other words, the first 250 ints could be reallocated if not referred to.  
But I think this is too limiting a behavior, for not much gain.

 But let's say that's not the case and all that's pointing in a 1M  
 elements int[] is a int[] of 5 elements somewhere in the middle. Do we  
 want to support these?

I think we are never ever going to get away from the requirement of users  
to be at least cognizant of these situations.

Consider this:

int[] x;

x.reserve(1_000_000);

should the GC at this point just refuse, citing its rule that you just  
can't ask for that much memory until you want to use it? Or maybe the GC  
comes along and says "you're not using that, I'm going to take it back,"  
when your goal explicitly is to pre-allocate such data.

 s = s.ptr[-1 .. $ + 1]

I can think of a very important use case that does this -- interfaces.  
Possibly you could allow this exception, but I don't think it's worth  
disallowing a user type that might mimic such useful behavior for some  
low-level purpose.

-Steve

Apr 30 2014

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Wed, 30 Apr 2014 14:15:03 -0400, Dmitry Olshansky  =

<dmitry.olsh gmail.com> wrote:

 30-Apr-2014 19:33, Andrei Alexandrescu =D0=BF=D0=B8=D1=88=D0=B5=D1=82:=


 4. Currently, struct objects created with new do NOT have their
 destructors called during collection. I think this may continue, mean=


ing
 that structs created with new are somewhat low-level and are meant to=


 be
 destroyed and deallocated manually.


I think structs can and will be destructed from the heap at some point. =
 =

What we do need, however, is some more intelligence in GC destruction.  =

Assumptions don't cut it, especially in multi-threaded code.

 IIRC they do, it's only arrays of such that doesn't. Anyhow having suc=

h  =

 a dangerous construct built-in (new =3D resource leak) in the language=

  =

 becomes questionable.

No, they don't. Only objects are marked as having a finalizer. The  =

finalize flag in the GC assumes that the first size_t in the block is a =
 =

pointer to a vtable. A struct cannot have this.

We need to fundamentally modify how this works if we want finalizers for=
  =

structs to be called, but I think it's worth doing. IIRC, Rainer's preci=
se  =

GC does this.

 5. This brings up arrays of structs. As far as I understand, destruct=


ors
 will never be called for these, even after all references are gone:

 struct S { ~this() { ... } }
 auto a =3D new S[100];

 Unlike (4), arrays of structs are high-level and frequently used. I
 think we must do something about it, so I plan to support calling
 destructors for arrays of structs.


An array of structs is a struct itself, and not a class. While it could =
 =

conceivably be done much easier than structs with the current layout, it=
  =

would eliminate 16-byte arrays, because the first size_t would have to b=
e  =

reserved! Hopefully my explanation above helps to understand this.

 Making the point about NOT calling destructor that much more  =

 schizophrenic. Either do it properly or not at all.

Agree.

-Steve

Apr 30 2014

Jonathan M Davis via Digitalmars-d <digitalmars-d puremagic.com> writes:

On Wed, 30 Apr 2014 20:08:03 -0400
Steven Schveighoffer via Digitalmars-d <digitalmars-d puremagic.com>
wrote:

 On Wed, 30 Apr 2014 14:15:03 -0400, Dmitry Olshansky  
 <dmitry.olsh gmail.com> wrote:
 
 IIRC they do, it's only arrays of such that doesn't. Anyhow having
 such a dangerous construct built-in (new = resource leak) in the
 language becomes questionable.

 
 No, they don't. Only objects are marked as having a finalizer. The  
 finalize flag in the GC assumes that the first size_t in the block is
 a pointer to a vtable. A struct cannot have this.
 
 We need to fundamentally modify how this works if we want finalizers
 for structs to be called, but I think it's worth doing. IIRC,
 Rainer's precise GC does this.

It would be _very_ cool if we could make it so that struct destructors
get run when they're on the GC heap. I'm sure that the fact that
they're not called currently creates a number of subtle bugs when
structs are used which assume that they're destructor is going to be
called when they're destroyed/freed.

- Jonathan M Davis

May 01 2014

Marco Leise <Marco.Leise gmx.de> writes:

Am Wed, 30 Apr 2014 08:33:25 -0700
schrieb Andrei Alexandrescu <SeeWebsiteForEmail erdani.org>:

 I'm mulling over a couple of design considerations for allocators, and 
 was thinking of the following restriction:
 
 1. No out-of-bounds tricks and no pointer arithmetic. Consider:
 
 int[] a = new int[1000];
 int* p = a[500 .. $].ptr;
 a = a[250 .. 750];
 
 Subsequently the GC should be within its rights to deallocate any memory 
 within the first and last 250 integers allocated, even though in theory 
 the user may get to them by using pointer arithmetic.

I see that you are trying to account for allocator designs
that could reuse these memory fragments.
If this is for  safe, then maybe some memory could be released,
but you'd have to statically verify that internal pointers
don't make it into unsafe code where I wonder if any memory
would be released if I wrote:

	size_t length = 100;
	int* p = (new int[](length)).ptr;
	GC.collect();
	p[length-1] = 42;

So it is difficult to give a good answer. I'd say no until it
is clear how it would work outside of  safe.

 6. The point above brings to mind more radical possibilities, such as 
 making all arrays reference-counted and allowing compulsive deallocation 
 when the reference counter goes down to zero. That would rule out things 
 like escaping pointers to data inside arrays, which is quite extreme.

Would that affect all arrays, only arrays containing structs
or only affect arrays containing structs with dtors?

	printf("hello\n".ptr);

should still work after all.

-- 
Marco

May 04 2014

Orvid King via Digitalmars-d <digitalmars-d puremagic.com> writes:

On 5/4/14, Marco Leise via Digitalmars-d <digitalmars-d puremagic.com> wrote:
 Would that affect all arrays, only arrays containing structs
 or only affect arrays containing structs with dtors?

 	printf("hello\n".ptr);

 should still work after all.

That should work independent of what the GC decides to do, as it
should be getting emitted as a literal pointer to "hello\n" in the RO
data-segment.

May 05 2014

D Programming

C/C++ Programming

Other

digitalmars.D - A few considerations on garbage collection