digitalmars.D - Easy & huge GC optimizations

Etienne (20/20) May 22 2014 I was thinking of how the GC could be optimized further and came across

Rainer Schuetze (10/30) May 22 2014 rt.lifetime makes heavy use of "NO_SCAN" for all array handling

Steven Schveighoffer (5/7) May 22 2014 It's done similarly to arrays. The function is here:
Etienne (25/29) May 22 2014 That's quite a relief, I was afraid of having to do it ;)

Rainer Schuetze (4/34) May 22 2014 Hmm, I guess I don't get the idea. You cannot skip a pool based on some

Chris (11/72) May 23 2014 I'm not a fan of machine learning, especially not in cases you

John Colvin (4/80) May 23 2014 Bear in mind here that most code goes though a whole bunch of

Etienne (5/8) May 23 2014 I'm happy you're here to say this. Machine learning is the future of
Chris (12/94) May 23 2014 What I'm saying is that in cases where you do have control you

Etienne (6/10) May 23 2014 If the GC errs, worst case you lose a few bytes of free space (Type 1

Chris (3/17) May 23 2014 Fair enough. But what about programs that allocate a lot and run

Etienne (16/18) May 23 2014 A server app? Couldn't have asked me for a better example.

"Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= (2/5) May 23 2014 What kind of machine learning? Branch prediction?

Etienne (10/55) May 23 2014 It only skips the inner search of the pool, like marking it NO_SCAN if a...

Etienne (7/9) May 23 2014 Sorry that's not true.

Rainer Schuetze (11/20) May 23 2014 Ok, so based on the samples, you skip fine grained marking of the

Etienne (11/13) May 23 2014 You store a void** and the original value, dereference it to see if it's...

Rainer Schuetze (9/22) May 23 2014 Looking at the address where the reference is stored gives you a hint

Etienne (5/8) May 23 2014 Yes, you're right, the test without the explicit calls to delete unveil

Etienne <etcimon gmail.com> writes:

I was thinking of how the GC could be optimized further and came across 
some sweet flags that are barely used throughout Phobos in favor of a 
witch-hunting against the GC:

GC.BlkAttr.NO_SCAN
GC.BlkAttr.NO_INTERIOR

When using the NO_SCAN attribute with GC.setAttr(p, NO_SCAN), you're 
basically doing removeRange on a GC allocation. It's never going to scan 
the memory in it, but the memory will stay alive if pointers are found 
pointing to anything inside it. This is very useful for strings! But I 
can't even find an example of a string with this flag. I'm totally baffled.

When using NO_INTERIOR attribute, you're telling the GC that nothing can 
point to the inside of the allocation if it's bigger than 4096 bytes, 
and to completely ignore scanning its contents in such case.

With these 2 attributes, one could write a simple recursive function in 
phobos that adjusts the flags on an object's allocations based on the 
type info.

Tuple!(int, int, int, string)[] bigTupleArray;
bigTupleArray.optimizeGC(); // sets NO_SCAN on ints, and on the pointee 
of string

Thoughts?

May 22 2014

Rainer Schuetze <r.sagitario gmx.de> writes:

On 22.05.2014 18:42, Etienne wrote:
 I was thinking of how the GC could be optimized further and came across
 some sweet flags that are barely used throughout Phobos in favor of a
 witch-hunting against the GC:

 GC.BlkAttr.NO_SCAN
 GC.BlkAttr.NO_INTERIOR

 When using the NO_SCAN attribute with GC.setAttr(p, NO_SCAN), you're
 basically doing removeRange on a GC allocation. It's never going to scan
 the memory in it, but the memory will stay alive if pointers are found
 pointing to anything inside it. This is very useful for strings! But I
 can't even find an example of a string with this flag. I'm totally baffled.

 When using NO_INTERIOR attribute, you're telling the GC that nothing can
 point to the inside of the allocation if it's bigger than 4096 bytes,
 and to completely ignore scanning its contents in such case.

 With these 2 attributes, one could write a simple recursive function in
 phobos that adjusts the flags on an object's allocations based on the
 type info.

 Tuple!(int, int, int, string)[] bigTupleArray;
 bigTupleArray.optimizeGC(); // sets NO_SCAN on ints, and on the pointee
 of string

 Thoughts?

rt.lifetime makes heavy use of "NO_SCAN" for all array handling 
(including strings). It uses the compiler generated flag inside the 
TypeInfo. Classes also deal with this flag (see _d_newclass).

I'm not sure where allocating a struct ends up, so maybe there is some 
potential there.

"NO_INTERIOR" is currently only used for the hash array used by 
associative arrays. It is a bit dangerous to use as any pointer,slice or 
register still operating on the array is ignored, so collecting it might 
corrupt your memory.

May 22 2014

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Thu, 22 May 2014 14:12:38 -0400, Rainer Schuetze <r.sagitario gmx.de>  
wrote:

 I'm not sure where allocating a struct ends up, so maybe there is some  
 potential there.

It's done similarly to arrays. The function is here:  
https://github.com/D-Programming-Language/druntime/blob/master/src/rt/lifetime.d#L991

-Steve

May 22 2014

Etienne <etcimon gmail.com> writes:

On 2014-05-22 2:12 PM, Rainer Schuetze wrote:
 "NO_INTERIOR" is currently only used for the hash array used by
 associative arrays. It is a bit dangerous to use as any pointer,slice or
 register still operating on the array is ignored, so collecting it might
 corrupt your memory.

That's quite a relief, I was afraid of having to do it ;)

I'm currently exploring the possibility of sampling the pointers during 
mark'ing to check if they're gone and using bayesian probabilities to 
decide whether or not to skip the pool.

I explained it all here: 
https://github.com/D-Programming-Language/druntime/pull/797#issuecomment-43896016

-- paste --
Basically, when marking, you take 1 in X of the references and send them 
to a specific array that represents the pool they refer to. Then, next 
time you're going to collect you test them individually and if they're 
mostly there, you skip marking/free'ing for that particular pool during 
collection. You can force collection on certain pools every 1 in X 
collections to even out the average lifetime of the references.

You're going to want to have a lower certainty of failure for big 
allocations, but basically you're using probabilities to avoid pushing a 
lot of useless load on the processor, especially when you're in a part 
of an application that's just allocating a lot (sampling will determine 
that the software is not in a state of data removal).

http://en.wikipedia.org/wiki/Bayes_factor

-- end paste --

The bayes factor is merely there to choose the appropriate model that 
fits with the program. Bayesian inference would take care of deciding if 
a pool should end up being mark'ed. In other words, machine learning.

Would you think it'd be a good optimization opportunity?

May 22 2014

Rainer Schuetze <r.sagitario gmx.de> writes:

On 22.05.2014 21:04, Etienne wrote:
 On 2014-05-22 2:12 PM, Rainer Schuetze wrote:
 "NO_INTERIOR" is currently only used for the hash array used by
 associative arrays. It is a bit dangerous to use as any pointer,slice or
 register still operating on the array is ignored, so collecting it might
 corrupt your memory.

 That's quite a relief, I was afraid of having to do it ;)

 I'm currently exploring the possibility of sampling the pointers during
 mark'ing to check if they're gone and using bayesian probabilities to
 decide whether or not to skip the pool.

 I explained it all here:
 https://github.com/D-Programming-Language/druntime/pull/797#issuecomment-43896016


 -- paste --
 Basically, when marking, you take 1 in X of the references and send them
 to a specific array that represents the pool they refer to. Then, next
 time you're going to collect you test them individually and if they're
 mostly there, you skip marking/free'ing for that particular pool during
 collection. You can force collection on certain pools every 1 in X
 collections to even out the average lifetime of the references.

 You're going to want to have a lower certainty of failure for big
 allocations, but basically you're using probabilities to avoid pushing a
 lot of useless load on the processor, especially when you're in a part
 of an application that's just allocating a lot (sampling will determine
 that the software is not in a state of data removal).

 http://en.wikipedia.org/wiki/Bayes_factor

 -- end paste --

 The bayes factor is merely there to choose the appropriate model that
 fits with the program. Bayesian inference would take care of deciding if
 a pool should end up being mark'ed. In other words, machine learning.

 Would you think it'd be a good optimization opportunity?

Hmm, I guess I don't get the idea. You cannot skip a pool based on some 
statistics, you might have references in there to anything. As a result 
you cannot collect anything.

May 22 2014

"Chris" <wendlec tcd.ie> writes:

On Friday, 23 May 2014 at 06:17:43 UTC, Rainer Schuetze wrote:
 On 22.05.2014 21:04, Etienne wrote:
 On 2014-05-22 2:12 PM, Rainer Schuetze wrote:
 "NO_INTERIOR" is currently only used for the hash array used 
 by
 associative arrays. It is a bit dangerous to use as any 
 pointer,slice or
 register still operating on the array is ignored, so 
 collecting it might
 corrupt your memory.

 That's quite a relief, I was afraid of having to do it ;)

 I'm currently exploring the possibility of sampling the 
 pointers during
 mark'ing to check if they're gone and using bayesian 
 probabilities to
 decide whether or not to skip the pool.

 I explained it all here:
 https://github.com/D-Programming-Language/druntime/pull/797#issuecomment-43896016


 -- paste --
 Basically, when marking, you take 1 in X of the references and 
 send them
 to a specific array that represents the pool they refer to. 
 Then, next
 time you're going to collect you test them individually and if 
 they're
 mostly there, you skip marking/free'ing for that particular 
 pool during
 collection. You can force collection on certain pools every 1 
 in X
 collections to even out the average lifetime of the references.

 You're going to want to have a lower certainty of failure for 
 big
 allocations, but basically you're using probabilities to avoid 
 pushing a
 lot of useless load on the processor, especially when you're 
 in a part
 of an application that's just allocating a lot (sampling will 
 determine
 that the software is not in a state of data removal).

 http://en.wikipedia.org/wiki/Bayes_factor

 -- end paste --

 The bayes factor is merely there to choose the appropriate 
 model that
 fits with the program. Bayesian inference would take care of 
 deciding if
 a pool should end up being mark'ed. In other words, machine 
 learning.

 Would you think it'd be a good optimization opportunity?

 Hmm, I guess I don't get the idea. You cannot skip a pool based 
 on some statistics, you might have references in there to 
 anything. As a result you cannot collect anything.

I'm not a fan of machine learning, especially not in cases you 
_can_ control, like memory allocation / deallocation. Guessing is 
not a good strategy, if you can have control over something. 
Machine learning is only good for vast and unpredictable data 
(voice recognition for example). Then it makes sense to apply 
probability. But if you can actually control what you are doing, 
why would you want to rely on a stupid and blind machine that 
decides things for you based on a probability of n%? Things can 
go wrong and you don't even know why. Mind you, we should rule 
the machines, not the other way around.

May 23 2014

"John Colvin" <john.loughran.colvin gmail.com> writes:

On Friday, 23 May 2014 at 13:43:53 UTC, Chris wrote:
 On Friday, 23 May 2014 at 06:17:43 UTC, Rainer Schuetze wrote:
 On 22.05.2014 21:04, Etienne wrote:
 On 2014-05-22 2:12 PM, Rainer Schuetze wrote:
 "NO_INTERIOR" is currently only used for the hash array used 
 by
 associative arrays. It is a bit dangerous to use as any 
 pointer,slice or
 register still operating on the array is ignored, so 
 collecting it might
 corrupt your memory.

 That's quite a relief, I was afraid of having to do it ;)

 I'm currently exploring the possibility of sampling the 
 pointers during
 mark'ing to check if they're gone and using bayesian 
 probabilities to
 decide whether or not to skip the pool.

 I explained it all here:
 https://github.com/D-Programming-Language/druntime/pull/797#issuecomment-43896016


 -- paste --
 Basically, when marking, you take 1 in X of the references 
 and send them
 to a specific array that represents the pool they refer to. 
 Then, next
 time you're going to collect you test them individually and 
 if they're
 mostly there, you skip marking/free'ing for that particular 
 pool during
 collection. You can force collection on certain pools every 1 
 in X
 collections to even out the average lifetime of the 
 references.

 You're going to want to have a lower certainty of failure for 
 big
 allocations, but basically you're using probabilities to 
 avoid pushing a
 lot of useless load on the processor, especially when you're 
 in a part
 of an application that's just allocating a lot (sampling will 
 determine
 that the software is not in a state of data removal).

 http://en.wikipedia.org/wiki/Bayes_factor

 -- end paste --

 The bayes factor is merely there to choose the appropriate 
 model that
 fits with the program. Bayesian inference would take care of 
 deciding if
 a pool should end up being mark'ed. In other words, machine 
 learning.

 Would you think it'd be a good optimization opportunity?

 Hmm, I guess I don't get the idea. You cannot skip a pool 
 based on some statistics, you might have references in there 
 to anything. As a result you cannot collect anything.

 I'm not a fan of machine learning, especially not in cases you 
 _can_ control, like memory allocation / deallocation. Guessing 
 is not a good strategy, if you can have control over something. 
 Machine learning is only good for vast and unpredictable data 
 (voice recognition for example). Then it makes sense to apply 
 probability. But if you can actually control what you are 
 doing, why would you want to rely on a stupid and blind machine 
 that decides things for you based on a probability of n%? 
 Things can go wrong and you don't even know why. Mind you, we 
 should rule the machines, not the other way around.

Bear in mind here that most code goes though a whole bunch of 
machine learning algorithms in the CPU itself. Like it or not, it 
has proved extremely successful.

May 23 2014

Etienne <etcimon gmail.com> writes:

On 2014-05-23 11:41 AM, John Colvin wrote:
 Bear in mind here that most code goes though a whole bunch of machine
 learning algorithms in the CPU itself. Like it or not, it has proved
 extremely successful.

I'm happy you're here to say this. Machine learning is the future of 
algorithms & heuristics, and then it has to work its way up, otherwise 
there's no real advantage of just doing it at a high-level for e.g. 
voice recognition

May 23 2014

"Chris" <wendlec tcd.ie> writes:

On Friday, 23 May 2014 at 15:41:39 UTC, John Colvin wrote:
 On Friday, 23 May 2014 at 13:43:53 UTC, Chris wrote:
 On Friday, 23 May 2014 at 06:17:43 UTC, Rainer Schuetze wrote:
 On 22.05.2014 21:04, Etienne wrote:
 On 2014-05-22 2:12 PM, Rainer Schuetze wrote:
 "NO_INTERIOR" is currently only used for the hash array 
 used by
 associative arrays. It is a bit dangerous to use as any 
 pointer,slice or
 register still operating on the array is ignored, so 
 collecting it might
 corrupt your memory.

 That's quite a relief, I was afraid of having to do it ;)

 I'm currently exploring the possibility of sampling the 
 pointers during
 mark'ing to check if they're gone and using bayesian 
 probabilities to
 decide whether or not to skip the pool.

 I explained it all here:
 https://github.com/D-Programming-Language/druntime/pull/797#issuecomment-43896016


 -- paste --
 Basically, when marking, you take 1 in X of the references 
 and send them
 to a specific array that represents the pool they refer to. 
 Then, next
 time you're going to collect you test them individually and 
 if they're
 mostly there, you skip marking/free'ing for that particular 
 pool during
 collection. You can force collection on certain pools every 
 1 in X
 collections to even out the average lifetime of the 
 references.

 You're going to want to have a lower certainty of failure 
 for big
 allocations, but basically you're using probabilities to 
 avoid pushing a
 lot of useless load on the processor, especially when you're 
 in a part
 of an application that's just allocating a lot (sampling 
 will determine
 that the software is not in a state of data removal).

 http://en.wikipedia.org/wiki/Bayes_factor

 -- end paste --

 The bayes factor is merely there to choose the appropriate 
 model that
 fits with the program. Bayesian inference would take care of 
 deciding if
 a pool should end up being mark'ed. In other words, machine 
 learning.

 Would you think it'd be a good optimization opportunity?

 Hmm, I guess I don't get the idea. You cannot skip a pool 
 based on some statistics, you might have references in there 
 to anything. As a result you cannot collect anything.

 I'm not a fan of machine learning, especially not in cases you 
 _can_ control, like memory allocation / deallocation. Guessing 
 is not a good strategy, if you can have control over 
 something. Machine learning is only good for vast and 
 unpredictable data (voice recognition for example). Then it 
 makes sense to apply probability. But if you can actually 
 control what you are doing, why would you want to rely on a 
 stupid and blind machine that decides things for you based on 
 a probability of n%? Things can go wrong and you don't even 
 know why. Mind you, we should rule the machines, not the other 
 way around.

 Bear in mind here that most code goes though a whole bunch of 
 machine learning algorithms in the CPU itself. Like it or not, 
 it has proved extremely successful.

What I'm saying is that in cases where you do have control you 
should not transfer it to the machine. Either you free memory 
yourself with free() or the GC mechanism is exact and does not 
"assume" things. This could cause inexplicable random bugs. I 
remember that about the GC introduced in Objective-C the manual 
said something like: Some objects may never be collected. I'm not 
an expert on GC, far from it, but I didn't like the sound of it.

I know that CPU's do a good bit of guessing. But that's not the 
same thing. If they err, they make up for it ("Ooops, it's not in 
the cache! Will get it from HD, just a nanosec!"). If the GC 
errs, how do you make up for it? Please educate me.

May 23 2014

Etienne <etcimon gmail.com> writes:

On 2014-05-23 1:29 PM, Chris wrote:
 I know that CPU's do a good bit of guessing. But that's not the same
 thing. If they err, they make up for it ("Ooops, it's not in the cache!
 Will get it from HD, just a nanosec!"). If the GC errs, how do you make
 up for it? Please educate me.

If the GC errs, worst case you lose a few bytes of free space (Type 1 
Error: skips collection for an object). If it weren't already known, the 
worst case would be that destructors are not guaranteed to be called.. 
but that's taken for granted now. Worst case will never be access 
violation even through machine learning

May 23 2014

"Chris" <wendlec tcd.ie> writes:

On Friday, 23 May 2014 at 17:38:27 UTC, Etienne wrote:
 On 2014-05-23 1:29 PM, Chris wrote:
 I know that CPU's do a good bit of guessing. But that's not 
 the same
 thing. If they err, they make up for it ("Ooops, it's not in 
 the cache!
 Will get it from HD, just a nanosec!"). If the GC errs, how do 
 you make
 up for it? Please educate me.

 If the GC errs, worst case you lose a few bytes of free space 
 (Type 1 Error: skips collection for an object). If it weren't 
 already known, the worst case would be that destructors are not 
 guaranteed to be called.. but that's taken for granted now. 
 Worst case will never be access violation even through machine 
 learning

Fair enough. But what about programs that allocate a lot and run 
for ages (a server app for example)?

May 23 2014

Etienne <etcimon gmail.com> writes:

On 2014-05-23 2:08 PM, Chris wrote:
 Fair enough. But what about programs that allocate a lot and run for
 ages (a server app for example)?

A server app? Couldn't have asked me for a better example.

You can see my native events fork here (I'm working on replacing libevent):

https://github.com/globecsys/vibe.d/tree/native-events/source/vibe/core/events

I actually need to improve the GC because of this and a cache library 
I'm making for vibe.d:

https://github.com/globecsys/cache.d

If you have 10,000 connections, they each create a 64KB buffer in the GC 
and if you don't want to risk collection for every one in a few times a 
new connection comes in, you need to use some sampling. Using FreeLists 
and manual memory management is a bad idea because if you're going to 
need to copy segments into the GC anyways if you want to extend the 
lifetime of data received over the network.

I have to create a copy of that data serialized because of manual 
management. I could store just a pointer if I didn't need to go around 
the GC which loses 95% of its time collecting absolutely nothing.

May 23 2014

"Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= writes:

On Friday, 23 May 2014 at 15:41:39 UTC, John Colvin wrote:
 Bear in mind here that most code goes though a whole bunch of 
 machine learning algorithms in the CPU itself. Like it or not, 
 it has proved extremely successful.

What kind of machine learning? Branch prediction?

May 23 2014

Etienne <etcimon gmail.com> writes:

On 2014-05-23 2:17 AM, Rainer Schuetze wrote:
 On 22.05.2014 21:04, Etienne wrote:
 On 2014-05-22 2:12 PM, Rainer Schuetze wrote:
 "NO_INTERIOR" is currently only used for the hash array used by
 associative arrays. It is a bit dangerous to use as any pointer,slice or
 register still operating on the array is ignored, so collecting it might
 corrupt your memory.

 That's quite a relief, I was afraid of having to do it ;)

 I'm currently exploring the possibility of sampling the pointers during
 mark'ing to check if they're gone and using bayesian probabilities to
 decide whether or not to skip the pool.

 I explained it all here:
 https://github.com/D-Programming-Language/druntime/pull/797#issuecomment-43896016



 -- paste --
 Basically, when marking, you take 1 in X of the references and send them
 to a specific array that represents the pool they refer to. Then, next
 time you're going to collect you test them individually and if they're
 mostly there, you skip marking/free'ing for that particular pool during
 collection. You can force collection on certain pools every 1 in X
 collections to even out the average lifetime of the references.

 You're going to want to have a lower certainty of failure for big
 allocations, but basically you're using probabilities to avoid pushing a
 lot of useless load on the processor, especially when you're in a part
 of an application that's just allocating a lot (sampling will determine
 that the software is not in a state of data removal).

 http://en.wikipedia.org/wiki/Bayes_factor

 -- end paste --

 The bayes factor is merely there to choose the appropriate model that
 fits with the program. Bayesian inference would take care of deciding if
 a pool should end up being mark'ed. In other words, machine learning.

 Would you think it'd be a good optimization opportunity?

 Hmm, I guess I don't get the idea. You cannot skip a pool based on some
 statistics, you might have references in there to anything. As a result
 you cannot collect anything.

It only skips the inner search of the pool, like marking it NO_SCAN if a 
sample of the pointers that pointed to it are still alive.

I mean, why would you want to check the pointers and mark every page in 
a memory zone when you know they're probably all there anyways? The idea 
is that you could manage to avoid collection altogether during periods 
of high allocation.

There's no other way to guess it. And specifying "GC.disable()" before 
making allocations is a little too verbose to consider it an 
optimization of the GC.

May 23 2014

Etienne <etcimon gmail.com> writes:

On 2014-05-23 12:33 PM, Etienne wrote:
 It only skips the inner search of the pool, like marking it NO_SCAN if a
 sample of the pointers that pointed to it are still alive.

Sorry that's not true.

It's like marking it NO_INTERIOR while it being still SCAN. By default, 
all the pages would be marked if the sample pointers to the pool are 
still alive.

And so the objective is to be able to skip collections. How many 
collections are executed only to recover only 1-2% of the memory?

May 23 2014

Rainer Schuetze <r.sagitario gmx.de> writes:

On 23.05.2014 18:41, Etienne wrote:
 On 2014-05-23 12:33 PM, Etienne wrote:
 It only skips the inner search of the pool, like marking it NO_SCAN if a
 sample of the pointers that pointed to it are still alive.

 Sorry that's not true.

 It's like marking it NO_INTERIOR while it being still SCAN. By default,
 all the pages would be marked if the sample pointers to the pool are
 still alive.

Ok, so based on the samples, you skip fine grained marking of the 
objects inside the pool. The memory still needs to be scanned for 
references, though.
I don't think this buys you a lot. You will have to scan more memory 
than when you can skip allocations that are no longer referenced.

BTW: How do you detect the sample pointers are alive? Or do you mean 
just the roots?

 And so the objective is to be able to skip collections. How many
 collections are executed only to recover only 1-2% of the memory?

I agree it would be good to have a measure to detect if it makes sense 
to run the full collection at all. I don't see how the sample pointers 
help here.

May 23 2014

Etienne <etcimon gmail.com> writes:

On 2014-05-23 2:14 PM, Rainer Schuetze wrote:
 BTW: How do you detect the sample pointers are alive? Or do you mean
 just the roots?

You store a void** and the original value, dereference it to see if it's 
the same value as the original. Loop through 20 of those if you have 
500, and you update them during marking by taking 1 in 20.

There's mathematical proof that you if you don't have 5 of those dead, 
you're great to go and don't need to collect.

Being flexible enough, you can skip 9 in 10 collections altogether imo

see this:
https://github.com/D-Programming-Language/druntime/pull/803

also mathematical proof is at page 154-183 of:
http://books.google.ca/books?id=b-XFrpBQ7d0C&lpg=PA119&pg=PA154#v=onepage&q&f=false

May 23 2014

Rainer Schuetze <r.sagitario gmx.de> writes:

On 23.05.2014 20:52, Etienne wrote:
 On 2014-05-23 2:14 PM, Rainer Schuetze wrote:
 BTW: How do you detect the sample pointers are alive? Or do you mean
 just the roots?

 You store a void** and the original value, dereference it to see if it's
 the same value as the original. Loop through 20 of those if you have
 500, and you update them during marking by taking 1 in 20.

Looking at the address where the reference is stored gives you a hint 
about modifications, but does not say anything about whether the object 
that contains this reference is still alive.

 There's mathematical proof that you if you don't have 5 of those dead,
 you're great to go and don't need to collect.

You still have to scan/mark the whole memory. Collecting unmarked memory 
is the easy part that doesn't cost too much time.

 Being flexible enough, you can skip 9 in 10 collections altogether imo

 see this:
 https://github.com/D-Programming-Language/druntime/pull/803

 also mathematical proof is at page 154-183 of:
 http://books.google.ca/books?id=b-XFrpBQ7d0C&lpg=PA119&pg=PA154#v=onepage&q&f=false

AFAICT your test case does not measure garbage collection, but manual 
memory management using the GC as the memory manager. delete/free are 
not meant to be called by user code as these are unsafe operations.

May 23 2014

Etienne <etcimon gmail.com> writes:

On 2014-05-23 3:08 PM, Rainer Schuetze wrote:
 AFAICT your test case does not measure garbage collection, but manual
 memory management using the GC as the memory manager. delete/free are
 not meant to be called by user code as these are unsafe operations.

Yes, you're right, the test without the explicit calls to delete unveil 
a 0.68% freed memory per collection cycle.

It's even worse than I had imagined, but it seems to fit in with how bad 
the GC appeared to slow down my applications.

May 23 2014

D Programming

C/C++ Programming

Other

digitalmars.D - Easy & huge GC optimizations