www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Inspecting GC memory/stats

reply "Iain Buclaw" <ibuclaw gdcproject.org> writes:
Hi,

I find myself wondering what do people use to inspect the GC on a 
running process?

Last night, I had a look at a couple of vibe.d servers that had 
been running for a little over 100 days.  But the same code, but 
one used less (or not at all).

Given that the main app logic is rather simple, and things that 
may be otherwise held in memory (files) are offloaded onto a 
Redis server.  I've have thought that it's consumption would have 
stayed pretty much stable.  But to my surprise, I found that the 
server that is under more (but not heavy) use is consuming a 
whooping 200MB.

Initially I had tried to see if this could be shrunk in some way 
or form.  Attached gdb to the process, called gc_minimize(), 
gc_collect() - but it didn't have any effect.

When I noticed gc_stats with an informative *experimental* 
warning, I thought "lets just run it anyway and see what 
happens"... SEGV.  Wonderful.

As I'm probably going to have to wait 100 days for the apparent 
leak (I'm sure it is a leak though) to show itself again, does 
anyone have any nice suggestions whether or not to confirm this 
memory is just being held and never freed by the GC?

Iain.
Nov 10 2014
parent reply Rainer Schuetze <r.sagitario gmx.de> writes:
On 11.11.2014 08:36, Iain Buclaw wrote:
 Hi,

 I find myself wondering what do people use to inspect the GC on a
 running process?

 Last night, I had a look at a couple of vibe.d servers that had been
 running for a little over 100 days.  But the same code, but one used
 less (or not at all).

 Given that the main app logic is rather simple, and things that may be
 otherwise held in memory (files) are offloaded onto a Redis server.
 I've have thought that it's consumption would have stayed pretty much
 stable.  But to my surprise, I found that the server that is under more
 (but not heavy) use is consuming a whooping 200MB.

 Initially I had tried to see if this could be shrunk in some way or
 form.  Attached gdb to the process, called gc_minimize(), gc_collect() -
 but it didn't have any effect.
The GC allocates pools of memory in increasing size. It starts with 1 MB, then adds 3MB for every new pool. (Numbers might be slightly different depending on the druntime version). These pools are then used to service any allocation request. gc_minimize can only return memory to the system if all the allocation in a pool are collected, which is very unlikely.
 When I noticed gc_stats with an informative *experimental* warning, I
 thought "lets just run it anyway and see what happens"... SEGV.  Wonderful.
I suspect calling gc_stats from the debugger is "experimental" because it returns a struct. With a bit of casting, you might by able to call "_gc.getStats( *cast(GCStats*)some_mem_adr );" instead.
 As I'm probably going to have to wait 100 days for the apparent leak
 (I'm sure it is a leak though) to show itself again, does anyone have
 any nice suggestions whether or not to confirm this memory is just being
 held and never freed by the GC?

 Iain.
Nov 11 2014
parent reply "Iain Buclaw" <ibuclaw gdcproject.org> writes:
On Tuesday, 11 November 2014 at 20:20:26 UTC, Rainer Schuetze 
wrote:
 On 11.11.2014 08:36, Iain Buclaw wrote:
 Hi,

 I find myself wondering what do people use to inspect the GC 
 on a
 running process?

 Last night, I had a look at a couple of vibe.d servers that 
 had been
 running for a little over 100 days.  But the same code, but 
 one used
 less (or not at all).

 Given that the main app logic is rather simple, and things 
 that may be
 otherwise held in memory (files) are offloaded onto a Redis 
 server.
 I've have thought that it's consumption would have stayed 
 pretty much
 stable.  But to my surprise, I found that the server that is 
 under more
 (but not heavy) use is consuming a whooping 200MB.

 Initially I had tried to see if this could be shrunk in some 
 way or
 form.  Attached gdb to the process, called gc_minimize(), 
 gc_collect() -
 but it didn't have any effect.
The GC allocates pools of memory in increasing size. It starts with 1 MB, then adds 3MB for every new pool. (Numbers might be slightly different depending on the druntime version). These pools are then used to service any allocation request. gc_minimize can only return memory to the system if all the allocation in a pool are collected, which is very unlikely.
I'm aware of roughly how the gc grows. But it seems an unlikely scenario to have 200MB worth of 3MB pools with at least one object in each. And if it did get to that state, the next question would be, how? I could say, I'd expect that if a large number of requests came in all at once, but I would have been prompted by this in the network graphs.
 When I noticed gc_stats with an informative *experimental* 
 warning, I
 thought "lets just run it anyway and see what happens"... 
 SEGV.  Wonderful.
I suspect calling gc_stats from the debugger is "experimental" because it returns a struct. With a bit of casting, you might by able to call "_gc.getStats( *cast(GCStats*)some_mem_adr );" instead.
No, that is not the reason. More like the iterative scan may be unsafe. I should have looked closer at the backtrace / memory location that was violated (I was in a hurry to get the site back up), but a more likely cause of the SEGV is that one of the pools in gcx.pooltable[n] or pages in pool.pagetable[n] was pointing to a free'd, stomped, or null location. Iain.
Nov 12 2014
parent Rainer Schuetze <r.sagitario gmx.de> writes:
On 12.11.2014 19:01, Iain Buclaw wrote:
 On Tuesday, 11 November 2014 at 20:20:26 UTC, Rainer Schuetze wrote:
 The GC allocates pools of memory in increasing size. It starts with 1
 MB, then adds 3MB for every new pool. (Numbers might be slightly
 different depending on the druntime version). These pools are then
 used to service any allocation request.

 gc_minimize can only return memory to the system if all the allocation
 in a pool are collected, which is very unlikely.
I'm aware of roughly how the gc grows. But it seems an unlikely scenario to have 200MB worth of 3MB pools with at least one object in each.
The pools have sizes 1, 4, 7, 10, 14, 17, 20, 23, 26, 29 and 33 MB, so only 11 pools overall. It will depend on your allocation patterns if they hold at least one object though.
 And if it did get to that state, the next question would be, how?  I
 could say, I'd expect that if a large number of requests came in all at
 once, but I would have been prompted by this in the network graphs.
The only reason I see would be fragmentation, but that should only happen if you have bad allocation patterns for large (>2k) memory blocks, e.g. a large growing array with allocation of smaller (but still
 2k) size inbetween that are never collected.
 When I noticed gc_stats with an informative *experimental* warning, I
 thought "lets just run it anyway and see what happens"... SEGV.
 Wonderful.
I suspect calling gc_stats from the debugger is "experimental" because it returns a struct. With a bit of casting, you might by able to call "_gc.getStats( *cast(GCStats*)some_mem_adr );" instead.
No, that is not the reason. More like the iterative scan may be unsafe. I should have looked closer at the backtrace / memory location that was violated (I was in a hurry to get the site back up), but a more likely cause of the SEGV is that one of the pools in gcx.pooltable[n] or pages in pool.pagetable[n] was pointing to a free'd, stomped, or null location.
I have used it without problems in the past when working on the GC. Maybe you have stopped the application during a collection?
Nov 14 2014