digitalmars.D.learn - Debugging a Memory Leak
- Maxime Chevalier-Boisvert (27/27) Nov 17 2014 There seems to be a memory leak in the Higgs compiler. This
- Steven Schveighoffer (16/39) Nov 17 2014 Hm... such a function could be created. However, it would be tricky to
- Maxime Chevalier-Boisvert (4/9) Nov 17 2014 Unfortunately, the program doesn't break or crash. It just keeps
- Steven Schveighoffer (13/22) Nov 18 2014 By "break down", I mean it does what you don't want :)
- Vladimir Panteleev (13/14) Nov 17 2014 The D GC has some debugging code which might be a little helpful
- Etienne (14/15) Nov 18 2014 I've tried dumping logs from the garbage collection process and it's the...
There seems to be a memory leak in the Higgs compiler. This problem shows up when running our test suite (`make test` command). A new VM object is created for each unittest block, e.g.: https://github.com/maximecb/Higgs/blob/master/source/runtime/tests.d#L201 These VM objects are unfortunately *never freed*. Not until the whole series of tests is run and the process terminates. The VM objects keep references to many other objects, and so the process keeps using more and more memory, up to over 2GB. The VM allocates it's own JS data heap that it manages itself, i.e.: https://github.com/maximecb/Higgs/blob/master/source/runtime/gc.d#L186 This memory is clearly marked as NO_SCAN, and so references to the VM in there should presumably not be counted. There is also executable memory I allocate with mmap, but this should also be ignored by the D GC in principle (I do not mark executable code as roots): https://github.com/maximecb/Higgs/blob/master/source/jit/codeblock.d#L129 I don't know where the problem lies. There could be false pointers, but I'm on a 64-bit system, which should presumably make this less likely. I wish there was a way to ask the D runtime "can you tell me what is pointing to this object?", but the situation is more complex because many objects in my system refer to the VM object, there is a complicated graph of references. If anything points into that graph, the whole thing stays "live". Help or advice on solving this problem is welcome.
Nov 17 2014
On 11/17/14 6:12 PM, Maxime Chevalier-Boisvert wrote:There seems to be a memory leak in the Higgs compiler. This problem shows up when running our test suite (`make test` command). A new VM object is created for each unittest block, e.g.: https://github.com/maximecb/Higgs/blob/master/source/runtime/tests.d#L201 These VM objects are unfortunately *never freed*. Not until the whole series of tests is run and the process terminates. The VM objects keep references to many other objects, and so the process keeps using more and more memory, up to over 2GB. The VM allocates it's own JS data heap that it manages itself, i.e.: https://github.com/maximecb/Higgs/blob/master/source/runtime/gc.d#L186 This memory is clearly marked as NO_SCAN, and so references to the VM in there should presumably not be counted. There is also executable memory I allocate with mmap, but this should also be ignored by the D GC in principle (I do not mark executable code as roots): https://github.com/maximecb/Higgs/blob/master/source/jit/codeblock.d#L129 I don't know where the problem lies. There could be false pointers, but I'm on a 64-bit system, which should presumably make this less likely. I wish there was a way to ask the D runtime "can you tell me what is pointing to this object?", but the situation is more complex because many objects in my system refer to the VM object, there is a complicated graph of references. If anything points into that graph, the whole thing stays "live".Hm... such a function could be created. However, it would be tricky to make work. First, you would need a way to store the pointer without having it actually point at the data. Clearly, if you pass the pointer to the function, it's going to be on the stack, so that would then refer to it. You have to somehow obfuscate it the whole time. Second, you may be given "memory x is pointing at your target", but what does memory x actually mean? That isn't something the GC can deal with. Perhaps when precise scanning is included (and I think we are close on that), you will have at least some type info.Help or advice on solving this problem is welcome.GC problems are *nasty*. My advice is to run the simplest program you can think of that still exhibits the problem, and then put in printf debugging everywhere to see where it breaks down. Not sure if this is useful. -Steve
Nov 17 2014
GC problems are *nasty*. My advice is to run the simplest program you can think of that still exhibits the problem, and then put in printf debugging everywhere to see where it breaks down. Not sure if this is useful.Unfortunately, the program doesn't break or crash. It just keeps allocating memory that doesn't get freed. There must be some false reference somewhere. I'm not sure how I can printf debug my way out of that.
Nov 17 2014
On 11/17/14 11:41 PM, Maxime Chevalier-Boisvert wrote:By "break down", I mean it does what you don't want :) You will need to instrument the GC and/or druntime. Note, if there is a false pointer, it's likely stack based, and likely there is not very many of them. But you have NO_INTERIOR set. This means the false pointer MUST point at the beginning of the block in order to keep it alive. As I said, these are tricky issues. It would not be easy to determine. One thing you can try -- allocate the block as a class, with a finalizer. This gives you the ability to sense when/if a block is finalized. That can help you determine the point at which your program starts to misbehave. -SteveGC problems are *nasty*. My advice is to run the simplest program you can think of that still exhibits the problem, and then put in printf debugging everywhere to see where it breaks down. Not sure if this is useful.Unfortunately, the program doesn't break or crash. It just keeps allocating memory that doesn't get freed. There must be some false reference somewhere. I'm not sure how I can printf debug my way out of that.
Nov 18 2014
On Monday, 17 November 2014 at 23:12:10 UTC, Maxime Chevalier-Boisvert wrote:Help or advice on solving this problem is welcome.The D GC has some debugging code which might be a little helpful (check the commented-out debug = X lines in druntime/src/gc/gc.d). Specifically, debug=LOGGING activates some sort of leak detector, though I'm not sure how effective it is as I've never used it. I've begun work on reviving Diamond to work for D2, multiple threads and x64. Once complete it should be able to answer such questions definitely, but it'll probably take a few days at least. Watch this space: https://github.com/CyberShadow/druntime/commits/diamond https://github.com/CyberShadow/Diamond
Nov 17 2014
On 2014-11-17 6:12 PM, Maxime Chevalier-Boisvert wrote:Help or advice on solving this problem is welcome.I've tried dumping logs from the garbage collection process and it's the biggest waste of time. Even if you left a reference somewhere, the logs will not help identify the code that caused it. Instead, you should do a test with the following: Store in a string[size_t] a list of pointers that should have been collected, along with the variable name. Once you assume they should have been collected, run this: The thread_scanAll function will send you valid memory ranges in your code. Run the stored size_t list against each value contained in the memory range. Accumulate everything that matches into another hashmap, and then fail with the error "Variables [list of identifiers] still have references in the code!"
Nov 18 2014