digitalmars.D - GC and memory leaks
- Ald Sannes (52/52) Nov 11 2007 Hello.
- Vladimir Panteleev (11/13) Nov 11 2007 AFAIK, DMD's GC does not release memory back to the OS, ever.
- Ald Sannes (5/18) Nov 11 2007 .......
- Vladimir Panteleev (18/19) Nov 11 2007 aks? I already delete everything I declare; guess some memory is alloca...
- Kevin Bealer (7/29) Nov 12 2007 I think even malloc() does not free memory to the OS. Getting memory fr...
- David B. Held (5/8) Nov 13 2007 I thought that too, but I wrote a test in both C++ and D that prove that...
- Oskar Linde (22/30) Nov 14 2007 The glibc malloc/free/realloc implementation (used on virtually all
- Janice Caron (4/5) Nov 11 2007 If that's true, you may be able to fix it by making your array a
Hello. I have good reasons to believe there are bugs in GC, or Phobos, or Zlib that comes bundled with Dmd 1.023 Here is my main function: void main(char[][] argumentList) { std.gc.minimize(); buildTemporaryIndex(); std.gc.fullCollect(); buildPermanentIndex(); findWords(); } The three functions are completely isolated from each other, they only communicate through disk IO (by the way, great library for file IO). Before exiting from the first function, I explicitly go through each class' static members and delete them. Delete each element in case of array or has table. Yet the 800 Mbytes of memory are not being freed until the program terminates. Next issue. I have commented out everything except for the code that decompresses data in files for processing. main() { buildTemporaryIndex();} void buildTemporaryIndex() { char[][] datasetFileList = listdir(Config.getInputDirectory()); for(int i = 0; i < datasetFileList.length; i+=2) { indexFileStream = IndexDecompressor.gunzipFile(datasetFileList[i+1]); pageFileStream = IndexDecompressor.gunzipFile(datasetFileList[i]); delete indexFileStream; delete pageFileStream; std.gc.fullCollect(); //break; } } public static char[] gunzipFile (char[] fileName) { int zipFileSize = getSize(fileName); void [] zipFileContentRaw = read(fileName); void[] zipFileContent = uncompress(zipFileContentRaw, zipFileSize*2, 24); delete zipFileContentRaw; //delete zipFileContent; return cast (char []) zipFileContent; //return ""; } The problem is that, despite the delete statments and calls to garbage collector to free all it can, the program hugs some 100 Mbytes of main memory, which roughly corresponds to to the size of data extracted, until the termination. I speculate that, despite gc.noRoots calls in the zlib wrapper, the memory leak happens there; the raw data in array is being taken for pointers that point literally everywhere, thus no memory is ever deallocated. Third. To parse HTML, I used std.regesp.replace(). On some files, it loops, ate all memory in less than a minute and crashed. What can I do to help find the issues? If it helps, I can post the entire source code. And even the data set (50 Mbytes). And one more thing. Please fix http://www.digitalmars.com/d/1.0/dcompiler.html, for the link labeled 'latest compiler' points to DMD 1.015. Thanks
Nov 11 2007
On Sun, 11 Nov 2007 18:34:03 +0200, Ald Sannes <aldarri_s yahoo.com> wrote:Yet the 800 Mbytes of memory are not being freed until the program terminates.AFAIK, DMD's GC does not release memory back to the OS, ever. Also, minimize() does nothing, and genCollect() does the same thing as fullCollect(). One thing you could try is Tango's GC, which in my experience behaves better in some circumstances. You can use Tangobos[1] to keep the Phobos API and use Tango's runtime (which includes the GC).To parse HTML, I used std.regesp.replace(). On some files, it loops, ate all memory in less than a minute and crashed.std.regexp has some known issues. Unless you're in the mood to debug and fix it (which would be making all of us a favour), for real work you might be better off finding some libpcre wrappers. Just in case, first check that your input is valid UTF-8 - that got me once (broken UTF-8 sequences make std.regexp crash and burn). There's also a compile-time regexp engine by Pragma and Don Clugston[2]. [1] http://dsource.org/projects/tangobos [2] http://www.dsource.org/projects/ddl/browser/trunk/meta/regex.d -- Best regards, Vladimir mailto:thecybershadow gmail.com
Nov 11 2007
Vladimir Panteleev Wrote:On Sun, 11 Nov 2007 18:34:03 +0200, Ald Sannes <aldarri_s yahoo.com> wrote:....... ... Ok, let's then manage memory manually. Where should I look for the leaks? I already delete everything I declare; guess some memory is allocated behind the scenes?Yet the 800 Mbytes of memory are not being freed until the program terminates.AFAIK, DMD's GC does not release memory back to the OS, ever. Also, minimize() does nothing, and genCollect() does the same thing as fullCollect(). One thing you could try is Tango's GC, which in my experience behaves better in some circumstances. You can use Tangobos[1] to keep the Phobos API and use Tango's runtime (which includes the GC).Thanks. Actually, since all I need is to find text in HTML, a FSA, built with a huge two-level switch structure, proved to be sufficient.To parse HTML, I used std.regesp.replace(). On some files, it loops, ate all memory in less than a minute and crashed.std.regexp has some known issues. Unless you're in the mood to debug and fix it (which would be making all of us a favour), for real work you might be better off finding some libpcre wrappers. Just in case, first check that your input is valid UTF-8 - that got me once (broken UTF-8 sequences make std.regexp crash and burn).
Nov 11 2007
On Sun, 11 Nov 2007 21:10:26 +0200, Ald Sannes <aldarri_s yahoo.com> wro= te:Ok, let's then manage memory manually. Where should I look for the le=aks? I already delete everything I declare; guess some memory is alloca= ted behind the scenes? D's "delete" statement does not return memory back to the OS - it just m= arks the block free for the GC to reuse in further reallocations. Truely= "manual" memory management means that you'll have to use malloc/free fr= om std.c.stdlib. One workaround I could suggest is putting the code that has the one-time= large memory requirement in a separate DLL. Since it'll have its own GC= , the GC will release (almost [1]) all memory back to the OS when the DL= L is unloaded. Note that Tango's runtime doesn't do this, and as far as = I understood the Tango developers don't care much[2]. [1] http://d.puremagic.com/issues/show_bug.cgi?id=3D1551 [2] http://www.dsource.org/projects/tango/ticket/669 -- = Best regards, Vladimir mailto:thecybershadow gmail.com
Nov 11 2007
Ald Sannes Wrote:Vladimir Panteleev Wrote:I think even malloc() does not free memory to the OS. Getting memory from the OS and returning it to the OS are expensive operations so most implementations will allocate chunks of memory. If you want to make sure memory is returned to the OS, you can create files and use mmap(). When you close the file and munmap the memory, the OS will truly get the memory back. Some operations like "new ObjectName", associative arrays, dynamic arrays, and "x ~ y" will automatically use normal memory though, so you would be restricted to using "struct" instead of class, and doing certain other things the "hard way". But to be honest, I can't think of a good reason (in general) to do these things... whether the O/S owns the free memory or the application's garbage collector does is unlikely to hurt anything, except for your attempts to get an accurate picture of what is going on. If you want to see if memory is being leaked, run the method that might be leaking memory in a loop. If the loop iterates 10 times and you still only have 800 MB of application size then I would guess that there is no leaking. If the application keeps getting bigger with each iteration, then that probably indicates a problem. KevinOn Sun, 11 Nov 2007 18:34:03 +0200, Ald Sannes <aldarri_s yahoo.com> wrote:....... ... Ok, let's then manage memory manually. Where should I look for the leaks? I already delete everything I declare; guess some memory is allocated behind the scenes?Yet the 800 Mbytes of memory are not being freed until the program terminates.AFAIK, DMD's GC does not release memory back to the OS, ever. Also, minimize() does nothing, and genCollect() does the same thing as fullCollect(). One thing you could try is Tango's GC, which in my experience behaves better in some circumstances. You can use Tangobos[1] to keep the Phobos API and use Tango's runtime (which includes the GC).Thanks. Actually, since all I need is to find text in HTML, a FSA, built with a huge two-level switch structure, proved to be sufficient.To parse HTML, I used std.regesp.replace(). On some files, it loops, ate all memory in less than a minute and crashed.std.regexp has some known issues. Unless you're in the mood to debug and fix it (which would be making all of us a favour), for real work you might be better off finding some libpcre wrappers. Just in case, first check that your input is valid UTF-8 - that got me once (broken UTF-8 sequences make std.regexp crash and burn).
Nov 12 2007
Kevin Bealer wrote:[...] I think even malloc() does not free memory to the OS. [...]I thought that too, but I wrote a test in both C++ and D that prove that malloc()/free() will return memory to the OS in both cases (at least on Linux). Dave
Nov 13 2007
David B. Held wrote:Kevin Bealer wrote:The glibc malloc/free/realloc implementation (used on virtually all Linux systems) is written by Doug Lea and follows the follows the following strategy: For very large requests, >=128KB (by default), it relies (if possible) on mmap, while smaller requests are handled using sbrk(). Memory mapped memory is returned to the OS on free(), while sbrk() allocated memory is not. There are some provisions for returning sbrk() allocated memory to the OS, but those are disabled by default because of reduced performance and the fact that only freed memory chunks at the very top of the allocated memory range can be freed. Those behaviors can easily be verified. Malloc()ing, then free()ing lots of small chunks will not return any memory to the OS(*), while doing the same for a few large chunks will. However, I don't see programs keeping large unused chunks of virtual memory much of a problem. *) In the case where provisions for trimming the virtual memory range is enabled, this behavior should still exist if if the small allocations are followed by one larger allocation that ends up on top of the virtual memory range, thereby blocking any possible trimming from happening. -- Oskar[...] I think even malloc() does not free memory to the OS. [...]I thought that too, but I wrote a test in both C++ and D that prove that malloc()/free() will return memory to the OS in both cases (at least on Linux).
Nov 14 2007
On 11/11/07, Ald Sannes <aldarri_s yahoo.com> wrote:I speculate that, despite gc.noRoots calls in the zlib wrapper, the memory leak happens there; the raw data in array is being taken for pointers that point literally everywhere, thus no memory is ever deallocated.If that's true, you may be able to fix it by making your array a ubyte[] instead of a void[]. void arrays can contain pointers; ubyte arrays cannot.
Nov 11 2007