www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - GC and memory leaks

reply Ald Sannes <aldarri_s yahoo.com> writes:
Hello.

I have good reasons to believe there are bugs in GC, or Phobos, or Zlib that
comes bundled with Dmd 1.023
Here is my main function:

void main(char[][] argumentList)
{
	std.gc.minimize(); 

	buildTemporaryIndex();

	std.gc.fullCollect(); 

	buildPermanentIndex();
	
	findWords();
}

The three functions are completely isolated from each other, they only
communicate through disk IO (by the way, great library for file IO).  Before
exiting from the first function, I explicitly go through each class' static
members and delete them.  Delete each element in case of array or has table.

Yet the 800 Mbytes of memory are not being freed until the program terminates.

Next issue.
I have commented out everything except for the code that decompresses data in
files for processing.  

main()
{	buildTemporaryIndex();}

void buildTemporaryIndex()
{
	char[][] datasetFileList = listdir(Config.getInputDirectory());
	
	for(int i = 0; i < datasetFileList.length; i+=2)
	{
		indexFileStream = IndexDecompressor.gunzipFile(datasetFileList[i+1]);
		pageFileStream = IndexDecompressor.gunzipFile(datasetFileList[i]);

		delete indexFileStream;
		delete pageFileStream;

		std.gc.fullCollect(); 
		//break;
	}
}

	public static char[]			gunzipFile			(char[] fileName)
	{
		int zipFileSize = getSize(fileName);
		void [] zipFileContentRaw = read(fileName);
		void[] zipFileContent = uncompress(zipFileContentRaw, zipFileSize*2, 24);
		
		delete zipFileContentRaw;
		//delete zipFileContent;
					
		return cast (char []) zipFileContent;
		//return "";
	}

The problem is that, despite the delete statments and calls to garbage
collector to free all it can, the program hugs some 100 Mbytes of main memory,
which roughly corresponds to to the size of data extracted, until the
termination.
I speculate that, despite gc.noRoots calls in the zlib wrapper, the memory leak
happens there; the raw data in array is being taken for pointers that point
literally everywhere, thus no memory is ever deallocated.

Third.
To parse HTML, I used std.regesp.replace().  On some files, it loops, ate all
memory in less than a minute and crashed.

What can I do to help find the issues?  If it helps, I can post the entire
source code.  And even the data set (50 Mbytes).


And one more thing.  Please fix 
http://www.digitalmars.com/d/1.0/dcompiler.html, for the link labeled 'latest
compiler' points to DMD 1.015.

Thanks
Nov 11 2007
next sibling parent reply "Vladimir Panteleev" <thecybershadow gmail.com> writes:
On Sun, 11 Nov 2007 18:34:03 +0200, Ald Sannes <aldarri_s yahoo.com> wrote:

 Yet the 800 Mbytes of memory are not being freed until the program terminates.
AFAIK, DMD's GC does not release memory back to the OS, ever. Also, minimize() does nothing, and genCollect() does the same thing as fullCollect(). One thing you could try is Tango's GC, which in my experience behaves better in some circumstances. You can use Tangobos[1] to keep the Phobos API and use Tango's runtime (which includes the GC).
 To parse HTML, I used std.regesp.replace().  On some files, it loops, ate all
memory in less than a minute and crashed.
std.regexp has some known issues. Unless you're in the mood to debug and fix it (which would be making all of us a favour), for real work you might be better off finding some libpcre wrappers. Just in case, first check that your input is valid UTF-8 - that got me once (broken UTF-8 sequences make std.regexp crash and burn). There's also a compile-time regexp engine by Pragma and Don Clugston[2]. [1] http://dsource.org/projects/tangobos [2] http://www.dsource.org/projects/ddl/browser/trunk/meta/regex.d -- Best regards, Vladimir mailto:thecybershadow gmail.com
Nov 11 2007
parent reply Ald Sannes <aldarri_s yahoo.com> writes:
Vladimir Panteleev Wrote:

 On Sun, 11 Nov 2007 18:34:03 +0200, Ald Sannes <aldarri_s yahoo.com> wrote:
 
 Yet the 800 Mbytes of memory are not being freed until the program terminates.
AFAIK, DMD's GC does not release memory back to the OS, ever. Also, minimize() does nothing, and genCollect() does the same thing as fullCollect(). One thing you could try is Tango's GC, which in my experience behaves better in some circumstances. You can use Tangobos[1] to keep the Phobos API and use Tango's runtime (which includes the GC).
....... ... Ok, let's then manage memory manually. Where should I look for the leaks? I already delete everything I declare; guess some memory is allocated behind the scenes?
 To parse HTML, I used std.regesp.replace().  On some files, it loops, ate all
memory in less than a minute and crashed.
std.regexp has some known issues. Unless you're in the mood to debug and fix it (which would be making all of us a favour), for real work you might be better off finding some libpcre wrappers. Just in case, first check that your input is valid UTF-8 - that got me once (broken UTF-8 sequences make std.regexp crash and burn).
Thanks. Actually, since all I need is to find text in HTML, a FSA, built with a huge two-level switch structure, proved to be sufficient.
Nov 11 2007
next sibling parent "Vladimir Panteleev" <thecybershadow gmail.com> writes:
On Sun, 11 Nov 2007 21:10:26 +0200, Ald Sannes <aldarri_s yahoo.com> wro=
te:

 Ok, let's then manage memory manually.  Where should I look for the le=
aks? I already delete everything I declare; guess some memory is alloca= ted behind the scenes? D's "delete" statement does not return memory back to the OS - it just m= arks the block free for the GC to reuse in further reallocations. Truely= "manual" memory management means that you'll have to use malloc/free fr= om std.c.stdlib. One workaround I could suggest is putting the code that has the one-time= large memory requirement in a separate DLL. Since it'll have its own GC= , the GC will release (almost [1]) all memory back to the OS when the DL= L is unloaded. Note that Tango's runtime doesn't do this, and as far as = I understood the Tango developers don't care much[2]. [1] http://d.puremagic.com/issues/show_bug.cgi?id=3D1551 [2] http://www.dsource.org/projects/tango/ticket/669 -- = Best regards, Vladimir mailto:thecybershadow gmail.com
Nov 11 2007
prev sibling parent reply Kevin Bealer <kevinbealer gmail.com> writes:
Ald Sannes Wrote:

 Vladimir Panteleev Wrote:
 
 On Sun, 11 Nov 2007 18:34:03 +0200, Ald Sannes <aldarri_s yahoo.com> wrote:
 
 Yet the 800 Mbytes of memory are not being freed until the program terminates.
AFAIK, DMD's GC does not release memory back to the OS, ever. Also, minimize() does nothing, and genCollect() does the same thing as fullCollect(). One thing you could try is Tango's GC, which in my experience behaves better in some circumstances. You can use Tangobos[1] to keep the Phobos API and use Tango's runtime (which includes the GC).
....... ... Ok, let's then manage memory manually. Where should I look for the leaks? I already delete everything I declare; guess some memory is allocated behind the scenes?
I think even malloc() does not free memory to the OS. Getting memory from the OS and returning it to the OS are expensive operations so most implementations will allocate chunks of memory. If you want to make sure memory is returned to the OS, you can create files and use mmap(). When you close the file and munmap the memory, the OS will truly get the memory back. Some operations like "new ObjectName", associative arrays, dynamic arrays, and "x ~ y" will automatically use normal memory though, so you would be restricted to using "struct" instead of class, and doing certain other things the "hard way". But to be honest, I can't think of a good reason (in general) to do these things... whether the O/S owns the free memory or the application's garbage collector does is unlikely to hurt anything, except for your attempts to get an accurate picture of what is going on. If you want to see if memory is being leaked, run the method that might be leaking memory in a loop. If the loop iterates 10 times and you still only have 800 MB of application size then I would guess that there is no leaking. If the application keeps getting bigger with each iteration, then that probably indicates a problem. Kevin
 To parse HTML, I used std.regesp.replace().  On some files, it loops, ate all
memory in less than a minute and crashed.
std.regexp has some known issues. Unless you're in the mood to debug and fix it (which would be making all of us a favour), for real work you might be better off finding some libpcre wrappers. Just in case, first check that your input is valid UTF-8 - that got me once (broken UTF-8 sequences make std.regexp crash and burn).
Thanks. Actually, since all I need is to find text in HTML, a FSA, built with a huge two-level switch structure, proved to be sufficient.
Nov 12 2007
parent reply "David B. Held" <dheld codelogicconsulting.com> writes:
Kevin Bealer wrote:
 [...]
 I think even malloc() does not free memory to the OS.
 [...]
I thought that too, but I wrote a test in both C++ and D that prove that malloc()/free() will return memory to the OS in both cases (at least on Linux). Dave
Nov 13 2007
parent Oskar Linde <oskar.lindeREM OVEgmail.com> writes:
David B. Held wrote:
 Kevin Bealer wrote:
 [...]
 I think even malloc() does not free memory to the OS.
 [...]
I thought that too, but I wrote a test in both C++ and D that prove that malloc()/free() will return memory to the OS in both cases (at least on Linux).
The glibc malloc/free/realloc implementation (used on virtually all Linux systems) is written by Doug Lea and follows the follows the following strategy: For very large requests, >=128KB (by default), it relies (if possible) on mmap, while smaller requests are handled using sbrk(). Memory mapped memory is returned to the OS on free(), while sbrk() allocated memory is not. There are some provisions for returning sbrk() allocated memory to the OS, but those are disabled by default because of reduced performance and the fact that only freed memory chunks at the very top of the allocated memory range can be freed. Those behaviors can easily be verified. Malloc()ing, then free()ing lots of small chunks will not return any memory to the OS(*), while doing the same for a few large chunks will. However, I don't see programs keeping large unused chunks of virtual memory much of a problem. *) In the case where provisions for trimming the virtual memory range is enabled, this behavior should still exist if if the small allocations are followed by one larger allocation that ends up on top of the virtual memory range, thereby blocking any possible trimming from happening. -- Oskar
Nov 14 2007
prev sibling parent "Janice Caron" <caron800 googlemail.com> writes:
On 11/11/07, Ald Sannes <aldarri_s yahoo.com> wrote:
 I speculate that, despite gc.noRoots calls in the zlib wrapper, the memory
leak happens there; the raw data in array is being taken for pointers that
point literally everywhere, thus no memory is ever deallocated.
If that's true, you may be able to fix it by making your array a ubyte[] instead of a void[]. void arrays can contain pointers; ubyte arrays cannot.
Nov 11 2007