www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Odd behaviour with large arrays

reply Niko Korhonen <niktheblak hotmail.com> writes:
Consider the following D program:

int main()
{
	const int ARRAY_SIZE = 1024 * 1024 * 128;
	
	ubyte[] array = new ubyte[ARRAY_SIZE];
	
	printf("Walking through the array...\n");
	
	for (int i = 0; i < ARRAY_SIZE; i++)
		array[i] = cast(ubyte)i;
		
	printf("Walk complete\n");	
	return 0;
}

As you can see, we allocate a large array and walk through it, setting 
the values to something.

When compiled with DMD 0.112 this program executes strangely on my 
machine. The message "Walk complete" gets printed pretty quickly, but 
*after* that (between printing the message and returning from the 
program) the program hangs with 100% CPU usage for a very long time.

What is happening here? Can someone reproduce this? I reckon it has 
something to do with GC because if I replace new byte[...] with malloc, 
it works properly.
Feb 08 2005
next sibling parent reply =?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= <afb algonet.se> writes:
Niko Korhonen wrote:

 When compiled with DMD 0.112 this program executes strangely on my 
 machine. The message "Walk complete" gets printed pretty quickly, but 
 *after* that (between printing the message and returning from the 
 program) the program hangs with 100% CPU usage for a very long time.
 
 What is happening here? Can someone reproduce this? I reckon it has 
 something to do with GC because if I replace new byte[...] with malloc, 
 it works properly.
Same behaviour with GDC. A small pause before it starts "walking", and an enormous CPU-eating hang at the very end of the program... (not sure if it completes, since I terminated it when it got boring) Replacing GC with malloc/free shows no pauses, and returns quickly. Porting same program to Java it's slow to start, but finishes quickly... Seems like the garbage collection still has a long way to go in D ? --anders
Feb 08 2005
parent reply Dave <Dave_member pathlink.com> writes:
In article <cuaalt$1lqi$1 digitaldaemon.com>,
=?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= says...
Niko Korhonen wrote:

 When compiled with DMD 0.112 this program executes strangely on my 
 machine. The message "Walk complete" gets printed pretty quickly, but 
 *after* that (between printing the message and returning from the 
 program) the program hangs with 100% CPU usage for a very long time.
 
 What is happening here? Can someone reproduce this? I reckon it has 
 something to do with GC because if I replace new byte[...] with malloc, 
 it works properly.
Same behaviour with GDC. A small pause before it starts "walking", and an enormous CPU-eating hang at the very end of the program... (not sure if it completes, since I terminated it when it got boring) Replacing GC with malloc/free shows no pauses, and returns quickly. Porting same program to Java it's slow to start, but finishes quickly... Seems like the garbage collection still has a long way to go in D ? --anders
On Linux it finishes immediately with DMD v0.112, but thrashes with GDC v0.10. On Windows DMD v0.112 thrashes as you mention. Here's what I get on Linux: Walking through the array... Walk complete real 0m0.744s user 0m0.254s sys 0m0.350s Walking through the array... Walk complete real 0m1.279s user 0m0.742s sys 0m0.329s Walking through the array... Walk complete real 0m1.015s user 0m0.489s sys 0m0.328s Walking through the array... Walk complete real 0m13.854s user 0m12.692s sys 0m0.355s - Dave
Feb 08 2005
parent =?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= <afb algonet.se> writes:
Dave wrote:

Same behaviour with GDC. A small pause before it starts "walking",
and an enormous CPU-eating hang at the very end of the program...
(not sure if it completes, since I terminated it when it got boring)
On Linux it finishes immediately with DMD v0.112, but thrashes with GDC v0.10. On Windows DMD v0.112 thrashes as you mention.
I should have mentioned that I tested on Mac OS X, with GDC 0.10. After trashing around for 30-40 seconds, it eventually completed. --anders
Feb 08 2005
prev sibling next sibling parent reply pragma <pragma_member pathlink.com> writes:
In article <cua7va$1fcg$1 digitaldaemon.com>, Niko Korhonen says...
Consider the following D program:

int main()
{
	const int ARRAY_SIZE = 1024 * 1024 * 128;
	
	ubyte[] array = new ubyte[ARRAY_SIZE];
	
	printf("Walking through the array...\n");
	
	for (int i = 0; i < ARRAY_SIZE; i++)
		array[i] = cast(ubyte)i;
		
	printf("Walk complete\n");	
	return 0;
}

As you can see, we allocate a large array and walk through it, setting 
the values to something.

When compiled with DMD 0.112 this program executes strangely on my 
machine. The message "Walk complete" gets printed pretty quickly, but 
*after* that (between printing the message and returning from the 
program) the program hangs with 100% CPU usage for a very long time.

What is happening here? Can someone reproduce this? I reckon it has 
something to do with GC because if I replace new byte[...] with malloc, 
it works properly.
Just a thought: I think the GC is trying to scan your 128MB of array data for additional references. I haven't tried this, but perhaps use "std.gc.removeRange(array)" to tell the GC to "keep out". Seeing as how it's raw data (all ubytes), the GC shouldn't need to scan it anyway. Leaving it scannable just increases the likelyhood of false positives for object references while scanning... so this is really a best practice with large blobs of data. Also, you can test the theory further by using "std.gc.disable()" and "std.gc.enable()" to see *where* the GC is slowing things down. - EricAnderton at yahoo
Feb 08 2005
next sibling parent =?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= <afb algonet.se> writes:
pragma wrote:

 I haven't tried this, but perhaps use "std.gc.removeRange(array)" to tell the
GC
 to "keep out".  Seeing as how it's raw data (all ubytes), the GC shouldn't need
 to scan it anyway.  Leaving it scannable just increases the likelyhood of false
 positives for object references while scanning... so this is really a best
 practice with large blobs of data.
Adding std.gc.removeRange(array) did not make a difference, on Mac OS X. default:
 62.82s user 2.95s system 55% cpu 1:58.41 total
removeRange:
 70.60s user 2.22s system 58% cpu 2:03.72 total
malloc/free:
 3.09s user 1.00s system 67% cpu 6.096 total
Still takes several minutes to complete the program, when using GC... --anders
Feb 08 2005
prev sibling next sibling parent John Demme <me teqdruid.com> writes:
pragma wrote:
 Also, you can test the theory further by using "std.gc.disable()" and
 "std.gc.enable()" to see *where* the GC is slowing things down.
 
 - EricAnderton at yahoo
Unless it's been implemented without me noticing, these don't do anything. :)
Feb 08 2005
prev sibling parent Niko Korhonen <niktheblak hotmail.com> writes:
pragma wrote:
 Just a thought: I think the GC is trying to scan your 128MB of array data for
 additional references.
That's what I though but then it should trash on Linux too. Aynway, since the GC knows it's a 128MB array of POD, why it would scan through it? And why it doesn't do so with DMD/Linux? I tested it with a more realistic use case: int main() { const int ARRAY_SIZE = 1024 * 1024 * 32; ubyte[] array1 = new ubyte[ARRAY_SIZE]; ubyte[] array2 = new ubyte[ARRAY_SIZE]; ubyte[] array3 = new ubyte[ARRAY_SIZE]; ubyte[] array4 = new ubyte[ARRAY_SIZE]; printf("Walking through the arrays...\n"); for (int i = 0; i < ARRAY_SIZE; i++) array1[i] = cast(ubyte)i; for (int i = 0; i < ARRAY_SIZE; i++) array2[i] = cast(ubyte)i; for (int i = 0; i < ARRAY_SIZE; i++) array3[i] = cast(ubyte)i; for (int i = 0; i < ARRAY_SIZE; i++) array4[i] = cast(ubyte)i; printf("Walk complete\n"); return 0; } I'm still allocating 128 MB, but in smaller chunks. It works nice and smooth, no trashing. Using smaller arrays definitely cures the problem, but I still would like to know whether it is normal/expected behaviour? And why it doesn't trash with DMD/Linux?
Feb 08 2005
prev sibling parent Ilya Minkov <minkov cs.tum.edu> writes:
The following happens: when the program finishes, the heap needs to be 
walked and analyzed for pointers to something that is registered with 
the garbage collector as an object that needs finilization. The walking 
itself is not the problem, but the check for everything which looks like 
it might be a reference into GC memory requieres walking fairly large 
structures and perhaps one or another function call. Thus it is slow.

For the future, the gabage collector is likely to be made more 
intelligent to avoid scanning blocks of non-pointer data, but now it 
doesn't have this sophistication. Besides, better algorithms for 
discarding references which are definately not into GC heap might help - 
but i'm not sure whether there is any room for improvement left in this 
regard. One clue would be to compare different implementations - DMD on 
Windows uses it's own GC, GDC uses Boehm as far as i remember. I don't 
know what GC DMD on Linux uses.

For now i would just allocate such massive non-pointer data with malloc 
and encapsulate it in an object which has a destructor to de-allocate 
the memory.

-i.

Niko Korhonen wrote:
 Consider the following D program:
 
 int main()
 {
     const int ARRAY_SIZE = 1024 * 1024 * 128;
     
     ubyte[] array = new ubyte[ARRAY_SIZE];
     
     printf("Walking through the array...\n");
     
     for (int i = 0; i < ARRAY_SIZE; i++)
         array[i] = cast(ubyte)i;
        
     printf("Walk complete\n");   
     return 0;
 }
 
 As you can see, we allocate a large array and walk through it, setting 
 the values to something.
 
 When compiled with DMD 0.112 this program executes strangely on my 
 machine. The message "Walk complete" gets printed pretty quickly, but 
 *after* that (between printing the message and returning from the 
 program) the program hangs with 100% CPU usage for a very long time.
 
 What is happening here? Can someone reproduce this? I reckon it has 
 something to do with GC because if I replace new byte[...] with malloc, 
 it works properly.
Feb 09 2005