digitalmars.D - Memory allocation in D
- Marius Muja (53/53) Nov 28 2007 Hi,
- BCS (6/22) Nov 28 2007 If you are using windows and the D new version allocates 3GB of ram then...
- Sean Kelly (13/40) Nov 29 2007 With D as it is now, this will call GC.malloc(2049), which will allocate...
- Sean Kelly (5/5) Nov 29 2007 This thread inspired me to make a number of changes to the GC and such
- Craig Black (3/8) Nov 29 2007 Cool! Maybe you could show some benchmarks when you get done?
- Sean Kelly (21/29) Nov 30 2007 Unfortunately, I think I'm not going to commit these changes. I had
- Robert Fraser (10/22) Nov 30 2007 I'm sure you've probably read it, but I'll post it anyway:
- Marius Muja (12/21) Nov 30 2007 That was my problem also. I was using array sizes that were powers of
- Sean Kelly (8/30) Nov 30 2007 Weird. D's GC is basically a "big bag of pages" allocator, where groups...
- Marius Muja (4/10) Nov 30 2007 That seems to be the case. And because in my test program I was always
- Jason House (5/41) Nov 30 2007 If object sizes are a power of 2, is that because of special padding of ...
- Sean Kelly (10/48) Nov 30 2007 Only arrays get the +1 byte added to their size during allocations. If
- Sean Kelly (8/22) Nov 30 2007 So I went and looked up the clause and I'm not clear whether it requires...
- James Dennett (4/24) Dec 19 2007 C and C++ only require that the address be valid for pointer
- Sean Kelly (7/28) Dec 19 2007 Thanks. This was what I inferred, but I had heard (or misheard) that it...
- Oskar Linde (12/20) Dec 01 2007 The +1 allocation for arrays is AFAIK there to make sure that it is
- Sean Kelly (5/14) Dec 01 2007 Okay that makes sense. The Boehm GC tied the +1 byte to whether
- =?ISO-8859-1?Q?=22J=E9r=F4me_M=2E_Berger=22?= (22/42) Nov 29 2007 -----BEGIN PGP SIGNED MESSAGE-----
- =?ISO-8859-1?Q?=22J=E9r=F4me_M=2E_Berger=22?= (18/19) Nov 29 2007 -----BEGIN PGP SIGNED MESSAGE-----
Hi, I'm working on a program that is computationally and memory intensive. I have noticed that if I allocate the memory using D's new operator I can allocate only about half as much memory compared to C's malloc (and also the memory allocation is much slower). Does anybody know why D's memory allocator uses twice as much memory compared to malloc? I used the following test program to test this: version (Tango){ import tango.core.Memory; import tango.stdc.stdlib; import tango.io.Stdout; } else { import std.stdio; import std.c.stdlib; } const int ALLOC_SIZE = 1024*1024; void test_d_alloc() { float[] a = new float[ALLOC_SIZE]; } void test_malloc() { float* ptr = cast(float*)malloc(ALLOC_SIZE*float.sizeof); if (ptr is null) throw new Exception("Out of memory"); } void main() { version (Tango) { GC.disable(); } else { std.gc.disable(); } int i=0; while(true) { version (Tango) { Stdout.formatln("{}",++i); } else { writefln(++i); } // test_d_alloc(); test_malloc(); } } When the above program uses the test_d_alloc() function it executes the loop about half as many times compared to when it's using the test_malloc() function (until it terminates with an out of memory error). Also when the allocation is done in smaller increments (ALLOC_SIZE is smaller) the test_d_alloc() version segfaults after allocating the maximum amount of memory (3G) instead of terminating with a nice OutOfMemory error. Marius
Nov 28 2007
Reply to Marius,Hi, I'm working on a program that is computationally and memory intensive. I have noticed that if I allocate the memory using D's new operator I can allocate only about half as much memory compared to C's malloc (and also the memory allocation is much slower). Does anybody know why D's memory allocator uses twice as much memory compared to malloc?[...]When the above program uses the test_d_alloc() function it executes the loop about half as many times compared to when it's using the test_malloc() function (until it terminates with an out of memory error). Also when the allocation is done in smaller increments (ALLOC_SIZE is smaller) the test_d_alloc() version segfaults after allocating the maximum amount of memory (3G) instead of terminating with a nice OutOfMemory error.If you are using windows and the D new version allocates 3GB of ram then it's using everything it can get. No matter what, you have 75% of the address space, I would suspect that the malloc isn't actual allocating as much as it should. Or am I reading something wrong.
Nov 28 2007
Marius Muja wrote:Hi, I'm working on a program that is computationally and memory intensive. I have noticed that if I allocate the memory using D's new operator I can allocate only about half as much memory compared to C's malloc (and also the memory allocation is much slower). Does anybody know why D's memory allocator uses twice as much memory compared to malloc? I used the following test program to test this: version (Tango){ import tango.core.Memory; import tango.stdc.stdlib; import tango.io.Stdout; } else { import std.stdio; import std.c.stdlib; } const int ALLOC_SIZE = 1024*1024; void test_d_alloc() { float[] a = new float[ALLOC_SIZE]; }With D as it is now, this will call GC.malloc(2049), which will allocate one page of memory per call. The D runtime adds 1 to every allocation with "new", according to Walter, this was to prevent certain confusing errors. It will allocate a page per call because the D GC allocates from pools devoted to fixed-size blocks which grow in powers of two up to 4096 bytes (one page on most systems). Allocations beyond 4096 bytes are "big" allocations, and a multiple of 4096 bytes will be allocated to fit the requested size. For these allocations, when the memory is released I believe the pages go back into a free pool. I'm not sure offhand if empty pages used for smaller blocks do or not. You might want to try calling GC.malloc() instead and compare results. Sean
Nov 29 2007
This thread inspired me to make a number of changes to the GC and such in Tango. They should improve efficiency in terms of eliminating wasted space and avoiding collections in some situations. They'll probably take a few days, so stay tuned. Sean
Nov 29 2007
"Sean Kelly" <sean f4.ca> wrote in message news:fin4fl$1h7v$1 digitalmars.com...This thread inspired me to make a number of changes to the GC and such in Tango. They should improve efficiency in terms of eliminating wasted space and avoiding collections in some situations. They'll probably take a few days, so stay tuned. SeanCool! Maybe you could show some benchmarks when you get done?
Nov 29 2007
Craig Black wrote:"Sean Kelly" <sean f4.ca> wrote in message news:fin4fl$1h7v$1 digitalmars.com...Unfortunately, I think I'm not going to commit these changes. I had some clever ideas for how to replace the "allocate an extra byte" functionality with an approach that would have resulted in less wasted space, but it's become more complicated than the added efficiency is worth. Basically, I was hoping to keep an extra free page at the end of the allocated memory region used by the process. This assumes that all memory used by the process is contiguous however, or at least that the intervening pages are valid. But neither assumption is reasonable, and while some sort of fancy manual fix may have worked with some memory management APIs, it wouldn't have worked with others. For what it's worth, my concern with the current behavior is that its worst-case scenario coincides with typical usage. It is extremely common for a programmer to use array sizes that are powers of two, and this is the situation which results in the maximum unused space per block (because the runtime adds 1 to the size of all array allocations passed to the GC). Totally stinks, and I'm running out of ideas. I don't suppose someone can offer a suggestion for how to address this, other than "just always allocate exactly what they request and if they get a page fault then too bad for them?" SeanThis thread inspired me to make a number of changes to the GC and such in Tango. They should improve efficiency in terms of eliminating wasted space and avoiding collections in some situations. They'll probably take a few days, so stay tuned.Cool! Maybe you could show some benchmarks when you get done?
Nov 30 2007
Sean Kelly wrote:For what it's worth, my concern with the current behavior is that its worst-case scenario coincides with typical usage. It is extremely common for a programmer to use array sizes that are powers of two, and this is the situation which results in the maximum unused space per block (because the runtime adds 1 to the size of all array allocations passed to the GC). Totally stinks, and I'm running out of ideas. I don't suppose someone can offer a suggestion for how to address this, other than "just always allocate exactly what they request and if they get a page fault then too bad for them?" SeanI'm sure you've probably read it, but I'll post it anyway: http://www.cs.purdue.edu/homes/grr/ISMM98-grr-mps-cef.pdf In particular: "... [Objects] slightly larger than a page boundary lead to significant internal fragmentation in the page-based large object allocator as well. The high internal fragmentation associated with both large small objects and small large objects suggest that one should create an intermediate class of medium objects."
Nov 30 2007
Sean Kelly wrote:For what it's worth, my concern with the current behavior is that its worst-case scenario coincides with typical usage. It is extremely common for a programmer to use array sizes that are powers of two, and this is the situation which results in the maximum unused space per block (because the runtime adds 1 to the size of all array allocations passed to the GC). Totally stinks, and I'm running out of ideas. I don't suppose someone can offer a suggestion for how to address this, other than "just always allocate exactly what they request and if they get a page fault then too bad for them?"That was my problem also. I was using array sizes that were powers of two and because of the extra 1 there were many memory pages wasted. For example in one of my tests I was allocating float arrays of size 1024 (4096 bytes) which were taking 2 pages instead of 1 because of the extra 1 added, and that's where I was observing that with D's new operator I can only allocate half as much memory than with malloc. Oddly the same problem occurs if I allocate much larger arrays of exactly 1024*1024 floats (4MB size). Seems that instead of 4MB, D's runtime allocated 8MB for each of these arrays. Is D's runtime using 4MB pages for large memory allocations? Marius
Nov 30 2007
Marius Muja wrote:Sean Kelly wrote:Weird. D's GC is basically a "big bag of pages" allocator, where groups of pages are divided into pools. The maximum size a pool will be on its own is 8 megs, with very large single allocations getting their own pool that is as big as necessary. If I had to guess, I'd say the 8MB you're seeing is an artifact of this allocation scheme, and I'd suspect that much of the 8MB is committed but marked as free pages in your program. SeanFor what it's worth, my concern with the current behavior is that its worst-case scenario coincides with typical usage. It is extremely common for a programmer to use array sizes that are powers of two, and this is the situation which results in the maximum unused space per block (because the runtime adds 1 to the size of all array allocations passed to the GC). Totally stinks, and I'm running out of ideas. I don't suppose someone can offer a suggestion for how to address this, other than "just always allocate exactly what they request and if they get a page fault then too bad for them?"That was my problem also. I was using array sizes that were powers of two and because of the extra 1 there were many memory pages wasted. For example in one of my tests I was allocating float arrays of size 1024 (4096 bytes) which were taking 2 pages instead of 1 because of the extra 1 added, and that's where I was observing that with D's new operator I can only allocate half as much memory than with malloc. Oddly the same problem occurs if I allocate much larger arrays of exactly 1024*1024 floats (4MB size). Seems that instead of 4MB, D's runtime allocated 8MB for each of these arrays. Is D's runtime using 4MB pages for large memory allocations?
Nov 30 2007
Weird. D's GC is basically a "big bag of pages" allocator, where groups of pages are divided into pools. The maximum size a pool will be on its own is 8 megs, with very large single allocations getting their own pool that is as big as necessary. If I had to guess, I'd say the 8MB you're seeing is an artifact of this allocation scheme, and I'd suspect that much of the 8MB is committed but marked as free pages in your program.That seems to be the case. And because in my test program I was always allocating 4MB-sized arrays, the extra pages that were committed but marked as free couldn't be used. Marius
Nov 30 2007
Sean Kelly wrote:Craig Black wrote:If object sizes are a power of 2, is that because of special padding of a struct/object to make it align well? Maybe some clever trick could be done to use wasted space, if it exists, for special tracking data... Otherwise, allocating a bit more seems the most sensible."Sean Kelly" <sean f4.ca> wrote in message news:fin4fl$1h7v$1 digitalmars.com...Unfortunately, I think I'm not going to commit these changes. I had some clever ideas for how to replace the "allocate an extra byte" functionality with an approach that would have resulted in less wasted space, but it's become more complicated than the added efficiency is worth. Basically, I was hoping to keep an extra free page at the end of the allocated memory region used by the process. This assumes that all memory used by the process is contiguous however, or at least that the intervening pages are valid. But neither assumption is reasonable, and while some sort of fancy manual fix may have worked with some memory management APIs, it wouldn't have worked with others. For what it's worth, my concern with the current behavior is that its worst-case scenario coincides with typical usage. It is extremely common for a programmer to use array sizes that are powers of two, and this is the situation which results in the maximum unused space per block (because the runtime adds 1 to the size of all array allocations passed to the GC). Totally stinks, and I'm running out of ideas. I don't suppose someone can offer a suggestion for how to address this, other than "just always allocate exactly what they request and if they get a page fault then too bad for them?" SeanThis thread inspired me to make a number of changes to the GC and such in Tango. They should improve efficiency in terms of eliminating wasted space and avoiding collections in some situations. They'll probably take a few days, so stay tuned.Cool! Maybe you could show some benchmarks when you get done?
Nov 30 2007
Jason House wrote:Sean Kelly wrote:Only arrays get the +1 byte added to their size during allocations. If an object occupies 16 bytes then it will end up in a 16 byte bucket. The +1 issue for arrays exists to make the program a bit more forgiving for "past the end" errors. Interestingly, the ANSI C standard actually requires this behavior in its allocators. I have yet to decide whether or not this is an artifact of C using terminators to mark the end of a block, since using terminators seems more likely to result in such bugs than the length scheme D employs. SeanCraig Black wrote:If object sizes are a power of 2, is that because of special padding of a struct/object to make it align well? Maybe some clever trick could be done to use wasted space, if it exists, for special tracking data... Otherwise, allocating a bit more seems the most sensible."Sean Kelly" <sean f4.ca> wrote in message news:fin4fl$1h7v$1 digitalmars.com...Unfortunately, I think I'm not going to commit these changes. I had some clever ideas for how to replace the "allocate an extra byte" functionality with an approach that would have resulted in less wasted space, but it's become more complicated than the added efficiency is worth. Basically, I was hoping to keep an extra free page at the end of the allocated memory region used by the process. This assumes that all memory used by the process is contiguous however, or at least that the intervening pages are valid. But neither assumption is reasonable, and while some sort of fancy manual fix may have worked with some memory management APIs, it wouldn't have worked with others. For what it's worth, my concern with the current behavior is that its worst-case scenario coincides with typical usage. It is extremely common for a programmer to use array sizes that are powers of two, and this is the situation which results in the maximum unused space per block (because the runtime adds 1 to the size of all array allocations passed to the GC). Totally stinks, and I'm running out of ideas. I don't suppose someone can offer a suggestion for how to address this, other than "just always allocate exactly what they request and if they get a page fault then too bad for them?"This thread inspired me to make a number of changes to the GC and such in Tango. They should improve efficiency in terms of eliminating wasted space and avoiding collections in some situations. They'll probably take a few days, so stay tuned.Cool! Maybe you could show some benchmarks when you get done?
Nov 30 2007
Sean Kelly wrote:Jason House wrote:So I went and looked up the clause and I'm not clear whether it requires the location to be physically available or merely fore the address to be valid for pointer mathematics. I also realized that I don't need to worry about malloc and vmalloc because they're already ANSI C complaint, so it's just potentially a matter of sorting out VirtualAlloc and mmap. I'm going to do a bit more research and see what comes out of it. SeanIf object sizes are a power of 2, is that because of special padding of a struct/object to make it align well? Maybe some clever trick could be done to use wasted space, if it exists, for special tracking data... Otherwise, allocating a bit more seems the most sensible.Only arrays get the +1 byte added to their size during allocations. If an object occupies 16 bytes then it will end up in a 16 byte bucket. The +1 issue for arrays exists to make the program a bit more forgiving for "past the end" errors. Interestingly, the ANSI C standard actually requires this behavior in its allocators.
Nov 30 2007
Sean Kelly wrote:Sean Kelly wrote:C and C++ only require that the address be valid for pointer arithmetic; it need not be dereferencable. -- JamesJason House wrote:So I went and looked up the clause and I'm not clear whether it requires the location to be physically available or merely fore the address to be valid for pointer mathematics.If object sizes are a power of 2, is that because of special padding of a struct/object to make it align well? Maybe some clever trick could be done to use wasted space, if it exists, for special tracking data... Otherwise, allocating a bit more seems the most sensible.Only arrays get the +1 byte added to their size during allocations. If an object occupies 16 bytes then it will end up in a 16 byte bucket. The +1 issue for arrays exists to make the program a bit more forgiving for "past the end" errors. Interestingly, the ANSI C standard actually requires this behavior in its allocators.
Dec 19 2007
James Dennett wrote:Sean Kelly wrote:Thanks. This was what I inferred, but I had heard (or misheard) that it was the stronger guarantee and got confused when I didn't see this in the spec. Unfortunately, as others have said, the "past the end" issue can cause other problems with garbage collection, so it seems that in D the extra byte of free space should typically be reserved anyway. SeanSean Kelly wrote:C and C++ only require that the address be valid for pointer arithmetic; it need not be dereferencable.Jason House wrote:So I went and looked up the clause and I'm not clear whether it requires the location to be physically available or merely fore the address to be valid for pointer mathematics.If object sizes are a power of 2, is that because of special padding of a struct/object to make it align well? Maybe some clever trick could be done to use wasted space, if it exists, for special tracking data... Otherwise, allocating a bit more seems the most sensible.Only arrays get the +1 byte added to their size during allocations. If an object occupies 16 bytes then it will end up in a 16 byte bucket. The +1 issue for arrays exists to make the program a bit more forgiving for "past the end" errors. Interestingly, the ANSI C standard actually requires this behavior in its allocators.
Dec 19 2007
Sean Kelly wrote:Only arrays get the +1 byte added to their size during allocations. If an object occupies 16 bytes then it will end up in a 16 byte bucket. The +1 issue for arrays exists to make the program a bit more forgiving for "past the end" errors. Interestingly, the ANSI C standard actually requires this behavior in its allocators. I have yet to decide whether or not this is an artifact of C using terminators to mark the end of a block, since using terminators seems more likely to result in such bugs than the length scheme D employs.The +1 allocation for arrays is AFAIK there to make sure that it is valid to have a pointer point one past the end. You want this to a) allow efficient iterators, and b) to allow [$..$] slices. The problems that would arise if the GC didn't allocate that extra byte are * appending to a [$..$] slice would in some cases be confused by the gc as appending to a [0..0] slice of the following allocated array, and thereby corrupting that memory. * all pointers past the end prevents the consecutive memory area to be reclaimed by the GC. -- Oskar
Dec 01 2007
Oskar Linde wrote:The problems that would arise if the GC didn't allocate that extra byte are * appending to a [$..$] slice would in some cases be confused by the gc as appending to a [0..0] slice of the following allocated array, and thereby corrupting that memory. * all pointers past the end prevents the consecutive memory area to be reclaimed by the GC.Okay that makes sense. The Boehm GC tied the +1 byte to whether internal pointers were allowed so I was wondering if it was something along these lines. Thanks. Sean
Dec 01 2007
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Marius Muja wrote:Hi, I'm working on a program that is computationally and memory intensive. I have noticed that if I allocate the memory using D's new operator I can allocate only about half as much memory compared to C's malloc (and also the memory allocation is much slower). Does anybody know why D's memory allocator uses twice as much memory compared to malloc? I used the following test program to test this: ... snip ... When the above program uses the test_d_alloc() function it executes the loop about half as many times compared to when it's using the test_malloc() function (until it terminates with an out of memory error). Also when the allocation is done in smaller increments (ALLOC_SIZE is smaller) the test_d_alloc() version segfaults after allocating the maximum amount of memory (3G) instead of terminating with a nice OutOfMemory error.Since you are using Linux, I would like to point out that a call to malloc may succeed even if there is no memory available at the time. The program will only fail when it actually tries to access the memory. Since the D new fills the array with zeroes, it will fail immediately upon reaching the memory limit. The standard malloc however may appear to work much longer unless you actually try to use the memory. Jerome - -- +------------------------- Jerome M. BERGER ---------------------+ | mailto:jeberger free.fr | ICQ: 238062172 | | http://jeberger.free.fr/ | Jabber: jeberger jabber.fr | +---------------------------------+------------------------------+ -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.7 (GNU/Linux) iD8DBQFHTxnhd0kWM4JG3k8RAuW/AKCBk6hVjvhe/2qLEVdnF1cWvFxEZQCgg0Yu aVOJuBt6UGLXGiSoQ31EX0I= =7X3r -----END PGP SIGNATURE-----
Nov 29 2007
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Jérôme M. Berger wrote:Since the D new fills the array with zeroes,^^^^^^ Sorry, this should be "nans" of course, but the main point remains valid... Jerome - -- +------------------------- Jerome M. BERGER ---------------------+ | mailto:jeberger free.fr | ICQ: 238062172 | | http://jeberger.free.fr/ | Jabber: jeberger jabber.fr | +---------------------------------+------------------------------+ -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.7 (GNU/Linux) iD8DBQFHTxwRd0kWM4JG3k8RAtv6AKC+DFmdebG3rE4p+MgwRztO+7YlugCePP2X y5rUTtv77h6fmH8AI/zy0Oc= =CqfB -----END PGP SIGNATURE-----
Nov 29 2007