digitalmars.D - DB/DBMS in D
- Vladimir A. Reznichenko (11/11) Feb 16 2009 Dear Mr./Ms.,
- Chris R Miller (14/24) Feb 16 2009 I would argue the opposite: that in a long-running process such as an
- Vladimir A. Reznichenko (12/41) Feb 16 2009 That's clear.
- bearophile (5/8) Feb 16 2009 When I have asked a similar question, people have told me that the curre...
- grauzone (4/4) Feb 16 2009 I suspect that in long running applications, there's more and more
- Sean Kelly (27/41) Feb 16 2009 The GC contains a collection of pools, each containing N contiguous
Dear Mr./Ms., I'd like to ask you about the garbage collector. It slows down an application, doesn't it? In case of DBMS, this is critical. I haven't found any articles or tests about this. Also it would be great to find out about memory management implemented in DMD: fragmentation, allocation, reallocation. And if wide-known algorithms are used there could it be named? The C/C++ is classic choice for such projects (DBMS), but the D language is great one and the best for me ). I want to find out abilities of using it. Faithfully yours.
Feb 16 2009
Vladimir A. Reznichenko wrote:Dear Mr./Ms., I'd like to ask you about the garbage collector. It slows down an application, doesn't it? In case of DBMS, this is critical. I haven't found any articles or tests about this. Also it would be great to find out about memory management implemented in DMD: fragmentation, allocation, reallocation. And if wide-known algorithms are used there could it be named? The C/C++ is classic choice for such projects (DBMS), but the D language is great one and the best for me ). I want to find out abilities of using it.I would argue the opposite: that in a long-running process such as an RDBMS you would *want* the garbage collector to ensure that there are no memory leaks. You could have either a super-fast database which leaks memory (so your users would have to restart it periodically) OR you could use a garbage collector, take the performance penalty (not that much - quite frankly, complaining about the garbage collector is like complaining that the silverware is gold and not platinum) and have the assurance that your memory leakage will be kept to an absolute minimum (or not at all, if you remember to properly declare weak references). Obviously it is possible to use a language like C++ and write code which doesn't leak memory... however, that level of effort isn't going to give you significant increases in performance compared to D. D is just plain fast.
Feb 16 2009
== Quote from Chris R Miller (lordsauronthegreat gmail.com)'s articleVladimir A. Reznichenko wrote:there could it be named?Dear Mr./Ms., I'd like to ask you about the garbage collector. It slows down an application, doesn't it? In case of DBMS, this is critical. I haven't found any articles or tests about this. Also it would be great to find out about memory management implemented in DMD: fragmentation, allocation, reallocation. And if wide-known algorithms are usedThat's clear. The thing that's not clear for me is memory fragmentation level. In C++ memory is deallocated as soon as object is deleted. In case of using GC deleted object is kept before reused. If GC operates on some range of addresses, and places all objects there (like using buffer) we get fragmentation. The longer we run process the harder to eliminate it. But if GS stores collection of object pointers, located somewhere in memory in undefined order then, of course, we can find deleted object, update it and reuse - this could be even faster. Which of these 2 ways is implemented in DMD GC?The C/C++ is classic choice for such projects (DBMS), but the D language is great one and the best for me ). I want to find out abilities of using it.I would argue the opposite: that in a long-running process such as an RDBMS you would *want* the garbage collector to ensure that there are no memory leaks. You could have either a super-fast database which leaks memory (so your users would have to restart it periodically) OR you could use a garbage collector, take the performance penalty (not that much - quite frankly, complaining about the garbage collector is like complaining that the silverware is gold and not platinum) and have the assurance that your memory leakage will be kept to an absolute minimum (or not at all, if you remember to properly declare weak references). Obviously it is possible to use a language like C++ and write code which doesn't leak memory... however, that level of effort isn't going to give you significant increases in performance compared to D. D is just plain fast.
Feb 16 2009
Vladimir A. Reznichenko:In case of using GC deleted object is kept before reused. If GC operates on some range of addresses, and places all objects there (like using buffer) we get fragmentation. The longer we run process the harder to eliminate it.When I have asked a similar question, people have told me that the current D GC allocates memory in blocks long as powers of two (until they become big enough) (this has also the consequence that saving few bytes in a struct is often an illusion, because if you use 9 bytes, the GC allocator gives you a 16 bytes long memory block. This happens in associative arrays too, so saving small amounts of memory is sometimes impossible. You have to use the C heap allocator or your own pools, arenas, etc). I can also suggest you to perform some experiment, to try to fragment memory and to look at how the memory uses grows or not grows (and to show us the code). Experiments require a bit of time and they can be wrong, but very often they also show you interesting surprises, I have seen such "surprises" very often while doing speed benchmarks. Bye, bearophile
Feb 16 2009
I suspect that in long running applications, there's more and more unfree'd garbage, because the conservative GC thinks it's still alive. And the garbage references other garbage and so on. Can someone confirm or confute this?
Feb 16 2009
Vladimir A. Reznichenko wrote:Dear Mr./Ms., I'd like to ask you about the garbage collector. It slows down an application, doesn't it? In case of DBMS, this is critical. I haven't found any articles or tests about this. Also it would be great to find out about memory management implemented in DMD: fragmentation, allocation, reallocation. And if wide-known algorithms are used there could it be named?The GC contains a collection of pools, each containing N contiguous pages of memory. Each page can be individually assigned for storing a particular fixed memory block size, with sizes as powers of two from 16 to 4096 bytes (1 page, on most systems). For allocations beyond 4096 bytes, the minimum necessary contiguous pages will be used to hold the memory block. All free blocks of a particular size are held in a free list. When an allocation occurs, the GC first checks the appropriate free list to see if there is a block of the right size available. If not, it looks for an available page in an existing memory pool that can be turned into a page of the appropriate size blocks. If there is none, a mark/sweep garbage collection cycle occurs. Then the GC looks for a free block, free page, and if there still aren't any it allocates a new pool from the OS. After garbage collection, the D2 GC will check to see if any pools are completely empty and release these back to the OS. I'm working from memory, but that's roughly the way the GC works. You can also explicitly delete GC memory via the 'delete' expression, though there's been some contention about whether having a 'delete' operation for GCed memory is actually a good idea.The C/C++ is classic choice for such projects (DBMS), but the D language is great one and the best for me ). I want to find out abilities of using it.D is just as fast as C/C++. For a DBMS, you'll mostly want to be aware of the potential memory overhead of the GC allocation scheme and the cost of the "stop the world" collection cycles. If you decide you don't want to use the GC at all, you're also able to call the C malloc and free. Only the built-in AA and some string operations (concatenation, etc) allocate GC memory behind the scenes. The rest comes from explicit 'new' calls that you make in your own code.
Feb 16 2009