digitalmars.D - Garbage collection, and practical strategies to avoid allocation
- Manu (67/67) May 31 2013 So let's talk about garbage collection, and practical strategies to avoi...
- Adam D. Ruppe (23/23) May 31 2013 just to toss in my quick thoughts, I wrote a couple comments on
- Brad Anderson (17/28) May 31 2013 I played around with adding an overload that accepted an output
- Brad Anderson (5/10) May 31 2013 Actually I'm completely wrong about this. Slices already work
- Jonathan M Davis (31/49) May 31 2013 Dynamic arrays are output ranges. The one potential hitch there though r...
- Brad Anderson (12/23) May 31 2013 Furthermore--you probably know this because of the conversation
- Jacob Carlborg (4/10) Jun 01 2013 This approach is used all over the place in Tango.
- SomeDude (4/8) May 31 2013 Here is my take:
- bearophile (7/9) Jun 01 2013 Rust avoids this giving a different heap to each thread, and
- Jonathan M Davis (18/22) Jun 01 2013 Sure, but which of these calls to new is going to cause the GC to run?
- js.mdnq (32/32) Jun 01 2013 Nothing will get done until someone decides to put in the effort
- Michel Fortin (13/14) Jun 01 2013 I think reference counting while still continuing to use the current GC
- Steven Schveighoffer (5/9) Jun 03 2013 +1
- Benjamin Thaut (17/17) Jun 01 2013 In my eyes there is just one reason there is no better GC yet: It
So let's talk about garbage collection, and practical strategies to avoid allocation. GC related discussions come up basically every day, perhaps multiple times a day on IRC, and the recent reddit 2.063 release thread is dominated by C++ programmers who are keenly interested in D, but are scared by the GC. I can say with confidence, as someone who has gone out on a limb and actually invested a lot of time and energy in D, I'm as nervous (or more so) as they are, and feel their turmoil deeply! So where are we? I can only speak from my industry's perspective. As I see it, we are here: - Stopping the world is unacceptable - Scanning the whole heap is costly - Also seems extremely wasteful to scan the whole heap when the overall temporal stability of a realtime heap is extremely high (just a few temps allocated frome-to-frame on average, and mostly on the stack!) - GC runs at unpredictable moments - GC collection cycles become more and more frequent the less unallocated memory overhead you have. Hint: video games usually run within kb of the systems available memory. How often will full heap-scanning collections be issued to collect a couple of transient/temporary allocations when there is only a few kb free memory? Conceivably, more than once per frame... - In a low-free-memory environment, what is the cumulative effect of fragmentation? Can this be measured, or will it be a nasty surprise 2 months from shipping a 20-million dollar project? (Hint: a discovery of this sort could very well ruin a project and destroy a company) Basically nobody will have experienced these issues to their fullest extent on a PC with plentiful memory. But they must be considered if the audience still stuck in C++ is to take D seriously (who I predict are D's greatest potential user-base). While I do think a sufficiently advanced GC might satisfy the realtime environment, the more I think about it, the more I am thinking a GC is not applicable to the embedded (or memory limited) environment. So what options exist? I'm thinking more and more that I like the idea of a ref-counting GC. People argue that managing ref-counts may be slower, perhaps true, it requires a small amount of localised overhead, but if allocation frequency is low, it shouldn't be much. I think the key advantages though are: - determinism, memory will be immediately freed (or perhaps deferred by a small but predictable amount of time, let's say, 1 frame) - elimination of full-heap scans which takes the most time - the refcount table is self-contained, won't destroy the dcache like a heap scan - less tendency to fragment, since transient allocations can come into and leave existence before something else allocates beside it But this is only part of the problem. Naturally, the fastest allocation is the one that never happened. So an equally, or even more important aspect of the puzzle is offering clearly documented advice, and convenient syntax/patterns on how to avoid allocation in general. It should be made EASY to avoid allocation. This way people will tend to do it by habit, further alleviating the problem. I've made the case that libraries should avoid surprise allocations at all costs. Maybe this leads back to nogc conversations (a concept I'm not personally sold on), but something needs to be done, and best-practises/conventions need to be defined. So I'd go so far as to say, perhaps these 2 points should be considered as key goals for 2.064? * find a solution for deterministic embedded garbage collection * decide some realistic best-practises/conventions for reliably (and conveniently!) avoiding allocations Showing progress on these will show real progress on D for ex-C++ users. In my time using D, there has been exactly ZERO progress on the GC issue, which is discouraging to say the least, perhaps even kinda scary. (fortunately, people are thinking about it, and there were 2 great talks at dconf, but I don't think either address the specific issues I raise) Discuss... (or perhaps, "destroooy")
May 31 2013
just to toss in my quick thoughts, I wrote a couple comments on the recent reddit thread about using D with a minimal runtime and some of the talk may be relevant here too: http://www.reddit.com/r/programming/comments/1fc9jt/dmd_2063_the_d_programming_language_reference/ca94mek Some little things we could do is add overloads to some functions that return string to be able to take a buffer argument too. string to(T:string)(int a) { char[] buf = new char[](16); return assumeUnique(to(a, buffer)); char[] to(int a, char[] buffer) { deposit it straight into buffer, return the slice into buffer that is actually used; } and so on for all kinds of functions. Kinda a pain in the butt but when you need it, you have the second variant, and if you don't care, the convenient first one is still available (also avoids breaking code!) I also mentioned on irc yesterday that I think a good, low cost idea to help find allocations is to add a global flag to druntime that makes gc_malloc throw an Error. Then you can set this flag in a critical section of code and at least get a runtime notice of a missed allocation in testing while still having the gc for the rest of the app. Another member also suggested you can do that easily enough by running the program in a debugger and setting a breakpoint at gc_malloc, but I think the flag would be simpler yet.
May 31 2013
On Saturday, 1 June 2013 at 02:16:02 UTC, Adam D. Ruppe wrote:just to toss in my quick thoughts, I wrote a couple comments on the recent reddit thread about using D with a minimal runtime and some of the talk may be relevant here too: http://www.reddit.com/r/programming/comments/1fc9jt/dmd_2063_the_d_programming_language_reference/ca94mek Some little things we could do is add overloads to some functions that return string to be able to take a buffer argument too. string to(T:string)(int a) { char[] buf = new char[](16); return assumeUnique(to(a, buffer)); char[] to(int a, char[] buffer) { deposit it straight into buffer, return the slice into buffer that is actually used; }I played around with adding an overload that accepted an output range to some of the std.string functions identified in my run of -vgc over phobos[1] (after Jonathan pointed out this is probably the best approach and is already what formattedWrite does). It worked fine but it did make me realize there aren't a lot of output ranges available to plug in at the moment (appender and lockingTextWriter are the only two that come to mind though there may be others). Appender isn't useful if your goal is to avoid the GC. Array!char et al aren't output ranges (whether they should be or not I have no idea). static arrays would need some sort of wrapper to make them output ranges I believe unless it was decided that put() should work by replacing the front and calling popFront for them (which I kind of doubt is the desired behavior). (feel free to correct me on any of this, range experts) 1. http://goo.gl/HP78r
May 31 2013
On Saturday, 1 June 2013 at 02:47:40 UTC, Brad Anderson wrote: staticarrays would need some sort of wrapper to make them output ranges I believe unless it was decided that put() should work by replacing the front and calling popFront for them (which I kind of doubt is the desired behavior). (feel free to correct me on any of this, range experts)Actually I'm completely wrong about this. Slices already work this way and you can use a slice of a static array just fine as an output range.
May 31 2013
On Saturday, June 01, 2013 04:47:39 Brad Anderson wrote:I played around with adding an overload that accepted an output range to some of the std.string functions identified in my run of -vgc over phobos[1] (after Jonathan pointed out this is probably the best approach and is already what formattedWrite does). It worked fine but it did make me realize there aren't a lot of output ranges available to plug in at the moment (appender and lockingTextWriter are the only two that come to mind though there may be others). Appender isn't useful if your goal is to avoid the GC. Array!char et al aren't output ranges (whether they should be or not I have no idea). static arrays would need some sort of wrapper to make them output ranges I believe unless it was decided that put() should work by replacing the front and calling popFront for them (which I kind of doubt is the desired behavior). (feel free to correct me on any of this, range experts) 1. http://goo.gl/HP78rDynamic arrays are output ranges. The one potential hitch there though relates to the fact that they get written to rather than appended to. This is actually exactly what you want in a situation like Manu's. However, that means that you have to worry about an output range running out of space and how you deal with that. If it's know how much will need to be appended, presumably you can check length if hasLength!R is true. Otherwise, I guess that the right thing to do is to check empty (arrays get shrunk as they're written to, so they'll be empty when you can't call put on them anymore). Unfortunately, put doesn't seem to worry about the case where the ouput range is full/empty, so the result when calling put on an empty range is undefined. The situation is even worse with narrow strings (assuming that put works with them - I'm not sure that it does at the moment) given that even if you knew their length (which you wouldn't if you were going by hasLength), you wouldn't know whether a put would succeed when the string was nearly empty, as the actual number of elements that the dchar would take up would depend on its value. In general, I don't think that output ranges have really been sorted out on quite the level that input ranges have been, and I think that some discussion is in order with regards to how to handle things like when the range can't be put into anymore. Given that one reason to use output ranges is for performance-critical code that doesn't want to allocate, throwing when the range is empty is probably a bad idea, and it's unclear that we can reasonably determine in the general case whether you can put to an output range before you actually try it. One solution to that would be to make bool return whether it succeeded or not, but it's an issue that definitely needs to be explored a bit. So, I very much think that the correct thing is to use output ranges, but how to use them needs to be better defined, and we probably need to better sort out some of their details (like the finish function which the hash stuff uses). - Jonathan M Davis
May 31 2013
On Saturday, 1 June 2013 at 04:16:44 UTC, Jonathan M Davis wrote:[snip] The situation is even worse with narrow strings (assuming that put works with them - I'm not sure that it does at the moment) given that even if you knew their length (which you wouldn't if you were going by hasLength), you wouldn't know whether a put would succeed when the string was nearly empty, as the actual number of elements that the dchar would take up would depend on its value.Furthermore--you probably know this because of the conversation on GitHub but for anyone else reading--narrow strings are also hard to work with because you can't put a dchar on them currently. Kenji had a pull request (since closed) to help with narrow strings issues: https://github.com/D-Programming-Language/phobos/pull/1000 That and all the issues and unknowns you describe would need to fixed for output ranges are going to be the approach. I'm reminded of a forum post by monarch_dodra that talked about some of the issues you brought up: http://forum.dlang.org/thread/xyvnifnetythvrhtcexm forum.dlang.org
May 31 2013
On 2013-06-01 04:16, Adam D. Ruppe wrote:Some little things we could do is add overloads to some functions that return string to be able to take a buffer argument too. string to(T:string)(int a) { char[] buf = new char[](16); return assumeUnique(to(a, buffer)); char[] to(int a, char[] buffer) { deposit it straight into buffer, return the slice into buffer that is actually used; }This approach is used all over the place in Tango. -- /Jacob Carlborg
Jun 01 2013
On Saturday, 1 June 2013 at 02:03:07 UTC, Manu wrote:So let's talk about garbage collection, and practical strategies to avoid allocation. Discuss... (or perhaps, "destroooy")Here is my take: http://forum.dlang.org/post/tftjtzmfuauxwcgcolct forum.dlang.org Sorry, I didn't see your new discussion.
May 31 2013
Manu:- Scanning the whole heap is costlyRust avoids this giving a different heap to each thread, and common heap to share data managed with unique references.- GC runs at unpredictable momentsIs this true? I think the D GC runs only when you allocate something. Bye, bearophile
Jun 01 2013
On Saturday, June 01, 2013 09:43:49 bearophile wrote:Sure, but which of these calls to new is going to cause the GC to run? auto a = new Foo; ... auto b = new Bar; ... auto c = new Baz; It could be that all or none of them do. It all depends on what memory is available at the time and what exactly they're trying to allocate. So, while you may be able to know that the GC will only run when new is called, you don't know which new will trigger it, and it can (and probably will) vary on each run of the program. Pretty much the only way to control it is to disable the GC and then explicitly do a collection when you can afford a collection to run (though in a program like those Manu typically writes, where you're generating something like 60 frames per second, the GC is pretty much never fast enough to be run - not with anything even vaguely like it's current implementation). - Jonathan M Davis- GC runs at unpredictable momentsIs this true? I think the D GC runs only when you allocate something.
Jun 01 2013
Nothing will get done until someone decides to put in the effort to fix the problem. D's biggest drawback at this point is the GC and one would think with all the smart people around here someone would have solved this problem by now. We need a solution that allows one to "plug and play" different allocation strategies. One shouldn't have to rely on any particularly bad GC implementation if they don't want to(but as of now are forced to). Stop the world GC is just plain crap except for the most basic apps. Any app that requires performance is going to have issues with this, the reason being simply that there are much better ways. The ability to avoid the GC is of utmost importance for real time apps regardless of what some want people to think. This can't be done as because D internally uses the GC. Hence this must be worked around. nogc/ gc has been brought up several times... seems like a start and can't hurt. IMO the best way to deal with the situation is to use manual memory management with a GC backend. e.g., pretend the GC is not there just like the good ol' days. If you forget to deallocate something, maybe get a warning from a sufficiently intelligent compiler, if the GC is turned on then it will clean up for you behind the scenes. If you need performance, turn it off or disable it temporarily. This is the best of both worlds and allows one to easily go from one extreme to the other. One can actually implement a pattern to allow the user to select one extreme or the other. One can already do this in D in user code but since the D core does not follow this approach it doesn't do one much good unless they want to avoid slices, phobos, etc... I think it will go a long way to get a standard for D regarding memory allocation that follows these lines. All new code will be written with using the standard and old code will be updated.
Jun 01 2013
On 2013-06-01 02:02:53 +0000, Manu <turkeyman gmail.com> said:* find a solution for deterministic embedded garbage collectionI think reference counting while still continuing to use the current GC to release cycles is the way to go. It wouldn't be too hard to implement. This could make it realistic to disable the GC entirely in a program if you need everything to be deterministic. The GC could still be of assistance as a debug tool to help find those rogue cycles that shouldn't be there by configuring it to emit logs about what it deallocates. -- Michel Fortin michel.fortin michelf.ca http://michelf.ca/
Jun 01 2013
On Sat, 01 Jun 2013 07:10:07 -0400, Michel Fortin <michel.fortin michelf.ca> wrote:On 2013-06-01 02:02:53 +0000, Manu <turkeyman gmail.com> said:+1 I was going to write this exact post, but you already did :) -Steve* find a solution for deterministic embedded garbage collectionI think reference counting while still continuing to use the current GC to release cycles is the way to go. It wouldn't be too hard to implement.
Jun 03 2013
In my eyes there is just one reason there is no better GC yet: It requires compiler support. And thats a huge problem. The number of people who would actually be ablte to make the modifications on the compiler in this community is very small and they tend to not have much time doing it. A really good concurrent GC which properly deals with short lived allocations would need the following compiler support. 1) Callback on pointer write (if the pointer is on the heap) 2) Support for percisley scanning the stack 3) Callback on assignment of __shared or static shared variables 4) Callback on cast to shared 5) Callback on new of a shared object The only thing the compiler already does support is the RTInfo template which can be used to percisely scan the heap, but thats not enough for a really good GC. Kind Regards Benjamin Thaut
Jun 01 2013