digitalmars.D - Potential GSoC project - GC improvements
- Jeremy DeHaan (11/11) Mar 09 2016 Hey all,
- Joakim (10/21) Mar 09 2016 I'd think working on the GC would make a great GSoC project, as
- thedeemon (4/8) Mar 10 2016 Do you think Windows version of this cdgc is feasible at all?
- Joakim (8/14) Mar 10 2016 I don't know enough about GCs and Windows to say. I'm assuming
- Adam Wilson (7/18) Mar 09 2016 The follow-on projects enabled by a precise GC would be quite useful...
- NX (7/13) Mar 09 2016 http://www.infognition.com/blog/2014/the_real_problem_with_gc_in_d.html
- Andrei Alexandrescu (3/6) Mar 10 2016 I agree. A first step would be easy to do with std.allocator's
- Jeremy DeHaan (7/14) Mar 10 2016 I was looking into this, but I am slightly hesitant. Should the
- ZombineDev (9/25) Mar 10 2016 There's no problem in using std.experimental.* internally,
- Daniel Kozak via Digitalmars-d (3/23) Mar 10 2016 But we allready need existing D compiler to build D frontend, so maybe
- ZombineDev (11/44) Mar 10 2016 The build process looks something like this currently:
- Daniel Kozak via Digitalmars-d (6/45) Mar 10 2016 I know how it works now :). But if D is stable enought 1 should be good...
- ZombineDev (13/69) Mar 10 2016 Well for 4. maybe, but after a couple of years... Most of the
- Jeremy DeHaan (15/15) Mar 11 2016 Thank you all for the feedback.
- Rikki Cattermole (6/20) Mar 11 2016 Ugh I would say that is a lot.
- Adam Wilson (21/35) Mar 12 2016 If I may make a suggestion. The lock free work is unlikely to require
- Chris Wright (19/30) Mar 12 2016 Unions mean we can't be entirely precise, but we can get pretty close.
- Jeremy DeHaan (29/43) Mar 12 2016 Rainer has two different precise GC's in pull requests right now
- Adam Wilson (47/89) Mar 12 2016 I would contend that it's not quite that simple. Of course the precise
- Chris Wright (95/138) Mar 12 2016 On Sat, 12 Mar 2016 13:23:35 -0800, Adam Wilson wrote:
- Adam Wilson (75/213) Mar 13 2016 I didn't intend to confuse, but I was not writing to neophyte's, and I
- Chris Wright (47/65) Mar 13 2016 It's not a standard term. Google's only seeing about four references to
- Adam Wilson (19/84) Mar 13 2016 Is there an implementation of a conservative moving (compacting) GC out
- deadalnix (3/6) Mar 13 2016 That is impossible, you need to know what is and isn't a pointer
- Chris Wright (12/20) Mar 13 2016 Let's say we do the most precise GC possible for D. We can categorize
- Adam Wilson (6/13) Mar 13 2016 It was a rhetorical question. I was just leaving the door open for
- Chris Wright (11/13) Mar 13 2016 You made a large number of assertions about garbage collection and they
- Adam Wilson (27/40) Mar 13 2016 Erm. That seems a bit extreme. You are asserting that D can't have a
- thedeemon (26/32) Mar 13 2016 IIRC, Rainer called it "mostly precise", and for a good reason.
- Adam Wilson (27/55) Mar 13 2016 He did. And C# supports pinning so at some point all GC's are partially
- Jeremy DeHaan (15/21) Mar 14 2016 I haven't had power for a couple of days, but it looks like the
- jmh530 (8/10) Mar 14 2016 I would characterize it as very interesting, although I know very
- Adam Wilson (16/36) Mar 14 2016 Awesome! It's a complex topic, but you'll get broad exposure to all
- Rainer Schuetze (27/47) Mar 18 2016 Being always way behind reading the forum these days, I'm a little late
- Jeremy deHaan (6/12) Mar 18 2016 Thank you for the feedback. I'm still working on my proposal so
- Rainer Schuetze (33/50) Mar 19 2016 Well, there are a number of unfinished parts left:
- Adam Wilson (13/64) Mar 19 2016 I would argue that precision is still important as it enables a
- Craig Dillabaugh (3/11) Mar 18 2016 In case you didn't already know, Adam is now an official mentor
- Dmitry Olshansky (7/21) Mar 13 2016 Let's not forget another low-hanging fruit which is parallel marking.
Hey all, I'm trying to think of good project ideas for this years GSoC, and one in particular I thought would be a great was working on and improving the GC. I'm not sure what the scope of this project would be like, but at the moment I am thinking writing a generational collector would be a good place to start. I have more ideas, but I don't have a proposal yet. I just wanted some initial feedback on the idea, perhaps some advice for scope with the time frame in mind, and hopefully someone on the mentor list willing to take this on if it were to be accepted. Jeremy
Mar 09 2016
On Wednesday, 9 March 2016 at 22:49:28 UTC, Jeremy DeHaan wrote:Hey all, I'm trying to think of good project ideas for this years GSoC, and one in particular I thought would be a great was working on and improving the GC. I'm not sure what the scope of this project would be like, but at the moment I am thinking writing a generational collector would be a good place to start. I have more ideas, but I don't have a proposal yet. I just wanted some initial feedback on the idea, perhaps some advice for scope with the time frame in mind, and hopefully someone on the mentor list willing to take this on if it were to be accepted.I'd think working on the GC would make a great GSoC project, as it's needed and fairly self-contained while having large impact. Perhaps someone could build off of Sociomantic's concurrent GC (https://www.sociomantic.com/blog/2013/06/porting-cdgc-to-d2/), which I assume has been ported to D2, porting it to Windows or whatever else remains to be done and then adding to it. There were some good ideas in this thread from a couple years ago too: http://forum.dlang.org/thread/ucsamkqvrihfuspnxfsk forum.dlang.org
Mar 09 2016
On Thursday, 10 March 2016 at 05:01:37 UTC, Joakim wrote:Perhaps someone could build off of Sociomantic's concurrent GC (https://www.sociomantic.com/blog/2013/06/porting-cdgc-to-d2/), which I assume has been ported to D2, porting it to Windows or whatever else remains to be done and then adding to it.Do you think Windows version of this cdgc is feasible at all? It relies on fork, something that gets emulated on Windows but too slowly, as I understand.
Mar 10 2016
On Thursday, 10 March 2016 at 11:12:47 UTC, thedeemon wrote:On Thursday, 10 March 2016 at 05:01:37 UTC, Joakim wrote:I don't know enough about GCs and Windows to say. I'm assuming it can be done with some effort, but I don't know how much. Even if it's too much work, it'd be worth it just for POSIX systems, which make up almost 70% of all general-purpose computers running apps these days. :) If it's good enough, it could eventually become the default on there, while leaving the legacy GC for Windows.Perhaps someone could build off of Sociomantic's concurrent GC (https://www.sociomantic.com/blog/2013/06/porting-cdgc-to-d2/), which I assume has been ported to D2, porting it to Windows or whatever else remains to be done and then adding to it.Do you think Windows version of this cdgc is feasible at all? It relies on fork, something that gets emulated on Windows but too slowly, as I understand.
Mar 10 2016
Jeremy DeHaan wrote:Hey all, I'm trying to think of good project ideas for this years GSoC, and one in particular I thought would be a great was working on and improving the GC. I'm not sure what the scope of this project would be like, but at the moment I am thinking writing a generational collector would be a good place to start. I have more ideas, but I don't have a proposal yet. I just wanted some initial feedback on the idea, perhaps some advice for scope with the time frame in mind, and hopefully someone on the mentor list willing to take this on if it were to be accepted. JeremyThe follow-on projects enabled by a precise GC would be quite useful... I'd be interested in mentoring if nobody else with more experience is interested. -- // Adam Wilson // quiet.dlang.dev
Mar 09 2016
On Wednesday, 9 March 2016 at 22:49:28 UTC, Jeremy DeHaan wrote:Hey all, I'm trying to think of good project ideas for this years GSoC, and one in particular I thought would be a great was working on and improving the GC. I'm not sure what the scope of this project would be like, but at the moment I am thinking writing a generational collector would be a good place to start.http://www.infognition.com/blog/2014/the_real_problem_with_gc_in_d.html Good luck with generational GC... I think the best possible improvement for GC is making it lock-free. Currently, GC lock cause some serious performance penalties for multithreaded code when frequent allocations take place.
Mar 09 2016
On 3/9/16 10:40 PM, NX wrote:I think the best possible improvement for GC is making it lock-free. Currently, GC lock cause some serious performance penalties for multithreaded code when frequent allocations take place.I agree. A first step would be easy to do with std.allocator's thread-local freelists. -- Andrei
Mar 10 2016
On Thursday, 10 March 2016 at 15:24:48 UTC, Andrei Alexandrescu wrote:On 3/9/16 10:40 PM, NX wrote:I was looking into this, but I am slightly hesitant. Should the gc use something in std.experimental? Or should we think that's ok? I also know that there are some people that think we should avoid using Phobos in druntime.I think the best possible improvement for GC is making it lock-free. Currently, GC lock cause some serious performance penalties for multithreaded code when frequent allocations take place.I agree. A first step would be easy to do with std.allocator's thread-local freelists. -- Andrei
Mar 10 2016
On Thursday, 10 March 2016 at 16:46:59 UTC, Jeremy DeHaan wrote:On Thursday, 10 March 2016 at 15:24:48 UTC, Andrei Alexandrescu wrote:There's no problem in using std.experimental.* internally, because if the API changes, the one who changes the API will also need to change all internal usages, otherwise his pull request won't compile. About using Phobos in Druntime - you can't directly import Phobos, since it's files are not present when Druntime is built. The solution is to copy the stuff you need to https://github.com/D-Programming-Language/druntime/tree/master/src/rt/util or some place like this.On 3/9/16 10:40 PM, NX wrote:I was looking into this, but I am slightly hesitant. Should the gc use something in std.experimental? Or should we think that's ok? I also know that there are some people that think we should avoid using Phobos in druntime.I think the best possible improvement for GC is making it lock-free. Currently, GC lock cause some serious performance penalties for multithreaded code when frequent allocations take place.I agree. A first step would be easy to do with std.allocator's thread-local freelists. -- Andrei
Mar 10 2016
Dne 10.3.2016 v 18:15 ZombineDev via Digitalmars-d napsal(a):On Thursday, 10 March 2016 at 16:46:59 UTC, Jeremy DeHaan wrote:But we allready need existing D compiler to build D frontend, so maybe we can use this full D compiler for building druntime in the future?On Thursday, 10 March 2016 at 15:24:48 UTC, Andrei Alexandrescu wrote:There's no problem in using std.experimental.* internally, because if the API changes, the one who changes the API will also need to change all internal usages, otherwise his pull request won't compile. About using Phobos in Druntime - you can't directly import Phobos, since it's files are not present when Druntime is built.On 3/9/16 10:40 PM, NX wrote:I was looking into this, but I am slightly hesitant. Should the gc use something in std.experimental? Or should we think that's ok? I also know that there are some people that think we should avoid using Phobos in druntime.I think the best possible improvement for GC is making it lock-free. Currently, GC lock cause some serious performance penalties for multithreaded code when frequent allocations take place.I agree. A first step would be easy to do with std.allocator's thread-local freelists. -- Andrei
Mar 10 2016
On Thursday, 10 March 2016 at 17:46:15 UTC, Daniel Kozak wrote:Dne 10.3.2016 v 18:15 ZombineDev via Digitalmars-d napsal(a):The build process looks something like this currently: (a bit simplified) 1. Download old dmd. 2. Build new dmd with 1. 3. Build druntime with 2. 4. Build phobos with 2 and link it with 3. 5. Package 2, 3 and 4 into a release zip You can't use 1 for building 3 (or 4), because 3 (or 4) may depend on a new feature only present in 2. Also you can't rely on 5 for building 3 (or 4) because you'll get a cyclic dependency.On Thursday, 10 March 2016 at 16:46:59 UTC, Jeremy DeHaan wrote:But we allready need existing D compiler to build D frontend, so maybe we can use this full D compiler for building druntime in the future?On Thursday, 10 March 2016 at 15:24:48 UTC, Andrei Alexandrescu wrote:There's no problem in using std.experimental.* internally, because if the API changes, the one who changes the API will also need to change all internal usages, otherwise his pull request won't compile. About using Phobos in Druntime - you can't directly import Phobos, since it's files are not present when Druntime is built.On 3/9/16 10:40 PM, NX wrote:I was looking into this, but I am slightly hesitant. Should the gc use something in std.experimental? Or should we think that's ok? I also know that there are some people that think we should avoid using Phobos in druntime.I think the best possible improvement for GC is making it lock-free. Currently, GC lock cause some serious performance penalties for multithreaded code when frequent allocations take place.I agree. A first step would be easy to do with std.allocator's thread-local freelists. -- Andrei
Mar 10 2016
Dne 10.3.2016 v 19:39 ZombineDev via Digitalmars-d napsal(a):On Thursday, 10 March 2016 at 17:46:15 UTC, Daniel Kozak wrote:I know how it works now :). But if D is stable enought 1 should be good enought to do 3 and 4 (yes you can't rely on features and fixes from 2). I think we should avoid to use latest features when developing dmd, druntime and phobos (maybe we should use only some subset or make some rules like new version must be compilable by 3 previous version of dmd)Dne 10.3.2016 v 18:15 ZombineDev via Digitalmars-d napsal(a):The build process looks something like this currently: (a bit simplified) 1. Download old dmd. 2. Build new dmd with 1. 3. Build druntime with 2. 4. Build phobos with 2 and link it with 3. 5. Package 2, 3 and 4 into a release zip You can't use 1 for building 3 (or 4), because 3 (or 4) may depend on a new feature only present in 2. Also you can't rely on 5 for building 3 (or 4) because you'll get a cyclic dependency.On Thursday, 10 March 2016 at 16:46:59 UTC, Jeremy DeHaan wrote:But we allready need existing D compiler to build D frontend, so maybe we can use this full D compiler for building druntime in the future?On Thursday, 10 March 2016 at 15:24:48 UTC, Andrei Alexandrescu wrote:There's no problem in using std.experimental.* internally, because if the API changes, the one who changes the API will also need to change all internal usages, otherwise his pull request won't compile. About using Phobos in Druntime - you can't directly import Phobos, since it's files are not present when Druntime is built.On 3/9/16 10:40 PM, NX wrote:I was looking into this, but I am slightly hesitant. Should the gc use something in std.experimental? Or should we think that's ok? I also know that there are some people that think we should avoid using Phobos in druntime.I think the best possible improvement for GC is making it lock-free. Currently, GC lock cause some serious performance penalties for multithreaded code when frequent allocations take place.I agree. A first step would be easy to do with std.allocator's thread-local freelists. -- Andrei
Mar 10 2016
On Thursday, 10 March 2016 at 18:49:28 UTC, Daniel Kozak wrote:Dne 10.3.2016 v 19:39 ZombineDev via Digitalmars-d napsal(a):Well for 4. maybe, but after a couple of years... Most of the time Phobos can't be compiled with an older version of DMD not because it relies on new features, but because there are bugs in the older version of DMD. As for 3. - druntime is very tightly coupled with the compiler. Quite often changes in the compiler rely on changes in druntime - e.g. the new DWARF exceptions, C++ and Objective-C support, better code coverage and GC profiling, things in object.d, built-in AAs and arrays, linking with MSVC 2015 libc, shared libraries, TLS, etc. On the other hand, I think that using Phobos in DMDFE is a good idea and is far more likely to happen.On Thursday, 10 March 2016 at 17:46:15 UTC, Daniel Kozak wrote:I know how it works now :). But if D is stable enought 1 should be good enought to do 3 and 4 (yes you can't rely on features and fixes from 2). I think we should avoid to use latest features when developing dmd, druntime and phobos (maybe we should use only some subset or make some rules like new version must be compilable by 3 previous version of dmd)Dne 10.3.2016 v 18:15 ZombineDev via Digitalmars-d napsal(a):The build process looks something like this currently: (a bit simplified) 1. Download old dmd. 2. Build new dmd with 1. 3. Build druntime with 2. 4. Build phobos with 2 and link it with 3. 5. Package 2, 3 and 4 into a release zip You can't use 1 for building 3 (or 4), because 3 (or 4) may depend on a new feature only present in 2. Also you can't rely on 5 for building 3 (or 4) because you'll get a cyclic dependency.On Thursday, 10 March 2016 at 16:46:59 UTC, Jeremy DeHaan wrote:But we allready need existing D compiler to build D frontend, so maybe we can use this full D compiler for building druntime in the future?On Thursday, 10 March 2016 at 15:24:48 UTC, Andrei Alexandrescu wrote:There's no problem in using std.experimental.* internally, because if the API changes, the one who changes the API will also need to change all internal usages, otherwise his pull request won't compile. About using Phobos in Druntime - you can't directly import Phobos, since it's files are not present when Druntime is built.On 3/9/16 10:40 PM, NX wrote:I was looking into this, but I am slightly hesitant. Should the gc use something in std.experimental? Or should we think that's ok? I also know that there are some people that think we should avoid using Phobos in druntime.I think the best possible improvement for GC is making it lock-free. Currently, GC lock cause some serious performance penalties for multithreaded code when frequent allocations take place.I agree. A first step would be easy to do with std.allocator's thread-local freelists. -- Andrei
Mar 10 2016
Thank you all for the feedback. I think I might still need a little more feedback as to what the project should actually entail, but here's what it's looking like so far: Implement lock free allocation using std.experimental.allocator's freelists (SharedFreeList? It was the only thing in the documentation that specifically mentions lock free allocation) Implement a generational garbage collector Implement precise garbage collection (possibly piggybacking off of Rainer's work) Does anyone think that might be too much to take on or does it sound pretty good? I have other garbage collector things I'd like to explore, but I they should probably go in the "if time allows" category at this point. Jeremy
Mar 11 2016
On 12/03/16 4:13 AM, Jeremy DeHaan wrote:Thank you all for the feedback. I think I might still need a little more feedback as to what the project should actually entail, but here's what it's looking like so far: Implement lock free allocation using std.experimental.allocator's freelists (SharedFreeList? It was the only thing in the documentation that specifically mentions lock free allocation) Implement a generational garbage collector Implement precise garbage collection (possibly piggybacking off of Rainer's work) Does anyone think that might be too much to take on or does it sound pretty good? I have other garbage collector things I'd like to explore, but I they should probably go in the "if time allows" category at this point. JeremyUgh I would say that is a lot. So here is what I would do, have two lists of goals. One where you would happy if you got done the other if you are able to do it, you will. That way your prioritize what you want to get done and not the others.
Mar 11 2016
Jeremy DeHaan wrote:Thank you all for the feedback. I think I might still need a little more feedback as to what the project should actually entail, but here's what it's looking like so far: Implement lock free allocation using std.experimental.allocator's freelists (SharedFreeList? It was the only thing in the documentation that specifically mentions lock free allocation) Implement a generational garbage collector Implement precise garbage collection (possibly piggybacking off of Rainer's work) Does anyone think that might be too much to take on or does it sound pretty good? I have other garbage collector things I'd like to explore, but I they should probably go in the "if time allows" category at this point. JeremyIf I may make a suggestion. The lock free work is unlikely to require the entirety of GSoC. And the precise GC is the next most important thing on your list and will have the biggest impact on GC performance. Once the GC is fully precise we can implement a fully compacting GC, which improves the usefulness of generational collection. Additionally, precision allows a significant amount of work to be done in improving the performance of the GC in multi-threaded scenarios. It should be quite possible to avoid needing fork() or anything like it altogether. I know that the .NET GC doesn't need to use anything like it. Also, I would strongly recommend getting this book and reading it cover to cover before starting: http://www.amazon.com/gp/product/1420082795/ref=pd_lpo_sbs_dp_ss_1?pf_rd_p=1944687562&pf_rd_s=lpo-top-stripe-1&pf_rd_t=201&pf_rd_i=0471941484&pf_rd_m=ATVPDKIKX0DER&pf_rd_r=0QD9X3E5QATSBCBT6BMM I think Rainer's Precise GC only dealt with Heap objects. What needs work is Register and Stack scanning. Expanding on Rainer's existing Precise GC work is the right idea, but Register and Stack scanning is a very big project in it's own right.I suspect it will take up the remainder of your GSoC time. :) -- // Adam Wilson // quiet.dlang.dev
Mar 12 2016
On Sat, 12 Mar 2016 00:50:06 -0800, Adam Wilson wrote:If I may make a suggestion. The lock free work is unlikely to require the entirety of GSoC. And the precise GC is the next most important thing on your list and will have the biggest impact on GC performance. Once the GC is fully precise we can implement a fully compacting GC,Unions mean we can't be entirely precise, but we can get pretty close. Stack maps and type pointer maps are entirely separate efforts, and type pointer maps will be easier to implement, so the first pass will probably only include type pointer maps. Switching to a compacting GC is risky. It's a transparent change if you're only working with D code, but if you interface to other languages, it breaks things horribly.which improves the usefulness of generational collection. Additionally, precision allows a significant amount of work to be done in improving the performance of the GC in multi-threaded scenarios.Primarily because less memory needs to be scanned on average, so less work needs to be done and GC pauses are shorter. There's nothing specific to multithreading there.It should be quite possible to avoid needing fork() or anything like it altogether.fork() is used to avoid pausing the program at all during the mark phase.I know that the .NET GC doesn't need to use anything like it.The .NET GC stops all managed threads during a collection. During a nursery or gen 1 collection, on average it's probably fast enough that forking wouldn't give that much benefit. During a full collection, if your application is latency-sensitive, a forking strategy would be better. But .NET can't do that because Windows doesn't have a fork() system call, and it's expensive to emulate.
Mar 12 2016
On Saturday, 12 March 2016 at 08:50:06 UTC, Adam Wilson wrote:If I may make a suggestion. The lock free work is unlikely to require the entirety of GSoC. And the precise GC is the next most important thing on your list and will have the biggest impact on GC performance.Rainer has two different precise GC's in pull requests right now and both are slower than the current one unless there are false pointers. I would expect anything I come up with to largely act the same. The reason I keep pushing for a generational garbage collector is because I think it would be where we would see the most benefit in terms of general performance.Once the GC is fully precise we can implement a fully compacting GC, which improves the usefulness of generational collection. Additionally, precision allows a significant amount of work to be done in improving the performance of the GC in multi-threaded scenarios. It should be quite possible to avoid needing fork() or anything like it altogether. I know that the .NET GC doesn't need to use anything like it.A compacting GC could be built on top of a generational GC without much difficulty I would think, if we wanted to go that route. The compaction would just happen as part of a collection cycle when things are moved into the next generation. I have concerns about doing any compaction though, mostly because D can have both references and pointers to objects, and I don't know how we would want to go about updating pointers. Especially since pointers to D objects can exists in C and C++ code. Another reason I want to work on a generational GC is because this can also lead into a concurrent GC without trying to emulate fork() on windows. The .Net GC has 3 generations with the last one having its collections running concurrently because it is unlikely to affect anything else going on. They don't bother running the other generations concurrently because their collections are really short. We could do something similar. Perhaps someone more intimate with GC's than I am can speak up, but I think that a generational GC would be the best use of time in relation to performance gains. Other things can then be implemented on top of it later.Also, I would strongly recommend getting this book and reading it cover to cover before starting: http://www.amazon.com/gp/product/1420082795/ref=pd_lpo_sbs_dp_ss_1?pf_rd_p=1944687562&pf_rd_s=lpo-top-stripe-1&pf_rd_t=201&pf_rd_i=0471941484&pf_rd_m=ATVPDKIKX0DER&pf_rd_r=0QD9X3E5QATSBCBT6BMMThank you for the link to the book. I was planning on this one http://www.amazon.com/gp/product/0471941484/ , but maybe I will buy them both.
Mar 12 2016
Jeremy DeHaan wrote:On Saturday, 12 March 2016 at 08:50:06 UTC, Adam Wilson wrote:I would contend that it's not quite that simple. Of course the precise GC is slower, it's scanning more stuff. The current conservative GC by definition can't figure out large chunks of memory so it won't even bother scanning them. What it does to those things it can't scan. And what it can't scan it has to pin, which means the memory can't be moved. You can't build a fully generational GC until you can move everything. They talk about this problem in the books. However, once you have a precise GC, a fully moving GC becomes possible and only then do the performance benefits of a generational GC become realized. I strongly suspect that a partially moving GC will perform worse than the precise GC. And I know for a fact that it will suffer enormously from fragmentation and Out-of-Memory issues. Precision is the basis from which all higher order GC functions flow.If I may make a suggestion. The lock free work is unlikely to require the entirety of GSoC. And the precise GC is the next most important thing on your list and will have the biggest impact on GC performance.Rainer has two different precise GC's in pull requests right now and both are slower than the current one unless there are false pointers. I would expect anything I come up with to largely act the same. The reason I keep pushing for a generational garbage collector is because I think it would be where we would see the most benefit in terms of general performance.Erm, a generational GC is, in practice, a moving GC as it must move the blobs around from the various heaps. The commonly used method is Stop-and-Copy. As for the C/C++ code problem various solutions have been proposed here. For one, D knows the difference between the two, so pinning is possible there. Or a separate heap space can be used. AFIAK each OS allows you to specify multiple heaps, so you could specify an unmanaged heap to go along with the managed heaps. IIRC, C++/CLI does something similar on Windows. This would be the preferred solution IMO. (Look into HeapCreate on Windows for example.)Once the GC is fully precise we can implement a fully compacting GC, which improves the usefulness of generational collection. Additionally, precision allows a significant amount of work to be done in improving the performance of the GC in multi-threaded scenarios. It should be quite possible to avoid needing fork() or anything like it altogether. I know that the .NET GC doesn't need to use anything like it.A compacting GC could be built on top of a generational GC without much difficulty I would think, if we wanted to go that route. The compaction would just happen as part of a collection cycle when things are moved into the next generation. I have concerns about doing any compaction though, mostly because D can have both references and pointers to objects, and I don't know how we would want to go about updating pointers. Especially since pointers to D objects can exists in C and C++ code.Another reason I want to work on a generational GC is because this can also lead into a concurrent GC without trying to emulate fork() on windows. The .Net GC has 3 generations with the last one having its collections running concurrently because it is unlikely to affect anything else going on. They don't bother running the other generations concurrently because their collections are really short. We could do something similar.A concurrent GC without precision is mostly useless because the GC roots are primarily derived from the stack which the thread was spawned and the stack roots are not scanned in a conservative GC. Additionally, the Async/Await pattern is not practical without a precise GC.Perhaps someone more intimate with GC's than I am can speak up, but I think that a generational GC would be the best use of time in relation to performance gains. Other things can then be implemented on top of it later.The problem is that performance is the most obvious problem with the GC, but not the most important problem, or most useful. Precision enables a fully moving, fully generational, fully concurrent, GC. It also enables other major features that have nothing to do with GC.Obviously you can do whatever you want. :) And I'll admit that precision isn't sexy, and it won't improve performance right away, but it opens the door way to further work that will provide those things. It's hard and thankless work, but you'll be better setup for GSoC 2017 for adding generational and concurrency support. My opinion is that we need to focus on improving the core of the GC so that it can better perform the high-order GC functions. Yes, it will make the GC slower, but it's already bad so I doubt people will notice much. We've discussed (and by discussed I mean thermonuclear flame-wars) the GC before numerous times before at great length on this forum, so I am glad to see that this thread hasn't turned into that, yet. -- // Adam Wilson // quiet.dlang.devAlso, I would strongly recommend getting this book and reading it cover to cover before starting: http://www.amazon.com/gp/product/1420082795/ref=pd_lpo_sbs_dp_ss_1?pf_rd_p=1944687562&pf_rd_s=lpo-top-stripe-1&pf_rd_t=201&pf_rd_i=0471941484&pf_rd_m=ATVPDKIKX0DER&pf_rd_r=0QD9X3E5QATSBCBT6BMMThank you for the link to the book. I was planning on this one http://www.amazon.com/gp/product/0471941484/ , but maybe I will buy them both.
Mar 12 2016
On Sat, 12 Mar 2016 13:23:35 -0800, Adam Wilson wrote: To start off, let's talk terminology. You seem to be using nonstandard terminology and possibly misunderstanding standard terminology. A GC scan is the mark phase of a mark/sweep collector (and specifically the part where the GC examines roots and allocated memory, if that matters). The GC searches for pointers to GC memory. Simple GCs only need to record whether there is a pointer to a given allocation, not where those pointers are coming from. A conservative GC is one that will encounter some things that aren't pointers but must be scanned as if they were pointers. In D, for instance, a union between a pointer and a size_t forces collectors to be conservative. A false pointer is something that a conservative GC must treat as a pointer and appears to point to an object allocated by that GC. For instance, unions in D allow us to create false pointers. A precise GC is a GC that never mistakes a non-pointer for a pointer. A moving GC is a GC that can relocate objects. A generational GC is one that can perform a collection on part of the heap without dealing with the full heap. Those parts tend to be organized by how long an object has survived. This does not require a precise or moving GC, though it works better with a precise, moving GC. A write barrier is a way to record which sections of memory have been altered since the last GC collection. This lets the GC scan only the changed data -- useful to any GC, especially useful to generational ones. A "partially moving" GC does not exist, as far as I know.Jeremy DeHaan wrote:Precise GCs are slower because they must reference some data telling them which items on the stack and in allocations represent pointers and which do not. This data takes up cache space, and having less cache for memory being scanned slows things down. Correlating two pieces of data takes extra time, too. The hope with a precise GC is to avoid false pointers. A false pointer to an object that holds references to many other objects can be costly, both in GC time and in excess memory usage. In D, we have a tiny bit of type information surfaced to the GC to reduce the incidence of false pointers. We could additionally take advantage of the fact that numbers tend to be small and people tend to use 4 to 8 byte numbers: we could arrange our address space to avoid ranges where false pointers are likely. But that's probably unnecessary.On Saturday, 12 March 2016 at 08:50:06 UTC, Adam Wilson wrote:I would contend that it's not quite that simple. Of course the precise GC is slower, it's scanning more stuff.If I may make a suggestion. The lock free work is unlikely to require the entirety of GSoC. And the precise GC is the next most important thing on your list and will have the biggest impact on GC performance.Rainer has two different precise GC's in pull requests right now and both are slower than the current one unless there are false pointers. I would expect anything I come up with to largely act the same. The reason I keep pushing for a generational garbage collector is because I think it would be where we would see the most benefit in terms of general performance.The current conservative GC by definition can't figure out large chunks of memory so it won't even bother scanning them.It must scan anything that might be a pointer. A precise GC reduces the number of things that might be pointers but aren't to zero. Therefore a precise GC scans less. If a conservative GC didn't scan something that might be a pointer, it would sometimes free an object that still had live references to it. This would result in memory corruption or segmentation faults.What it does to those things it can't scan. And what it can't scan it has to pin, which means the memory can't be moved.If a moving GC cannot precisely identify all pointers to an object, it is not allowed to move that object. If it can, it is allowed to move that object. A conservative GC can be a moving GC. It must classify things as "definitely pointer", "maybe pointer", and "not pointer". It cannot move an object with a "maybe pointer" pointing to it, but it can move an object with several "definitely pointers" pointing to it. In D, thanks to unions, we can never create a fully precise collector, but we can create a moving collector. Objects pointed to from a union can't be moved, but few objects will be pointed to from a union.You can't build a fully generational GC until you can move everything.You can build a more efficient generational GC if you can move objects. However, it's not impossible to build a non-moving generational GC. Instead of having large blocks of address space for each generation, you would have each Pool indicate its generation. Then you have to find the Pool for a pointer and check that flag. That's comparatively slow, but it'd still be faster to do a Gen 0 scan that way than to do a full scan with D's current collector (assuming you implement write barriers).They talk about this problem in the books. However, once you have a precise GC, a fully moving GC becomes possible and only then do the performance benefits of a generational GC become realized.The performance benefit of a moving and generational GC over a non-moving generational GC is that you can have a contiguous section of address space for each generation, allocated densely. You get this with less overhead. However, a moving GC can be a pessimization for user code. If you allocate several objects in sequence, with many GCs you'll get memory allocated for them that is contiguous or nearly contiguous. For small objects, this is probably going to be on the same page of memory. But a moving GC is free to move them to different pages. So you need to exercise caution.The Boehm collector is generational, conservative, and not moving. You probably conflate "moving" and "generational" because of the JVM.I have concerns about doing any compaction though, mostly because D can have both references and pointers to objects, and I don't know how we would want to go about updating pointers. Especially since pointers to D objects can exists in C and C++ code.Erm, a generational GC is, in practice, a moving GC as it must move the blobs around from the various heaps.As for the C/C++ code problem various solutions have been proposed here. For one, D knows the difference between the two, so pinning is possible there.There's nothing in the type system that says whether you can pass a variable to a C/C++ function. So, no, D *doesn't* know the difference between them. And there's no GC.pin function, so people aren't currently telling your GC when something's accessible to C/C++ code. That means a moving collector in D will produce segmentation faults and memory corruption in existing, currently working code.AFIAK each OS allows you to specify multiple heapsYou get one address space. You get to use mmap to ask for memory at a given address. There's no heap_t data structure to provide to malloc or mmap. You might have been confused by literature referring to multiple heaps. That's just a single allocator deciding to divide its address space logically. There's no OS-level enforcement.A concurrent GC without precision is mostly uselessThe Boehm collector is not precise. It is concurrent. Its concurrency reduces the amount of time a GC collection takes. This is mostly orthogonal to the benefits of a precise collector in reducing the amount of memory a process uses.because the GC roots are primarily derived from the stack which the thread was spawned and the stack roots are not scanned in a conservative GC.They *are* scanned in a conservative GC. If you don't scan memory that might contain pointers (such as the stack), your GC will inevitably cause segmentation faults and memory corruption. D's GC is conservative and does not cause segmentation faults or memory corruption, therefore it must scan the stack.
Mar 12 2016
Chris Wright wrote:On Sat, 12 Mar 2016 13:23:35 -0800, Adam Wilson wrote: To start off, let's talk terminology. You seem to be using nonstandard terminology and possibly misunderstanding standard terminology. A GC scan is the mark phase of a mark/sweep collector (and specifically the part where the GC examines roots and allocated memory, if that matters). The GC searches for pointers to GC memory. Simple GCs only need to record whether there is a pointer to a given allocation, not where those pointers are coming from. A conservative GC is one that will encounter some things that aren't pointers but must be scanned as if they were pointers. In D, for instance, a union between a pointer and a size_t forces collectors to be conservative. A false pointer is something that a conservative GC must treat as a pointer and appears to point to an object allocated by that GC. For instance, unions in D allow us to create false pointers. A precise GC is a GC that never mistakes a non-pointer for a pointer. A moving GC is a GC that can relocate objects. A generational GC is one that can perform a collection on part of the heap without dealing with the full heap. Those parts tend to be organized by how long an object has survived. This does not require a precise or moving GC, though it works better with a precise, moving GC.I didn't intend to confuse, but I was not writing to neophyte's, and I used linguistic shortcuts.A write barrier is a way to record which sections of memory have been altered since the last GC collection. This lets the GC scan only the changed data -- useful to any GC, especially useful to generational ones. A "partially moving" GC does not exist, as far as I know.Yep, it's a Bad Idea.Yep the data is usually held with the object in a header. Hence the "scanning more stuff".Jeremy DeHaan wrote:Precise GCs are slower because they must reference some data telling them which items on the stack and in allocations represent pointers and which do not. This data takes up cache space, and having less cache for memory being scanned slows things down. Correlating two pieces of data takes extra time, too.On Saturday, 12 March 2016 at 08:50:06 UTC, Adam Wilson wrote:I would contend that it's not quite that simple. Of course the precise GC is slower, it's scanning more stuff.If I may make a suggestion. The lock free work is unlikely to require the entirety of GSoC. And the precise GC is the next most important thing on your list and will have the biggest impact on GC performance.Rainer has two different precise GC's in pull requests right now and both are slower than the current one unless there are false pointers. I would expect anything I come up with to largely act the same. The reason I keep pushing for a generational garbage collector is because I think it would be where we would see the most benefit in terms of general performance.The hope with a precise GC is to avoid false pointers. A false pointer to an object that holds references to many other objects can be costly, both in GC time and in excess memory usage. In D, we have a tiny bit of type information surfaced to the GC to reduce the incidence of false pointers. We could additionally take advantage of the fact that numbers tend to be small and people tend to use 4 to 8 byte numbers: we could arrange our address space to avoid ranges where false pointers are likely. But that's probably unnecessary.We are not limited by what we currently have. Most of Rainer's work involved surfacing more data to the GC.Yep, precise GC's aren't totally perf-negative.The current conservative GC by definition can't figure out large chunks of memory so it won't even bother scanning them.It must scan anything that might be a pointer. A precise GC reduces the number of things that might be pointers but aren't to zero. Therefore a precise GC scans less.If a conservative GC didn't scan something that might be a pointer, it would sometimes free an object that still had live references to it. This would result in memory corruption or segmentation faults.Actually, we probably could with modifications to the compiler. A little metadata goes a long way. And I know that Walter would accept that PR. I don't know for sure if it can be done, but I've seen precious little evidence one way or the other. Let's investigate before making absolute statements about what we can do. I know from discussions with Walter that the compiler's AST contains a wealth of information, and that more can be added.What it does to those things it can't scan. And what it can't scan it has to pin, which means the memory can't be moved.If a moving GC cannot precisely identify all pointers to an object, it is not allowed to move that object. If it can, it is allowed to move that object. A conservative GC can be a moving GC. It must classify things as "definitely pointer", "maybe pointer", and "not pointer". It cannot move an object with a "maybe pointer" pointing to it, but it can move an object with several "definitely pointers" pointing to it. In D, thanks to unions, we can never create a fully precise collector, but we can create a moving collector. Objects pointed to from a union can't be moved, but few objects will be pointed to from a union.My apologies I didn't intend to state that it was impossible. I was attempting to emphatically question the wisdom of doing so, particular when the potential performance benefits are dubious at best. My point is that we should do this right, if we keep blindly pursuing performance without taking a considered approach to achieving it, we'll never get there. The GC will just become an ever larger pile of hacks that increase performance in specific cases.You can't build a fully generational GC until you can move everything.You can build a more efficient generational GC if you can move objects. However, it's not impossible to build a non-moving generational GC. Instead of having large blocks of address space for each generation, you would have each Pool indicate its generation. Then you have to find the Pool for a pointer and check that flag. That's comparatively slow, but it'd still be faster to do a Gen 0 scan that way than to do a full scan with D's current collector (assuming you implement write barriers).That's actually pretty important. Precision enables moving, moving has clear and enumerable performance benefits. QED.They talk about this problem in the books. However, once you have a precise GC, a fully moving GC becomes possible and only then do the performance benefits of a generational GC become realized.The performance benefit of a moving and generational GC over a non-moving generational GC is that you can have a contiguous section of address space for each generation, allocated densely. You get this with less overhead.However, a moving GC can be a pessimization for user code. If you allocate several objects in sequence, with many GCs you'll get memory allocated for them that is contiguous or nearly contiguous. For small objects, this is probably going to be on the same page of memory. But a moving GC is free to move them to different pages. So you need to exercise caution.Yes ... and this is also dealt with in the books, including basic algorithms to help combat these problems. Additionally, as evidenced by the fact that every major GC (JVM, CLR, et al.) system implements a moving GC, I'd argue that the benefits greatly outweigh any potential risks. And we have pull requests to further optimize the algorithms. We don't have a system at all yet, let's build it first, then we can work on optimizing it.Ok, yes, but lets be honest, it's what we actually want, and more importantly, expect. Precision makes it happen, efficiently. Even the C++ community doesn't take the Boehm GC very seriously. We have an opportunity to prove the C-style language can have a performant GC.The Boehm collector is generational, conservative, and not moving. You probably conflate "moving" and "generational" because of the JVM.I have concerns about doing any compaction though, mostly because D can have both references and pointers to objects, and I don't know how we would want to go about updating pointers. Especially since pointers to D objects can exists in C and C++ code.Erm, a generational GC is, in practice, a moving GC as it must move the blobs around from the various heaps.D does differentiate between a GC reference and pointer. Pin the pointers. Yes, you can create get pointer from a reference, and doing so is a pinning operation. We'd have to add a GC.pin function, so long as the compiler emits it for us. :)As for the C/C++ code problem various solutions have been proposed here. For one, D knows the difference between the two, so pinning is possible there.There's nothing in the type system that says whether you can pass a variable to a C/C++ function. So, no, D *doesn't* know the difference between them. And there's no GC.pin function, so people aren't currently telling your GC when something's accessible to C/C++ code.That means a moving collector in D will produce segmentation faults and memory corruption in existing, currently working code.Windows has that concept via HeapCreate shown here: https://msdn.microsoft.com/en-us/library/windows/desktop/aa366599(v=vs.85).aspx I'd be surprised if it's not available on other OS's, and yes, it is OS enforced. It might not work exactly how you like it, but that's a far cry from "not provided".AFIAK each OS allows you to specify multiple heapsYou get one address space. You get to use mmap to ask for memory at a given address. There's no heap_t data structure to provide to malloc or mmap. You might have been confused by literature referring to multiple heaps. That's just a single allocator deciding to divide its address space logically. There's no OS-level enforcement.I may have been confusing any precise GC with Rainer's precise GC which only did precise Heap scans. Apologies for the confusion. I will admit that I've done better work with my wording than what is in my last message. However, my fundamental point is sound, precision has benefits across all GC workloads and enables scenarios beyond the GC itself. Extending the compiler can work around many of the issues with false pointers. And let's engage the elephant in the room. One of Rainer's main learning's was the precision permeates the compiler and GC, you have to deal with it everywhere. So you end up rebuilding the GC from the ground up when you add it. What you are proposing is building a conservative-generational GC that will have to be jettisoned and rewritten *again* once precision is added for a performance boost that, by your own admission, may not actually happen. How is this a good use of time? My statement stands. Precision is the basis from which all high-order GC functions flow. Yes, you can work around not having it, but such workarounds tend to be just that. More importantly you are inevitably going to end up doing a half-implementation of precision anyways because each of the high-order functions need parts of that information. I know precision isn't fun. But it's extremely important work that needs to get done. And once we have it, the world of GC algorithms opens before us. Until then we are intentionally limiting ourselves. One of the biggest complaints about D is the GC, fine, so instead of doing yet another hack-job on something that may not actually improve GC performance, let's do it right and lay the groundwork for a modern GC that allows us to really compete with JVM/CLR. -- // Adam Wilson // quiet.dlang.devA concurrent GC without precision is mostly uselessThe Boehm collector is not precise. It is concurrent. Its concurrency reduces the amount of time a GC collection takes. This is mostly orthogonal to the benefits of a precise collector in reducing the amount of memory a process uses.because the GC roots are primarily derived from the stack which the thread was spawned and the stack roots are not scanned in a conservative GC.They *are* scanned in a conservative GC. If you don't scan memory that might contain pointers (such as the stack), your GC will inevitably cause segmentation faults and memory corruption. D's GC is conservative and does not cause segmentation faults or memory corruption, therefore it must scan the stack.
Mar 13 2016
On Sun, 13 Mar 2016 12:43:37 -0700, Adam Wilson wrote:It's not a standard term. Google's only seeing about four references to the term, none of them authoritative or definitive. Since it's non- standard, it would behoove you to define it. I think you mean a GC in which some objects are pinned (due to limitations in the GC rather than the user explicitly requesting that they be pinned). Regardless, you ought to define the term and explain why it's a bad idea instead of providing a bare assertion. It doesn't make any difference to the operation of the GC *why* something's pinned; what matters is how much is pinned. So, for instance, you could point to a conservative moving GC, select one circumstance in which it can't move objects, tell us how much memory that ends up pinning on average, and then you'll have an argument.A "partially moving" GC does not exist, as far as I know.Yep, it's a Bad Idea.Sure. My point is that we don't *need* a lot of type information in the GC to get reasonable advantages. A little type information goes a long way, and a little caution in placing objects in the heap can help too.In D, we have a tiny bit of type information surfaced to the GC to reduce the incidence of false pointers. We could additionally take advantage of the fact that numbers tend to be small and people tend to use 4 to 8 byte numbers: we could arrange our address space to avoid ranges where false pointers are likely. But that's probably unnecessary.We are not limited by what we currently have. Most of Rainer's work involved surfacing more data to the GC.No, we couldn't. I've been assuming that we can make whatever modifications we want to the compiler for free. The compiler has static type information. This explicitly contains ambiguities. When it's based solely on data available at compile time, the compiler can resolve the ambiguities -- and you could have written the code without using a union. But in the majority of cases, people use a union because they must. That means which values are set depends on runtime data, and the compiler can't tell you which fields will be set. We could provide a runtime function that is invoked when you set a value inside a union. That won't work very well. Consider: struct A { long a; long b; } struct B { double a; void* p; } union AB { A a; B b; } extern void foo(ref A a); AB ab; ab.b.p = GC.malloc(512); foo(ab.a); Was that pointer overwritten? No clue! The compiler can't tell. When compiling the definition of foo(), it didn't have any idea whether the struct was a union member. Now we have to intercept *every* indirect write to *any* value with a runtime call, even if you never use a union in your program -- Oh look, now Python's beating us by an order of magnitude in benchmarks. What do we get for it? Well, the 1% of programs that use this pattern will allow the GC to move an additional 0.1% of their objects. Or we leave the GC conservative in this one case. We can detect this case pretty easily and treat apparent pointers as pinned. Everything will work, and the GC will be better at releasing memory than it currently is.In D, thanks to unions, we can never create a fully precise collector, but we can create a moving collector. Objects pointed to from a union can't be moved, but few objects will be pointed to from a union.Actually, we probably could with modifications to the compiler.
Mar 13 2016
Chris Wright wrote:On Sun, 13 Mar 2016 12:43:37 -0700, Adam Wilson wrote:Is there an implementation of a conservative moving (compacting) GC out there? I'm not aware of one, but there are a lot of GC's out there. Boehm isn't.It's not a standard term. Google's only seeing about four references to the term, none of them authoritative or definitive. Since it's non- standard, it would behoove you to define it. I think you mean a GC in which some objects are pinned (due to limitations in the GC rather than the user explicitly requesting that they be pinned). Regardless, you ought to define the term and explain why it's a bad idea instead of providing a bare assertion. It doesn't make any difference to the operation of the GC *why* something's pinned; what matters is how much is pinned. So, for instance, you could point to a conservative moving GC, select one circumstance in which it can't move objects, tell us how much memory that ends up pinning on average, and then you'll have an argument.A "partially moving" GC does not exist, as far as I know.Yep, it's a Bad Idea.Ok, so yes, it needs conservative scanning in this case. Because even in absolutely against pinning and unions aren't going away, and if the GC would have to be go conservative then so be it. And given that unions are a pretty niche use case, I don't think we should drop the idea of precision just because it can't be 100% precise, so we end up with a few conservatively-scanned edge cases, big deal, the benefits of precision to the whole ecosystem would be enormous in the general case. Is this a debate about precise vs. non-precise GC or are we just bikeshedding about terminology and technical details? ;) I moved recently and lost track of the GC book for awhile so I've been refreshing my terminology. :P -- // Adam Wilson // quiet.dlang.devSure. My point is that we don't *need* a lot of type information in the GC to get reasonable advantages. A little type information goes a long way, and a little caution in placing objects in the heap can help too.In D, we have a tiny bit of type information surfaced to the GC to reduce the incidence of false pointers. We could additionally take advantage of the fact that numbers tend to be small and people tend to use 4 to 8 byte numbers: we could arrange our address space to avoid ranges where false pointers are likely. But that's probably unnecessary.We are not limited by what we currently have. Most of Rainer's work involved surfacing more data to the GC.No, we couldn't. I've been assuming that we can make whatever modifications we want to the compiler for free. The compiler has static type information. This explicitly contains ambiguities. When it's based solely on data available at compile time, the compiler can resolve the ambiguities -- and you could have written the code without using a union. But in the majority of cases, people use a union because they must. That means which values are set depends on runtime data, and the compiler can't tell you which fields will be set. We could provide a runtime function that is invoked when you set a value inside a union. That won't work very well. Consider: struct A { long a; long b; } struct B { double a; void* p; } union AB { A a; B b; } extern void foo(ref A a); AB ab; ab.b.p = GC.malloc(512); foo(ab.a); Was that pointer overwritten? No clue! The compiler can't tell. When compiling the definition of foo(), it didn't have any idea whether the struct was a union member. Now we have to intercept *every* indirect write to *any* value with a runtime call, even if you never use a union in your program -- Oh look, now Python's beating us by an order of magnitude in benchmarks. What do we get for it? Well, the 1% of programs that use this pattern will allow the GC to move an additional 0.1% of their objects. Or we leave the GC conservative in this one case. We can detect this case pretty easily and treat apparent pointers as pinned. Everything will work, and the GC will be better at releasing memory than it currently is.In D, thanks to unions, we can never create a fully precise collector, but we can create a moving collector. Objects pointed to from a union can't be moved, but few objects will be pointed to from a union.Actually, we probably could with modifications to the compiler.
Mar 13 2016
On Sunday, 13 March 2016 at 23:34:44 UTC, Adam Wilson wrote:Is there an implementation of a conservative moving (compacting) GC out there? I'm not aware of one, but there are a lot of GC's out there. Boehm isn't.That is impossible, you need to know what is and isn't a pointer to be able to move things around.
Mar 13 2016
On Sun, 13 Mar 2016 23:46:51 +0000, deadalnix wrote:On Sunday, 13 March 2016 at 23:34:44 UTC, Adam Wilson wrote:Let's say we do the most precise GC possible for D. We can categorize things into: * maybe a pointer; it's part of a union that contains a non-pointer element * definitely a pointer (we know because of stack pointer maps and type information, and there aren't any unions mucking it up) * definitely not a pointer We can move anything that has only "definitely a pointer"s pointing at it. We can't move anything that has one or more "maybe a pointer"s pointing at it. I keep saying this...Is there an implementation of a conservative moving (compacting) GC out there? I'm not aware of one, but there are a lot of GC's out there. Boehm isn't.That is impossible, you need to know what is and isn't a pointer to be able to move things around.
Mar 13 2016
deadalnix wrote:On Sunday, 13 March 2016 at 23:34:44 UTC, Adam Wilson wrote:It was a rhetorical question. I was just leaving the door open for someone to surprise me. ;-) -- // Adam Wilson // quiet.dlang.devIs there an implementation of a conservative moving (compacting) GC out there? I'm not aware of one, but there are a lot of GC's out there. Boehm isn't.That is impossible, you need to know what is and isn't a pointer to be able to move things around.
Mar 13 2016
On Sun, 13 Mar 2016 16:34:44 -0700, Adam Wilson wrote:Is this a debate about precise vs. non-precise GC or are we just bikeshedding about terminology and technical details?You made a large number of assertions about garbage collection and they were almost all wrong. It's not about minutiae -- D can't reasonably have a precise collector, and you were saying that nothing else useful can happen before you make the GC precise. A non-precise collector can and often should be concurrent, and you were saying this was useless. These are big things. They affect prioritization of major projects. Getting them wrong could mislead a GSoC student to the point of not accomplishing much when they could have plucked lower-hanging fruit successfully.
Mar 13 2016
Chris Wright wrote:On Sun, 13 Mar 2016 16:34:44 -0700, Adam Wilson wrote:Erm. That seems a bit extreme. You are asserting that D can't have a precise garbage collector, yet only have presented one case where the compiler cannot provide definitive information about the classification of pointer. Unions. Ok, so Unions, a relatively low-usage scenario with a valid work-around (assume it's a pointer). We can teach the compiler and GC how to pin C-style pointers so that's not it. Lastly, Rainer seemed to think a precise GC could be done, and he then went and did it ... so "can't reasonably have a precise collector" is a factually incorrect assertion. My GC lingo may be rusty, and I will admit to using superlatives incorrectly myself, but based on the existing evidence my assertions are hardly "almost all wrong". I'll clean up my lingo and retool my language if you'll tone down the superlatives. Of course GSoC students need guidance, I've been a GSoC Mentor for DLang before. If I didn't think that a precise GC was not only extremely important to D, but also within the ability of the student, I would not have pushed for it. "low-hanging fruit" is the reason D's quality is so spotty. We now have a chance with GSoC to get some quality time dedicated to a pain point, we should utilize that time to work on foundational issues. Then we (the community) can flame each other to death about the best algorithms to achieve the performance we desire. That is assuming we haven't scared him off. -- // Adam Wilson // quiet.dlang.devIs this a debate about precise vs. non-precise GC or are we just bikeshedding about terminology and technical details?You made a large number of assertions about garbage collection and they were almost all wrong. It's not about minutiae -- D can't reasonably have a precise collector, and you were saying that nothing else useful can happen before you make the GC precise. A non-precise collector can and often should be concurrent, and you were saying this was useless. These are big things. They affect prioritization of major projects. Getting them wrong could mislead a GSoC student to the point of not accomplishing much when they could have plucked lower-hanging fruit successfully.
Mar 13 2016
On Monday, 14 March 2016 at 01:38:50 UTC, Adam Wilson wrote:Lastly, Rainer seemed to think a precise GC could be done, and he then went and did it ... so "can't reasonably have a precise collector" is a factually incorrect assertion.IIRC, Rainer called it "mostly precise", and for a good reason. If we're ok with calling a partially conservative GC a precise one, then go on. ;) As I understand Rainer's GC works in VisualD now and it did solve the problem of leaks he had with the default GC, so there is certainly benefit in pursuing preciseness, even non-100% one. I think continuing Rainer's work toward "even more precise" GC, and adding important optimizations like allocations without a global lock and parallel scanning would be a great project, and a realistic one. As for moving and generational GC, I have big doubts one can do it without breaking 90% of old code written in D (passing with write barriers everywhere. Now D competes in speed with C++, with your proposed changes it will only try to compete with Java might be overestimating write-barriers costs, please feel free to prove me wrong. Fun part: if you make a moving GC and try to automate pinning and let the compiler do it instead of requiring programmers to rewrite most of their D code, you'll probably end up with a "partially moving GC" where half of the heap is pinned, some objects are moving here and there, and GC rarely releases memory back to OS because of those pinned objects.My GC lingo may be rusty, and I will admit to using superlatives incorrectly myself, but based on the existing evidence my assertions are hardly "almost all wrong".They did look so, sorry. Additional explanations helped.
Mar 13 2016
thedeemon wrote:On Monday, 14 March 2016 at 01:38:50 UTC, Adam Wilson wrote:conservative. ;-)Lastly, Rainer seemed to think a precise GC could be done, and he then went and did it ... so "can't reasonably have a precise collector" is a factually incorrect assertion.IIRC, Rainer called it "mostly precise", and for a good reason. If we're ok with calling a partially conservative GC a precise one, then go on. ;) As I understand Rainer's GC works in VisualD now and it did solve the problem of leaks he had with the default GC, so there is certainly benefit in pursuing preciseness, even non-100% one.I think continuing Rainer's work toward "even more precise" GC, and adding important optimizations like allocations without a global lock and parallel scanning would be a great project, and a realistic one.On this we are in complete agreement.As for moving and generational GC, I have big doubts one can do it without breaking 90% of old code written in D (passing pointers to C everywhere. Now D competes in speed with C++, with your proposed changes be lost. On the other hand I might be overestimating write-barriers costs, please feel free to prove me wrong.Well, conclusively proving the cost of write-barriers would require profiling. But I think we should be able to optimize the use of write-barriers to reduce the impact. Here is the thing, nobody wants to admit it, but we have lost the battle with C++, although to be honest, I don't think we ever had a prayer of winning it in the first place. They want C++ with the "warts" removed, for some personally distinct set of "warts", they're not really interested in D for itself. Maybe ... why don't instead of trying to compete with everybody else, we do our own thing, and do it very well. As long as we're just another "me too" operation people will treat us like we are. Let's focus on being the best D language out there. And D needs a GC, so let's build the best damn garbage collector the natively-compiled world has every seen.Fun part: if you make a moving GC and try to automate pinning and let the compiler do it instead of requiring programmers to rewrite most of their D code, you'll probably end up with a "partially moving GC" where half of the heap is pinned, some objects are moving here and there, and GC rarely releases memory back to OS because of those pinned objects.Yea, this is true, although I think getting most of the way there is going to be very useful in it's own right. And it would open the possibility of experimenting with different algorithms and strategies. Right now we're pretty limited in the types of algorithms we can use, precision offers us more options and more experiments. Let's get the foundation in place, then we can go nuts.Thank you. :) -- // Adam Wilson // quiet.dlang.devMy GC lingo may be rusty, and I will admit to using superlatives incorrectly myself, but based on the existing evidence my assertions are hardly "almost all wrong".They did look so, sorry. Additional explanations helped.
Mar 13 2016
I haven't had power for a couple of days, but it looks like the discussion has gone along pretty ok. After reading everything, I think I'm inclined to agree with Adam and the main focus of my proposal will be a precise GC (or as precise as we can get). I'll definitely need some guidance, but I'll be learning everything I can about GC's and precision until the start of the project. On Monday, 14 March 2016 at 05:28:13 UTC, Adam Wilson wrote:Maybe ... why don't instead of trying to compete with everybody else, we do our own thing, and do it very well. As long as we're just another "me too" operation people will treat us like we are. Let's focus on being the best D language out there. And D needs a GC, so let's build the best damn garbage collector the natively-compiled world has every seen.That is what I hope to work towards. Hopefully next year I can work on making a generational GC, or something else depending on how much time I could dedicate between when this GSoC is finished and the next begins. Adam, can you reach out to Craig and let him know you're willing to mentor this project if it get's accepted? I talked with him a few days ago via email and he said that someone would need to take it on but he wasn't sure who.
Mar 14 2016
On Tuesday, 15 March 2016 at 01:34:07 UTC, Jeremy DeHaan wrote:I haven't had power for a couple of days, but it looks like the discussion has gone along pretty ok.I would characterize it as very interesting, although I know very little about how GCs are implemented. I have a question, for the more knowledgeable... There was some discussion about the difficulty of handling unions and being able to ensure that pointers are pointers in a precise GC. Is there less work involved in getting the GC to be precise for safe code? Does the current GC take this into account?
Mar 14 2016
Jeremy DeHaan wrote:I haven't had power for a couple of days, but it looks like the discussion has gone along pretty ok. After reading everything, I think I'm inclined to agree with Adam and the main focus of my proposal will be a precise GC (or as precise as we can get). I'll definitely need some guidance, but I'll be learning everything I can about GC's and precision until the start of the project.Awesome! It's a complex topic, but you'll get broad exposure to all corners of the D ecosystem, it's the kind of experience that you can apply to all aspects of software engineering.On Monday, 14 March 2016 at 05:28:13 UTC, Adam Wilson wrote:Will do. And given the nature of the work you want to do, I think (personal opinion follows) your application will sail through the process. The application window opened today so my first recommendation would be to get started on your proposal as soon as possible, if you haven't already. GSoC has a basic outline of what you'll need to include here: http://write.flossmanuals.net/gsocstudentguide/writing-a-proposal/ I'd like to review the proposal before you submit it. And welcome aboard the crazy-train known as GSoC, it's a lot of work but it'll be buckets of fun! :) -- // Adam Wilson // quiet.dlang.devMaybe ... why don't instead of trying to compete with everybody else, we do our own thing, and do it very well. As long as we're just another "me too" operation people will treat us like we are. Let's focus on being the best D language out there. And D needs a GC, so let's build the best damn garbage collector the natively-compiled world has every seen.That is what I hope to work towards. Hopefully next year I can work on making a generational GC, or something else depending on how much time I could dedicate between when this GSoC is finished and the next begins. Adam, can you reach out to Craig and let him know you're willing to mentor this project if it get's accepted? I talked with him a few days ago via email and he said that someone would need to take it on but he wasn't sure who.
Mar 14 2016
On 15.03.2016 02:34, Jeremy DeHaan wrote:I haven't had power for a couple of days, but it looks like the discussion has gone along pretty ok. After reading everything, I think I'm inclined to agree with Adam and the main focus of my proposal will be a precise GC (or as precise as we can get). I'll definitely need some guidance, but I'll be learning everything I can about GC's and precision until the start of the project. On Monday, 14 March 2016 at 05:28:13 UTC, Adam Wilson wrote:Being always way behind reading the forum these days, I'm a little late and have not read all the messages in this thread thoroughly. Here are some thoughts: The precise GC used by Visual D is not only heap-precise, but also has type information to scan the data and TLS segments precisely. This is done with a patched dmd now based on version 2.066 and is not part of the PR. It uses the solution that I presented at DConf 2013: http://dconf.org/2013/talks/schuetze.pdf Creating information for stack and registers seems a lot of work with uncertain return: Apart from having to describe your code in very high detail, there is usually no guarantee that you can unwind the stack properly, especially when calling through C/C++ functions that you have no control of. A moving collector is possible IMO, but it needs write barriers, and the idea of adding these explicitly has met quite some resistance, especially by Walter. Write barriers are also necessary for incremental collections. It changes the way you can interact with C/C++ because it's no longer good enough to just keep a pointer used by C visible to the GC, you have to pin it explicitly. You can have very coarse barriers by using the page protection mechanism of the hardware, that's what I implemented for Windows, see http://rainers.github.io/visuald/druntime/concurrentgc.html. This is similar to Sociomantics concurrent GC using fork. It scans in the background, but not with multiple threads in parallel. Unfortunately, this work has rotton almost 2 years now, but I hope to get back to it some day...Maybe ... why don't instead of trying to compete with everybody else, we do our own thing, and do it very well. As long as we're just another "me too" operation people will treat us like we are. Let's focus on being the best D language out there. And D needs a GC, so let's build the best damn garbage collector the natively-compiled world has every seen.That is what I hope to work towards. Hopefully next year I can work on making a generational GC, or something else depending on how much time I could dedicate between when this GSoC is finished and the next begins. Adam, can you reach out to Craig and let him know you're willing to mentor this project if it get's accepted? I talked with him a few days ago via email and he said that someone would need to take it on but he wasn't sure who.
Mar 18 2016
On Friday, 18 March 2016 at 16:41:21 UTC, Rainer Schuetze wrote:On 15.03.2016 02:34, Jeremy DeHaan wrote:Thank you for the feedback. I'm still working on my proposal so nothing is set in stone just yet. I'm very interested in working on the GC for this GSoC, so what would you suggest be my main focus? It sounds like you already have a GC that is more or less what I was planning on implementing...[...]Being always way behind reading the forum these days, I'm a little late and have not read all the messages in this thread thoroughly. Here are some thoughts: [...]
Mar 18 2016
On 18.03.2016 22:04, Jeremy deHaan wrote:On Friday, 18 March 2016 at 16:41:21 UTC, Rainer Schuetze wrote:Well, there are a number of unfinished parts left: - as far as I understand the precise GC PR https://github.com/D-Programming-Language/druntime/pull/1022 is currently stalled because it doesn't yield better overall performance than the current GC in most situations. The reasoning is that it should be able compensate the additional time during allocation (saving pointer information) by improved scanning using that information to skip non-pointers. That didn't work out yet, though. - last time I checked generating RTInfo (which contains the pointer info) was a bit unreliable, see https://github.com/D-Programming-Language/dmd/pull/2480 and https://github.com/D-Programming-Language/dmd/pull/3958. - I only implemented the DATA and TLS section TypeInfo emission for the OMF backend, this needs to be ported to other platforms. Martin Nowak recently considered to just emit pointer locations. This would make the scanning function a bit simpler, but might need a bit more memory in the binary. Regarding the concurrent GC: I consider my implementation a prototype with rough edges and a lot of optimization opportunities. I'm not sure whether page protection is good enough to implement a generational GC, but it might still be possible to take advantage of the information that a page hasn't been written to. Judging from https://blog.golang.org/go15gc Go 1.5 seems to be using a similar GC (concurrent mark-and-sweep), though with proper write barriers. I guess there is a lot of stuff that can be used from their experience.On 15.03.2016 02:34, Jeremy DeHaan wrote:Thank you for the feedback. I'm still working on my proposal so nothing is set in stone just yet. I'm very interested in working on the GC for this GSoC, so what would you suggest be my main focus? It sounds like you already have a GC that is more or less what I was planning on implementing...[...]Being always way behind reading the forum these days, I'm a little late and have not read all the messages in this thread thoroughly. Here are some thoughts: [...]so what would you suggest be my main focus?Given that 32-bit applications are becoming legacy, and false pointers are not a common problem in 64-bit processes (they do happen eventually, though) I suspect that concurrency of the GC would make a much larger impact on the D language than preciseness. A good target should be to reduce stop-the-world-time to something acceptable for interactive programs, i.e. well below 50ms.
Mar 19 2016
Rainer Schuetze wrote:On 18.03.2016 22:04, Jeremy deHaan wrote:I would argue that precision is still important as it enables a compacting GC. While technically possible to implement a generational GC without compaction, it's generally not considered wise. And even though 64-bit does not suffer from false pointers as much as 32-bit does it still will, especially when used in the big-data, analytics, and scientific scenarios that D seems to have made real traction in. While false pointers are a problem for a simple command line app, and probably even most Vibe.D servers, there is a significant class of work being done in D today that would be directly effected by them. -- // Adam Wilson // quiet.dlang.devOn Friday, 18 March 2016 at 16:41:21 UTC, Rainer Schuetze wrote:Well, there are a number of unfinished parts left: - as far as I understand the precise GC PR https://github.com/D-Programming-Language/druntime/pull/1022 is currently stalled because it doesn't yield better overall performance than the current GC in most situations. The reasoning is that it should be able compensate the additional time during allocation (saving pointer information) by improved scanning using that information to skip non-pointers. That didn't work out yet, though. - last time I checked generating RTInfo (which contains the pointer info) was a bit unreliable, see https://github.com/D-Programming-Language/dmd/pull/2480 and https://github.com/D-Programming-Language/dmd/pull/3958. - I only implemented the DATA and TLS section TypeInfo emission for the OMF backend, this needs to be ported to other platforms. Martin Nowak recently considered to just emit pointer locations. This would make the scanning function a bit simpler, but might need a bit more memory in the binary. Regarding the concurrent GC: I consider my implementation a prototype with rough edges and a lot of optimization opportunities. I'm not sure whether page protection is good enough to implement a generational GC, but it might still be possible to take advantage of the information that a page hasn't been written to. Judging from https://blog.golang.org/go15gc Go 1.5 seems to be using a similar GC (concurrent mark-and-sweep), though with proper write barriers. I guess there is a lot of stuff that can be used from their experience. > so what would you suggest be my main focus? Given that 32-bit applications are becoming legacy, and false pointers are not a common problem in 64-bit processes (they do happen eventually, though) I suspect that concurrency of the GC would make a much larger impact on the D language than preciseness. A good target should be to reduce stop-the-world-time to something acceptable for interactive programs, i.e. well below 50ms.On 15.03.2016 02:34, Jeremy DeHaan wrote:Thank you for the feedback. I'm still working on my proposal so nothing is set in stone just yet. I'm very interested in working on the GC for this GSoC, so what would you suggest be my main focus? It sounds like you already have a GC that is more or less what I was planning on implementing...[...]Being always way behind reading the forum these days, I'm a little late and have not read all the messages in this thread thoroughly. Here are some thoughts: [...]
Mar 19 2016
On Tuesday, 15 March 2016 at 01:34:07 UTC, Jeremy DeHaan wrote:I haven't had power for a couple of days, but it looks like the discussion has gone along pretty ok. After reading everything, I think I'm inclined to agree with Adam and the main focus of my proposal will be a precise GC (or as precise as we can get). I'll definitely need some guidance, but I'll be learning everything I can about GC's and precision until the start of the project. [...]In case you didn't already know, Adam is now an official mentor so you are all set to go. Good luck!
Mar 18 2016
On 11-Mar-2016 18:13, Jeremy DeHaan wrote:Thank you all for the feedback. I think I might still need a little more feedback as to what the project should actually entail, but here's what it's looking like so far: Implement lock free allocation using std.experimental.allocator's freelists (SharedFreeList? It was the only thing in the documentation that specifically mentions lock free allocation) Implement a generational garbage collector Implement precise garbage collection (possibly piggybacking off of Rainer's work)Let's not forget another low-hanging fruit which is parallel marking. Regardless of the kind of collector we use parallel marking is a must have in our multicore world, and should be stright-forward to implement in D.Does anyone think that might be too much to take on or does it sound pretty good? I have other garbage collector things I'd like to explore, but I they should probably go in the "if time allows" category at this point. Jeremy-- Dmitry Olshansky
Mar 13 2016