www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Potential GSoC project - GC improvements

reply Jeremy DeHaan <dehaan.jeremiah gmail.com> writes:
Hey all,

I'm trying to think of good project ideas for this years GSoC, 
and one in particular I thought would be a great was working on 
and improving the GC. I'm not sure what the scope of this project 
would be like, but at the moment I am thinking writing a 
generational collector would be a good place to start.

I have more ideas, but I don't have a proposal yet. I just wanted 
some initial feedback on the idea, perhaps some advice for scope 
with the time frame in mind, and hopefully someone on the mentor 
list willing to take this on if it were to be accepted.

     Jeremy
Mar 09 2016
next sibling parent reply Joakim <dlang joakim.fea.st> writes:
On Wednesday, 9 March 2016 at 22:49:28 UTC, Jeremy DeHaan wrote:
 Hey all,

 I'm trying to think of good project ideas for this years GSoC, 
 and one in particular I thought would be a great was working on 
 and improving the GC. I'm not sure what the scope of this 
 project would be like, but at the moment I am thinking writing 
 a generational collector would be a good place to start.

 I have more ideas, but I don't have a proposal yet. I just 
 wanted some initial feedback on the idea, perhaps some advice 
 for scope with the time frame in mind, and hopefully someone on 
 the mentor list willing to take this on if it were to be 
 accepted.
I'd think working on the GC would make a great GSoC project, as it's needed and fairly self-contained while having large impact. Perhaps someone could build off of Sociomantic's concurrent GC (https://www.sociomantic.com/blog/2013/06/porting-cdgc-to-d2/), which I assume has been ported to D2, porting it to Windows or whatever else remains to be done and then adding to it. There were some good ideas in this thread from a couple years ago too: http://forum.dlang.org/thread/ucsamkqvrihfuspnxfsk forum.dlang.org
Mar 09 2016
parent reply thedeemon <dlang thedeemon.com> writes:
On Thursday, 10 March 2016 at 05:01:37 UTC, Joakim wrote:
 Perhaps someone could build off of Sociomantic's concurrent GC 
 (https://www.sociomantic.com/blog/2013/06/porting-cdgc-to-d2/), 
 which I assume has been ported to D2, porting it to Windows or 
 whatever else remains to be done and then adding to it.
Do you think Windows version of this cdgc is feasible at all? It relies on fork, something that gets emulated on Windows but too slowly, as I understand.
Mar 10 2016
parent Joakim <dlang joakim.fea.st> writes:
On Thursday, 10 March 2016 at 11:12:47 UTC, thedeemon wrote:
 On Thursday, 10 March 2016 at 05:01:37 UTC, Joakim wrote:
 Perhaps someone could build off of Sociomantic's concurrent GC 
 (https://www.sociomantic.com/blog/2013/06/porting-cdgc-to-d2/), which I assume
has been ported to D2, porting it to Windows or whatever else remains to be
done and then adding to it.
Do you think Windows version of this cdgc is feasible at all? It relies on fork, something that gets emulated on Windows but too slowly, as I understand.
I don't know enough about GCs and Windows to say. I'm assuming it can be done with some effort, but I don't know how much. Even if it's too much work, it'd be worth it just for POSIX systems, which make up almost 70% of all general-purpose computers running apps these days. :) If it's good enough, it could eventually become the default on there, while leaving the legacy GC for Windows.
Mar 10 2016
prev sibling next sibling parent Adam Wilson <flyboynw gmail.com> writes:
Jeremy DeHaan wrote:
 Hey all,

 I'm trying to think of good project ideas for this years GSoC, and one
 in particular I thought would be a great was working on and improving
 the GC. I'm not sure what the scope of this project would be like, but
 at the moment I am thinking writing a generational collector would be a
 good place to start.

 I have more ideas, but I don't have a proposal yet. I just wanted some
 initial feedback on the idea, perhaps some advice for scope with the
 time frame in mind, and hopefully someone on the mentor list willing to
 take this on if it were to be accepted.

      Jeremy
The follow-on projects enabled by a precise GC would be quite useful... I'd be interested in mentoring if nobody else with more experience is interested. -- // Adam Wilson // quiet.dlang.dev
Mar 09 2016
prev sibling parent reply NX <nightmarex1337 hotmail.com> writes:
On Wednesday, 9 March 2016 at 22:49:28 UTC, Jeremy DeHaan wrote:
 Hey all,

 I'm trying to think of good project ideas for this years GSoC, 
 and one in particular I thought would be a great was working on 
 and improving the GC. I'm not sure what the scope of this 
 project would be like, but at the moment I am thinking writing 
 a generational collector would be a good place to start.
http://www.infognition.com/blog/2014/the_real_problem_with_gc_in_d.html Good luck with generational GC... I think the best possible improvement for GC is making it lock-free. Currently, GC lock cause some serious performance penalties for multithreaded code when frequent allocations take place.
Mar 09 2016
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 3/9/16 10:40 PM, NX wrote:
 I think the best possible improvement for GC is making it lock-free.
 Currently, GC lock cause some serious performance penalties for
 multithreaded code when frequent allocations take place.
I agree. A first step would be easy to do with std.allocator's thread-local freelists. -- Andrei
Mar 10 2016
parent reply Jeremy DeHaan <dehaan.jeremiah gmail.com> writes:
On Thursday, 10 March 2016 at 15:24:48 UTC, Andrei Alexandrescu 
wrote:
 On 3/9/16 10:40 PM, NX wrote:
 I think the best possible improvement for GC is making it 
 lock-free.
 Currently, GC lock cause some serious performance penalties for
 multithreaded code when frequent allocations take place.
I agree. A first step would be easy to do with std.allocator's thread-local freelists. -- Andrei
I was looking into this, but I am slightly hesitant. Should the gc use something in std.experimental? Or should we think that's ok? I also know that there are some people that think we should avoid using Phobos in druntime.
Mar 10 2016
parent reply ZombineDev <petar.p.kirov gmail.com> writes:
On Thursday, 10 March 2016 at 16:46:59 UTC, Jeremy DeHaan wrote:
 On Thursday, 10 March 2016 at 15:24:48 UTC, Andrei Alexandrescu 
 wrote:
 On 3/9/16 10:40 PM, NX wrote:
 I think the best possible improvement for GC is making it 
 lock-free.
 Currently, GC lock cause some serious performance penalties 
 for
 multithreaded code when frequent allocations take place.
I agree. A first step would be easy to do with std.allocator's thread-local freelists. -- Andrei
I was looking into this, but I am slightly hesitant. Should the gc use something in std.experimental? Or should we think that's ok? I also know that there are some people that think we should avoid using Phobos in druntime.
There's no problem in using std.experimental.* internally, because if the API changes, the one who changes the API will also need to change all internal usages, otherwise his pull request won't compile. About using Phobos in Druntime - you can't directly import Phobos, since it's files are not present when Druntime is built. The solution is to copy the stuff you need to https://github.com/D-Programming-Language/druntime/tree/master/src/rt/util or some place like this.
Mar 10 2016
next sibling parent reply Daniel Kozak via Digitalmars-d <digitalmars-d puremagic.com> writes:
Dne 10.3.2016 v 18:15 ZombineDev via Digitalmars-d napsal(a):
 On Thursday, 10 March 2016 at 16:46:59 UTC, Jeremy DeHaan wrote:
 On Thursday, 10 March 2016 at 15:24:48 UTC, Andrei Alexandrescu wrote:
 On 3/9/16 10:40 PM, NX wrote:
 I think the best possible improvement for GC is making it lock-free.
 Currently, GC lock cause some serious performance penalties for
 multithreaded code when frequent allocations take place.
I agree. A first step would be easy to do with std.allocator's thread-local freelists. -- Andrei
I was looking into this, but I am slightly hesitant. Should the gc use something in std.experimental? Or should we think that's ok? I also know that there are some people that think we should avoid using Phobos in druntime.
There's no problem in using std.experimental.* internally, because if the API changes, the one who changes the API will also need to change all internal usages, otherwise his pull request won't compile. About using Phobos in Druntime - you can't directly import Phobos, since it's files are not present when Druntime is built.
But we allready need existing D compiler to build D frontend, so maybe we can use this full D compiler for building druntime in the future?
Mar 10 2016
parent reply ZombineDev <petar.p.kirov gmail.com> writes:
On Thursday, 10 March 2016 at 17:46:15 UTC, Daniel Kozak wrote:
 Dne 10.3.2016 v 18:15 ZombineDev via Digitalmars-d napsal(a):
 On Thursday, 10 March 2016 at 16:46:59 UTC, Jeremy DeHaan 
 wrote:
 On Thursday, 10 March 2016 at 15:24:48 UTC, Andrei 
 Alexandrescu wrote:
 On 3/9/16 10:40 PM, NX wrote:
 I think the best possible improvement for GC is making it 
 lock-free.
 Currently, GC lock cause some serious performance penalties 
 for
 multithreaded code when frequent allocations take place.
I agree. A first step would be easy to do with std.allocator's thread-local freelists. -- Andrei
I was looking into this, but I am slightly hesitant. Should the gc use something in std.experimental? Or should we think that's ok? I also know that there are some people that think we should avoid using Phobos in druntime.
There's no problem in using std.experimental.* internally, because if the API changes, the one who changes the API will also need to change all internal usages, otherwise his pull request won't compile. About using Phobos in Druntime - you can't directly import Phobos, since it's files are not present when Druntime is built.
But we allready need existing D compiler to build D frontend, so maybe we can use this full D compiler for building druntime in the future?
The build process looks something like this currently: (a bit simplified) 1. Download old dmd. 2. Build new dmd with 1. 3. Build druntime with 2. 4. Build phobos with 2 and link it with 3. 5. Package 2, 3 and 4 into a release zip You can't use 1 for building 3 (or 4), because 3 (or 4) may depend on a new feature only present in 2. Also you can't rely on 5 for building 3 (or 4) because you'll get a cyclic dependency.
Mar 10 2016
parent reply Daniel Kozak via Digitalmars-d <digitalmars-d puremagic.com> writes:
Dne 10.3.2016 v 19:39 ZombineDev via Digitalmars-d napsal(a):
 On Thursday, 10 March 2016 at 17:46:15 UTC, Daniel Kozak wrote:
 Dne 10.3.2016 v 18:15 ZombineDev via Digitalmars-d napsal(a):
 On Thursday, 10 March 2016 at 16:46:59 UTC, Jeremy DeHaan wrote:
 On Thursday, 10 March 2016 at 15:24:48 UTC, Andrei Alexandrescu wrote:
 On 3/9/16 10:40 PM, NX wrote:
 I think the best possible improvement for GC is making it lock-free.
 Currently, GC lock cause some serious performance penalties for
 multithreaded code when frequent allocations take place.
I agree. A first step would be easy to do with std.allocator's thread-local freelists. -- Andrei
I was looking into this, but I am slightly hesitant. Should the gc use something in std.experimental? Or should we think that's ok? I also know that there are some people that think we should avoid using Phobos in druntime.
There's no problem in using std.experimental.* internally, because if the API changes, the one who changes the API will also need to change all internal usages, otherwise his pull request won't compile. About using Phobos in Druntime - you can't directly import Phobos, since it's files are not present when Druntime is built.
But we allready need existing D compiler to build D frontend, so maybe we can use this full D compiler for building druntime in the future?
The build process looks something like this currently: (a bit simplified) 1. Download old dmd. 2. Build new dmd with 1. 3. Build druntime with 2. 4. Build phobos with 2 and link it with 3. 5. Package 2, 3 and 4 into a release zip You can't use 1 for building 3 (or 4), because 3 (or 4) may depend on a new feature only present in 2. Also you can't rely on 5 for building 3 (or 4) because you'll get a cyclic dependency.
I know how it works now :). But if D is stable enought 1 should be good enought to do 3 and 4 (yes you can't rely on features and fixes from 2). I think we should avoid to use latest features when developing dmd, druntime and phobos (maybe we should use only some subset or make some rules like new version must be compilable by 3 previous version of dmd)
Mar 10 2016
parent ZombineDev <petar.p.kirov gmail.com> writes:
On Thursday, 10 March 2016 at 18:49:28 UTC, Daniel Kozak wrote:
 Dne 10.3.2016 v 19:39 ZombineDev via Digitalmars-d napsal(a):
 On Thursday, 10 March 2016 at 17:46:15 UTC, Daniel Kozak wrote:
 Dne 10.3.2016 v 18:15 ZombineDev via Digitalmars-d napsal(a):
 On Thursday, 10 March 2016 at 16:46:59 UTC, Jeremy DeHaan 
 wrote:
 On Thursday, 10 March 2016 at 15:24:48 UTC, Andrei 
 Alexandrescu wrote:
 On 3/9/16 10:40 PM, NX wrote:
 I think the best possible improvement for GC is making it 
 lock-free.
 Currently, GC lock cause some serious performance 
 penalties for
 multithreaded code when frequent allocations take place.
I agree. A first step would be easy to do with std.allocator's thread-local freelists. -- Andrei
I was looking into this, but I am slightly hesitant. Should the gc use something in std.experimental? Or should we think that's ok? I also know that there are some people that think we should avoid using Phobos in druntime.
There's no problem in using std.experimental.* internally, because if the API changes, the one who changes the API will also need to change all internal usages, otherwise his pull request won't compile. About using Phobos in Druntime - you can't directly import Phobos, since it's files are not present when Druntime is built.
But we allready need existing D compiler to build D frontend, so maybe we can use this full D compiler for building druntime in the future?
The build process looks something like this currently: (a bit simplified) 1. Download old dmd. 2. Build new dmd with 1. 3. Build druntime with 2. 4. Build phobos with 2 and link it with 3. 5. Package 2, 3 and 4 into a release zip You can't use 1 for building 3 (or 4), because 3 (or 4) may depend on a new feature only present in 2. Also you can't rely on 5 for building 3 (or 4) because you'll get a cyclic dependency.
I know how it works now :). But if D is stable enought 1 should be good enought to do 3 and 4 (yes you can't rely on features and fixes from 2). I think we should avoid to use latest features when developing dmd, druntime and phobos (maybe we should use only some subset or make some rules like new version must be compilable by 3 previous version of dmd)
Well for 4. maybe, but after a couple of years... Most of the time Phobos can't be compiled with an older version of DMD not because it relies on new features, but because there are bugs in the older version of DMD. As for 3. - druntime is very tightly coupled with the compiler. Quite often changes in the compiler rely on changes in druntime - e.g. the new DWARF exceptions, C++ and Objective-C support, better code coverage and GC profiling, things in object.d, built-in AAs and arrays, linking with MSVC 2015 libc, shared libraries, TLS, etc. On the other hand, I think that using Phobos in DMDFE is a good idea and is far more likely to happen.
Mar 10 2016
prev sibling parent reply Jeremy DeHaan <dehaan.jeremiah gmail.com> writes:
Thank you all for the feedback.

I think I might still need a little more feedback as to what the 
project should actually entail, but here's what it's looking like 
so far:

Implement lock free allocation using std.experimental.allocator's 
freelists (SharedFreeList? It was the only thing in the 
documentation that specifically mentions lock free allocation)

Implement a generational garbage collector

Implement precise garbage collection (possibly piggybacking off 
of Rainer's work)


Does anyone think that might be too much to take on or does it 
sound pretty good? I have other garbage collector things I'd like 
to explore, but I they should probably go in the "if time allows" 
category at this point.

    Jeremy
Mar 11 2016
next sibling parent Rikki Cattermole <alphaglosined gmail.com> writes:
On 12/03/16 4:13 AM, Jeremy DeHaan wrote:
 Thank you all for the feedback.

 I think I might still need a little more feedback as to what the project
 should actually entail, but here's what it's looking like so far:

 Implement lock free allocation using std.experimental.allocator's
 freelists (SharedFreeList? It was the only thing in the documentation
 that specifically mentions lock free allocation)

 Implement a generational garbage collector

 Implement precise garbage collection (possibly piggybacking off of
 Rainer's work)


 Does anyone think that might be too much to take on or does it sound
 pretty good? I have other garbage collector things I'd like to explore,
 but I they should probably go in the "if time allows" category at this
 point.

     Jeremy
Ugh I would say that is a lot. So here is what I would do, have two lists of goals. One where you would happy if you got done the other if you are able to do it, you will. That way your prioritize what you want to get done and not the others.
Mar 11 2016
prev sibling next sibling parent reply Adam Wilson <flyboynw gmail.com> writes:
Jeremy DeHaan wrote:
 Thank you all for the feedback.

 I think I might still need a little more feedback as to what the project
 should actually entail, but here's what it's looking like so far:

 Implement lock free allocation using std.experimental.allocator's
 freelists (SharedFreeList? It was the only thing in the documentation
 that specifically mentions lock free allocation)

 Implement a generational garbage collector

 Implement precise garbage collection (possibly piggybacking off of
 Rainer's work)


 Does anyone think that might be too much to take on or does it sound
 pretty good? I have other garbage collector things I'd like to explore,
 but I they should probably go in the "if time allows" category at this
 point.

     Jeremy
If I may make a suggestion. The lock free work is unlikely to require the entirety of GSoC. And the precise GC is the next most important thing on your list and will have the biggest impact on GC performance. Once the GC is fully precise we can implement a fully compacting GC, which improves the usefulness of generational collection. Additionally, precision allows a significant amount of work to be done in improving the performance of the GC in multi-threaded scenarios. It should be quite possible to avoid needing fork() or anything like it altogether. I know that the .NET GC doesn't need to use anything like it. Also, I would strongly recommend getting this book and reading it cover to cover before starting: http://www.amazon.com/gp/product/1420082795/ref=pd_lpo_sbs_dp_ss_1?pf_rd_p=1944687562&pf_rd_s=lpo-top-stripe-1&pf_rd_t=201&pf_rd_i=0471941484&pf_rd_m=ATVPDKIKX0DER&pf_rd_r=0QD9X3E5QATSBCBT6BMM I think Rainer's Precise GC only dealt with Heap objects. What needs work is Register and Stack scanning. Expanding on Rainer's existing Precise GC work is the right idea, but Register and Stack scanning is a very big project in it's own right.I suspect it will take up the remainder of your GSoC time. :) -- // Adam Wilson // quiet.dlang.dev
Mar 12 2016
next sibling parent Chris Wright <dhasenan gmail.com> writes:
On Sat, 12 Mar 2016 00:50:06 -0800, Adam Wilson wrote:
 If I may make a suggestion. The lock free work is unlikely to require
 the entirety of GSoC. And the precise GC is the next most important
 thing on your list and will have the biggest impact on GC performance.
 
 Once the GC is fully precise we can implement a fully compacting GC,
Unions mean we can't be entirely precise, but we can get pretty close. Stack maps and type pointer maps are entirely separate efforts, and type pointer maps will be easier to implement, so the first pass will probably only include type pointer maps. Switching to a compacting GC is risky. It's a transparent change if you're only working with D code, but if you interface to other languages, it breaks things horribly.
 which improves the usefulness of generational collection. Additionally,
 precision allows a significant amount of work to be done in improving
 the performance of the GC in multi-threaded scenarios.
Primarily because less memory needs to be scanned on average, so less work needs to be done and GC pauses are shorter. There's nothing specific to multithreading there.
 It should be
 quite possible to avoid needing fork() or anything like it altogether.
fork() is used to avoid pausing the program at all during the mark phase.
 I know that the .NET GC doesn't need to use anything like it.
The .NET GC stops all managed threads during a collection. During a nursery or gen 1 collection, on average it's probably fast enough that forking wouldn't give that much benefit. During a full collection, if your application is latency-sensitive, a forking strategy would be better. But .NET can't do that because Windows doesn't have a fork() system call, and it's expensive to emulate.
Mar 12 2016
prev sibling parent reply Jeremy DeHaan <dehaan.jeremiah gmail.com> writes:
On Saturday, 12 March 2016 at 08:50:06 UTC, Adam Wilson wrote:
 If I may make a suggestion. The lock free work is unlikely to 
 require the entirety of GSoC. And the precise GC is the next 
 most important thing on your list and will have the biggest 
 impact on GC performance.
Rainer has two different precise GC's in pull requests right now and both are slower than the current one unless there are false pointers. I would expect anything I come up with to largely act the same. The reason I keep pushing for a generational garbage collector is because I think it would be where we would see the most benefit in terms of general performance.
 Once the GC is fully precise we can implement a fully 
 compacting GC, which improves the usefulness of generational 
 collection. Additionally, precision allows a significant amount 
 of work to be done in improving the performance of the GC in 
 multi-threaded scenarios. It should be quite possible to avoid 
 needing fork() or anything like it altogether. I know that the 
 .NET GC doesn't need to use anything like it.
A compacting GC could be built on top of a generational GC without much difficulty I would think, if we wanted to go that route. The compaction would just happen as part of a collection cycle when things are moved into the next generation. I have concerns about doing any compaction though, mostly because D can have both references and pointers to objects, and I don't know how we would want to go about updating pointers. Especially since pointers to D objects can exists in C and C++ code. Another reason I want to work on a generational GC is because this can also lead into a concurrent GC without trying to emulate fork() on windows. The .Net GC has 3 generations with the last one having its collections running concurrently because it is unlikely to affect anything else going on. They don't bother running the other generations concurrently because their collections are really short. We could do something similar. Perhaps someone more intimate with GC's than I am can speak up, but I think that a generational GC would be the best use of time in relation to performance gains. Other things can then be implemented on top of it later.
 Also, I would strongly recommend getting this book and reading 
 it cover to cover before starting:
 http://www.amazon.com/gp/product/1420082795/ref=pd_lpo_sbs_dp_ss_1?pf_rd_p=1944687562&pf_rd_s=lpo-top-stripe-1&pf_rd_t=201&pf_rd_i=0471941484&pf_rd_m=ATVPDKIKX0DER&pf_rd_r=0QD9X3E5QATSBCBT6BMM
Thank you for the link to the book. I was planning on this one http://www.amazon.com/gp/product/0471941484/ , but maybe I will buy them both.
Mar 12 2016
parent reply Adam Wilson <flyboynw gmail.com> writes:
Jeremy DeHaan wrote:
 On Saturday, 12 March 2016 at 08:50:06 UTC, Adam Wilson wrote:
 If I may make a suggestion. The lock free work is unlikely to require
 the entirety of GSoC. And the precise GC is the next most important
 thing on your list and will have the biggest impact on GC performance.
Rainer has two different precise GC's in pull requests right now and both are slower than the current one unless there are false pointers. I would expect anything I come up with to largely act the same. The reason I keep pushing for a generational garbage collector is because I think it would be where we would see the most benefit in terms of general performance.
I would contend that it's not quite that simple. Of course the precise GC is slower, it's scanning more stuff. The current conservative GC by definition can't figure out large chunks of memory so it won't even bother scanning them. What it does to those things it can't scan. And what it can't scan it has to pin, which means the memory can't be moved. You can't build a fully generational GC until you can move everything. They talk about this problem in the books. However, once you have a precise GC, a fully moving GC becomes possible and only then do the performance benefits of a generational GC become realized. I strongly suspect that a partially moving GC will perform worse than the precise GC. And I know for a fact that it will suffer enormously from fragmentation and Out-of-Memory issues. Precision is the basis from which all higher order GC functions flow.
 Once the GC is fully precise we can implement a fully compacting GC,
 which improves the usefulness of generational collection.
 Additionally, precision allows a significant amount of work to be done
 in improving the performance of the GC in multi-threaded scenarios. It
 should be quite possible to avoid needing fork() or anything like it
 altogether. I know that the .NET GC doesn't need to use anything like it.
A compacting GC could be built on top of a generational GC without much difficulty I would think, if we wanted to go that route. The compaction would just happen as part of a collection cycle when things are moved into the next generation. I have concerns about doing any compaction though, mostly because D can have both references and pointers to objects, and I don't know how we would want to go about updating pointers. Especially since pointers to D objects can exists in C and C++ code.
Erm, a generational GC is, in practice, a moving GC as it must move the blobs around from the various heaps. The commonly used method is Stop-and-Copy. As for the C/C++ code problem various solutions have been proposed here. For one, D knows the difference between the two, so pinning is possible there. Or a separate heap space can be used. AFIAK each OS allows you to specify multiple heaps, so you could specify an unmanaged heap to go along with the managed heaps. IIRC, C++/CLI does something similar on Windows. This would be the preferred solution IMO. (Look into HeapCreate on Windows for example.)
 Another reason I want to work on a generational GC is because this can
 also lead into a concurrent GC without trying to emulate fork() on
 windows. The .Net GC has 3 generations with the last one having its
 collections running concurrently because it is unlikely to affect
 anything else going on. They don't bother running the other generations
 concurrently because their collections are really short. We could do
 something similar.
A concurrent GC without precision is mostly useless because the GC roots are primarily derived from the stack which the thread was spawned and the stack roots are not scanned in a conservative GC. Additionally, the Async/Await pattern is not practical without a precise GC.
 Perhaps someone more intimate with GC's than I am can speak up, but I
 think that a generational GC would be the best use of time in relation
 to performance gains. Other things can then be implemented on top of it
 later.
The problem is that performance is the most obvious problem with the GC, but not the most important problem, or most useful. Precision enables a fully moving, fully generational, fully concurrent, GC. It also enables other major features that have nothing to do with GC.
 Also, I would strongly recommend getting this book and reading it
 cover to cover before starting:
 http://www.amazon.com/gp/product/1420082795/ref=pd_lpo_sbs_dp_ss_1?pf_rd_p=1944687562&pf_rd_s=lpo-top-stripe-1&pf_rd_t=201&pf_rd_i=0471941484&pf_rd_m=ATVPDKIKX0DER&pf_rd_r=0QD9X3E5QATSBCBT6BMM
Thank you for the link to the book. I was planning on this one http://www.amazon.com/gp/product/0471941484/ , but maybe I will buy them both.
Obviously you can do whatever you want. :) And I'll admit that precision isn't sexy, and it won't improve performance right away, but it opens the door way to further work that will provide those things. It's hard and thankless work, but you'll be better setup for GSoC 2017 for adding generational and concurrency support. My opinion is that we need to focus on improving the core of the GC so that it can better perform the high-order GC functions. Yes, it will make the GC slower, but it's already bad so I doubt people will notice much. We've discussed (and by discussed I mean thermonuclear flame-wars) the GC before numerous times before at great length on this forum, so I am glad to see that this thread hasn't turned into that, yet. -- // Adam Wilson // quiet.dlang.dev
Mar 12 2016
parent reply Chris Wright <dhasenan gmail.com> writes:
On Sat, 12 Mar 2016 13:23:35 -0800, Adam Wilson wrote:

To start off, let's talk terminology. You seem to be using nonstandard 
terminology and possibly misunderstanding standard terminology.

A GC scan is the mark phase of a mark/sweep collector (and specifically 
the part where the GC examines roots and allocated memory, if that 
matters). The GC searches for pointers to GC memory. Simple GCs only need 
to record whether there is a pointer to a given allocation, not where 
those pointers are coming from.

A conservative GC is one that will encounter some things that aren't 
pointers but must be scanned as if they were pointers. In D, for 
instance, a union between a pointer and a size_t forces collectors to be 
conservative.

A false pointer is something that a conservative GC must treat as a 
pointer and appears to point to an object allocated by that GC. For 
instance, unions in D allow us to create false pointers.

A precise GC is a GC that never mistakes a non-pointer for a pointer.

A moving GC is a GC that can relocate objects.

A generational GC is one that can perform a collection on part of the 
heap without dealing with the full heap. Those parts tend to be organized 
by how long an object has survived. This does not require a precise or 
moving GC, though it works better with a precise, moving GC.

A write barrier is a way to record which sections of memory have been 
altered since the last GC collection. This lets the GC scan only the 
changed data -- useful to any GC, especially useful to generational ones.

A "partially moving" GC does not exist, as far as I know.

 Jeremy DeHaan wrote:
 On Saturday, 12 March 2016 at 08:50:06 UTC, Adam Wilson wrote:
 If I may make a suggestion. The lock free work is unlikely to require
 the entirety of GSoC. And the precise GC is the next most important
 thing on your list and will have the biggest impact on GC performance.
Rainer has two different precise GC's in pull requests right now and both are slower than the current one unless there are false pointers. I would expect anything I come up with to largely act the same. The reason I keep pushing for a generational garbage collector is because I think it would be where we would see the most benefit in terms of general performance.
I would contend that it's not quite that simple. Of course the precise GC is slower, it's scanning more stuff.
Precise GCs are slower because they must reference some data telling them which items on the stack and in allocations represent pointers and which do not. This data takes up cache space, and having less cache for memory being scanned slows things down. Correlating two pieces of data takes extra time, too. The hope with a precise GC is to avoid false pointers. A false pointer to an object that holds references to many other objects can be costly, both in GC time and in excess memory usage. In D, we have a tiny bit of type information surfaced to the GC to reduce the incidence of false pointers. We could additionally take advantage of the fact that numbers tend to be small and people tend to use 4 to 8 byte numbers: we could arrange our address space to avoid ranges where false pointers are likely. But that's probably unnecessary.
 The current conservative GC by
 definition can't figure out large chunks of memory so it won't even
 bother scanning them.
It must scan anything that might be a pointer. A precise GC reduces the number of things that might be pointers but aren't to zero. Therefore a precise GC scans less. If a conservative GC didn't scan something that might be a pointer, it would sometimes free an object that still had live references to it. This would result in memory corruption or segmentation faults.
 What it does to those things it can't scan. And
 what it can't scan it has to pin, which means the memory can't be moved.
If a moving GC cannot precisely identify all pointers to an object, it is not allowed to move that object. If it can, it is allowed to move that object. A conservative GC can be a moving GC. It must classify things as "definitely pointer", "maybe pointer", and "not pointer". It cannot move an object with a "maybe pointer" pointing to it, but it can move an object with several "definitely pointers" pointing to it. In D, thanks to unions, we can never create a fully precise collector, but we can create a moving collector. Objects pointed to from a union can't be moved, but few objects will be pointed to from a union.
 You can't build a fully generational GC until you can move everything.
You can build a more efficient generational GC if you can move objects. However, it's not impossible to build a non-moving generational GC. Instead of having large blocks of address space for each generation, you would have each Pool indicate its generation. Then you have to find the Pool for a pointer and check that flag. That's comparatively slow, but it'd still be faster to do a Gen 0 scan that way than to do a full scan with D's current collector (assuming you implement write barriers).
 They talk about this problem in the books. However, once you have a
 precise GC, a fully moving GC becomes possible and only then do the
 performance benefits of a generational GC become realized.
The performance benefit of a moving and generational GC over a non-moving generational GC is that you can have a contiguous section of address space for each generation, allocated densely. You get this with less overhead. However, a moving GC can be a pessimization for user code. If you allocate several objects in sequence, with many GCs you'll get memory allocated for them that is contiguous or nearly contiguous. For small objects, this is probably going to be on the same page of memory. But a moving GC is free to move them to different pages. So you need to exercise caution.
 I have concerns about doing any compaction
 though, mostly because D can have both references and pointers to
 objects, and I don't know how we would want to go about updating
 pointers. Especially since pointers to D objects can exists in C and
 C++
 code.
Erm, a generational GC is, in practice, a moving GC as it must move the blobs around from the various heaps.
The Boehm collector is generational, conservative, and not moving. You probably conflate "moving" and "generational" because of the JVM.
 As for the C/C++ code problem various solutions have been proposed here.
 For one, D knows the difference between the two, so pinning is possible
 there.
There's nothing in the type system that says whether you can pass a variable to a C/C++ function. So, no, D *doesn't* know the difference between them. And there's no GC.pin function, so people aren't currently telling your GC when something's accessible to C/C++ code. That means a moving collector in D will produce segmentation faults and memory corruption in existing, currently working code.
 AFIAK each OS allows you to specify multiple heaps
You get one address space. You get to use mmap to ask for memory at a given address. There's no heap_t data structure to provide to malloc or mmap. You might have been confused by literature referring to multiple heaps. That's just a single allocator deciding to divide its address space logically. There's no OS-level enforcement.
 A concurrent GC without precision is mostly useless
The Boehm collector is not precise. It is concurrent. Its concurrency reduces the amount of time a GC collection takes. This is mostly orthogonal to the benefits of a precise collector in reducing the amount of memory a process uses.
 because the GC roots
 are primarily derived from the stack which the thread was spawned and
 the stack roots are not scanned in a conservative GC.
They *are* scanned in a conservative GC. If you don't scan memory that might contain pointers (such as the stack), your GC will inevitably cause segmentation faults and memory corruption. D's GC is conservative and does not cause segmentation faults or memory corruption, therefore it must scan the stack.
Mar 12 2016
parent reply Adam Wilson <flyboynw gmail.com> writes:
Chris Wright wrote:
 On Sat, 12 Mar 2016 13:23:35 -0800, Adam Wilson wrote:

 To start off, let's talk terminology. You seem to be using nonstandard
 terminology and possibly misunderstanding standard terminology.

 A GC scan is the mark phase of a mark/sweep collector (and specifically
 the part where the GC examines roots and allocated memory, if that
 matters). The GC searches for pointers to GC memory. Simple GCs only need
 to record whether there is a pointer to a given allocation, not where
 those pointers are coming from.

 A conservative GC is one that will encounter some things that aren't
 pointers but must be scanned as if they were pointers. In D, for
 instance, a union between a pointer and a size_t forces collectors to be
 conservative.

 A false pointer is something that a conservative GC must treat as a
 pointer and appears to point to an object allocated by that GC. For
 instance, unions in D allow us to create false pointers.

 A precise GC is a GC that never mistakes a non-pointer for a pointer.

 A moving GC is a GC that can relocate objects.

 A generational GC is one that can perform a collection on part of the
 heap without dealing with the full heap. Those parts tend to be organized
 by how long an object has survived. This does not require a precise or
 moving GC, though it works better with a precise, moving GC.
I didn't intend to confuse, but I was not writing to neophyte's, and I used linguistic shortcuts.
 A write barrier is a way to record which sections of memory have been
 altered since the last GC collection. This lets the GC scan only the
 changed data -- useful to any GC, especially useful to generational ones.

 A "partially moving" GC does not exist, as far as I know.
Yep, it's a Bad Idea.
 Jeremy DeHaan wrote:
 On Saturday, 12 March 2016 at 08:50:06 UTC, Adam Wilson wrote:
 If I may make a suggestion. The lock free work is unlikely to require
 the entirety of GSoC. And the precise GC is the next most important
 thing on your list and will have the biggest impact on GC performance.
Rainer has two different precise GC's in pull requests right now and both are slower than the current one unless there are false pointers. I would expect anything I come up with to largely act the same. The reason I keep pushing for a generational garbage collector is because I think it would be where we would see the most benefit in terms of general performance.
I would contend that it's not quite that simple. Of course the precise GC is slower, it's scanning more stuff.
Precise GCs are slower because they must reference some data telling them which items on the stack and in allocations represent pointers and which do not. This data takes up cache space, and having less cache for memory being scanned slows things down. Correlating two pieces of data takes extra time, too.
Yep the data is usually held with the object in a header. Hence the "scanning more stuff".
 The hope with a precise GC is to avoid false pointers. A false pointer to
 an object that holds references to many other objects can be costly, both
 in GC time and in excess memory usage.

 In D, we have a tiny bit of type information surfaced to the GC to reduce
 the incidence of false pointers. We could additionally take advantage of
 the fact that numbers tend to be small and people tend to use 4 to 8 byte
 numbers: we could arrange our address space to avoid ranges where false
 pointers are likely. But that's probably unnecessary.
We are not limited by what we currently have. Most of Rainer's work involved surfacing more data to the GC.
 The current conservative GC by
 definition can't figure out large chunks of memory so it won't even
 bother scanning them.
It must scan anything that might be a pointer. A precise GC reduces the number of things that might be pointers but aren't to zero. Therefore a precise GC scans less.
Yep, precise GC's aren't totally perf-negative.
 If a conservative GC didn't scan something that might be a pointer, it
 would sometimes free an object that still had live references to it. This
 would result in memory corruption or segmentation faults.

 What it does to those things it can't scan. And
 what it can't scan it has to pin, which means the memory can't be moved.
If a moving GC cannot precisely identify all pointers to an object, it is not allowed to move that object. If it can, it is allowed to move that object. A conservative GC can be a moving GC. It must classify things as "definitely pointer", "maybe pointer", and "not pointer". It cannot move an object with a "maybe pointer" pointing to it, but it can move an object with several "definitely pointers" pointing to it. In D, thanks to unions, we can never create a fully precise collector, but we can create a moving collector. Objects pointed to from a union can't be moved, but few objects will be pointed to from a union.
Actually, we probably could with modifications to the compiler. A little metadata goes a long way. And I know that Walter would accept that PR. I don't know for sure if it can be done, but I've seen precious little evidence one way or the other. Let's investigate before making absolute statements about what we can do. I know from discussions with Walter that the compiler's AST contains a wealth of information, and that more can be added.
 You can't build a fully generational GC until you can move everything.
You can build a more efficient generational GC if you can move objects. However, it's not impossible to build a non-moving generational GC. Instead of having large blocks of address space for each generation, you would have each Pool indicate its generation. Then you have to find the Pool for a pointer and check that flag. That's comparatively slow, but it'd still be faster to do a Gen 0 scan that way than to do a full scan with D's current collector (assuming you implement write barriers).
My apologies I didn't intend to state that it was impossible. I was attempting to emphatically question the wisdom of doing so, particular when the potential performance benefits are dubious at best. My point is that we should do this right, if we keep blindly pursuing performance without taking a considered approach to achieving it, we'll never get there. The GC will just become an ever larger pile of hacks that increase performance in specific cases.
 They talk about this problem in the books. However, once you have a
 precise GC, a fully moving GC becomes possible and only then do the
 performance benefits of a generational GC become realized.
The performance benefit of a moving and generational GC over a non-moving generational GC is that you can have a contiguous section of address space for each generation, allocated densely. You get this with less overhead.
That's actually pretty important. Precision enables moving, moving has clear and enumerable performance benefits. QED.
 However, a moving GC can be a pessimization for user code. If you
 allocate several objects in sequence, with many GCs you'll get memory
 allocated for them that is contiguous or nearly contiguous. For small
 objects, this is probably going to be on the same page of memory. But a
 moving GC is free to move them to different pages. So you need to
 exercise caution.
Yes ... and this is also dealt with in the books, including basic algorithms to help combat these problems. Additionally, as evidenced by the fact that every major GC (JVM, CLR, et al.) system implements a moving GC, I'd argue that the benefits greatly outweigh any potential risks. And we have pull requests to further optimize the algorithms. We don't have a system at all yet, let's build it first, then we can work on optimizing it.
 I have concerns about doing any compaction
 though, mostly because D can have both references and pointers to
 objects, and I don't know how we would want to go about updating
 pointers. Especially since pointers to D objects can exists in C and
 C++
 code.
Erm, a generational GC is, in practice, a moving GC as it must move the blobs around from the various heaps.
The Boehm collector is generational, conservative, and not moving. You probably conflate "moving" and "generational" because of the JVM.
Ok, yes, but lets be honest, it's what we actually want, and more importantly, expect. Precision makes it happen, efficiently. Even the C++ community doesn't take the Boehm GC very seriously. We have an opportunity to prove the C-style language can have a performant GC.
 As for the C/C++ code problem various solutions have been proposed here.
 For one, D knows the difference between the two, so pinning is possible
 there.
There's nothing in the type system that says whether you can pass a variable to a C/C++ function. So, no, D *doesn't* know the difference between them. And there's no GC.pin function, so people aren't currently telling your GC when something's accessible to C/C++ code.
D does differentiate between a GC reference and pointer. Pin the pointers. Yes, you can create get pointer from a reference, and doing so is a pinning operation. We'd have to add a GC.pin function, so long as the compiler emits it for us. :)
 That means a moving collector in D will produce segmentation faults and
 memory corruption in existing, currently working code.

 AFIAK each OS allows you to specify multiple heaps
You get one address space. You get to use mmap to ask for memory at a given address. There's no heap_t data structure to provide to malloc or mmap. You might have been confused by literature referring to multiple heaps. That's just a single allocator deciding to divide its address space logically. There's no OS-level enforcement.
Windows has that concept via HeapCreate shown here: https://msdn.microsoft.com/en-us/library/windows/desktop/aa366599(v=vs.85).aspx I'd be surprised if it's not available on other OS's, and yes, it is OS enforced. It might not work exactly how you like it, but that's a far cry from "not provided".
 A concurrent GC without precision is mostly useless
The Boehm collector is not precise. It is concurrent. Its concurrency reduces the amount of time a GC collection takes. This is mostly orthogonal to the benefits of a precise collector in reducing the amount of memory a process uses.
 because the GC roots
 are primarily derived from the stack which the thread was spawned and
 the stack roots are not scanned in a conservative GC.
They *are* scanned in a conservative GC. If you don't scan memory that might contain pointers (such as the stack), your GC will inevitably cause segmentation faults and memory corruption. D's GC is conservative and does not cause segmentation faults or memory corruption, therefore it must scan the stack.
I may have been confusing any precise GC with Rainer's precise GC which only did precise Heap scans. Apologies for the confusion. I will admit that I've done better work with my wording than what is in my last message. However, my fundamental point is sound, precision has benefits across all GC workloads and enables scenarios beyond the GC itself. Extending the compiler can work around many of the issues with false pointers. And let's engage the elephant in the room. One of Rainer's main learning's was the precision permeates the compiler and GC, you have to deal with it everywhere. So you end up rebuilding the GC from the ground up when you add it. What you are proposing is building a conservative-generational GC that will have to be jettisoned and rewritten *again* once precision is added for a performance boost that, by your own admission, may not actually happen. How is this a good use of time? My statement stands. Precision is the basis from which all high-order GC functions flow. Yes, you can work around not having it, but such workarounds tend to be just that. More importantly you are inevitably going to end up doing a half-implementation of precision anyways because each of the high-order functions need parts of that information. I know precision isn't fun. But it's extremely important work that needs to get done. And once we have it, the world of GC algorithms opens before us. Until then we are intentionally limiting ourselves. One of the biggest complaints about D is the GC, fine, so instead of doing yet another hack-job on something that may not actually improve GC performance, let's do it right and lay the groundwork for a modern GC that allows us to really compete with JVM/CLR. -- // Adam Wilson // quiet.dlang.dev
Mar 13 2016
parent reply Chris Wright <dhasenan gmail.com> writes:
On Sun, 13 Mar 2016 12:43:37 -0700, Adam Wilson wrote:
 A "partially moving" GC does not exist, as far as I know.
Yep, it's a Bad Idea.
It's not a standard term. Google's only seeing about four references to the term, none of them authoritative or definitive. Since it's non- standard, it would behoove you to define it. I think you mean a GC in which some objects are pinned (due to limitations in the GC rather than the user explicitly requesting that they be pinned). Regardless, you ought to define the term and explain why it's a bad idea instead of providing a bare assertion. It doesn't make any difference to the operation of the GC *why* something's pinned; what matters is how much is pinned. So, for instance, you could point to a conservative moving GC, select one circumstance in which it can't move objects, tell us how much memory that ends up pinning on average, and then you'll have an argument.
 In D, we have a tiny bit of type information surfaced to the GC to
 reduce the incidence of false pointers. We could additionally take
 advantage of the fact that numbers tend to be small and people tend to
 use 4 to 8 byte numbers: we could arrange our address space to avoid
 ranges where false pointers are likely. But that's probably
 unnecessary.
We are not limited by what we currently have. Most of Rainer's work involved surfacing more data to the GC.
Sure. My point is that we don't *need* a lot of type information in the GC to get reasonable advantages. A little type information goes a long way, and a little caution in placing objects in the heap can help too.
 In D, thanks to unions, we can never create a fully precise collector,
 but we can create a moving collector. Objects pointed to from a union
 can't be moved, but few objects will be pointed to from a union.
Actually, we probably could with modifications to the compiler.
No, we couldn't. I've been assuming that we can make whatever modifications we want to the compiler for free. The compiler has static type information. This explicitly contains ambiguities. When it's based solely on data available at compile time, the compiler can resolve the ambiguities -- and you could have written the code without using a union. But in the majority of cases, people use a union because they must. That means which values are set depends on runtime data, and the compiler can't tell you which fields will be set. We could provide a runtime function that is invoked when you set a value inside a union. That won't work very well. Consider: struct A { long a; long b; } struct B { double a; void* p; } union AB { A a; B b; } extern void foo(ref A a); AB ab; ab.b.p = GC.malloc(512); foo(ab.a); Was that pointer overwritten? No clue! The compiler can't tell. When compiling the definition of foo(), it didn't have any idea whether the struct was a union member. Now we have to intercept *every* indirect write to *any* value with a runtime call, even if you never use a union in your program -- Oh look, now Python's beating us by an order of magnitude in benchmarks. What do we get for it? Well, the 1% of programs that use this pattern will allow the GC to move an additional 0.1% of their objects. Or we leave the GC conservative in this one case. We can detect this case pretty easily and treat apparent pointers as pinned. Everything will work, and the GC will be better at releasing memory than it currently is.
Mar 13 2016
parent reply Adam Wilson <flyboynw gmail.com> writes:
Chris Wright wrote:
 On Sun, 13 Mar 2016 12:43:37 -0700, Adam Wilson wrote:
 A "partially moving" GC does not exist, as far as I know.
Yep, it's a Bad Idea.
It's not a standard term. Google's only seeing about four references to the term, none of them authoritative or definitive. Since it's non- standard, it would behoove you to define it. I think you mean a GC in which some objects are pinned (due to limitations in the GC rather than the user explicitly requesting that they be pinned). Regardless, you ought to define the term and explain why it's a bad idea instead of providing a bare assertion. It doesn't make any difference to the operation of the GC *why* something's pinned; what matters is how much is pinned. So, for instance, you could point to a conservative moving GC, select one circumstance in which it can't move objects, tell us how much memory that ends up pinning on average, and then you'll have an argument.
Is there an implementation of a conservative moving (compacting) GC out there? I'm not aware of one, but there are a lot of GC's out there. Boehm isn't.
 In D, we have a tiny bit of type information surfaced to the GC to
 reduce the incidence of false pointers. We could additionally take
 advantage of the fact that numbers tend to be small and people tend to
 use 4 to 8 byte numbers: we could arrange our address space to avoid
 ranges where false pointers are likely. But that's probably
 unnecessary.
We are not limited by what we currently have. Most of Rainer's work involved surfacing more data to the GC.
Sure. My point is that we don't *need* a lot of type information in the GC to get reasonable advantages. A little type information goes a long way, and a little caution in placing objects in the heap can help too.
 In D, thanks to unions, we can never create a fully precise collector,
 but we can create a moving collector. Objects pointed to from a union
 can't be moved, but few objects will be pointed to from a union.
Actually, we probably could with modifications to the compiler.
No, we couldn't. I've been assuming that we can make whatever modifications we want to the compiler for free. The compiler has static type information. This explicitly contains ambiguities. When it's based solely on data available at compile time, the compiler can resolve the ambiguities -- and you could have written the code without using a union. But in the majority of cases, people use a union because they must. That means which values are set depends on runtime data, and the compiler can't tell you which fields will be set. We could provide a runtime function that is invoked when you set a value inside a union. That won't work very well. Consider: struct A { long a; long b; } struct B { double a; void* p; } union AB { A a; B b; } extern void foo(ref A a); AB ab; ab.b.p = GC.malloc(512); foo(ab.a); Was that pointer overwritten? No clue! The compiler can't tell. When compiling the definition of foo(), it didn't have any idea whether the struct was a union member. Now we have to intercept *every* indirect write to *any* value with a runtime call, even if you never use a union in your program -- Oh look, now Python's beating us by an order of magnitude in benchmarks. What do we get for it? Well, the 1% of programs that use this pattern will allow the GC to move an additional 0.1% of their objects. Or we leave the GC conservative in this one case. We can detect this case pretty easily and treat apparent pointers as pinned. Everything will work, and the GC will be better at releasing memory than it currently is.
Ok, so yes, it needs conservative scanning in this case. Because even in absolutely against pinning and unions aren't going away, and if the GC would have to be go conservative then so be it. And given that unions are a pretty niche use case, I don't think we should drop the idea of precision just because it can't be 100% precise, so we end up with a few conservatively-scanned edge cases, big deal, the benefits of precision to the whole ecosystem would be enormous in the general case. Is this a debate about precise vs. non-precise GC or are we just bikeshedding about terminology and technical details? ;) I moved recently and lost track of the GC book for awhile so I've been refreshing my terminology. :P -- // Adam Wilson // quiet.dlang.dev
Mar 13 2016
next sibling parent reply deadalnix <deadalnix gmail.com> writes:
On Sunday, 13 March 2016 at 23:34:44 UTC, Adam Wilson wrote:
 Is there an implementation of a conservative moving 
 (compacting) GC out there? I'm not aware of one, but there are 
 a lot of GC's out there. Boehm isn't.
That is impossible, you need to know what is and isn't a pointer to be able to move things around.
Mar 13 2016
next sibling parent Chris Wright <dhasenan gmail.com> writes:
On Sun, 13 Mar 2016 23:46:51 +0000, deadalnix wrote:

 On Sunday, 13 March 2016 at 23:34:44 UTC, Adam Wilson wrote:
 Is there an implementation of a conservative moving (compacting) GC out
 there? I'm not aware of one, but there are a lot of GC's out there.
 Boehm isn't.
That is impossible, you need to know what is and isn't a pointer to be able to move things around.
Let's say we do the most precise GC possible for D. We can categorize things into: * maybe a pointer; it's part of a union that contains a non-pointer element * definitely a pointer (we know because of stack pointer maps and type information, and there aren't any unions mucking it up) * definitely not a pointer We can move anything that has only "definitely a pointer"s pointing at it. We can't move anything that has one or more "maybe a pointer"s pointing at it. I keep saying this...
Mar 13 2016
prev sibling parent Adam Wilson <flyboynw gmail.com> writes:
deadalnix wrote:
 On Sunday, 13 March 2016 at 23:34:44 UTC, Adam Wilson wrote:
 Is there an implementation of a conservative moving (compacting) GC
 out there? I'm not aware of one, but there are a lot of GC's out
 there. Boehm isn't.
That is impossible, you need to know what is and isn't a pointer to be able to move things around.
It was a rhetorical question. I was just leaving the door open for someone to surprise me. ;-) -- // Adam Wilson // quiet.dlang.dev
Mar 13 2016
prev sibling parent reply Chris Wright <dhasenan gmail.com> writes:
On Sun, 13 Mar 2016 16:34:44 -0700, Adam Wilson wrote:
 Is this a debate about precise vs. non-precise GC or are we just
 bikeshedding about terminology and technical details?
You made a large number of assertions about garbage collection and they were almost all wrong. It's not about minutiae -- D can't reasonably have a precise collector, and you were saying that nothing else useful can happen before you make the GC precise. A non-precise collector can and often should be concurrent, and you were saying this was useless. These are big things. They affect prioritization of major projects. Getting them wrong could mislead a GSoC student to the point of not accomplishing much when they could have plucked lower-hanging fruit successfully.
Mar 13 2016
parent reply Adam Wilson <flyboynw gmail.com> writes:
Chris Wright wrote:
 On Sun, 13 Mar 2016 16:34:44 -0700, Adam Wilson wrote:
 Is this a debate about precise vs. non-precise GC or are we just
 bikeshedding about terminology and technical details?
You made a large number of assertions about garbage collection and they were almost all wrong. It's not about minutiae -- D can't reasonably have a precise collector, and you were saying that nothing else useful can happen before you make the GC precise. A non-precise collector can and often should be concurrent, and you were saying this was useless. These are big things. They affect prioritization of major projects. Getting them wrong could mislead a GSoC student to the point of not accomplishing much when they could have plucked lower-hanging fruit successfully.
Erm. That seems a bit extreme. You are asserting that D can't have a precise garbage collector, yet only have presented one case where the compiler cannot provide definitive information about the classification of pointer. Unions. Ok, so Unions, a relatively low-usage scenario with a valid work-around (assume it's a pointer). We can teach the compiler and GC how to pin C-style pointers so that's not it. Lastly, Rainer seemed to think a precise GC could be done, and he then went and did it ... so "can't reasonably have a precise collector" is a factually incorrect assertion. My GC lingo may be rusty, and I will admit to using superlatives incorrectly myself, but based on the existing evidence my assertions are hardly "almost all wrong". I'll clean up my lingo and retool my language if you'll tone down the superlatives. Of course GSoC students need guidance, I've been a GSoC Mentor for DLang before. If I didn't think that a precise GC was not only extremely important to D, but also within the ability of the student, I would not have pushed for it. "low-hanging fruit" is the reason D's quality is so spotty. We now have a chance with GSoC to get some quality time dedicated to a pain point, we should utilize that time to work on foundational issues. Then we (the community) can flame each other to death about the best algorithms to achieve the performance we desire. That is assuming we haven't scared him off. -- // Adam Wilson // quiet.dlang.dev
Mar 13 2016
parent reply thedeemon <dlang thedeemon.com> writes:
On Monday, 14 March 2016 at 01:38:50 UTC, Adam Wilson wrote:
 Lastly, Rainer seemed to think a precise GC could be done, and 
 he then went and did it ... so "can't reasonably have a precise 
 collector" is a factually incorrect assertion.
IIRC, Rainer called it "mostly precise", and for a good reason. If we're ok with calling a partially conservative GC a precise one, then go on. ;) As I understand Rainer's GC works in VisualD now and it did solve the problem of leaks he had with the default GC, so there is certainly benefit in pursuing preciseness, even non-100% one. I think continuing Rainer's work toward "even more precise" GC, and adding important optimizations like allocations without a global lock and parallel scanning would be a great project, and a realistic one. As for moving and generational GC, I have big doubts one can do it without breaking 90% of old code written in D (passing with write barriers everywhere. Now D competes in speed with C++, with your proposed changes it will only try to compete with Java might be overestimating write-barriers costs, please feel free to prove me wrong. Fun part: if you make a moving GC and try to automate pinning and let the compiler do it instead of requiring programmers to rewrite most of their D code, you'll probably end up with a "partially moving GC" where half of the heap is pinned, some objects are moving here and there, and GC rarely releases memory back to OS because of those pinned objects.
 My GC lingo may be rusty, and I will admit to using 
 superlatives incorrectly myself, but based on the existing 
 evidence my assertions are hardly "almost all wrong".
They did look so, sorry. Additional explanations helped.
Mar 13 2016
parent reply Adam Wilson <flyboynw gmail.com> writes:
thedeemon wrote:
 On Monday, 14 March 2016 at 01:38:50 UTC, Adam Wilson wrote:
 Lastly, Rainer seemed to think a precise GC could be done, and he then
 went and did it ... so "can't reasonably have a precise collector" is
 a factually incorrect assertion.
IIRC, Rainer called it "mostly precise", and for a good reason. If we're ok with calling a partially conservative GC a precise one, then go on. ;) As I understand Rainer's GC works in VisualD now and it did solve the problem of leaks he had with the default GC, so there is certainly benefit in pursuing preciseness, even non-100% one.
conservative. ;-)
 I think continuing Rainer's work toward "even more precise" GC, and
 adding important optimizations like allocations without a global lock
 and parallel scanning would be a great project, and a realistic one.
On this we are in complete agreement.
 As for moving and generational GC, I have big doubts one can do it
 without breaking 90% of old code written in D (passing pointers to C

 everywhere. Now D competes in speed with C++, with your proposed changes

 be lost. On the other hand I might be overestimating write-barriers
 costs, please feel free to prove me wrong.
Well, conclusively proving the cost of write-barriers would require profiling. But I think we should be able to optimize the use of write-barriers to reduce the impact. Here is the thing, nobody wants to admit it, but we have lost the battle with C++, although to be honest, I don't think we ever had a prayer of winning it in the first place. They want C++ with the "warts" removed, for some personally distinct set of "warts", they're not really interested in D for itself. Maybe ... why don't instead of trying to compete with everybody else, we do our own thing, and do it very well. As long as we're just another "me too" operation people will treat us like we are. Let's focus on being the best D language out there. And D needs a GC, so let's build the best damn garbage collector the natively-compiled world has every seen.
 Fun part: if you make a moving GC and try to automate pinning and let
 the compiler do it instead of requiring programmers to rewrite most of
 their D code, you'll probably end up with a "partially moving GC" where
 half of the heap is pinned, some objects are moving here and there, and
 GC rarely releases memory back to OS because of those pinned objects.
Yea, this is true, although I think getting most of the way there is going to be very useful in it's own right. And it would open the possibility of experimenting with different algorithms and strategies. Right now we're pretty limited in the types of algorithms we can use, precision offers us more options and more experiments. Let's get the foundation in place, then we can go nuts.
 My GC lingo may be rusty, and I will admit to using superlatives
 incorrectly myself, but based on the existing evidence my assertions
 are hardly "almost all wrong".
They did look so, sorry. Additional explanations helped.
Thank you. :) -- // Adam Wilson // quiet.dlang.dev
Mar 13 2016
parent reply Jeremy DeHaan <dehaan.jeremiah gmail.com> writes:
I haven't had power for a couple of days, but it looks like the 
discussion has gone along pretty ok. After reading everything, I 
think I'm inclined to agree with Adam and the main focus of my 
proposal will be a precise GC (or as precise as we can get). I'll 
definitely need some guidance, but I'll be learning everything I 
can about GC's and precision until the start of the project.

On Monday, 14 March 2016 at 05:28:13 UTC, Adam Wilson wrote:
 Maybe ... why don't instead of trying to compete with everybody 
 else, we do our own thing, and do it very well. As long as 
 we're just another "me too" operation people will treat us like 
 we are. Let's focus on being the best D language out there. And 
 D needs a GC, so let's build the best damn garbage collector 
 the natively-compiled world has every seen.
That is what I hope to work towards. Hopefully next year I can work on making a generational GC, or something else depending on how much time I could dedicate between when this GSoC is finished and the next begins. Adam, can you reach out to Craig and let him know you're willing to mentor this project if it get's accepted? I talked with him a few days ago via email and he said that someone would need to take it on but he wasn't sure who.
Mar 14 2016
next sibling parent jmh530 <john.michael.hall gmail.com> writes:
On Tuesday, 15 March 2016 at 01:34:07 UTC, Jeremy DeHaan wrote:
 I haven't had power for a couple of days, but it looks like the 
 discussion has gone along pretty ok.
I would characterize it as very interesting, although I know very little about how GCs are implemented. I have a question, for the more knowledgeable... There was some discussion about the difficulty of handling unions and being able to ensure that pointers are pointers in a precise GC. Is there less work involved in getting the GC to be precise for safe code? Does the current GC take this into account?
Mar 14 2016
prev sibling next sibling parent Adam Wilson <flyboynw gmail.com> writes:
Jeremy DeHaan wrote:
 I haven't had power for a couple of days, but it looks like the
 discussion has gone along pretty ok. After reading everything, I think
 I'm inclined to agree with Adam and the main focus of my proposal will
 be a precise GC (or as precise as we can get). I'll definitely need some
 guidance, but I'll be learning everything I can about GC's and precision
 until the start of the project.
Awesome! It's a complex topic, but you'll get broad exposure to all corners of the D ecosystem, it's the kind of experience that you can apply to all aspects of software engineering.
 On Monday, 14 March 2016 at 05:28:13 UTC, Adam Wilson wrote:
 Maybe ... why don't instead of trying to compete with everybody else,
 we do our own thing, and do it very well. As long as we're just
 another "me too" operation people will treat us like we are. Let's
 focus on being the best D language out there. And D needs a GC, so
 let's build the best damn garbage collector the natively-compiled
 world has every seen.
That is what I hope to work towards. Hopefully next year I can work on making a generational GC, or something else depending on how much time I could dedicate between when this GSoC is finished and the next begins. Adam, can you reach out to Craig and let him know you're willing to mentor this project if it get's accepted? I talked with him a few days ago via email and he said that someone would need to take it on but he wasn't sure who.
Will do. And given the nature of the work you want to do, I think (personal opinion follows) your application will sail through the process. The application window opened today so my first recommendation would be to get started on your proposal as soon as possible, if you haven't already. GSoC has a basic outline of what you'll need to include here: http://write.flossmanuals.net/gsocstudentguide/writing-a-proposal/ I'd like to review the proposal before you submit it. And welcome aboard the crazy-train known as GSoC, it's a lot of work but it'll be buckets of fun! :) -- // Adam Wilson // quiet.dlang.dev
Mar 14 2016
prev sibling next sibling parent reply Rainer Schuetze <r.sagitario gmx.de> writes:
On 15.03.2016 02:34, Jeremy DeHaan wrote:
 I haven't had power for a couple of days, but it looks like the
 discussion has gone along pretty ok. After reading everything, I think
 I'm inclined to agree with Adam and the main focus of my proposal will
 be a precise GC (or as precise as we can get). I'll definitely need some
 guidance, but I'll be learning everything I can about GC's and precision
 until the start of the project.

 On Monday, 14 March 2016 at 05:28:13 UTC, Adam Wilson wrote:
 Maybe ... why don't instead of trying to compete with everybody else,
 we do our own thing, and do it very well. As long as we're just
 another "me too" operation people will treat us like we are. Let's
 focus on being the best D language out there. And D needs a GC, so
 let's build the best damn garbage collector the natively-compiled
 world has every seen.
That is what I hope to work towards. Hopefully next year I can work on making a generational GC, or something else depending on how much time I could dedicate between when this GSoC is finished and the next begins. Adam, can you reach out to Craig and let him know you're willing to mentor this project if it get's accepted? I talked with him a few days ago via email and he said that someone would need to take it on but he wasn't sure who.
Being always way behind reading the forum these days, I'm a little late and have not read all the messages in this thread thoroughly. Here are some thoughts: The precise GC used by Visual D is not only heap-precise, but also has type information to scan the data and TLS segments precisely. This is done with a patched dmd now based on version 2.066 and is not part of the PR. It uses the solution that I presented at DConf 2013: http://dconf.org/2013/talks/schuetze.pdf Creating information for stack and registers seems a lot of work with uncertain return: Apart from having to describe your code in very high detail, there is usually no guarantee that you can unwind the stack properly, especially when calling through C/C++ functions that you have no control of. A moving collector is possible IMO, but it needs write barriers, and the idea of adding these explicitly has met quite some resistance, especially by Walter. Write barriers are also necessary for incremental collections. It changes the way you can interact with C/C++ because it's no longer good enough to just keep a pointer used by C visible to the GC, you have to pin it explicitly. You can have very coarse barriers by using the page protection mechanism of the hardware, that's what I implemented for Windows, see http://rainers.github.io/visuald/druntime/concurrentgc.html. This is similar to Sociomantics concurrent GC using fork. It scans in the background, but not with multiple threads in parallel. Unfortunately, this work has rotton almost 2 years now, but I hope to get back to it some day...
Mar 18 2016
parent reply Jeremy deHaan <dehaan.jeremiah gmail.com> writes:
On Friday, 18 March 2016 at 16:41:21 UTC, Rainer Schuetze wrote:
 On 15.03.2016 02:34, Jeremy DeHaan wrote:
[...]
Being always way behind reading the forum these days, I'm a little late and have not read all the messages in this thread thoroughly. Here are some thoughts: [...]
Thank you for the feedback. I'm still working on my proposal so nothing is set in stone just yet. I'm very interested in working on the GC for this GSoC, so what would you suggest be my main focus? It sounds like you already have a GC that is more or less what I was planning on implementing...
Mar 18 2016
parent reply Rainer Schuetze <r.sagitario gmx.de> writes:
On 18.03.2016 22:04, Jeremy deHaan wrote:
 On Friday, 18 March 2016 at 16:41:21 UTC, Rainer Schuetze wrote:
 On 15.03.2016 02:34, Jeremy DeHaan wrote:
 [...]
Being always way behind reading the forum these days, I'm a little late and have not read all the messages in this thread thoroughly. Here are some thoughts: [...]
Thank you for the feedback. I'm still working on my proposal so nothing is set in stone just yet. I'm very interested in working on the GC for this GSoC, so what would you suggest be my main focus? It sounds like you already have a GC that is more or less what I was planning on implementing...
Well, there are a number of unfinished parts left: - as far as I understand the precise GC PR https://github.com/D-Programming-Language/druntime/pull/1022 is currently stalled because it doesn't yield better overall performance than the current GC in most situations. The reasoning is that it should be able compensate the additional time during allocation (saving pointer information) by improved scanning using that information to skip non-pointers. That didn't work out yet, though. - last time I checked generating RTInfo (which contains the pointer info) was a bit unreliable, see https://github.com/D-Programming-Language/dmd/pull/2480 and https://github.com/D-Programming-Language/dmd/pull/3958. - I only implemented the DATA and TLS section TypeInfo emission for the OMF backend, this needs to be ported to other platforms. Martin Nowak recently considered to just emit pointer locations. This would make the scanning function a bit simpler, but might need a bit more memory in the binary. Regarding the concurrent GC: I consider my implementation a prototype with rough edges and a lot of optimization opportunities. I'm not sure whether page protection is good enough to implement a generational GC, but it might still be possible to take advantage of the information that a page hasn't been written to. Judging from https://blog.golang.org/go15gc Go 1.5 seems to be using a similar GC (concurrent mark-and-sweep), though with proper write barriers. I guess there is a lot of stuff that can be used from their experience.
 so what would you suggest be my main focus?
Given that 32-bit applications are becoming legacy, and false pointers are not a common problem in 64-bit processes (they do happen eventually, though) I suspect that concurrency of the GC would make a much larger impact on the D language than preciseness. A good target should be to reduce stop-the-world-time to something acceptable for interactive programs, i.e. well below 50ms.
Mar 19 2016
parent Adam Wilson <flyboynw gmail.com> writes:
Rainer Schuetze wrote:
 On 18.03.2016 22:04, Jeremy deHaan wrote:
 On Friday, 18 March 2016 at 16:41:21 UTC, Rainer Schuetze wrote:
 On 15.03.2016 02:34, Jeremy DeHaan wrote:
 [...]
Being always way behind reading the forum these days, I'm a little late and have not read all the messages in this thread thoroughly. Here are some thoughts: [...]
Thank you for the feedback. I'm still working on my proposal so nothing is set in stone just yet. I'm very interested in working on the GC for this GSoC, so what would you suggest be my main focus? It sounds like you already have a GC that is more or less what I was planning on implementing...
Well, there are a number of unfinished parts left: - as far as I understand the precise GC PR https://github.com/D-Programming-Language/druntime/pull/1022 is currently stalled because it doesn't yield better overall performance than the current GC in most situations. The reasoning is that it should be able compensate the additional time during allocation (saving pointer information) by improved scanning using that information to skip non-pointers. That didn't work out yet, though. - last time I checked generating RTInfo (which contains the pointer info) was a bit unreliable, see https://github.com/D-Programming-Language/dmd/pull/2480 and https://github.com/D-Programming-Language/dmd/pull/3958. - I only implemented the DATA and TLS section TypeInfo emission for the OMF backend, this needs to be ported to other platforms. Martin Nowak recently considered to just emit pointer locations. This would make the scanning function a bit simpler, but might need a bit more memory in the binary. Regarding the concurrent GC: I consider my implementation a prototype with rough edges and a lot of optimization opportunities. I'm not sure whether page protection is good enough to implement a generational GC, but it might still be possible to take advantage of the information that a page hasn't been written to. Judging from https://blog.golang.org/go15gc Go 1.5 seems to be using a similar GC (concurrent mark-and-sweep), though with proper write barriers. I guess there is a lot of stuff that can be used from their experience. > so what would you suggest be my main focus? Given that 32-bit applications are becoming legacy, and false pointers are not a common problem in 64-bit processes (they do happen eventually, though) I suspect that concurrency of the GC would make a much larger impact on the D language than preciseness. A good target should be to reduce stop-the-world-time to something acceptable for interactive programs, i.e. well below 50ms.
I would argue that precision is still important as it enables a compacting GC. While technically possible to implement a generational GC without compaction, it's generally not considered wise. And even though 64-bit does not suffer from false pointers as much as 32-bit does it still will, especially when used in the big-data, analytics, and scientific scenarios that D seems to have made real traction in. While false pointers are a problem for a simple command line app, and probably even most Vibe.D servers, there is a significant class of work being done in D today that would be directly effected by them. -- // Adam Wilson // quiet.dlang.dev
Mar 19 2016
prev sibling parent Craig Dillabaugh <craig.dillabaugh gmail.com> writes:
On Tuesday, 15 March 2016 at 01:34:07 UTC, Jeremy DeHaan wrote:
 I haven't had power for a couple of days, but it looks like the 
 discussion has gone along pretty ok. After reading everything, 
 I think I'm inclined to agree with Adam and the main focus of 
 my proposal will be a precise GC (or as precise as we can get). 
 I'll definitely need some guidance, but I'll be learning 
 everything I can about GC's and precision until the start of 
 the project.

 [...]
In case you didn't already know, Adam is now an official mentor so you are all set to go. Good luck!
Mar 18 2016
prev sibling parent Dmitry Olshansky <dmitry.olsh gmail.com> writes:
On 11-Mar-2016 18:13, Jeremy DeHaan wrote:
 Thank you all for the feedback.

 I think I might still need a little more feedback as to what the project
 should actually entail, but here's what it's looking like so far:

 Implement lock free allocation using std.experimental.allocator's
 freelists (SharedFreeList? It was the only thing in the documentation
 that specifically mentions lock free allocation)

 Implement a generational garbage collector

 Implement precise garbage collection (possibly piggybacking off of
 Rainer's work)
Let's not forget another low-hanging fruit which is parallel marking. Regardless of the kind of collector we use parallel marking is a must have in our multicore world, and should be stright-forward to implement in D.
 Does anyone think that might be too much to take on or does it sound
 pretty good? I have other garbage collector things I'd like to explore,
 but I they should probably go in the "if time allows" category at this
 point.

     Jeremy
-- Dmitry Olshansky
Mar 13 2016