digitalmars.D - Precise GC
- Walter Bright (23/23) Apr 07 2012 Of course, many of us have been thinking about this for a looong time, a...
- Froglegs (3/3) Apr 07 2012 That sounds cool, perhaps people can have customizable GC for
- Andrei Alexandrescu (7/10) Apr 07 2012 I'm also very excited about this design, and will make time to help with...
- Andrei Alexandrescu (5/14) Apr 07 2012 BTW one exciting thing about both this and the nascent attributes design...
- Antti-Ville Tuunainen (5/7) Apr 14 2012 That would be me.
- Jacob Carlborg (4/10) Apr 14 2012 There is: http://prowiki.org/wiki4d/wiki.cgi?GSOC_2012_Ideas
- Antti-Ville Tuunainen (3/4) Apr 14 2012 That's the ideas list for proposals. I was asking if anyone else
- Jacob Carlborg (4/8) Apr 15 2012 Aha, I see. The chosen projects have not been announced yet.
- Chad J (3/3) Apr 07 2012 Hey, that sounds awesome. I think I geeked out a bit.
- Walter Bright (2/5) Apr 07 2012 It has nothing to do with reference counting that I can think of.
- Andrei Alexandrescu (3/10) Apr 07 2012 Nevertheless good food for thought. This is all very cool.
- Timon Gehr (2/28) Apr 08 2012 That actually sounds like a pretty awesome idea.
- Timon Gehr (5/6) Apr 08 2012 Make sure that the compiler does not actually rely on the fact that the
- Manu (6/13) Apr 08 2012 This sounds important to me. If it is also possible to do the work with
- =?UTF-8?B?QWxleCBSw7hubmUgUGV0ZXJzZW4=?= (6/19) Apr 08 2012 Yes, I agree here. The last thing we need is a huge amount of
- deadalnix (4/27) Apr 09 2012 Nothing prevent the generated function to itself call other generated
- Manu (10/42) Apr 09 2012 Eh?
- deadalnix (11/20) Apr 09 2012 If you have reference to objects, you can't avoid a function call. If
- Manu (9/29) Apr 09 2012 e
- deadalnix (11/16) Apr 10 2012 OK, back to basics.
- Andrei Alexandrescu (8/17) Apr 10 2012 That is correct. The code bloat is equal to that generated if the end
- Kagamin (3/16) Apr 13 2012 What's the problem with virtual calls on ARM?
- Manu (20/33) Apr 13 2012 No other processors have branch prediction units anywhere near
- Kagamin (10/23) Apr 13 2012 Allocation of small aggregated objects usually involves
- Manu (29/31) Apr 14 2012 All depends how much you love object orientation. If you follow the C++
- Robert Jacques (3/36) Apr 14 2012 Is there any reason to assume that the GC is actually using the mark fun...
- James Miller (5/6) Apr 13 2012 Hmmm, I thought we decided that was a good idea, anybody in the know if
- Timon Gehr (3/4) Apr 08 2012 I understand that the stack will still have to be scanned
- Walter Bright (2/6) Apr 08 2012 For now, just treat them conservatively.
- Rainer Schuetze (6/10) Apr 08 2012 I guess the compiler should generate an (anonymous) struct type
- =?ISO-8859-1?Q?Alex_R=F8nne_Petersen?= (6/17) Apr 08 2012 This sounds sensible to me. No reason closure marking can't be precise
- Steven Schveighoffer (11/38) Apr 09 2012 I think this is a really good idea.
- deadalnix (15/41) Apr 09 2012 This id a good idea. However, this doesn't handle type qualifiers. And
- Walter Bright (2/6) Apr 09 2012 I think this is an orthogonal issue.
- deadalnix (17/26) Apr 10 2012 You mean an allocator/deallocator one ?
Of course, many of us have been thinking about this for a looong time, and what is the best way to go about it. The usual technique is for the compiler to emit some sort of table for each TypeInfo giving the layout of the object, i.e. where the pointers are. The general problem with these is the table is non-trivial, as it will require things like iterated data blocks, etc. It has to be compressed to save space, and the gc then has to execute a fair amount of code to decode it. It also requires some significant work on the compiler end, leading of course to complexity, rigidity, development bottlenecks, and the usual bugs. An alternative Andrei and I have been talking about is to put in the TypeInfo a pointer to a function. That function will contain customized code to mark the pointers in an instance of that type. That custom code will be generated by a template defined by the library. All the compiler has to do is stupidly instantiate the template for the type, and insert an address to the generated function. The compiler need know NOTHING about how the marking works. Even better, as ctRegex has demonstrated, the custom generated code can be very, very fast compared with a runtime table-driven approach. (The slow part will be calling the function indirectly.) And best of all, the design is pushed out of the compiler into the library, so various schemes can be tried out without needing compiler work. I think this is an exciting idea, it will enable us to get a precise gc by enabling people to work on it in parallel rather than serially waiting for me.
Apr 07 2012
That sounds cool, perhaps people can have customizable GC for specific applications? Looking forward to D having a precise GC
Apr 07 2012
On 4/7/12 8:56 PM, Walter Bright wrote: [snip]I think this is an exciting idea, it will enable us to get a precise gc by enabling people to work on it in parallel rather than serially waiting for me.I'm also very excited about this design, and will make time to help with the library part of the implementation. Maybe we can get a GSoC project on that. We already have a related proposal (lock-free GC). Andrei
Apr 07 2012
On 4/7/12 9:49 PM, Andrei Alexandrescu wrote:On 4/7/12 8:56 PM, Walter Bright wrote: [snip]BTW one exciting thing about both this and the nascent attributes design is they integrate wonderful with language's generic and generative capabilities. AndreiI think this is an exciting idea, it will enable us to get a precise gc by enabling people to work on it in parallel rather than serially waiting for me.I'm also very excited about this design, and will make time to help with the library part of the implementation. Maybe we can get a GSoC project on that. We already have a related proposal (lock-free GC).
Apr 07 2012
On Sunday, 8 April 2012 at 02:49:34 UTC, Andrei Alexandrescu wrote:Maybe we can get a GSoC project on that. We already have a related proposal (lock-free GC).That would be me. Just though I should wave and say hello. Are there other GSoC proposals in the GC area?
Apr 14 2012
On 2012-04-14 19:46, Antti-Ville Tuunainen wrote:On Sunday, 8 April 2012 at 02:49:34 UTC, Andrei Alexandrescu wrote:There is: http://prowiki.org/wiki4d/wiki.cgi?GSOC_2012_Ideas -- /Jacob CarlborgMaybe we can get a GSoC project on that. We already have a related proposal (lock-free GC).That would be me. Just though I should wave and say hello. Are there other GSoC proposals in the GC area?
Apr 14 2012
On Saturday, 14 April 2012 at 19:17:02 UTC, Jacob Carlborg wrote:There is: http://prowiki.org/wiki4d/wiki.cgi?GSOC_2012_IdeasThat's the ideas list for proposals. I was asking if anyone else applied for GSoC using one of those.
Apr 14 2012
On 2012-04-15 04:11, Antti-Ville Tuunainen wrote:On Saturday, 14 April 2012 at 19:17:02 UTC, Jacob Carlborg wrote:Aha, I see. The chosen projects have not been announced yet. -- /Jacob CarlborgThere is: http://prowiki.org/wiki4d/wiki.cgi?GSOC_2012_IdeasThat's the ideas list for proposals. I was asking if anyone else applied for GSoC using one of those.
Apr 15 2012
Hey, that sounds awesome. I think I geeked out a bit. Would this make it any easier to reference count types that can be statically proven to have no cyclical references?
Apr 07 2012
On 4/7/2012 7:58 PM, Chad J wrote:Hey, that sounds awesome. I think I geeked out a bit. Would this make it any easier to reference count types that can be statically proven to have no cyclical references?It has nothing to do with reference counting that I can think of.
Apr 07 2012
On 4/7/12 9:59 PM, Walter Bright wrote:On 4/7/2012 7:58 PM, Chad J wrote:Nevertheless good food for thought. This is all very cool. AndreiHey, that sounds awesome. I think I geeked out a bit. Would this make it any easier to reference count types that can be statically proven to have no cyclical references?It has nothing to do with reference counting that I can think of.
Apr 07 2012
On 04/08/2012 03:56 AM, Walter Bright wrote:Of course, many of us have been thinking about this for a looong time, and what is the best way to go about it. The usual technique is for the compiler to emit some sort of table for each TypeInfo giving the layout of the object, i.e. where the pointers are. The general problem with these is the table is non-trivial, as it will require things like iterated data blocks, etc. It has to be compressed to save space, and the gc then has to execute a fair amount of code to decode it. It also requires some significant work on the compiler end, leading of course to complexity, rigidity, development bottlenecks, and the usual bugs. An alternative Andrei and I have been talking about is to put in the TypeInfo a pointer to a function. That function will contain customized code to mark the pointers in an instance of that type. That custom code will be generated by a template defined by the library. All the compiler has to do is stupidly instantiate the template for the type, and insert an address to the generated function. The compiler need know NOTHING about how the marking works. Even better, as ctRegex has demonstrated, the custom generated code can be very, very fast compared with a runtime table-driven approach. (The slow part will be calling the function indirectly.) And best of all, the design is pushed out of the compiler into the library, so various schemes can be tried out without needing compiler work. I think this is an exciting idea, it will enable us to get a precise gc by enabling people to work on it in parallel rather than serially waiting for me.That actually sounds like a pretty awesome idea.
Apr 08 2012
On 04/08/2012 10:45 AM, Timon Gehr wrote:That actually sounds like a pretty awesome idea.Make sure that the compiler does not actually rely on the fact that the template generates a function. The design should include the possibility of just generating tables. It all should be completely transparent to the compiler, if that is possible.
Apr 08 2012
On 8 April 2012 11:56, Timon Gehr <timon.gehr gmx.ch> wrote:On 04/08/2012 10:45 AM, Timon Gehr wrote:This sounds important to me. If it is also possible to do the work with generated tables, and not calling thousands of indirect functions in someone's implementation, it would be nice to reserve that possibility. Indirect function calls in hot loops make me very nervous for non-x86 machines.That actually sounds like a pretty awesome idea.Make sure that the compiler does not actually rely on the fact that the template generates a function. The design should include the possibility of just generating tables. It all should be completely transparent to the compiler, if that is possible.
Apr 08 2012
On 08-04-2012 11:42, Manu wrote:On 8 April 2012 11:56, Timon Gehr <timon.gehr gmx.ch <mailto:timon.gehr gmx.ch>> wrote: On 04/08/2012 10:45 AM, Timon Gehr wrote: That actually sounds like a pretty awesome idea. Make sure that the compiler does not actually rely on the fact that the template generates a function. The design should include the possibility of just generating tables. It all should be completely transparent to the compiler, if that is possible. This sounds important to me. If it is also possible to do the work with generated tables, and not calling thousands of indirect functions in someone's implementation, it would be nice to reserve that possibility. Indirect function calls in hot loops make me very nervous for non-x86 machines.Yes, I agree here. The last thing we need is a huge amount of kinda-sorta-virtual function calls on ARM, MIPS, etc. It may work fine on x86, but anywhere else, it's really not what you want in a GC. -- - Alex
Apr 08 2012
Le 08/04/2012 14:02, Alex Rønne Petersen a écrit :On 08-04-2012 11:42, Manu wrote:Nothing prevent the generated function to itself call other generated functions, when things are predictable. It avoid many indirect calls, and purely by lib, which is good (can be tuned for application/plateform).On 8 April 2012 11:56, Timon Gehr <timon.gehr gmx.ch <mailto:timon.gehr gmx.ch>> wrote: On 04/08/2012 10:45 AM, Timon Gehr wrote: That actually sounds like a pretty awesome idea. Make sure that the compiler does not actually rely on the fact that the template generates a function. The design should include the possibility of just generating tables. It all should be completely transparent to the compiler, if that is possible. This sounds important to me. If it is also possible to do the work with generated tables, and not calling thousands of indirect functions in someone's implementation, it would be nice to reserve that possibility. Indirect function calls in hot loops make me very nervous for non-x86 machines.Yes, I agree here. The last thing we need is a huge amount of kinda-sorta-virtual function calls on ARM, MIPS, etc. It may work fine on x86, but anywhere else, it's really not what you want in a GC.
Apr 09 2012
On 9 April 2012 21:20, deadalnix <deadalnix gmail.com> wrote:Le 08/04/2012 14:02, Alex R=C3=B8nne Petersen a =C3=A9crit : On 08-04-2012 11:42, Manu wrote:Eh? Not sure what you mean. The idea is the template would produce a struct/table of data instead of being a pointer to a function, this way the GC could work without calling anything. If the GC was written to assume GC info in a particular format/structure, it could be written without any calls. I'm just saying to leave that as a possibility, and not REQUIRE an indirect function call for every single allocation in the system. Some GC might be able to make better use of that sort of setup.Nothing prevent the generated function to itself call other generated functions, when things are predictable. It avoid many indirect calls, and purely by lib, which is good (can be tuned for application/plateform).On 8 April 2012 11:56, Timon Gehr <timon.gehr gmx.ch <mailto:timon.gehr gmx.ch>> wrote: On 04/08/2012 10:45 AM, Timon Gehr wrote: That actually sounds like a pretty awesome idea. Make sure that the compiler does not actually rely on the fact that the template generates a function. The design should include the possibility of just generating tables. It all should be completely transparent to the compiler, if that is possible. This sounds important to me. If it is also possible to do the work with generated tables, and not calling thousands of indirect functions in someone's implementation, it would be nice to reserve that possibility. Indirect function calls in hot loops make me very nervous for non-x86 machines.Yes, I agree here. The last thing we need is a huge amount of kinda-sorta-virtual function calls on ARM, MIPS, etc. It may work fine on x86, but anywhere else, it's really not what you want in a GC.
Apr 09 2012
Le 09/04/2012 20:33, Manu a écrit :Eh? Not sure what you mean. The idea is the template would produce a struct/table of data instead of being a pointer to a function, this way the GC could work without calling anything. If the GC was written to assume GC info in a particular format/structure, it could be written without any calls. I'm just saying to leave that as a possibility, and not REQUIRE an indirect function call for every single allocation in the system. Some GC might be able to make better use of that sort of setup.If you have reference to objects, you can't avoid a function call. If you have something you know at compile time, the generated function can directly call the other function that mark the pointed data (or even can do it itself, if you don't fear code bloat) without going back to the GC and its indirect call. So it make no difference in the number of indirect calls you have, but the struct proposal is a stronger constraint on the GC that the function one. BTW, starting you answer by « Not sure what you mean. » should have been a red flag.
Apr 09 2012
On 10 April 2012 00:06, deadalnix <deadalnix gmail.com> wrote:Le 09/04/2012 20:33, Manu a =C3=A9crit : Eh?doNot sure what you mean. The idea is the template would produce a struct/table of data instead of being a pointer to a function, this way the GC could work without calling anything. If the GC was written to assume GC info in a particular format/structure, it could be written without any calls. I'm just saying to leave that as a possibility, and not REQUIRE an indirect function call for every single allocation in the system. Some GC might be able to make better use of that sort of setup.If you have reference to objects, you can't avoid a function call. If you have something you know at compile time, the generated function can directly call the other function that mark the pointed data (or even can =it itself, if you don't fear code bloat) without going back to the GC and its indirect call. So it make no difference in the number of indirect calls you have, but th=estruct proposal is a stronger constraint on the GC that the function one. BTW, starting you answer by =C2=AB Not sure what you mean. =C2=BB should =have been ared flag.It is, and I still don't follow. I can't imagine there are any indirect function calls, except for the ones introduced by this proposal, where you may register a function to mark the pointers in complex structs. You seem to be suggesting that another one already exists anyway? Where is it? Why is it there?
Apr 09 2012
Le 10/04/2012 00:39, Manu a écrit :It is, and I still don't follow. I can't imagine there are any indirect function calls, except for the ones introduced by this proposal, where you may register a function to mark the pointers in complex structs. You seem to be suggesting that another one already exists anyway? Where is it? Why is it there?OK, back to basics. For every type, a function template (let's call it GCscan) will be instantiated to scan it. This function can be ANY code. ANY code include the possibility for GCscan!A to call GCscan!B directly, without going back to GC main loop and indirect call. If inlined, you can forget about function call at all (and you can force that using mixin template for example, but it is likely to massively generate code bloat). This can't be done for reference/pointer to polymorphic types, but for any other it is an available option, and it can reduce dramatically the number of indirect calls.
Apr 10 2012
On 4/10/12 3:03 AM, deadalnix wrote:For every type, a function template (let's call it GCscan) will be instantiated to scan it. This function can be ANY code. ANY code include the possibility for GCscan!A to call GCscan!B directly, without going back to GC main loop and indirect call. If inlined, you can forget about function call at all (and you can force that using mixin template for example, but it is likely to massively generate code bloat).That is correct. The code bloat is equal to that generated if the end user sat down and wrote by hand appropriate routines for collection for all types.This can't be done for reference/pointer to polymorphic types, but for any other it is an available option, and it can reduce dramatically the number of indirect calls.Indeed. For non-final class member variables, the template will fetch their Typeinfo, from that the pointer to the mark function, and will issue the call through pointer. Andrei
Apr 10 2012
On Sunday, 8 April 2012 at 12:02:10 UTC, Alex Rønne Petersen wrote:What's the problem with virtual calls on ARM?This sounds important to me. If it is also possible to do the work with generated tables, and not calling thousands of indirect functions in someone's implementation, it would be nice to reserve that possibility. Indirect function calls in hot loops make me very nervous for non-x86 machines.Yes, I agree here. The last thing we need is a huge amount of kinda-sorta-virtual function calls on ARM, MIPS, etc. It may work fine on x86, but anywhere else, it's really not what you want in a GC.
Apr 13 2012
On 13 April 2012 15:53, Kagamin <spam here.lot> wrote:On Sunday, 8 April 2012 at 12:02:10 UTC, Alex R=C3=B8nne Petersen wrote:nThis sounds important to me. If it is also possible to do the work withgenerated tables, and not calling thousands of indirect functions in someone's implementation, it would be nice to reserve that possibility. Indirect function calls in hot loops make me very nervous for non-x86 machines.Yes, I agree here. The last thing we need is a huge amount of kinda-sorta-virtual function calls on ARM, MIPS, etc. It may work fine o=No other processors have branch prediction units anywhere near the sophistication of modern x86. Any call through a function pointer stalls the pipeline, pipelines are getting longer all the time, and PPC has even more associated costs/hazards. Most processors can only perform trivial binary branch prediction around an 'if'. It also places burden on the icache (unable to prefetch), and of course the dcache, both of which are much less sophisticated than x86 aswell. Compiler can't do anything with code locality (improves icache usage), since the target is unknown at compile time... there are also pipeline stalls introduced by the sequence of indirect pointer lookups preceding any virtual call. Virtuals are possibly the worst hazard to modern CPU's, and the hardest to detect/profile, since their cost is evenly spread throughout the entire code base, you can never gauge their true impact on your performance. You also can't easily measure the affect of icache misses on your code, suffice to say, you will have MANY more in virtual-heavy code. While I'm at it. 'final:' and 'virtual' keyword please ;)x86, but anywhere else, it's really not what you want in a GC.What's the problem with virtual calls on ARM?
Apr 13 2012
On Friday, 13 April 2012 at 13:54:39 UTC, Manu wrote:No other processors have branch prediction units anywhere near the sophistication of modern x86. Any call through a function pointer stalls the pipeline, pipelines are getting longer all the time, and PPC has even more associated costs/hazards. Most processors can only perform trivial binary branch prediction around an 'if'. It also places burden on the icache (unable to prefetch), and of course the dcache, both of which are much less sophisticated than x86 aswell.Allocation of small aggregated objects usually involves allocation of several equally small objects of different types in a row, so they sit one after another in heap and gc will visit them in a row every time calling function different from the previous time, so to x86 processor it would result in constant misprediction: AFAIK x86 processor caches only one target address per branch (ARM caches a flag?). And icache should not suffer in both cases: once you prefetched the function, it will remain in the icache and be reused from there the next time.
Apr 13 2012
On 13 April 2012 17:25, Kagamin <spam here.lot> wrote:once you prefetched the function, it will remain in the icache and be reused from there the next time.All depends how much you love object orientation. If you follow the C++ book and make loads of classes for everything, you'll thrash the hell out of it. If you only have a couple of different object, maybe they'll coexist in the icache. The GC is a bit of a special case though because it runs in a tight loop. That said, the pipeline hazards still exist regardless of the state of icache. Conventional virtuals are worse, since during the process of executing regular code, there's not usually such a tight loop pattern. (note: I was answering the prior question about virtual functions in general, not as applied to the specific use case of a GC scan) The latest 'turbo' ARM chips (Cortex, etc) and such may store a branch target table, they are alleged to have improved prediction, but I haven't checked. Prior chips (standard Arm9 and down, note: most non-top-end androids fall in this category, and all games consoles with arms in them) don't technically have branch prediction at all. ARM has conditional execution bits on instructions, so it can filter opcodes based on the state of the flags register. This is a cool design for binary branching, or performing 'select' operations, but it can't do anything to help an indirect jump. Point is, the GC is the most fundamental performance hazard to D, and I just think it's important to make sure the design is such that it is possible to write a GC loop which can do its job with generated data tables if possible, instead of requiring generated marker functions. It would seem to me that carefully crafted tables of data could technically perform the same function as marker functions, but without any function calls... and the performance characteristics may be better/worse for different architectures.
Apr 14 2012
On Sat, 14 Apr 2012 05:21:08 -0500, Manu <turkeyman gmail.com> wrote:On 13 April 2012 17:25, Kagamin <spam here.lot> wrote:Is there any reason to assume that the GC is actually using the mark function for marking? The mark function is only there to provide the GC with the information it needs in order to mark. What it actually is is defined by the runtime. So it may just return a bitmask. Or it might _be_ a bitmask (for objects <= 512 bytes) Or an 64-bit index into some internal table. Etc. If fact, real functions only start making sense with very large arrays and compound structs/objects. My actual concern with this approach is that a DLL compiled with GC A will have problems interacting with a program with GC B.once you prefetched the function, it will remain in the icache and be reused from there the next time.All depends how much you love object orientation. If you follow the C++ book and make loads of classes for everything, you'll thrash the hell out of it. If you only have a couple of different object, maybe they'll coexist in the icache. The GC is a bit of a special case though because it runs in a tight loop. That said, the pipeline hazards still exist regardless of the state of icache. Conventional virtuals are worse, since during the process of executing regular code, there's not usually such a tight loop pattern. (note: I was answering the prior question about virtual functions in general, not as applied to the specific use case of a GC scan) The latest 'turbo' ARM chips (Cortex, etc) and such may store a branch target table, they are alleged to have improved prediction, but I haven't checked. Prior chips (standard Arm9 and down, note: most non-top-end androids fall in this category, and all games consoles with arms in them) don't technically have branch prediction at all. ARM has conditional execution bits on instructions, so it can filter opcodes based on the state of the flags register. This is a cool design for binary branching, or performing 'select' operations, but it can't do anything to help an indirect jump. Point is, the GC is the most fundamental performance hazard to D, and I just think it's important to make sure the design is such that it is possible to write a GC loop which can do its job with generated data tables if possible, instead of requiring generated marker functions. It would seem to me that carefully crafted tables of data could technically perform the same function as marker functions, but without any function calls... and the performance characteristics may be better/worse for different architectures.
Apr 14 2012
On 2012-04-13 16:54:28 +0300 Manu <turkeyman gmail.com> wrote:While I'm at it. 'final:' and 'virtual' keyword please ;)Hmmm, I thought we decided that was a good idea, anybody in the know if this going to happen or not? -- James Miller
Apr 13 2012
On 04/08/2012 10:45 AM, Timon Gehr wrote:That actually sounds like a pretty awesome idea.I understand that the stack will still have to be scanned conservatively, but how does the scheme deal with closures?
Apr 08 2012
On 4/8/2012 2:21 AM, Timon Gehr wrote:On 04/08/2012 10:45 AM, Timon Gehr wrote:For now, just treat them conservatively.That actually sounds like a pretty awesome idea.I understand that the stack will still have to be scanned conservatively, but how does the scheme deal with closures?
Apr 08 2012
On 4/8/2012 11:21 AM, Timon Gehr wrote:On 04/08/2012 10:45 AM, Timon Gehr wrote:I guess the compiler should generate an (anonymous) struct type corresponding to the closure data layout. There probably has to be a template for compiler generated structs or classes anyway. This new type could also be used as the type of the context pointer, so a debugger could display the closure variables.That actually sounds like a pretty awesome idea.I understand that the stack will still have to be scanned conservatively, but how does the scheme deal with closures?
Apr 08 2012
On 08-04-2012 12:07, Rainer Schuetze wrote:On 4/8/2012 11:21 AM, Timon Gehr wrote:This sounds sensible to me. No reason closure marking can't be precise if the compiler just emits the relevant type info (pretty much any other -- - AlexOn 04/08/2012 10:45 AM, Timon Gehr wrote:I guess the compiler should generate an (anonymous) struct type corresponding to the closure data layout. There probably has to be a template for compiler generated structs or classes anyway. This new type could also be used as the type of the context pointer, so a debugger could display the closure variables.That actually sounds like a pretty awesome idea.I understand that the stack will still have to be scanned conservatively, but how does the scheme deal with closures?
Apr 08 2012
On Sat, 07 Apr 2012 21:56:09 -0400, Walter Bright <newshound2 digitalmars.com> wrote:Of course, many of us have been thinking about this for a looong time, and what is the best way to go about it. The usual technique is for the compiler to emit some sort of table for each TypeInfo giving the layout of the object, i.e. where the pointers are. The general problem with these is the table is non-trivial, as it will require things like iterated data blocks, etc. It has to be compressed to save space, and the gc then has to execute a fair amount of code to decode it. It also requires some significant work on the compiler end, leading of course to complexity, rigidity, development bottlenecks, and the usual bugs. An alternative Andrei and I have been talking about is to put in the TypeInfo a pointer to a function. That function will contain customized code to mark the pointers in an instance of that type. That custom code will be generated by a template defined by the library. All the compiler has to do is stupidly instantiate the template for the type, and insert an address to the generated function. The compiler need know NOTHING about how the marking works. Even better, as ctRegex has demonstrated, the custom generated code can be very, very fast compared with a runtime table-driven approach. (The slow part will be calling the function indirectly.) And best of all, the design is pushed out of the compiler into the library, so various schemes can be tried out without needing compiler work. I think this is an exciting idea, it will enable us to get a precise gc by enabling people to work on it in parallel rather than serially waiting for me.I think this is a really good idea. I would like to go further and propose that there be an arbitrary way to add members to the TypeInfo types using templates. Not sure how it would be implemented, but I don't see why this has to be specific to GCs. Some way to signify "hey compiler, please initialize this member with template X given the type being compiled". This could be a huge bridge between compile-time and runtime type information. -Steve
Apr 09 2012
Le 08/04/2012 03:56, Walter Bright a écrit :Of course, many of us have been thinking about this for a looong time, and what is the best way to go about it. The usual technique is for the compiler to emit some sort of table for each TypeInfo giving the layout of the object, i.e. where the pointers are. The general problem with these is the table is non-trivial, as it will require things like iterated data blocks, etc. It has to be compressed to save space, and the gc then has to execute a fair amount of code to decode it. It also requires some significant work on the compiler end, leading of course to complexity, rigidity, development bottlenecks, and the usual bugs. An alternative Andrei and I have been talking about is to put in the TypeInfo a pointer to a function. That function will contain customized code to mark the pointers in an instance of that type. That custom code will be generated by a template defined by the library. All the compiler has to do is stupidly instantiate the template for the type, and insert an address to the generated function. The compiler need know NOTHING about how the marking works. Even better, as ctRegex has demonstrated, the custom generated code can be very, very fast compared with a runtime table-driven approach. (The slow part will be calling the function indirectly.) And best of all, the design is pushed out of the compiler into the library, so various schemes can be tried out without needing compiler work. I think this is an exciting idea, it will enable us to get a precise gc by enabling people to work on it in parallel rather than serially waiting for me.This id a good idea. However, this doesn't handle type qualifiers. And this is important ! D2 type system is made in such a way that most data are either thread local or immutable, and a small amount is shared. Both thread local storage and immutability are source of BIG improvement for the GC. Doing without it is a huge design error. For instance, Ocaml's GC is known to be more performant than Java's. Because in Caml, most data are immutable, and the GC take advantage of this. Immutability means 100% concurrent garbage collection. In the other hand, TLS can be collected independently and only influence the thread that own the data. Both are every powerfull improvement, and the design you propose « as this » cannot provide any mean to handle that. Which is a big missed opportunity, and will be hard to change in the future.
Apr 09 2012
On 4/9/2012 11:30 AM, deadalnix wrote:In the other hand, TLS can be collected independently and only influence the thread that own the data. Both are every powerfull improvement, and the design you propose « as this » cannot provide any mean to handle that. Which is a big missed opportunity, and will be hard to change in the future.I think this is an orthogonal issue.
Apr 09 2012
Le 09/04/2012 23:27, Walter Bright a écrit :On 4/9/2012 11:30 AM, deadalnix wrote:You mean an allocator/deallocator one ? I'm not sure. For instance, concurrent shared memory scanning will require some magic on reference changes (it can be hooked into the program using page protection). In such a case, you have constraint in what the scanning function can do or can't. If the function is scanning immutable data, such a constraint disappears. In a similar way, when scanning TLS, you'll want to avoid going into non TLS world. This is currently possible only of you go back to main GC code and trigger the indirect call every single time you encounter a pointer or a reference. This is going to be a performance killer on many architecture. So this code, in a way or another will need to be aware of the qualifier. Or it will either require to pass every single pointer/reference into an indirect function call, or forget about optimizations that the type system has been made to allow (in the program in general, not especially in the GC).In the other hand, TLS can be collected independently and only influence the thread that own the data. Both are every powerfull improvement, and the design you propose « as this » cannot provide any mean to handle that. Which is a big missed opportunity, and will be hard to change in the future.I think this is an orthogonal issue.
Apr 10 2012