digitalmars.D - GC Precision
- dsimcha (31/31) Oct 26 2009 I just realized last night that D's templates are probably powerful enou...
- Sean Kelly (2/12) Oct 26 2009 I've thought about it, but not done anything about it. The compiler doe...
- dsimcha (5/17) Oct 26 2009 provide this information, so precise scanning would require a user-level...
- Sean Kelly (2/20) Oct 26 2009 Nope.
- Andrei Alexandrescu (5/27) Oct 26 2009 One question is, is there enough information for stack variables? My
- dsimcha (16/43) Oct 26 2009 That's why I said precise *heap* scanning. This would solve probably 90...
- Andrei Alexandrescu (3/47) Oct 26 2009 Absolutely! I think that's great work. Thanks for clarifying things for ...
- dsimcha (49/49) Oct 26 2009 I've spent some free brain cycles today thinking about this and here's t...
- Leandro Lucarella (35/90) Oct 26 2009 Did you read the thread I posted? What do you think about Fritz's idea o...
- dsimcha (23/83) Oct 26 2009 As far as I can tell, RTTI doesn't have all the stuff that's needed yet ...
- Leandro Lucarella (28/67) Oct 27 2009 I guess you don't. You have a bin size, so you use the bin size. I guess
- Leandro Lucarella (18/45) Oct 26 2009 There is some discussion about precise stack in the thread I cited too.
- Leandro Lucarella (21/57) Oct 26 2009 Maybe you're talking about me. I didn't have the chance to play with thi...
- bearophile (12/15) Oct 29 2009 I agree that here doing something simple now is better than doing nothin...
- Jacob Carlborg (3/18) Oct 29 2009 The current implementation of toHash in Object does that: return
- bearophile (4/6) Oct 29 2009 I agree, such things will have to change when D wants a moving GC.
I just realized last night that D's templates are probably powerful enough now to generate bit masks that can be used for precise GC heap scanning. I'm halfway (emphasis on halfway) thinking of using this to try to hack the GC and make heap scanning fully precise except for the corner case of unions. However, this ties into several things that others in the D community are doing, so I want to gauge people's responses and make sure I'm not wasting effort on something that will be useless in 6 months. 1. Sean, Leonardo, whoever else may be working on GC implementations, have you by any chance broken ground on precise heap scanning already? 2. Andrei, Walter, how close are we to actually eliminating new from the language? If all allocations were done by either calling GC.malloc() or using templates that call GC.malloc(), then things would get a lot simpler than if I were to have to hack the compiler to make new pass type info to the GC. 3. I'm thinking bit masks could be stored as follows: When getBitMask!(T) is instantiated, it generates an immutable size_t[N]. Element 0 is the size of the array (to allow for storing only the ptr in the GC), element 1 is the size of one instance of the object, in bytes. The size of the memory block must be a multiple of this. Elements 2..$ are all of the offsets that should be scanned for pointers. For example: struct Foo { uint bar; void* baz; } getBitMask!(Foo); // [3, 8, 4]. That leaves the problem of where/how to store the pointers to this information in the GC efficiently. I haven't gotten that far yet, but I remember some concerns have been raised in the past about storing 4 bytes per GC object for a pointer to the bitmask. For my use cases, where I tend to allocate a relatively small number of relatively large objects, this isn't a problem. However, in a heavily OO environment, where people allocate tons of tiny objects, it might be.
Oct 26 2009
dsimcha Wrote:I just realized last night that D's templates are probably powerful enough now to generate bit masks that can be used for precise GC heap scanning. I'm halfway (emphasis on halfway) thinking of using this to try to hack the GC and make heap scanning fully precise except for the corner case of unions. However, this ties into several things that others in the D community are doing, so I want to gauge people's responses and make sure I'm not wasting effort on something that will be useless in 6 months. 1. Sean, Leonardo, whoever else may be working on GC implementations, have you by any chance broken ground on precise heap scanning already?I've thought about it, but not done anything about it. The compiler doesn't provide this information, so precise scanning would require a user-level call. You'll also have to deal with arrays of structs, by the way.
Oct 26 2009
== Quote from Sean Kelly (sean invisibleduck.org)'s articledsimcha Wrote:provide this information, so precise scanning would require a user-level call. You'll also have to deal with arrays of structs, by the way. Arrays of structs are easy: Generate a bitmask for one element, and keep reusing that bitmask until the end of the block. Am I missing something?I just realized last night that D's templates are probably powerful enough now to generate bit masks that can be used for precise GC heap scanning. I'm halfway (emphasis on halfway) thinking of using this to try to hack the GC and make heap scanning fully precise except for the corner case of unions. However, this ties into several things that others in the D community are doing, so I want to gauge people's responses and make sure I'm not wasting effort on something that will be useless in 6 months. 1. Sean, Leonardo, whoever else may be working on GC implementations, have you by any chance broken ground on precise heap scanning already?I've thought about it, but not done anything about it. The compiler doesn't
Oct 26 2009
dsimcha Wrote:== Quote from Sean Kelly (sean invisibleduck.org)'s articleNope.dsimcha Wrote:provide this information, so precise scanning would require a user-level call. You'll also have to deal with arrays of structs, by the way. Arrays of structs are easy: Generate a bitmask for one element, and keep reusing that bitmask until the end of the block. Am I missing something?I just realized last night that D's templates are probably powerful enough now to generate bit masks that can be used for precise GC heap scanning. I'm halfway (emphasis on halfway) thinking of using this to try to hack the GC and make heap scanning fully precise except for the corner case of unions. However, this ties into several things that others in the D community are doing, so I want to gauge people's responses and make sure I'm not wasting effort on something that will be useless in 6 months. 1. Sean, Leonardo, whoever else may be working on GC implementations, have you by any chance broken ground on precise heap scanning already?I've thought about it, but not done anything about it. The compiler doesn't
Oct 26 2009
Sean Kelly wrote:dsimcha Wrote:One question is, is there enough information for stack variables? My understanding from a while ago was that heap data could be reasonably analyzed, but stack data has no info associated with it. Andrei== Quote from Sean Kelly (sean invisibleduck.org)'s articleNope.dsimcha Wrote:provide this information, so precise scanning would require a user-level call. You'll also have to deal with arrays of structs, by the way. Arrays of structs are easy: Generate a bitmask for one element, and keep reusing that bitmask until the end of the block. Am I missing something?I just realized last night that D's templates are probably powerful enough now to generate bit masks that can be used for precise GC heap scanning. I'm halfway (emphasis on halfway) thinking of using this to try to hack the GC and make heap scanning fully precise except for the corner case of unions. However, this ties into several things that others in the D community are doing, so I want to gauge people's responses and make sure I'm not wasting effort on something that will be useless in 6 months. 1. Sean, Leonardo, whoever else may be working on GC implementations, have you by any chance broken ground on precise heap scanning already?I've thought about it, but not done anything about it. The compiler doesn't
Oct 26 2009
== Quote from Andrei Alexandrescu (SeeWebsiteForEmail erdani.org)'s articleSean Kelly wrote:That's why I said precise *heap* scanning. This would solve probably 90+% of the problem w/ false pointers without requiring any changes with major ripple effects, i.e. only druntime, not the compiler, would need to be hacked. Admittedly, though, unless you came up w/ some pinning scheme for stack variables and unions, it still wouldn't allow a moving GC. I personally am much more interested in a decent solution to our GC woes now than a perfect one at some point indefinitely far into the future. Right now, when working with programs that use more than maybe 100-200 MB of memory, false pointers become such a problem that the GC is almost useless, yet all kinds of library code still uses the GC heap, which is why I resisted the idea of removing GC.free() so strongly. As I see it, the biggest problem is false pointers, with the fact that every allocation requires a lock in a close second. These are the low-hanging fruit. A moving GC, one that doesn't stop the world on collection, and one that's fully precise including stack would be nice, but they're several orders of magnitude less important and would also have more ripple effects.dsimcha Wrote:One question is, is there enough information for stack variables? My understanding from a while ago was that heap data could be reasonably analyzed, but stack data has no info associated with it. Andrei== Quote from Sean Kelly (sean invisibleduck.org)'s articleNope.dsimcha Wrote:provide this information, so precise scanning would require a user-level call. You'll also have to deal with arrays of structs, by the way. Arrays of structs are easy: Generate a bitmask for one element, and keep reusing that bitmask until the end of the block. Am I missing something?I just realized last night that D's templates are probably powerful enough now to generate bit masks that can be used for precise GC heap scanning. I'm halfway (emphasis on halfway) thinking of using this to try to hack the GC and make heap scanning fully precise except for the corner case of unions. However, this ties into several things that others in the D community are doing, so I want to gauge people's responses and make sure I'm not wasting effort on something that will be useless in 6 months. 1. Sean, Leonardo, whoever else may be working on GC implementations, have you by any chance broken ground on precise heap scanning already?I've thought about it, but not done anything about it. The compiler doesn't
Oct 26 2009
dsimcha wrote:== Quote from Andrei Alexandrescu (SeeWebsiteForEmail erdani.org)'s articleAbsolutely! I think that's great work. Thanks for clarifying things for me. AndreiSean Kelly wrote:That's why I said precise *heap* scanning. This would solve probably 90+% of the problem w/ false pointers without requiring any changes with major ripple effects, i.e. only druntime, not the compiler, would need to be hacked. Admittedly, though, unless you came up w/ some pinning scheme for stack variables and unions, it still wouldn't allow a moving GC. I personally am much more interested in a decent solution to our GC woes now than a perfect one at some point indefinitely far into the future. Right now, when working with programs that use more than maybe 100-200 MB of memory, false pointers become such a problem that the GC is almost useless, yet all kinds of library code still uses the GC heap, which is why I resisted the idea of removing GC.free() so strongly. As I see it, the biggest problem is false pointers, with the fact that every allocation requires a lock in a close second. These are the low-hanging fruit. A moving GC, one that doesn't stop the world on collection, and one that's fully precise including stack would be nice, but they're several orders of magnitude less important and would also have more ripple effects.dsimcha Wrote:One question is, is there enough information for stack variables? My understanding from a while ago was that heap data could be reasonably analyzed, but stack data has no info associated with it. Andrei== Quote from Sean Kelly (sean invisibleduck.org)'s articleNope.dsimcha Wrote:provide this information, so precise scanning would require a user-level call. You'll also have to deal with arrays of structs, by the way. Arrays of structs are easy: Generate a bitmask for one element, and keep reusing that bitmask until the end of the block. Am I missing something?I just realized last night that D's templates are probably powerful enough now to generate bit masks that can be used for precise GC heap scanning. I'm halfway (emphasis on halfway) thinking of using this to try to hack the GC and make heap scanning fully precise except for the corner case of unions. However, this ties into several things that others in the D community are doing, so I want to gauge people's responses and make sure I'm not wasting effort on something that will be useless in 6 months. 1. Sean, Leonardo, whoever else may be working on GC implementations, have you by any chance broken ground on precise heap scanning already?I've thought about it, but not done anything about it. The compiler doesn't
Oct 26 2009
I've spent some free brain cycles today thinking about this and here's the scheme I have in mind. If anyone thinks this could be improved in a way that would not have substantial ripple effects throughout the compiler/language (because then it might never actually get implemented) let me know. 1. GC.malloc's signature changes to GC.malloc(size_t size, uint ba = 0, size_t* bitmask = null). A null bitmask means use the old-school behavior and either scan everything or don't scan anything based on the NO_SCAN bit. IMHO plain old conservative scanning must be supported to allow for untyped memory blocks to be allocated, because requiring every memory block to have a type associated with it is an unacceptable limitation in a systems language. For now, the only way to get precise scanning would be to call malloc directly or use a template that does. Eventually new would either be fixed to instantiate the bitMask!(T) template or eliminated entirely. Since using the precise scanning by writing a template that calls GC.malloc() is a lot easier than hacking the compiler, this may be a catalyst for getting rid of new. 2. Some concern has been expressed in the past about the possibility of using 4 bytes per block in overhead to store pointers to bitmasks. IMHO this concern is misplaced because in any program that takes up enough memory for space efficiency to matter, false pointers waste more space. Furthermore, if you're programming for a toaster oven, elevator controller, etc., you probably aren't going to use the standard out-of-the-box GC anyhow. However, I've thought of ways to mitigate this and have come up with the following: A. Store a pointer to the bitmask iff the NO_SCAN bit is set. this is a no-brainer and will prevent any overhead on, for example, arrays of floats. B. Add another attributes bit to the GC called something like NEEDS_BITMASK. This bit would be set iff an object mixes pointers and non-pointers. If it's not set, no bitmask pointer would be stored. However, the overhead of an extra bit may or may not be worth it. 3. The bitmask pointer would be stored at the end of every GC-allocated block for which a bitmask pointer is stored. The reason for using the end of the block instead of the beginning is just implementation simplicity: That way, finding the beginning of a block would work the same whether or not we have a bitmask pointer. 4. The bitmask would be a size_t[] created at compile time by a template and stored in the static data segment. Its layout would be [length of array, T.sizeof, offsets that need to be scanned]. For example, if you have something like: struct Foo { uint bar; void* ptr; } On a 32-bit machine, bitMask!Foo would be [3, 8, 4]. On a 64-bit, it would be [3, 16, 8]. The reason the size of the array is stored in the array is so that we can get away with storing a single ptr in each memory block instead of a pointer and a length. 5. To store information about pinning, we simply use the high-order bits of the pointer offsets. 1 means pinned, 0 means not pinned. This means that, for any type T, T.sizeof can't be bigger than size_t.max / 2. I think this is a fairly minor limitation.
Oct 26 2009
dsimcha, el 26 de octubre a las 23:05 me escribiste:I've spent some free brain cycles today thinking about this and here's the scheme I have in mind. If anyone thinks this could be improved in a way that would not have substantial ripple effects throughout the compiler/language (because then it might never actually get implemented) let me know. 1. GC.malloc's signature changes to GC.malloc(size_t size, uint ba = 0, size_t* bitmask = null). A null bitmask means use the old-school behavior and either scan everything or don't scan anything based on the NO_SCAN bit. IMHO plain old conservative scanning must be supported to allow for untyped memory blocks to be allocated, because requiring every memory block to have a type associated with it is an unacceptable limitation in a systems language. For now, the only way to get precise scanning would be to call malloc directly or use a template that does. Eventually new would either be fixed to instantiate the bitMask!(T) template or eliminated entirely. Since using the precise scanning by writing a template that calls GC.malloc() is a lot easier than hacking the compiler, this may be a catalyst for getting rid of new.Did you read the thread I posted? What do you think about Fritz's idea on how to encode the pointers information? I'm not very familiar with TypeInfo, but wouldn't be more natural to pass the TypeInfo to GC.malloc() directly if it can get the pointer information itself instead of translating that information? I think if that's possible it will keep the GC interface simpler.2. Some concern has been expressed in the past about the possibility of using 4 bytes per block in overhead to store pointers to bitmasks. IMHO this concern is misplaced because in any program that takes up enough memory for space efficiency to matter, false pointers waste more space. Furthermore, if you're programming for a toaster oven, elevator controller, etc., you probably aren't going to use the standard out-of-the-box GC anyhow.This seems reasonable, but I don't see why we can't use a more efficient way to store this information in the GC. Implementation simplicity (to have a better GC now instead of a perfect GC in some point in a far future) is a good enough reason :) I'm just curious if you found any flaws in the scheme proposed by Frits or is just you want a simpler implementation.However, I've thought of ways to mitigate this and have come up with the following: A. Store a pointer to the bitmask iff the NO_SCAN bit is set. this is a no-brainer and will prevent any overhead on, for example, arrays of floats.Good idea.B. Add another attributes bit to the GC called something like NEEDS_BITMASK. This bit would be set iff an object mixes pointers and non-pointers. If it's not set, no bitmask pointer would be stored. However, the overhead of an extra bit may or may not be worth it.You mean for the special case where all the attributes of an object are pointers? I think this should be rare enough, so I doubt about the utility, but maybe I'm missing some common case.3. The bitmask pointer would be stored at the end of every GC-allocated block for which a bitmask pointer is stored. The reason for using the end of the block instead of the beginning is just implementation simplicity: That way, finding the beginning of a block would work the same whether or not we have a bitmask pointer.Did you even consider storing this information outside the memory block (like the flags)? I think storing them in the memory block can be annoying if they are not stored always because now your fixed sized blocks are not fixed. It might be very easy to overcome this, but maybe thinking about the other option is worthy.4. The bitmask would be a size_t[] created at compile time by a template and stored in the static data segment. Its layout would be [length of array, T.sizeof, offsets that need to be scanned]. For example, if you have something like:Again, I wonder if this information can't be still obtained from the TypeInfo.struct Foo { uint bar; void* ptr; } On a 32-bit machine, bitMask!Foo would be [3, 8, 4]. On a 64-bit, it would be [3, 16, 8]. The reason the size of the array is stored in the array is so that we can get away with storing a single ptr in each memory block instead of a pointer and a length. 5. To store information about pinning, we simply use the high-order bits of the pointer offsets. 1 means pinned, 0 means not pinned. This means that, for any type T, T.sizeof can't be bigger than size_t.max / 2. I think this is a fairly minor limitation.I like from Frits's proposal that information about weak pointer were added too, this might fix another big missing in the memory management area in D. -- Leandro Lucarella (AKA luca) http://llucax.com.ar/ ---------------------------------------------------------------------- GPG Key: 5F5A8D05 (F8CD F9A7 BF00 5431 4145 104C 949E BFB6 5F5A 8D05) ---------------------------------------------------------------------- Agitamos toda la zona de la paleta al horno, vemos que una luz nos atraviesa toda la zona intestinal... -- Sidharta Kiwi
Oct 26 2009
== Quote from Leandro Lucarella (llucax gmail.com)'s articledsimcha, el 26 de octubre a las 23:05 me escribiste:As far as I can tell, RTTI doesn't have all the stuff that's needed yet for structs and adding it would require hacking the compiler. Frankly, I want this to be simple enough to actually get implemented as opposed to just being talked about and deadlocking on everyone waiting on everyone else to do something.I've spent some free brain cycles today thinking about this and here's the scheme I have in mind. If anyone thinks this could be improved in a way that would not have substantial ripple effects throughout the compiler/language (because then it might never actually get implemented) let me know. 1. GC.malloc's signature changes to GC.malloc(size_t size, uint ba = 0, size_t* bitmask = null). A null bitmask means use the old-school behavior and either scan everything or don't scan anything based on the NO_SCAN bit. IMHO plain old conservative scanning must be supported to allow for untyped memory blocks to be allocated, because requiring every memory block to have a type associated with it is an unacceptable limitation in a systems language. For now, the only way to get precise scanning would be to call malloc directly or use a template that does. Eventually new would either be fixed to instantiate the bitMask!(T) template or eliminated entirely. Since using the precise scanning by writing a template that calls GC.malloc() is a lot easier than hacking the compiler, this may be a catalyst for getting rid of new.Did you read the thread I posted? What do you think about Fritz's idea on how to encode the pointers information? I'm not very familiar with TypeInfo, but wouldn't be more natural to pass the TypeInfo to GC.malloc() directly if it can get the pointer information itself instead of translating that information? I think if that's possible it will keep the GC interface simpler.Two things: Implementation simplicity is one. As I've been alluding to, worse and works in practice is better than better and only exists on paper. The other is that I don't understand, in Frits's approach, how do you store the size of the object so you know how many bits to interpret as the mask?2. Some concern has been expressed in the past about the possibility of using 4 bytes per block in overhead to store pointers to bitmasks. IMHO this concern is misplaced because in any program that takes up enough memory for space efficiency to matter, false pointers waste more space. Furthermore, if you're programming for a toaster oven, elevator controller, etc., you probably aren't going to use the standard out-of-the-box GC anyhow.This seems reasonable, but I don't see why we can't use a more efficient way to store this information in the GC. Implementation simplicity (to have a better GC now instead of a perfect GC in some point in a far future) is a good enough reason :) I'm just curious if you found any flaws in the scheme proposed by Frits or is just you want a simpler implementation.You're probably right. The only common one I can think of is arrays of classes. For an array, the 4 bytes of overhead is usually negligible.However, I've thought of ways to mitigate this and have come up with the following: A. Store a pointer to the bitmask iff the NO_SCAN bit is set. this is a no-brainer and will prevent any overhead on, for example, arrays of floats.Good idea.B. Add another attributes bit to the GC called something like NEEDS_BITMASK. This bit would be set iff an object mixes pointers and non-pointers. If it's not set, no bitmask pointer would be stored. However, the overhead of an extra bit may or may not be worth it.You mean for the special case where all the attributes of an object are pointers? I think this should be rare enough, so I doubt about the utility, but maybe I'm missing some common case.The flags in the GC are designed to store single bits apparently. Also, as far as weak refs, we could use another high-order bit for that. I don't think anyone cares if (on 32-bit) T.sizeof is limited to about a gigabyte. The question is, how many high-order bits do we reserve? BTW, I'm not going to actually implement weak references unless I see an easy way to do it, we're only talking about reserving bits here. I guess the bottom line is that IMHO precise heap scanning is really low-hanging fruit where the bang for the buck would be huge. Improving the GC in other ways raises some nasty issues, but in this case I really think the perfect is the enemy of the good. I'm not guaranteeing anything, but I think I might be able to have precise heap scanning up and running in a release or two if I really keep it simple and stupid.3. The bitmask pointer would be stored at the end of every GC-allocated block for which a bitmask pointer is stored. The reason for using the end of the block instead of the beginning is just implementation simplicity: That way, finding the beginning of a block would work the same whether or not we have a bitmask pointer.Did you even consider storing this information outside the memory block (like the flags)? I think storing them in the memory block can be annoying if they are not stored always because now your fixed sized blocks are not fixed. It might be very easy to overcome this, but maybe thinking about the other option is worthy.
Oct 26 2009
dsimcha, el 27 de octubre a las 01:34 me escribiste:Fair enough.Two things: Implementation simplicity is one. As I've been alluding to, worse and works in practice is better than better and only exists on paper. The other2. Some concern has been expressed in the past about the possibility of using 4 bytes per block in overhead to store pointers to bitmasks. IMHO this concern is misplaced because in any program that takes up enough memory for space efficiency to matter, false pointers waste more space. Furthermore, if you're programming for a toaster oven, elevator controller, etc., you probably aren't going to use the standard out-of-the-box GC anyhow.This seems reasonable, but I don't see why we can't use a more efficient way to store this information in the GC. Implementation simplicity (to have a better GC now instead of a perfect GC in some point in a far future) is a good enough reason :) I'm just curious if you found any flaws in the scheme proposed by Frits or is just you want a simpler implementation.is that I don't understand, in Frits's approach, how do you store the size of the object so you know how many bits to interpret as the mask?I guess you don't. You have a bin size, so you use the bin size. I guess you just mark unused words as being non-pointers, so they wont get scanned. ... I guess that doesn't work for arrays though. I remember some discussion about how to handle arrays, but I can't find it now...I think at first sight it sounds reasonable, but I'm a little affraid of it; it can sound like "640K ought to be enough for anybody" ;) I know it almost impossible to have a T.sizeof larger than 1GiB, but what if you hit that? I guess since you won't be creating more than one or two objects of that size (3 at most), I guess is pretty reasonable to say "manage that memory yourself".The flags in the GC are designed to store single bits apparently. Also, as far as weak refs, we could use another high-order bit for that. I don't think anyone cares if (on 32-bit) T.sizeof is limited to about a gigabyte.3. The bitmask pointer would be stored at the end of every GC-allocated block for which a bitmask pointer is stored. The reason for using the end of the block instead of the beginning is just implementation simplicity: That way, finding the beginning of a block would work the same whether or not we have a bitmask pointer.Did you even consider storing this information outside the memory block (like the flags)? I think storing them in the memory block can be annoying if they are not stored always because now your fixed sized blocks are not fixed. It might be very easy to overcome this, but maybe thinking about the other option is worthy.The question is, how many high-order bits do we reserve? BTW, I'm not going to actually implement weak references unless I see an easy way to do it, we're only talking about reserving bits here.I wasn't talking about bitmaps, I was talking more generally. Even if you want to store a pointer to the array describing the offsets of the pointers, you could store that pointers in a separated memory block (not in the block reserved for the object itself). Again, I didn't thought about it, it was just something that popped in my mind. I guess is a little irrelevant just now, it's something that should be moderately easy to change in the future if it's proven to be better to store it elsewhere.I guess the bottom line is that IMHO precise heap scanning is really low-hanging fruit where the bang for the buck would be huge. Improving the GC in other ways raises some nasty issues, but in this case I really think the perfect is the enemy of the good. I'm not guaranteeing anything, but I think I might be able to have precise heap scanning up and running in a release or two if I really keep it simple and stupid.Agreed. I would like to know very much how things goes. -- Leandro Lucarella (AKA luca) http://llucax.com.ar/ ---------------------------------------------------------------------- GPG Key: 5F5A8D05 (F8CD F9A7 BF00 5431 4145 104C 949E BFB6 5F5A 8D05) ---------------------------------------------------------------------- Just because you're paranoid, don't mean they're not after you.
Oct 27 2009
Andrei Alexandrescu, el 26 de octubre a las 11:01 me escribiste:Sean Kelly wrote:There is some discussion about precise stack in the thread I cited too. The stack can be also precise adding a shadow stack (like a TypeInfo for the stack) but it's costly and it needs compiler support (LLVM has some mechanisms to generate this stack information AFAIK but I never played with it). The problem with the stack is that you still have to interact with C, which has no stack information, so it's a little more controversial about how much the extra complexity will pay off. I agree with David that it's much more reasonable to make the heap precise first and then see how thing are going from that. -- Leandro Lucarella (AKA luca) http://llucax.com.ar/ ---------------------------------------------------------------------- GPG Key: 5F5A8D05 (F8CD F9A7 BF00 5431 4145 104C 949E BFB6 5F5A 8D05) ---------------------------------------------------------------------- hypocrite opportunist don't infect me with your poisondsimcha Wrote:One question is, is there enough information for stack variables? My understanding from a while ago was that heap data could be reasonably analyzed, but stack data has no info associated with it.== Quote from Sean Kelly (sean invisibleduck.org)'s articleNope.dsimcha Wrote:provide this information, so precise scanning would require a user-level call. You'll also have to deal with arrays of structs, by the way. Arrays of structs are easy: Generate a bitmask for one element, and keep reusing that bitmask until the end of the block. Am I missing something?I just realized last night that D's templates are probably powerful enough now to generate bit masks that can be used for precise GC heap scanning. I'm halfway (emphasis on halfway) thinking of using this to try to hack the GC and make heap scanning fully precise except for the corner case of unions. However, this ties into several things that others in the D community are doing, so I want to gauge people's responses and make sure I'm not wasting effort on something that will be useless in 6 months. 1. Sean, Leonardo, whoever else may be working on GC implementations, have you by any chance broken ground on precise heap scanning already?I've thought about it, but not done anything about it. The compiler doesn't
Oct 26 2009
dsimcha, el 26 de octubre a las 13:08 me escribiste:I just realized last night that D's templates are probably powerful enough now to generate bit masks that can be used for precise GC heap scanning. I'm halfway (emphasis on halfway) thinking of using this to try to hack the GC and make heap scanning fully precise except for the corner case of unions. However, this ties into several things that others in the D community are doing, so I want to gauge people's responses and make sure I'm not wasting effort on something that will be useless in 6 months. 1. Sean, Leonardo, whoever else may be working on GC implementations, have you by any chance broken ground on precise heap scanning already?Maybe you're talking about me. I didn't have the chance to play with this yet. My main goal is to make the collect run concurrently with the mutator, but I have been a little busy lately so I didn't make many advances yet. I will like to play with adding preciseness to the GC too if I have the time.2. Andrei, Walter, how close are we to actually eliminating new from the language? If all allocations were done by either calling GC.malloc() or using templates that call GC.malloc(), then things would get a lot simpler than if I were to have to hack the compiler to make new pass type info to the GC.The runtime is already receiving the type information on the allocated object when new is used AFAIK, but this information is not propagated to gc_malloc(). So it shouldn't be too hard to add type information to the GC. There was some interesting discussion about this some time ago. http://www.digitalmars.com/d/archives/digitalmars/D/Std_Phobos_2_and_logging_library_87794.html#N878313. I'm thinking bit masks could be stored as follows: When getBitMask!(T) is instantiated, it generates an immutable size_t[N]. Element 0 is the size of the array (to allow for storing only the ptr in the GC), element 1 is the size of one instance of the object, in bytes. The size of the memory block must be a multiple of this. Elements 2..$ are all of the offsets that should be scanned for pointers. For example: struct Foo { uint bar; void* baz; } getBitMask!(Foo); // [3, 8, 4]. That leaves the problem of where/how to store the pointers to this information in the GC efficiently. I haven't gotten that far yet, but I remember some concerns have been raised in the past about storing 4 bytes per GC object for a pointer to the bitmask. For my use cases, where I tend to allocate a relatively small number of relatively large objects, this isn't a problem. However, in a heavily OO environment, where people allocate tons of tiny objects, it might be.In the discussion I mentioned Frits van Bommel proposed a reasonable way to encode the information efficiently: http://www.digitalmars.com/d/archives/digitalmars/D/Std_Phobos_2_and_logging_library_87794.html#N87968 -- Leandro Lucarella (AKA luca) http://llucax.com.ar/ ---------------------------------------------------------------------- GPG Key: 5F5A8D05 (F8CD F9A7 BF00 5431 4145 104C 949E BFB6 5F5A 8D05) ---------------------------------------------------------------------- "The Guinness Book of Records" holds the record for being the most stolen book in public libraries
Oct 26 2009
dsimcha:A moving GC, one that doesn't stop the world on collection, and one that's fully precise including stack would be nice, but they're several orders of magnitude less important and would also have more ripple effects.I agree that here doing something simple now is better than doing nothing or doing something very refined in an unknown future. And in future things may be improved. In D objects are always managed by reference, and I think that most programs don't alter or cast such references to something else (objects allocated on memory specified by the programmer, and scoped objects allocated on the stack may be excluded from this). So I think that it can be safe to move objects, to compact the heap. So you may have 5 memory zones: - C heap. (The type system of the D compiler may see the C-heap pointers and D-heap pointers as two different types, as I have proposed in the past. So you need casts if you want to mix them, and the compiler can use the D moving heap in a safer way). - Pinned D heap for everything can't be moved, like structs managed by pointers (eventually a D compiler can infer at compile time that some structs too may be moved around, because their pointer is used only in clean ways (you don't need to introduce struct references for this)). I think SafeD modules will not use this heap a lot (but they can use unsafe modules that may use pinned objects. Is D safety transitive? I think it is not, so from a SafeD module you can call and use an unsafe module); - Old object generation managed with compaction (I think there's no need for the permanent objects zone in D); - Two "from" and "to" zones for the young generation, that don't use a true compaction strategy, young objects bounce between them; - New generation Eden where new object allocations happen managed as a memory arena. All this has the disadvantage of requiring a more complex GC, and probably requiring 2-4 times more RAM at runtime. It hopefully has the advantage of allowing new programmers, that have learnt Java at university, to program in almost like in Java. (I have found a not-synthetic Java benchmark program that converted to D is something like 18 times slower on LDC. I'll put the code in my site in the following days). Bye, bearophile
Oct 29 2009
On 10/29/09 11:47, bearophile wrote:dsimcha:The current implementation of toHash in Object does that: return cast(hash_t)cast(void*)this;A moving GC, one that doesn't stop the world on collection, and one that's fully precise including stack would be nice, but they're several orders of magnitude less important and would also have more ripple effects.I agree that here doing something simple now is better than doing nothing or doing something very refined in an unknown future. And in future things may be improved. In D objects are always managed by reference, and I think that most programs don't alter or cast such references to something else (objects allocated on memory specified by the programmer, and scoped objects allocated on the stack may be excluded from this). So I think that it can be safe to move objects, to compact the heap.So you may have 5 memory zones: - C heap. (The type system of the D compiler may see the C-heap pointers and D-heap pointers as two different types, as I have proposed in the past. So you need casts if you want to mix them, and the compiler can use the D moving heap in a safer way). - Pinned D heap for everything can't be moved, like structs managed by pointers (eventually a D compiler can infer at compile time that some structs too may be moved around, because their pointer is used only in clean ways (you don't need to introduce struct references for this)). I think SafeD modules will not use this heap a lot (but they can use unsafe modules that may use pinned objects. Is D safety transitive? I think it is not, so from a SafeD module you can call and use an unsafe module); - Old object generation managed with compaction (I think there's no need for the permanent objects zone in D); - Two "from" and "to" zones for the young generation, that don't use a true compaction strategy, young objects bounce between them; - New generation Eden where new object allocations happen managed as a memory arena. All this has the disadvantage of requiring a more complex GC, and probably requiring 2-4 times more RAM at runtime. It hopefully has the advantage of allowing new programmers, that have learnt Java at university, to program in almost like in Java. (I have found a not-synthetic Java benchmark program that converted to D is something like 18 times slower on LDC. I'll put the code in my site in the following days). Bye, bearophile
Oct 29 2009
Jacob Carlborg:The current implementation of toHash in Object does that: return cast(hash_t)cast(void*)this;I agree, such things will have to change when D wants a moving GC. Bye, bearophile
Oct 29 2009