digitalmars.D - std.allocator: false pointers
- Andrei Alexandrescu (24/24) May 02 2014 Hello,
- "Marc =?UTF-8?B?U2Now7x0eiI=?= <schuetzm gmx.net> (7/33) May 02 2014 If destructors are still going to be called (see other thread),
- Andrei Alexandrescu (3/6) May 02 2014 Yah, forgot to mention this trick is only applicable to what I call "the...
- Orvid King via Digitalmars-d (9/33) May 02 2014 Well, in a 64-bit address space, the false pointer issue is almost
- Andrei Alexandrescu (4/11) May 02 2014 Those can be accommodated I think.
- safety0ff (13/20) May 03 2014 False pointers don't only cause memory consumption, you're
- "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= (9/15) May 04 2014 Which means a precise collector, which means faster stack meta
- Steven Schveighoffer (13/36) May 02 2014 False pointers are less of a problem in 64-bit code, but you can run int...
- Steven Schveighoffer (24/27) May 02 2014 Sorry, if it was already *unmarked* (or marked as garbage).
- Andrei Alexandrescu (5/11) May 02 2014 Yah, understood. Unfortunately I just realized that would require either...
- Steven Schveighoffer (4/17) May 02 2014 What is the problem with keeping the bits together?
- Andrei Alexandrescu (3/20) May 02 2014 More implementation (I have a BitVector type but not a KBitsVector!k
- Steven Schveighoffer (7/11) May 02 2014 Given a bitvector type, a 2bitvector type can be implemented on top of i...
- Andrei Alexandrescu (7/18) May 02 2014 If speed is no issue, sure :o). My intuition is that the TwoBitVector
- Steven Schveighoffer (8/29) May 02 2014 Well, you are looking for one bit set (free), or another bit set
- David Gileadi (3/5) May 02 2014 Heh, however it's implemented, TwoBitVector's very name implies that
- Andrei Alexandrescu (4/14) May 02 2014 That's a great idea. Thanks!
Hello, I'm currently doing a nice optimization in the tracing code: I use the same bit (per block) to indicate "this memory is allocated" and "this memory is marked as used during tracing". The way the trick works is, at the beginning of a collection the GC marks all of its memory as deallocated (sic!). Then, it traces through pointers and marks BACK as allocated the memory that's actually used. At the end of tracing, there's no need to do anything - what's used stays used, the rest is free by default. This is unlike more traditional GCs, which use a SEPARATE bit to mean "mark during tracing". At the beginning of the collection, these "mark" bits are set to 0. Then collection proceeds and marks with 1 all blocks that are actually used. As the last step, collection deallocates all blocks that were marked with 0 and were previously allocated. So the optimization consumes less memory and saves one pass. It does have a disadvantage, however. Consider a false pointer. It will claim that a block that's free is actually occupied, and the implementation can't distinguish because it conflates the "mark" and the "allocated" bit together. So it's possible that at the end of collection there are blocks allocated that previously weren't. The optimization is therefore sensitive to false pointers. Thoughts? How bad are false pointers (assuming we fix globals, which only leaves the stack and registers)? Andrei
May 02 2014
On Friday, 2 May 2014 at 15:55:06 UTC, Andrei Alexandrescu wrote:Hello, I'm currently doing a nice optimization in the tracing code: I use the same bit (per block) to indicate "this memory is allocated" and "this memory is marked as used during tracing". The way the trick works is, at the beginning of a collection the GC marks all of its memory as deallocated (sic!). Then, it traces through pointers and marks BACK as allocated the memory that's actually used. At the end of tracing, there's no need to do anything - what's used stays used, the rest is free by default. This is unlike more traditional GCs, which use a SEPARATE bit to mean "mark during tracing". At the beginning of the collection, these "mark" bits are set to 0. Then collection proceeds and marks with 1 all blocks that are actually used. As the last step, collection deallocates all blocks that were marked with 0 and were previously allocated. So the optimization consumes less memory and saves one pass. It does have a disadvantage, however. Consider a false pointer. It will claim that a block that's free is actually occupied, and the implementation can't distinguish because it conflates the "mark" and the "allocated" bit together. So it's possible that at the end of collection there are blocks allocated that previously weren't. The optimization is therefore sensitive to false pointers. Thoughts? How bad are false pointers (assuming we fix globals, which only leaves the stack and registers)?If destructors are still going to be called (see other thread), this trick is dangerous, because the resurrected objects might later on be destroyed again (double free). I'm aware that this is still about untyped allocators, but if they are going to be used as a basis for typed allocators, things like this need to be considered already at this stage.
May 02 2014
On 5/2/14, 10:12 AM, "Marc Schütz" <schuetzm gmx.net>" wrote:If destructors are still going to be called (see other thread), this trick is dangerous, because the resurrected objects might later on be destroyed again (double free).Yah, forgot to mention this trick is only applicable to what I call "the passive heap". Thanks! -- Andrei
May 02 2014
Well, in a 64-bit address space, the false pointer issue is almost mute, the issue comes in when you try to apply this design to 32-bit, where the false pointer issue is more prevelent. Is the volume of memory saved by this really worth it? Another thing to consider is that this makes it impossible to pre-allocate blocks of varying sizes for absurdly fast allocations via atomic linked lists, in most cases literally a single `lock cmpxchg`. On 5/2/14, Andrei Alexandrescu via Digitalmars-d <digitalmars-d puremagic.com> wrote:Hello, I'm currently doing a nice optimization in the tracing code: I use the same bit (per block) to indicate "this memory is allocated" and "this memory is marked as used during tracing". The way the trick works is, at the beginning of a collection the GC marks all of its memory as deallocated (sic!). Then, it traces through pointers and marks BACK as allocated the memory that's actually used. At the end of tracing, there's no need to do anything - what's used stays used, the rest is free by default. This is unlike more traditional GCs, which use a SEPARATE bit to mean "mark during tracing". At the beginning of the collection, these "mark" bits are set to 0. Then collection proceeds and marks with 1 all blocks that are actually used. As the last step, collection deallocates all blocks that were marked with 0 and were previously allocated. So the optimization consumes less memory and saves one pass. It does have a disadvantage, however. Consider a false pointer. It will claim that a block that's free is actually occupied, and the implementation can't distinguish because it conflates the "mark" and the "allocated" bit together. So it's possible that at the end of collection there are blocks allocated that previously weren't. The optimization is therefore sensitive to false pointers. Thoughts? How bad are false pointers (assuming we fix globals, which only leaves the stack and registers)? Andrei
May 02 2014
On 5/2/14, 10:15 AM, Orvid King via Digitalmars-d wrote:Well, in a 64-bit address space, the false pointer issue is almost mute, the issue comes in when you try to apply this design to 32-bit, where the false pointer issue is more prevelent. Is the volume of memory saved by this really worth it?It's the time savings that are most important.Another thing to consider is that this makes it impossible to pre-allocate blocks of varying sizes for absurdly fast allocations via atomic linked lists, in most cases literally a single `lock cmpxchg`.Those can be accommodated I think. Andrei
May 02 2014
On Friday, 2 May 2014 at 17:15:20 UTC, Orvid King via Digitalmars-d wrote:Well, in a 64-bit address space, the false pointer issue is almost mute, the issue comes in when you try to apply this design to 32-bit, where the false pointer issue is more prevelent. Is the volume of memory saved by this really worth it?False pointers don't only cause memory consumption, you're forgetting that the GC will repeatedly scan memory held by false pointers at each collection*. This adds time to each collection and further increases the risk of other false pointers. False pointers can make many reasonable looking D programs behave unexpectedly when run in a 32-bit environment (to the point that I consider that they can "break" programs.) I think false pointers must be addressed to make claims that D is well-behaved on 32-bit systems. * Unless it is marked NO_SCAN of course, but this is not likely the common case.
May 03 2014
On Saturday, 3 May 2014 at 23:41:02 UTC, safety0ff wrote:That assumes that the heap is located at a high address.Well, in a 64-bit address space, the false pointer issue is almost mute, the issue comes in when you try to apply this design to 32-bit,I think false pointers must be addressed to make claims that D is well-behaved on 32-bit systems.Which means a precise collector, which means faster stack meta info access, which means faster exception handling? Then use a single pass mark-and-don't-sweep collector/allocator? (invert the meaning of the bit for every other full collection cycle) A conservative collector will never be acceptable in a language which is primarily GC based. It is something you can bolt on to a RC system to collrct cycles, IMHO.
May 04 2014
On Fri, 02 May 2014 11:55:11 -0400, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:Hello, I'm currently doing a nice optimization in the tracing code: I use the same bit (per block) to indicate "this memory is allocated" and "this memory is marked as used during tracing". The way the trick works is, at the beginning of a collection the GC marks all of its memory as deallocated (sic!). Then, it traces through pointers and marks BACK as allocated the memory that's actually used. At the end of tracing, there's no need to do anything - what's used stays used, the rest is free by default. This is unlike more traditional GCs, which use a SEPARATE bit to mean "mark during tracing". At the beginning of the collection, these "mark" bits are set to 0. Then collection proceeds and marks with 1 all blocks that are actually used. As the last step, collection deallocates all blocks that were marked with 0 and were previously allocated. So the optimization consumes less memory and saves one pass. It does have a disadvantage, however. Consider a false pointer. It will claim that a block that's free is actually occupied, and the implementation can't distinguish because it conflates the "mark" and the "allocated" bit together. So it's possible that at the end of collection there are blocks allocated that previously weren't. The optimization is therefore sensitive to false pointers. Thoughts? How bad are false pointers (assuming we fix globals, which only leaves the stack and registers)?False pointers are less of a problem in 64-bit code, but you can run into worse issues. If you are not zeroing the memory when deallocating, then if it mysteriously comes alive again, it has the ghost of what could be a pointer to other code. Your blocks are more likely to resurrect once one of them resurrects. Why not keep the 3 states, but just treat unmarked blocks as free? Then the next time you go through tracing, change the bit to free if it was already marked. This doesn't save on the bit space, but I think the savings there is minimal anyway. However, it does allow the final pass to be saved. -Steve
May 02 2014
On Fri, 02 May 2014 13:26:41 -0400, Steven Schveighoffer <schveiguy yahoo.com> wrote:Why not keep the 3 states, but just treat unmarked blocks as free? Then the next time you go through tracing, change the bit to free if it was already marked.Sorry, if it was already *unmarked* (or marked as garbage). essentially: enum GCState {free, allocated, garbage} GCState memoryBlocks[]; fullCollect() { foreach(ref st; memoryBlocks) { final switch(st) { case GCState.free: break; case GCState.allocated: st = GCState.garbage; break; case GCState.garbage: st = GCState.free; break; } } ... // run mark/sweep setting garbage to allocated for reachable blocks }
May 02 2014
On 5/2/14, 10:33 AM, Steven Schveighoffer wrote:On Fri, 02 May 2014 13:26:41 -0400, Steven Schveighoffer <schveiguy yahoo.com> wrote:Yah, understood. Unfortunately I just realized that would require either to keep the bits together or to scan two memory areas when trying to allocate, both of which have disadvantages. Well, I guess I'll go with the post-tracing pass. -- AndreiWhy not keep the 3 states, but just treat unmarked blocks as free? Then the next time you go through tracing, change the bit to free if it was already marked.Sorry, if it was already *unmarked* (or marked as garbage).
May 02 2014
On Fri, 02 May 2014 14:00:18 -0400, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:On 5/2/14, 10:33 AM, Steven Schveighoffer wrote:What is the problem with keeping the bits together? -SteveOn Fri, 02 May 2014 13:26:41 -0400, Steven Schveighoffer <schveiguy yahoo.com> wrote:Yah, understood. Unfortunately I just realized that would require either to keep the bits together or to scan two memory areas when trying to allocate, both of which have disadvantages. Well, I guess I'll go with the post-tracing pass. -- AndreiWhy not keep the 3 states, but just treat unmarked blocks as free? Then the next time you go through tracing, change the bit to free if it was already marked.Sorry, if it was already *unmarked* (or marked as garbage).
May 02 2014
On 5/2/14, 11:07 AM, Steven Schveighoffer wrote:On Fri, 02 May 2014 14:00:18 -0400, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:More implementation (I have a BitVector type but not a KBitsVector!k type), and scanning can't be done with fast primitives. -- AndreiOn 5/2/14, 10:33 AM, Steven Schveighoffer wrote:What is the problem with keeping the bits together?On Fri, 02 May 2014 13:26:41 -0400, Steven Schveighoffer <schveiguy yahoo.com> wrote:Yah, understood. Unfortunately I just realized that would require either to keep the bits together or to scan two memory areas when trying to allocate, both of which have disadvantages. Well, I guess I'll go with the post-tracing pass. -- AndreiWhy not keep the 3 states, but just treat unmarked blocks as free? Then the next time you go through tracing, change the bit to free if it was already marked.Sorry, if it was already *unmarked* (or marked as garbage).
May 02 2014
On Fri, 02 May 2014 14:42:52 -0400, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:On 5/2/14, 11:07 AM, Steven Schveighoffer wrote:Given a bitvector type, a 2bitvector type can be implemented on top of it. If one bit is "free", and another is "garbage", you just have to look for any set bits for free blocks. Yes, you have to look through 2x as much memory, but only until you find a free block. -SteveWhat is the problem with keeping the bits together?More implementation (I have a BitVector type but not a KBitsVector!k type), and scanning can't be done with fast primitives. -- Andrei
May 02 2014
On 5/2/14, 11:50 AM, Steven Schveighoffer wrote:On Fri, 02 May 2014 14:42:52 -0400, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:If speed is no issue, sure :o). My intuition is that the TwoBitVector would need certain primitives from BitVector to work well.On 5/2/14, 11:07 AM, Steven Schveighoffer wrote:Given a bitvector type, a 2bitvector type can be implemented on top of it.What is the problem with keeping the bits together?More implementation (I have a BitVector type but not a KBitsVector!k type), and scanning can't be done with fast primitives. -- AndreiIf one bit is "free", and another is "garbage", you just have to look for any set bits for free blocks. Yes, you have to look through 2x as much memory, but only until you find a free block.Hmm, so if "garbage" is 0, then to allocate we'd need to scan for a "hole" of contiguous zeros (cheap) instead of a checkered pattern (expensive). I'll think about it. Andrei
May 02 2014
On Fri, 02 May 2014 14:56:00 -0400, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:On 5/2/14, 11:50 AM, Steven Schveighoffer wrote:Well, you are looking for one bit set (free), or another bit set (garbage). So the pattern may not be uniform. What you probably want to do is to store the bits close together, but probably not *right* together. That way you can use logic-or to search for bits. -SteveOn Fri, 02 May 2014 14:42:52 -0400, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:If speed is no issue, sure :o). My intuition is that the TwoBitVector would need certain primitives from BitVector to work well.On 5/2/14, 11:07 AM, Steven Schveighoffer wrote:Given a bitvector type, a 2bitvector type can be implemented on top of it.What is the problem with keeping the bits together?More implementation (I have a BitVector type but not a KBitsVector!k type), and scanning can't be done with fast primitives. -- AndreiIf one bit is "free", and another is "garbage", you just have to look for any set bits for free blocks. Yes, you have to look through 2x as much memory, but only until you find a free block.Hmm, so if "garbage" is 0, then to allocate we'd need to scan for a "hole" of contiguous zeros (cheap) instead of a checkered pattern (expensive). I'll think about it.
May 02 2014
On 5/2/14, 11:56 AM, Andrei Alexandrescu wrote:If speed is no issue, sure :o). My intuition is that the TwoBitVector would need certain primitives from BitVector to work well.Heh, however it's implemented, TwoBitVector's very name implies that it's cheap to use ;)
May 02 2014
On 5/2/14, 10:26 AM, Steven Schveighoffer wrote:False pointers are less of a problem in 64-bit code, but you can run into worse issues. If you are not zeroing the memory when deallocating, then if it mysteriously comes alive again, it has the ghost of what could be a pointer to other code. Your blocks are more likely to resurrect once one of them resurrects.Good point.Why not keep the 3 states, but just treat unmarked blocks as free? Then the next time you go through tracing, change the bit to free if it was already marked. This doesn't save on the bit space, but I think the savings there is minimal anyway. However, it does allow the final pass to be saved.That's a great idea. Thanks! Andrei
May 02 2014