digitalmars.D - gc and Object.toHash()
- Brian Hammond (13/13) May 17 2004 From http://www.digitalmars.com/d/garbage.html
-
Stewart Gordon
(13/16)
May 18 2004
- Brian Hammond (17/30) May 18 2004 In my case no.. I was simply using Object's toHash() as a convenient way...
- Norbert Nemec (8/13) May 18 2004 B.t.w: is this "moving around" described more clearly anywhere in the
- =?iso-8859-1?Q?Robert_M._M=FCnch?= (12/21) May 18 2004 The current GC implementation doesn't support it.
- Walter (6/14) May 23 2004 Yes.
- Norbert Nemec (10/29) May 23 2004 What's the core idea behind it? I don't doubt that it can be done someho...
- Ben Hinkle (9/40) May 23 2004 Here's my guess:
- Brian Hammond (9/49) May 23 2004 So Walter, if I understand correctly, the current Object.toHash() implem...
- Walter (9/17) May 25 2004 implementation
- Norbert Nemec (13/21) May 24 2004 I don't agree: the question should not be what kind of data you move, bu...
- Ben Hinkle (8/32) May 24 2004 Agreed. I was thinking any ambiguous object reference from outside the
- Kevin Bealer (22/54) May 24 2004 Removing or adding such a pointer is simple, if the list is changed to a...
- Norbert Nemec (14/20) May 24 2004 That might really be an idea! Object references on the stack usually eit...
- Kevin Bealer (26/46) May 25 2004 Other ideas:
- Walter (8/19) May 23 2004 collector
From http://www.digitalmars.com/d/garbage.html "Using pointer values to compute a hash function. A copying garbage collector can arbitrarily move objects around in memory, thus invalidating the computed hash value." Why then does Object's toHash() method default implementation do exactly that? uint toHash() { return (uint)(void *)this; } I was relying on the default toHash() for the value of a key in an associative array. I guess I should provide my own hash function then? Thanks! Brian
May 17 2004
Brian Hammond <d at brianhammond dot com> wrote: <snip>Why then does Object's toHash() method default implementation do exactly that?<snip>I was relying on the default toHash() for the value of a key in an associative array. I guess I should provide my own hash function then?Always a good idea. Especially if your class has a sense of equality over and above that of being the very same object. But you're right, Object.toHash is being a bit naughty there. But is there a better idea? Maybe it's only meant to be a makeshift, but still.... Stewart. -- My e-mail is valid but not my primary mailbox, aside from its being the unfortunate victim of intensive mail-bombing at the moment. Please keep replies on the 'group where everyone may benefit.
May 18 2004
In my case no.. I was simply using Object's toHash() as a convenient way to have fast lookups (via an associative array) for collections of basic object types like a delegate. A contrived example: a class that allows delegates to be registered/unregistered as observers. alias void delegate(subject) obs_dg; class subject { void register(obs_dg dg) { observers_[dg.toHash()] = dg; } void unregister(obs_dg dg) { delete observers_[dg.toHash()]; } //... private obs_dg[uint] observers_; } The key can change due to the GC however so I'll have to use something else like a straight array and have O(n) lookups. For small n, who cares really. Thanks, BrianI was relying on the default toHash() for the value of a key in an associative array. I guess I should provide my own hash function then?Always a good idea. Especially if your class has a sense of equality over and above that of being the very same object.But you're right, Object.toHash is being a bit naughty there. But is there a better idea? Maybe it's only meant to be a makeshift, but still.... Stewart. -- My e-mail is valid but not my primary mailbox, aside from its being the unfortunate victim of intensive mail-bombing at the moment. Please keep replies on the 'group where everyone may benefit.
May 18 2004
Brian Hammond wrote:From http://www.digitalmars.com/d/garbage.html "Using pointer values to compute a hash function. A copying garbage collector can arbitrarily move objects around in memory, thus invalidating the computed hash value."B.t.w: is this "moving around" described more clearly anywhere in the documentation? What exactly happens if the GC moves an object that was allocated by a call to "new"? Does it adjust all references pointing anywhere into that object? I would assume that this takes quite a bit of intelligence, definitely more than the Boehm GC has. Is there actually any GC out there, that does move objects for defragmentation or some other purpose?
May 18 2004
On Tue, 18 May 2004 16:53:43 +0200, Norbert Nemec <Norbert.Nemec gmx.de> wrote:B.t.w: is this "moving around" described more clearly anywhere in the documentation?The current GC implementation doesn't support it.What exactly happens if the GC moves an object that was allocated by a call to "new"? Does it adjust all references pointing anywhere into that object?Yes, that's how it works.I would assume that this takes quite a bit of intelligence, definitely more than the Boehm GC has. Is there actually any GC out there, that does move objects for defragmentation or some other purpose?Walter, once mentioned that he had written such a GC. I don't know how he did it, maybe using something like a dictornary, or a special reference memory pool, so you know where to start the scan to adjust addresses. Maybe the MMU could be used to help too. Just wild guessing... -- Robert M. Münch Management & IT Freelancer http://www.robertmuench.de
May 18 2004
"Norbert Nemec" <Norbert.Nemec gmx.de> wrote in message news:c8d81o$12m4$1 digitaldaemon.com...B.t.w: is this "moving around" described more clearly anywhere in the documentation? What exactly happens if the GC moves an object that was allocated by a call to "new"? Does it adjust all references pointing anywhere into that object?Yes.I would assume that this takes quite a bit of intelligence, definitelymorethan the Boehm GC has. Is there actually any GC out there, that does move objects for defragmentation or some other purpose?I've seen a lot of material saying this cannot be done with a language that manipulates pointers like C, C++ and D do, but there is a way to do it.
May 23 2004
Walter wrote:"Norbert Nemec" <Norbert.Nemec gmx.de> wrote in message news:c8d81o$12m4$1 digitaldaemon.com...What's the core idea behind it? I don't doubt that it can be done somehow, but I wonder how it efficient it would be. Of course, if the GC knows exactly what data is a pointer, it can go through the complete memory and readjust every pointer referencing the memory that has to be moved. Going through the whole memory should not even be the problem in efficiency, but how should the GC know what is a pointer and what is some other data that by chance has the same binary value as a valid pointer? For a plain GC, this is no problem, since misinterpreting random data as pointers will at worst keep alive some data that actually is dead.B.t.w: is this "moving around" described more clearly anywhere in the documentation? What exactly happens if the GC moves an object that was allocated by a call to "new"? Does it adjust all references pointing anywhere into that object?Yes.I would assume that this takes quite a bit of intelligence, definitelymorethan the Boehm GC has. Is there actually any GC out there, that does move objects for defragmentation or some other purpose?I've seen a lot of material saying this cannot be done with a language that manipulates pointers like C, C++ and D do, but there is a way to do it.
May 23 2004
Norbert Nemec wrote:Walter wrote:Here's my guess: if the GC can tell what is an object reference (for example by allocating from special heaps that store only objects) then it can get at the class info and the class info can store a mask that tells the GC which fields can be updated if the referant is moved. That would mean dynamic arrays don't get moved and pointers to structs (well, actually, pointers in general) don't get moved. -Ben"Norbert Nemec" <Norbert.Nemec gmx.de> wrote in message news:c8d81o$12m4$1 digitaldaemon.com...What's the core idea behind it? I don't doubt that it can be done somehow, but I wonder how it efficient it would be. Of course, if the GC knows exactly what data is a pointer, it can go through the complete memory and readjust every pointer referencing the memory that has to be moved. Going through the whole memory should not even be the problem in efficiency, but how should the GC know what is a pointer and what is some other data that by chance has the same binary value as a valid pointer? For a plain GC, this is no problem, since misinterpreting random data as pointers will at worst keep alive some data that actually is dead.B.t.w: is this "moving around" described more clearly anywhere in the documentation? What exactly happens if the GC moves an object that was allocated by a call to "new"? Does it adjust all references pointing anywhere into that object?Yes.I would assume that this takes quite a bit of intelligence, definitelymorethan the Boehm GC has. Is there actually any GC out there, that does move objects for defragmentation or some other purpose?I've seen a lot of material saying this cannot be done with a language that manipulates pointers like C, C++ and D do, but there is a way to do it.
May 23 2004
So Walter, if I understand correctly, the current Object.toHash() implementation is valid with the current garbage collector but would be broken using a new, copying collector. If and when the GC impl. is updated, will Object.toHash() be updated as well? I'd imagine yes, Object.toHash() will always work correctly else it should be abstract :) Just wondering if I need to rewrite code now, when the GC impl. changes, or never. Thanks. Brian In article <c8r874$2tvd$1 digitaldaemon.com>, Ben Hinkle says...Norbert Nemec wrote:Walter wrote:Here's my guess: if the GC can tell what is an object reference (for example by allocating from special heaps that store only objects) then it can get at the class info and the class info can store a mask that tells the GC which fields can be updated if the referant is moved. That would mean dynamic arrays don't get moved and pointers to structs (well, actually, pointers in general) don't get moved. -Ben"Norbert Nemec" <Norbert.Nemec gmx.de> wrote in message news:c8d81o$12m4$1 digitaldaemon.com...What's the core idea behind it? I don't doubt that it can be done somehow, but I wonder how it efficient it would be. Of course, if the GC knows exactly what data is a pointer, it can go through the complete memory and readjust every pointer referencing the memory that has to be moved. Going through the whole memory should not even be the problem in efficiency, but how should the GC know what is a pointer and what is some other data that by chance has the same binary value as a valid pointer? For a plain GC, this is no problem, since misinterpreting random data as pointers will at worst keep alive some data that actually is dead.B.t.w: is this "moving around" described more clearly anywhere in the documentation? What exactly happens if the GC moves an object that was allocated by a call to "new"? Does it adjust all references pointing anywhere into that object?Yes.I would assume that this takes quite a bit of intelligence, definitelymorethan the Boehm GC has. Is there actually any GC out there, that does move objects for defragmentation or some other purpose?I've seen a lot of material saying this cannot be done with a language that manipulates pointers like C, C++ and D do, but there is a way to do it.
May 23 2004
"Brian Hammond" <d at brianhammond dot comBrian_member pathlink.com> wrote in message news:c8s0hv$v1n$1 digitaldaemon.com...So Walter, if I understand correctly, the current Object.toHash()implementationis valid with the current garbage collector but would be broken using anew,copying collector.Yes, that's right.If and when the GC impl. is updated, will Object.toHash() be updated as well? I'd imagine yes, Object.toHash() will always workcorrectlyelse it should be abstract :) Just wondering if I need to rewrite code now, when the GC impl. changes,ornever. Thanks.To be safe, right now I'd implement your own toHash() for your derived types.
May 25 2004
Ben Hinkle wrote:Here's my guess: if the GC can tell what is an object reference (for example by allocating from special heaps that store only objects) then it can get at the class info and the class info can store a mask that tells the GC which fields can be updated if the referant is moved. That would mean dynamic arrays don't get moved and pointers to structs (well, actually, pointers in general) don't get moved.I don't agree: the question should not be what kind of data you move, but where the reference is stored. References in object are easy to be found, but what about references on the stack or inside of structures/arrays? If the GC wants to move just a single object, it has to be sure to know about *all* references to this object, so it needs to know exactly what piece of data on the stack is a reference or not. It would need to do a complete backtrace and interpret each stackframe according to the corresponding function. I really would like to know what idea Walter has in mind about that. I doubt that it can be done with reasonable effort, in which case it would really make sense to drop that concept from the specs and rather guarantee that references stay fixed.
May 24 2004
Norbert Nemec wrote:Ben Hinkle wrote:Agreed. I was thinking any ambiguous object reference from outside the object heap would make the object unmovable but I never considered that only garbage would be moveable then :-) Maybe pure object references on the stack (or maybe everywhere... who knows) can be stored in a giant list. That would make the size of a reference bigger since it stores a pointer to the next reference but it would make traversing the set of references easy.Here's my guess: if the GC can tell what is an object reference (for example by allocating from special heaps that store only objects) then it can get at the class info and the class info can store a mask that tells the GC which fields can be updated if the referant is moved. That would mean dynamic arrays don't get moved and pointers to structs (well, actually, pointers in general) don't get moved.I don't agree: the question should not be what kind of data you move, but where the reference is stored. References in object are easy to be found, but what about references on the stack or inside of structures/arrays? If the GC wants to move just a single object, it has to be sure to know about *all* references to this object, so it needs to know exactly what piece of data on the stack is a reference or not. It would need to do a complete backtrace and interpret each stackframe according to the corresponding function.I really would like to know what idea Walter has in mind about that. I doubt that it can be done with reasonable effort, in which case it would really make sense to drop that concept from the specs and rather guarantee that references stay fixed.
May 24 2004
In article <c8sri0$28a7$1 digitaldaemon.com>, Ben Hinkle says...Norbert Nemec wrote:Removing or adding such a pointer is simple, if the list is changed to a doubly linked list.. but the overhead is annoying for single threaded, and much worse for multithreaded, programs. You could also use a bitmap of stack frames. I would guess the solution involves sectioning memory into parts that are "fully understood" and "partially understood". Fully understood memory is D objects where the pointers and non-pointers are clearly known from the code. During a sweep, objects are classified as dead, moveable, or fixed. Dead objects are removed because no pointer is found to them. Moveable objects have pointers, but only from "fully understood" memory. "Fixed" objects are all objects in the partial heap, plus "fully understood" objects that have a pointer or *possible* pointer, from partially understood memory. Since most code will not use pointers. In practice, it doesn't hurt much to leave these objects where they are and move other objects around them. This is similar to the "defrag" of a FAT filesystem. You can't move files that are open, so you leave them in place. You get 99% of the benefit. This technique is partly based on a cultural distinction. In C, programmers expect that they know at least as much as the language does about what memory layout is. The language helps out, but is only a servant. D programmers will have different expectations; the language runtime controls the heap and knows enough to do its own accounting. It's like an armed shopkeeper. KevinBen Hinkle wrote:Agreed. I was thinking any ambiguous object reference from outside the object heap would make the object unmovable but I never considered that only garbage would be moveable then :-) Maybe pure object references on the stack (or maybe everywhere... who knows) can be stored in a giant list. That would make the size of a reference bigger since it stores a pointer to the next reference but it would make traversing the set of references easy.Here's my guess: if the GC can tell what is an object reference (for example by allocating from special heaps that store only objects) then it can get at the class info and the class info can store a mask that tells the GC which fields can be updated if the referant is moved. That would mean dynamic arrays don't get moved and pointers to structs (well, actually, pointers in general) don't get moved.I don't agree: the question should not be what kind of data you move, but where the reference is stored. References in object are easy to be found, but what about references on the stack or inside of structures/arrays? If the GC wants to move just a single object, it has to be sure to know about *all* references to this object, so it needs to know exactly what piece of data on the stack is a reference or not. It would need to do a complete backtrace and interpret each stackframe according to the corresponding function.I really would like to know what idea Walter has in mind about that. I doubt that it can be done with reasonable effort, in which case it would really make sense to drop that concept from the specs and rather guarantee that references stay fixed.
May 24 2004
Kevin Bealer wrote:I would guess the solution involves sectioning memory into parts that are "fully understood" and "partially understood". Fully understood memory is D objects where the pointers and non-pointers are clearly known from the code. ...That might really be an idea! Object references on the stack usually either point to huge objects, short-time objects or to the top of object hierarchies. In those cases where memory-defragmentation really would matter, references are mostly between objects that are saved on the heap for a long time. Furthermore, defragmentation really is only necessary for long-running programs like interactive programs or daemons. These should be fairly easy to optimize for defragmentation, but just trying not to keep too many references in variables on the stack during the top loop. Additionally, the GC could simply offer a way to mark/unmark variables on the stack as references, so their content may be moved as well. (This should not happen automatically, since it would mean quite some overhead, but doing it manually in important cases would not be a problem.
May 24 2004
In article <c8t1s1$2hv7$1 digitaldaemon.com>, Norbert Nemec says...Kevin Bealer wrote:Other ideas: The GC could prefer to collect when the stack is "shallow" relative to recent activity. This would cause collections at the end of "natural breaks" in processing. The stack itself could be "well understood" as follows: each function/method has a stack frame with a certain layout. A bitmap of that layout, i.e. "1" bits mean this-is-a-reference, and "0" bits mean "this is piece-of-data", would be created at compile time. When the GC "took over" it would determine which bitmap applied to each frame. If you think about it, a struct is like a stack frame, is like an object's data section. Programs have code and data elements; you can move "local variables" in an object into and out of the object body, to make different parts of it more or less persistent. So, everything that can be done with an object can be done with a stack frame and vice versa. Languages like ruby even allow "closures" where the stack frame stays around after the program exits because a nested function still has references to variables in it. This (potentially) requires garbage collection of the stack frame, which is not fast. It also means (depending on implementation) there really is no stack: just a lot of floating objects that serve as contexts for each other. There are some neat writeups on this. I don't remember where, but it was on the page for "parrot", the perl "assembler language". Slightly mind blowing, if you ask me. KevinI would guess the solution involves sectioning memory into parts that are "fully understood" and "partially understood". Fully understood memory is D objects where the pointers and non-pointers are clearly known from the code. ...That might really be an idea! Object references on the stack usually either point to huge objects, short-time objects or to the top of object hierarchies. In those cases where memory-defragmentation really would matter, references are mostly between objects that are saved on the heap for a long time. Furthermore, defragmentation really is only necessary for long-running programs like interactive programs or daemons. These should be fairly easy to optimize for defragmentation, but just trying not to keep too many references in variables on the stack during the top loop. Additionally, the GC could simply offer a way to mark/unmark variables on the stack as references, so their content may be moved as well. (This should not happen automatically, since it would mean quite some overhead, but doing it manually in important cases would not be a problem.
May 25 2004
"Brian Hammond" <d at brianhammond dot comBrian_member pathlink.com> wrote in message news:c8b56i$pro$1 digitaldaemon.com...From http://www.digitalmars.com/d/garbage.html "Using pointer values to compute a hash function. A copying garbagecollectorcan arbitrarily move objects around in memory, thus invalidating thecomputedhash value." Why then does Object's toHash() method default implementation do exactlythat?uint toHash() { return (uint)(void *)this; } I was relying on the default toHash() for the value of a key in anassociativearray. I guess I should provide my own hash function then?You're right, Object.toHash() is fundamentally broken, and this break will show up if and when a D implementation gets a copying garbage collector.
May 23 2004