digitalmars.D - GC, the simple solution
- Frank Benoit (16/16) Jun 04 2006 Another garbage collector thread :)
- Daniel Keep (55/74) Jun 04 2006 "GC, the simple solution"
- Frank Benoit (40/58) Jun 04 2006 In nearly every other GC implementation you will need also a read or
- Daniel Keep (80/147) Jun 04 2006 I guess we'll just have to disagree, then :)
- Lars Ivar Igesund (10/29) Jun 05 2006 Well, actually, the current GC is kinda bad with cycles. Try to make a
- Larry Evans (33/46) Jun 05 2006 be valid
- Daniel Keep (25/94) Jun 08 2006 Sorry for the late (and short) reply: exams breathing down my neck and
- Lionello Lunesu (10/15) Jun 04 2006 This is not such a good thing! You don't really know who releases the la...
- Frank Benoit (15/40) Jun 04 2006 Where is the disadvantage? If you release concurrent in different
- Lionello Lunesu (23/50) Jun 04 2006 There are many resources that must be released by the thread by which th...
- Frank Benoit (7/20) Jun 04 2006 This is not true for the phobos GC. The GC runs in the thread which
- Lars Ivar Igesund (25/44) Jun 04 2006 Not entirely true in the general case, Frank ;)
- Frank Benoit (5/22) Jun 05 2006 In codesize yes, but in heap size?
- Lars Ivar Igesund (10/18) Jun 05 2006 The reason is that a delete might trigger an unknown additional amount o...
- Sean Kelly (8/15) Jun 05 2006 And some allocators colaesce adjacent free blocks aggressively (ie.
- Walter Bright (28/30) Jun 04 2006 Reference counting has its advantages, but some severe disadvantages:
- Frank Benoit (38/66) Jun 04 2006 Ok, i see now that my "best and simplest" solution is neiter of both :)
- Daniel Keep (44/68) Jun 04 2006 Just wanted to point out that if I remember correctly (and I *could* be
- Lionello Lunesu (3/9) Jun 04 2006 The thread doing the ++ apparently has a pointer to the object without a...
- Daniel Keep (12/26) Jun 04 2006 *slaps forehead*
- Frank Benoit (3/11) Jun 05 2006 MS=mark&sweep :)
- Walter Bright (31/110) Jun 04 2006 I'm not throwing it away so fast, I've thought about it for several
- Frank Benoit (21/76) Jun 05 2006 Yes, and the compiler can catch these cases. With incr/decr the
- Walter Bright (13/48) Jun 05 2006 It can only do it if it has full source available. It also won't work
- Kevin Bealer (15/24) Jun 07 2006 Correct me if I'm wrong, but while this is true for M + S, doesn't RC /k...
- Bruno Medeiros (12/22) Jun 07 2006 Are you sure? It says in
- Sean Kelly (3/23) Jun 07 2006 These are synchronizing operations.
- Bruno Medeiros (7/33) Jun 10 2006 By "That requires synchronizing." I thought Walter meant stuff with
- Sean Kelly (15/44) Jun 10 2006 For the most part. There are really two issues here. First, the
- Bruno Medeiros (9/57) Jun 12 2006 Ah, I see. I had recently read about relative out-of-order execution
- Sean Kelly (51/57) Jun 13 2006 A single CPU is allowed to do whatever it wants so long as it can fool
- Daniel Keep (8/78) Jun 14 2006 Cool; learn something new every day. Thanks for the informative post.
- Bruno Medeiros (8/78) Jun 14 2006 Nice post!
- Sean Kelly (13/16) Jun 14 2006 comp.programming.threads is worth keeping an eye on, though the jargon
- Lionello Lunesu (5/5) Jun 14 2006 Interesting indeed!
- Sean Kelly (10/15) Jun 14 2006 It's been a while since I read that article, but I think D already has
- Lionello Lunesu (6/21) Jun 15 2006 ...that leaves only atomic operations? Or would "volatile" or
- Sean Kelly (33/54) Jun 15 2006 "volatile" is actually intended to address compiler optimizations in a
- Lionello Lunesu (18/45) Jun 16 2006 So I suppose "volatile" could be extended to include memory locking as w...
- Sean Kelly (18/48) Jun 16 2006 Yes it could, though for now I prefer it the way it is as it offers more...
- Don Clugston (5/70) Jun 16 2006 I'd really like to see this included in the standard distribution.
- Sean Kelly (6/10) Jun 16 2006 Mango contains some of Doug Lea's containers and there may be one there....
- Bruno Medeiros (16/36) Jun 17 2006 Well, that's the thing, I just want to have some general knowledge about...
- Sean Kelly (7/38) Jun 17 2006 Sadly, I don't know of any more basic sources of information on this
- Sean Kelly (25/35) Jun 05 2006 For what it's worth, incremental GC is similar to refcounting in that
- Jeff Parsons (13/17) Jun 07 2006 My _only_ significant beef with D so far as been the unbound time for a
- Sean Kelly (15/23) Jun 08 2006 The spec doesn't contain any language forbidding such an approach, but I...
- Larry Evans (18/32) Jun 08 2006 [snip]
- Walter Bright (8/13) Jun 08 2006 If you're looking for a *guarantee*, then you cannot even use malloc().
- Sean Kelly (10/23) Jun 08 2006 For what it's worth, I have read papers describing hard real-time
- Walter Bright (5/11) Jun 09 2006 Yup. It gets the job done.
- Derek Parnell (9/15) Jun 08 2006 It seems obvious that if one doesn't want the Garbage Collector to colle...
- Sean Kelly (7/19) Jun 08 2006 This gets sticky if one wants to use strings in D though, as there's no
- Walter Bright (17/36) Jun 09 2006 You can use D arrays. Just allocate them using malloc (or statically
- Sean Kelly (22/59) Jun 09 2006 So then I'd just manually set the ptr and len members? Sounds
- Walter Bright (9/36) Jun 09 2006 int *p = cast(int *)calloc(dim, sizeof(int));
- David Gileadi (12/22) Jun 09 2006 It seems that a fairly common desire is to use D for writing games.
- John Reimer (16/40) Jun 09 2006 Didn't Walter point out that that, in the current gc implementation, you...
- Dave (6/52) Jun 09 2006 Yea, in the current GC, allocation is the only thing that may implicitly...
- Chad J (32/45) Jun 12 2006 This is what I'm screaming.
- renox (14/18) Jun 04 2006 Uh, what's your definition of "best"?
Another garbage collector thread :) The best garbage collector I can think about, is a reference counting one. To implement this: * Every object needs a reference counter field. * Code using references, need to have code for incrementing/decrementing the counters. * Destructors are called immediatly if the counter goes zero. What you get is a precise, incremental, realtime garbage collector. * all logic behaviour is deterministic * Member objects are still valid, while a destructor is called. => no more dispose/close patterns * implement games and realtime apps without worry about lags * implement secure application without worry about GC attack by valid refence data values. * consume only the needed memory * no more need for the auto object
Jun 04 2006
"GC, the simple solution" No such thing. Summary: * Ref counting by default: good grief no! * Ref counting as an option: good grief yes! * Is there a perfect solution for memory management: don't be ridiculous. Frank Benoit wrote:Another garbage collector thread :) The best garbage collector I can think about, is a reference counting one. To implement this: * Every object needs a reference counter field. * Code using references, need to have code for incrementing/decrementing the counters. * Destructors are called immediatly if the counter goes zero. What you get is a precise, incremental, realtime garbage collector. * all logic behaviour is deterministic * Member objects are still valid, while a destructor is called. => no more dispose/close patterns * implement games and realtime apps without worry about lags * implement secure application without worry about GC attack by valid refence data values. * consume only the needed memory * no more need for the auto objectWhat you also get that you might not want: * because references are on the object, you can't have array slices * it's also incompatible with pointers: if the pointer is to a member of an object, how on earth do you update the refcount then? * slower in the immediate sense (incref & decref calls, plus it'll mean less code in L1 and L2 CPU cache, which can kill performance in some architectures) * requires all objects to have destructors to decref child objects (implicit or otherwise) * without auto, makes it impossible to ensure an object's destruction when you leave scope: you can leak references * inability to free cycles * inability to even *run* destructors where cycles are involved, because there is no order in which they can logically be executed and still be valid Those last two are the main problem with reference counting, along with the others and slightly added complexity in the compiler. Python, however, does not suffer the cycle problem. Know why? Because they have a *mark & sweep* garbage collector to clean up the cycles. Seriously, I don't want to see the current GC replaced by a reference counting scheme. The current system is just fine for the majority of applications. Where it falls down are real-time systems which can't allow for the non-deterministic GC pauses, and where you want to keep your memory profile to a minimum. Anything that has pauses of activity can just run a background GC thread to keep the memory footprint down. To this end, I think implementing an incremental or concurrent GC should be the next port of call for optimising D's memory management. As it stands, you can implement reference counting yourself: you just need to write an allocator, and call the incref and decref functions manually. However, I do think that having built-in *support* for Reference counting would be fantastic. The less that stuff like this is left in the hands of the programmer, the better. What I think we all need to keep in mind is that NO memory management scheme is good for all cases. There will always be some weird corner case which doesn't work well with our scheme of choice, and that we don't give a rat's about, but is VERY important to someone else. So far D does automatic GC (the best default, IMHO [1]), manual management, and RAII. Throw optional ref counting in to the mix, and I think we'll have pretty much 99% of cases covered. -- Daniel [1] The best default since it's unobtrusive, and *safe*: unless you're doing strange things, it's very hard to leave dangling references with a GC. This allows people to get on with writing the damn program, and worry about managing memory later (if ever). -- Unlike Knuth, I have neither proven or tried the above; it may not even make sense. v2sw5+8Yhw5ln4+5pr6OFPma8u6+7Lw4Tm6+7l6+7D i28a2Xs3MSr2e4/6+7t4TNSMb6HTOp5en5g6RAHCP http://hackerkey.com/
Jun 04 2006
In nearly every other GC implementation you will need also a read or write barrier. With reference counting you have a write barriere. Yes, the unprecise, non-incremental, "Stop-the-world", non-concurrent, non-generational, conservative Phobos GC implementation is an exception. But a write barrier alone doesn't make ref counting ridiculous. Well you are right, it is not the solution for all and everything, but I think a big step in the right direction. And mark&sweep GC should be the option.* because references are on the object, you can't have array slicesDon't understand that point.* it's also incompatible with pointers: if the pointer is to a member of an object, how on earth do you update the refcount then?In the actual implememtation of the gc, every object's memory range is registered by the gc. With that you can get the objects refcounter address.* slower in the immediate sense (incref & decref calls, plus it'll mean less code in L1 and L2 CPU cache, which can kill performance in some architectures)Performance has two sides. Throughput is one thing, worst latency the other. We can think about techniques to optimize unecessary increments/decrements. BTW all incremental GCs will also need read and/or write barriers, which will have the same perfomance issues.* requires all objects to have destructors to decref child objects (implicit or otherwise)Yes, all objects having child objects. But if you think, that destructor calls are a performance issue, you will get problems with the mark&sweep also.* without auto, makes it impossible to ensure an object's destruction when you leave scope: you can leak referencesHow can that happen, if we have them native? Well ok, e.g. if you store the reference to a static variable, then it is not freed. But this is really a programmers bug. But I think that is something an advanced look at it will find solutions. First we have to decide to go for ref counting :)* inability to free cycles * inability to even *run* destructors where cycles are involved, because there is no order in which they can logically be executed and still be validRight, here a manual ref=null is needed. And perhaps a mark&sweep algorithm to detect such cycles in the debug version, called on request. So if you know you program, you can avoid cycles, or you can break them before releasing the last reference. Doing a realtime GC seems to be not possible. The existing "realtime GC" seems to me, incremental ones, which stop the world for a short limited time. And that time goes into your worst latency. So if I want to do things with a latency less 100us, there is no chance to implement a GC. Ref counting whould be a solution, but this can only work if the whole program does not depend on "Stop-the-world".As it stands, you can implement reference counting yourself: you just need to write an allocator, and call the incref and decref functions manually. However, I do think that having built-in *support* for Reference counting would be fantastic. The less that stuff like this is left in the hands of the programmer, the better.No, i can't. * The dereference operator cannot be overwritten. * Existing classes, native arrays etc. does not work like this. I do not need a few classes managed with ref counting. I want to avoid the "Stop the world". And phobos needs it. Where is the proof that ref counting is so slow? In mark&sweep you have to run over the whole memory. If the memory is low, you have to do that often. This is really slow than.
Jun 04 2006
Frank Benoit wrote:In nearly every other GC implementation you will need also a read or write barrier. With reference counting you have a write barriere. Yes, the unprecise, non-incremental, "Stop-the-world", non-concurrent, non-generational, conservative Phobos GC implementation is an exception. But a write barrier alone doesn't make ref counting ridiculous. Well you are right, it is not the solution for all and everything, but I think a big step in the right direction. And mark&sweep GC should be the option.I guess we'll just have to disagree, then :) Like I said before, I think the current GC should be the default since, even if it's not the best in terms of latency, it's the *safest*. There's no need for someone coding under it to worry about things like cycles, etc. By having refcounting an option, we can say to programmers: "turn this on if you like, but watch your back for cycles!" <joking>Plus, the current system gives us bitchin' benchmarks with wc! ;)</joking>An array slice is a pointer and a length. The problem is the "pointer" part:* because references are on the object, you can't have array slicesDon't understand that point.Aaah, ok: you're putting the refcount somewhere in the allocated chunk. Sorry, I thought you meant hide it as a member on objects. However, I imagine that this is still going to do horrible things to pointer operations in terms of speed. Every time you move or copy a pointer, you'd have to ask the GC "ok, find which pool this pointer is located in, then inc/dec the refcount."* it's also incompatible with pointers: if the pointer is to a member of an object, how on earth do you update the refcount then?In the actual implememtation of the gc, every object's memory range is registered by the gc. With that you can get the objects refcounter address.I concede that, as well as that the current GC has (probably) the worst all-at-once latency you can get.* slower in the immediate sense (incref & decref calls, plus it'll mean less code in L1 and L2 CPU cache, which can kill performance in some architectures)Performance has two sides. Throughput is one thing, worst latency the other. We can think about techniques to optimize unecessary increments/decrements. BTW all incremental GCs will also need read and/or write barriers, which will have the same perfomance issues.Well, I don't *need* destructors in most general objects under mark&sweep. It's only when they're holding some external resource, and that's when I usually manage them with auto or scoped destruction. ... which would admittedly be easier with refcounting, if a little bit slower :)* requires all objects to have destructors to decref child objects (implicit or otherwise)Yes, all objects having child objects. But if you think, that destructor calls are a performance issue, you will get problems with the mark&sweep also.What I mean is that if I say "auto Foo x = ..." then I know that come hell or high water, D is going to destroy that object at the end of scope. Plus, because of auto rules, I *cannot* leak a reference to that object: D simply won't let me. As for the static variable case, it's no different to the current implementation: you save a reference to something in a global or static variable, and it's never going to die :)* without auto, makes it impossible to ensure an object's destruction when you leave scope: you can leak referencesHow can that happen, if we have them native? Well ok, e.g. if you store the reference to a static variable, then it is not freed. But this is really a programmers bug. But I think that is something an advanced look at it will find solutions. First we have to decide to go for ref counting :)The problem I have is that even if I *do* "know" my program, I don't want to have to worry about this. I just want to write my damn code! This is one of the great advantages of D: if you don't give a rat's about memory management, you don't have to. That said, D *does* have one advantage: at compile time, the compiler can determine if it's possible for an object to indirectly refer to itself. Using this, it can insert something like this into the decref: That way, you know that cycles will get checked for *eventually*.* inability to free cycles * inability to even *run* destructors where cycles are involved, because there is no order in which they can logically be executed and still be validRight, here a manual ref=null is needed. And perhaps a mark&sweep algorithm to detect such cycles in the debug version, called on request. So if you know you program, you can avoid cycles, or you can break them before releasing the last reference.Doing a realtime GC seems to be not possible.Oh, you'll get no argument from me THERE. :)The existing "realtime GC" seems to me, incremental ones, which stop the world for a short limited time. And that time goes into your worst latency. So if I want to do things with a latency less 100us, there is no chance to implement a GC. Ref counting whould be a solution, but this can only work if the whole program does not depend on "Stop-the-world".I saw an interesting paper once that was talking about using a tri-colour mark&sweep collector that was fully concurrent: sounded heinously complicated to write, but could run a full collect in the background of your app. Buggered if I can find the damn thing again, though...MyRef.value Yeah, it's not that great, but at least it's *possible* :)As it stands, you can implement reference counting yourself: you just need to write an allocator, and call the incref and decref functions manually. However, I do think that having built-in *support* for Reference counting would be fantastic. The less that stuff like this is left in the hands of the programmer, the better.No, i can't. * The dereference operator cannot be overwritten.* Existing classes, native arrays etc. does not work like this.Fair enough.I do not need a few classes managed with ref counting. I want to avoid the "Stop the world". And phobos needs it.I don't think phobos "needs" it. What I think would help immensely was if phobos was easier to rebuild with different compiler flags, collector, etc. Would you be happy if it was trivial to recompile phobos using refcounting instead of the current GC, and then use that?Where is the proof that ref counting is so slow? In mark&sweep you have to run over the whole memory. If the memory is low, you have to do that often. This is really slow than.I don't have any proof on me, although knowing the internet I'm sure I could find some "proof" without too much trouble :P And yes; if memory is low, m&s will probably take a long time. Actually, I should find out how long that is. Would be interesting to know exactly how long it takes to collect 512 meg of garbage :P Personally, I don't mind how D's default collector is implemented, so long as it is efficient in the general case, and I don't have to know anything about the way it works in order to write code. That said, I'd still like to see D support collectors other than the base GC. Ideally, I think D should support at least: That way, we can all have our cake, be it chocolate cake, fruit cake or cheesecake. Plus, we even get to *eat* it :) -- Daniel -- Unlike Knuth, I have neither proven or tried the above; it may not even make sense. v2sw5+8Yhw5ln4+5pr6OFPma8u6+7Lw4Tm6+7l6+7D i28a2Xs3MSr2e4/6+7t4TNSMb6HTOp5en5g6RAHCP http://hackerkey.com/
Jun 04 2006
Daniel Keep wrote:Frank Benoit wrote:Well, actually, the current GC is kinda bad with cycles. Try to make a circular linked list where you keep a single additional reference to one of the elements. Then remove the reference and collect. In theory this memory should now "just" be lost/leaked, but you might also experience crashes. I think Chris Miller had some interesting revelations on the subject. -- Lars Ivar Igesund blog at http://larsivi.net DSource & #D: larsiviIn nearly every other GC implementation you will need also a read or write barrier. With reference counting you have a write barriere. Yes, the unprecise, non-incremental, "Stop-the-world", non-concurrent, non-generational, conservative Phobos GC implementation is an exception. But a write barrier alone doesn't make ref counting ridiculous. Well you are right, it is not the solution for all and everything, but I think a big step in the right direction. And mark&sweep GC should be the option.I guess we'll just have to disagree, then :) Like I said before, I think the current GC should be the default since, even if it's not the best in terms of latency, it's the *safest*. There's no need for someone coding under it to worry about things like cycles, etc.
Jun 05 2006
On 06/04/2006 05:40 AM, Daniel Keep wrote:"GC, the simple solution"[snip]* inability to free cycles * inability to even *run* destructors where cycles are involved, because there is no order in which they can logically be executed and stillbe valid I think someone else in the thread mentioned this as a programmer error. However, assuming the programmer intended this, then this programmer must expect that the MS GC would break the cycle somewhere before calling the destructors and he must not care where the cycle was broken, because otherwise he would have used a weak pointer to close the cycle where he wanted the break to occur. Anyway, code for doing this GC breaking of cycles in RC is at: http://tinyurl.com/reuzl See code after comment: * Breaks all arcs in cycle_arcs. As mentioned in the code comments, this is Christopher's method for collecting cycles and suffers a delay since it keeps all smart ptrs in a list and processes the entire list. Other methods don't keep this list but do a local mark-sweep at each rc decrement with obvious time penalties. A compromise just uses a queue which, when it reaches a certain size, plays the same role as Christopher's list of all smart pointers.Those last two are the main problem with reference counting, along with the others and slightly added complexity in the compiler. Python, however, does not suffer the cycle problem. Know why? Because they have a *mark & sweep* garbage collector to clean up thecycles. Then it must not automatically call destructors since then it would suffer the 2nd problem above.[snip]As it stands, you can implement reference counting yourself: you just need to write an allocator, and call the incref and decref functions manually. However, I do think that having built-in *support* for Reference counting would be fantastic. The less that stuff like this is left in the hands of the programmer, the better.Smart pointers would be a great help here, but since D doesn't allow user definition of the assignment operator: http://www.digitalmars.com/d/archives/digitalmars/D/learn/984.html that's not easy. I'm a newbie, so what's the best way to implement something like a smart pointer without having to manually: increment rc1; decrement rc2; assign the pointer to its target location;
Jun 05 2006
Sorry for the late (and short) reply: exams breathing down my neck and all :) Larry Evans wrote:On 06/04/2006 05:40 AM, Daniel Keep wrote:I haven't read the above link since I really don't need to be distracted any more than I already am at the moment. I'll read that link later on once I'm not quite so swamped :) However, I will say this: if you have a cycle in Python, it will clean it up normally PROVIDED destructors are not involved. If they are, then it just moves the whole cycle to a dead object list, since there is no safe way to break the cycle. At that point, it's up to the programmer to deal with it."GC, the simple solution"[snip]* inability to free cycles * inability to even *run* destructors where cycles are involved, because there is no order in which they can logically be executed and still bevalid I think someone else in the thread mentioned this as a programmer error. However, assuming the programmer intended this, then this programmer must expect that the MS GC would break the cycle somewhere before calling the destructors and he must not care where the cycle was broken, because otherwise he would have used a weak pointer to close the cycle where he wanted the break to occur. Anyway, code for doing this GC breaking of cycles in RC is at: http://tinyurl.com/reuzl See code after comment: * Breaks all arcs in cycle_arcs. As mentioned in the code comments, this is Christopher's method for collecting cycles and suffers a delay since it keeps all smart ptrs in a list and processes the entire list. Other methods don't keep this list but do a local mark-sweep at each rc decrement with obvious time penalties. A compromise just uses a queue which, when it reaches a certain size, plays the same role as Christopher's list of all smart pointers.Those last two are the main problem with reference counting, along with the others and slightly added complexity in the compiler. Python, however, does not suffer the cycle problem. Know why? Because they have a *mark & sweep* garbage collector to clean up thecycles. Then it must not automatically call destructors since then it would suffer the 2nd problem above.Not really, sorry. Your best bet might be with a preprocessor. I've had plans to write an AST-based preprocessor for a while, but that's off in the far, far future. Maybe give Build 3.0 a look? In that case, you'd want to write a macro like: ==> (rc1).incref(); (rc2).decref(); rc2 = (rc1); Again, sorry for the terse reply. -- Daniel -- Unlike Knuth, I have neither proven or tried the above; it may not even make sense. v2sw5+8Yhw5ln4+5pr6OFPma8u6+7Lw4Tm6+7l6+7D i28a2Xs3MSr2e4/6+7t4TNSMb6HTOp5en5g6RAHCP http://hackerkey.com/[snip]As it stands, you can implement reference counting yourself: you just need to write an allocator, and call the incref and decref functions manually. However, I do think that having built-in *support* for Reference counting would be fantastic. The less that stuff like this is left in the hands of the programmer, the better.Smart pointers would be a great help here, but since D doesn't allow user definition of the assignment operator: http://www.digitalmars.com/d/archives/digitalmars/D/learn/984.html that's not easy. I'm a newbie, so what's the best way to implement something like a smart pointer without having to manually: increment rc1; decrement rc2; assign the pointer to its target location;
Jun 08 2006
* Destructors are called immediatly if the counter goes zero.This is not such a good thing! You don't really know who releases the last reference, so you don't really know when the object is deleted. If the object is being referenced from multiple threads, you don't know in what thread the destructor will run.* all logic behaviour is deterministicSee above.* Member objects are still valid, while a destructor is called. => no more dispose/close patternsWhen an object is being destructed, the references it has to other objects are being released. So in the destructor of object B, object A might have released about half its references.* consume only the needed memoryWhat about circular references? They'll never get released, no clean-up. L.
Jun 04 2006
Lionello Lunesu schrieb:Where is the disadvantage? If you release concurrent in different threads you need some kind of syncronization. Than you know it. But this is no issue of ref counting. If you release a reference in the phobos GC, the object is collected "perhaps" the next time. If there are no valid integer values, "pointing"to your object. And after collecting, the sequence of destructor calls is not defined.* Destructors are called immediatly if the counter goes zero.This is not such a good thing! You don't really know who releases the last reference, so you don't really know when the object is deleted. If the object is being referenced from multiple threads, you don't know in what thread the destructor will run.See above.* all logic behaviour is deterministicSee above.If object B Dtor is called, its member variables are valid, and valid is the object A. The B.~this() can do something with A. After the Dtor is finished, the reference to A is nulled, and A is perhaps also deleted. Seams perfect to me.* Member objects are still valid, while a destructor is called. => no more dispose/close patternsWhen an object is being destructed, the references it has to other objects are being released. So in the destructor of object B, object A might have released about half its references.Yes, they need a solution. Perhaps a fall back mark&sweep collector, to check for cicles.* consume only the needed memoryWhat about circular references? They'll never get released, no clean-up.L.
Jun 04 2006
"Frank Benoit" <keinfarbton nospam.xyz> wrote in message news:e5uojm$18ji$1 digitaldaemon.com...Lionello Lunesu schrieb:There are many resources that must be released by the thread by which they were acquired. There's not much synchronisation can do. If the last reference just happens to be released in a thread other than the one that created the object, then the resource will be released in the wrong thread. As I've understood from Boehm's paper, one advantages of a GC is to fix this issue: a GC collect will call finalizers for the objects that were created in that thread. It's like keeping an object from being destructed until the thread that created the object calls malloc (which is were the GC does its thing). Anyway, Boehm's paper is a good read. He (she?) also addresses the problems with reference counting. http://www.hpl.hp.com/personal/Hans_Boehm/gc/Where is the disadvantage? If you release concurrent in different threads you need some kind of syncronization. Than you know it. But this is no issue of ref counting. If you release a reference in the phobos GC, the object is collected "perhaps" the next time. If there are no valid integer values, "pointing"to your object. And after collecting, the sequence of destructor calls is not defined.* Destructors are called immediatly if the counter goes zero.This is not such a good thing! You don't really know who releases the last reference, so you don't really know when the object is deleted. If the object is being referenced from multiple threads, you don't know in what thread the destructor will run.I meant my example like this: object A has a reference to object B. During the destruction of A, the references it has to other objects are released, causing some objects being destructed, like B. Now B might (indirectly) use data from A, but A's in the middle of destruction. What I mean to tell with all this is that you still have to be carefull with your destructors. Especially when using a strong-pointer template. You can't always tell what state other objects are in in a destructor. They might be half way during destruction, having released some, but not all, of their references. L.If object B Dtor is called, its member variables are valid, and valid is the object A. The B.~this() can do something with A. After the Dtor is finished, the reference to A is nulled, and A is perhaps also deleted. Seams perfect to me.* Member objects are still valid, while a destructor is called. => no more dispose/close patternsWhen an object is being destructed, the references it has to other objects are being released. So in the destructor of object B, object A might have released about half its references.
Jun 04 2006
As I've understood from Boehm's paper, one advantages of a GC is to fix this issue: a GC collect will call finalizers for the objects that were created in that thread. It's like keeping an object from being destructed until the thread that created the object calls mallocThis is not true for the phobos GC. The GC runs in the thread which calls new or gc.fullCollect. But the Dtors are not called while other threads are running, because the GC pauses all other threads. This is called the "Stop-the-world".I meant my example like this: object A has a reference to object B. During the destruction of A, the references it has to other objects are released, causing some objects being destructed, like B. Now B might (indirectly) use data from A, but A's in the middle of destruction. What I mean to tell with all this is that you still have to be carefull with your destructors. Especially when using a strong-pointer template. You can't always tell what state other objects are in in a destructor. They might be half way during destruction, having released some, but not all, of their references.Object A cannot be destroyed, if it is reachable directly or indirectly. That is, the ref counting is for. So if A.~this() is called, nobody else has a reference to A.
Jun 04 2006
Frank Benoit wrote:Another garbage collector thread :) The best garbage collector I can think about, is a reference counting one. To implement this: * Every object needs a reference counter field. * Code using references, need to have code for incrementing/decrementing the counters. * Destructors are called immediatly if the counter goes zero. What you get is a precise, incremental, realtime garbage collector. * all logic behaviour is deterministic * Member objects are still valid, while a destructor is called. => no more dispose/close patterns * implement games and realtime apps without worry about lags * implement secure application without worry about GC attack by valid refence data values. * consume only the needed memory * no more need for the auto objectNot entirely true in the general case, Frank ;) There are always techniques to get around problems with an algorithm, depending on what you want to achieve, but in general, a reference counted GC is said to have these _negative_ properties: (paraphrased from Garbage Collection: Algorithms for Automatic Dynamic Memory Management by Jones and Lins) * The cost of removing the last pointer to an object is unbounded * Even if the general latency is good, the total overhead of adjusting references is significantly greater than that of a tracing GC * It has a substantial space overhead, making it less useful for smaller heaps * Cyclic references (already mentioned) I think these makes RC less suited for the general case, whereas it might fit very well in a tightly controlled system where low latency is required (which I'm well aware that you need ;). IMO, the best general GC (the default) would be one which can handle as many cases as possible, and which is manipulable in those (preferably known) cases which it don't handle out of the box. I think that one need to combine features from several GC techniques (possibly also RC) to achieve this. -- Lars Ivar Igesund blog at http://larsivi.net DSource & #D: larsivi
Jun 04 2006
* The cost of removing the last pointer to an object is unboundeddelete can be unbound? I didn't know that.* Even if the general latency is good, the total overhead of adjusting references is significantly greater than that of a tracing GC* It has a substantial space overhead, making it less useful for smaller heapsIn codesize yes, but in heap size? I though the tracing GC does need much more heap because it should be called very seldom.* Cyclic references (already mentioned) I think these makes RC less suited for the general case, whereas it might fit very well in a tightly controlled system where low latency is required (which I'm well aware that you need ;). IMO, the best general GC (the default) would be one which can handle as many cases as possible, and which is manipulable in those (preferably known) cases which it don't handle out of the box. I think that one need to combine features from several GC techniques (possibly also RC) to achieve this.agreed.
Jun 05 2006
Frank Benoit wrote:The reason is that a delete might trigger an unknown additional amount of decrefs leading to more deletes.* The cost of removing the last pointer to an object is unboundeddelete can be unbound? I didn't know that.Hmm, might have paraphrased somewhat wildly there, but the space overhead _is_ considered a drawback, and reducing it will most likely lead to more computation overhead instead. -- Lars Ivar Igesund blog at http://larsivi.net DSource & #D: larsivi* It has a substantial space overhead, making it less useful for smaller heapsIn codesize yes, but in heap size? I though the tracing GC does need much more heap because it should be called very seldom.
Jun 05 2006
Lars Ivar Igesund wrote:Frank Benoit wrote:And some allocators colaesce adjacent free blocks aggressively (ie. during delete). Not to mention the fact that the allocator data is probably protected by a mutex, etc. In general, if the allocator documentation makes no guarantees about progress then you may at least be stuck on a mutex while other (potentially low-priority) threads are making an (unbounded) allocation. SeanThe reason is that a delete might trigger an unknown additional amount of decrefs leading to more deletes.* The cost of removing the last pointer to an object is unboundeddelete can be unbound? I didn't know that.
Jun 05 2006
Frank Benoit wrote:The best garbage collector I can think about, is a reference counting one.Reference counting has its advantages, but some severe disadvantages: 1) cyclical data structures won't get free'd 2) every pointer copy requires an increment and a corresponding decrement - including when simply passing a reference to a function 3) in a multithreaded app, the incs and decs must be synchronized 4) exception handlers (finally's) must be inserted to handle all the decs so there are no leaks. Contrary to assertions otherwise, there is no such thing as "zero overhead exceptions." 5) in order to support slicing and interior pointers, as well as supporting reference counting on arbitrary allocations of non-object data, a separate "wrapper" object must be allocated for each allocation to be ref counted. This essentially doubles the number of allocations needed. 6) The wrapper object will mean that all pointers will need to be double-dereferenced to access the data. 7) Fixing the compiler to hide all this stuff from the programmer will make it difficult to interface cleanly with C. 8) Ref counting can fragment the heap thereby consuming more memory just like the gc can, though the gc typically will consume more memory overall. 9) Ref counting does not eliminated latency problems, it just reduces them. The proposed C++ shared_ptr<>, which implements ref counting, suffers from all these faults. I haven't seen a heads up benchmark of shared_ptr<> vs mark/sweep, but I wouldn't be surprised if shared_ptr<> turned out to be a significant loser in terms of both performance and memory consumption. That said, I'd like for D to optionally support some form of ref counting, as rc is better for managing scarce resources like file handles.
Jun 04 2006
Ok, i see now that my "best and simplest" solution is neiter of both :) But I don't want to throw the idea away so fast. Please lets discuss it a bit further. Let's think about the scenario of having reference counting (RC) and mark&sweep (MS) in parallel. Like Daniel Keep said, it is used in Python. If Python uses this, the idea cannot be so crazy.1) cyclical data structures won't get free'dCombine RC and MS. So cycles are detected. It should be possible to avoid cycles at all, if you don't want to have any MS pause.2) every pointer copy requires an increment and a corresponding decrement - including when simply passing a reference to a functionThis is true for a smart pointer class. But is it really true for a compiler supported RC? If I call a func/method, I hold a valid reference to the object. While this reference is copied down the stack, these are only temporary reference. And all of them are release before the call returns to the caller. You can ignore them all for RC. Only if the reference is stored, the ref counter has to be incremented. This is only possible if this is done by the compiler. No smart pointer can do this :)3) in a multithreaded app, the incs and decs must be synchronizedIsn't a atomic inc/dec enough? (Don't know anything about the performance of the asm "lock" instruction)4) exception handlers (finally's) must be inserted to handle all the decs so there are no leaks. Contrary to assertions otherwise, there is no such thing as "zero overhead exceptions."Yes, this is additional overhead in code size and runtime. And there are additional post destructors needed, to set all member refs to null, which wasn't nulled by the dtor.5) in order to support slicing and interior pointers, as well as supporting reference counting on arbitrary allocations of non-object data, a separate "wrapper" object must be allocated for each allocation to be ref counted. This essentially doubles the number of allocations needed.The MS GC allready has a memory registry. Isn't there stored, where an object begins? Let the ref counter be in this registry. This should work with all types and slicing and object member references.6) The wrapper object will mean that all pointers will need to be double-dereferenced to access the data.=> 5.)7) Fixing the compiler to hide all this stuff from the programmer will make it difficult to interface cleanly with C.hm, ok. Here is some more manual work necessary. If you give the last reference to a C lib, isn't this a problem for the MS GC also?8) Ref counting can fragment the heap thereby consuming more memory just like the gc can, though the gc typically will consume more memory overall.Isn't the actual GC implementation also fragmenting the heap? But I don't have any idea to this issue.9) Ref counting does not eliminated latency problems, it just reduces them.Where do you mean is the latency with RC? The collecting work is done in the smallest pieces all the time. Only the allocation of new memory needs an unbound time to complete. But it does not need to stop all other threads. e.g. hardware interrupts don't need to be disabled.The proposed C++ shared_ptr<>, which implements ref counting, suffers from all these faults. I haven't seen a heads up benchmark of shared_ptr<> vs mark/sweep, but I wouldn't be surprised if shared_ptr<> turned out to be a significant loser in terms of both performance and memory consumption. That said, I'd like for D to optionally support some form of ref counting, as rc is better for managing scarce resources like file handles.As I said, I would be happy to be able to switch the whole runtime system to RC. If I would loose 50% performance this is acceptable for me if I get latency < 100us. Frank Benoit
Jun 04 2006
Just a few comments, since you invoked my name :P Frank Benoit wrote:Let's think about the scenario of having reference counting (RC) and mark&sweep (MS) in parallel. Like Daniel Keep said, it is used in Python. If Python uses this, the idea cannot be so crazy.Just wanted to point out that if I remember correctly (and I *could* be wrong), the cycle checking is only done on decrefs, since that's the only time that cycles become a problem.They definitely have to be synched: because in decref you've got to: 1. ref--; 2. if( ref == 0 ) free(ptr); 3. if( obj.canHaveCycles ) registerForCycleCheck(obj); If not more. You then also need the incs synched with that, otherwise you could end up "ref++"ing in the middle of "free(ptr)".3) in a multithreaded app, the incs and decs must be synchronizedIsn't a atomic inc/dec enough? (Don't know anything about the performance of the asm "lock" instruction)MS' GC doesn't use refcounting (I assume you're talking about .NET, btw). In fact, to solve this problem, it uses pinning: """ When the runtime marshaler sees that your code is passing to native code a reference to a managed reference object, it automatically pins the object. What this means is that the object is put in a special state where the garbage collector will neither move the object in memory nor remove the object from memory. ... When the native function returns, the marshaled object is automatically unpinned. Automatic pinning is very convenient, but it raises another question. What happens if the native function caches the pointer for use later on? When the function returns, won't the collector be free to move the object? The answer is yes, and the solution for your code in such a situation is to manually pin the object using the System.Runtime.InteropServices.GCHandle type. """ -- http://msdn.microsoft.com/msdnmag/issues/04/10/NET/7) Fixing the compiler to hide all this stuff from the programmer will make it difficult to interface cleanly with C.hm, ok. Here is some more manual work necessary. If you give the last reference to a C lib, isn't this a problem for the MS GC also?Not as badly. The current GC is pretty good at keeping fragmentation to a minimum. Fragmentation happens when you allocate lots of little objects, then free some but not all, and end up with teeny-tiny "holes" in memory that are hard to fill. I think Walter said this because of his comment on using "wrapper" objects to implement the references: you'd end up with lots of "holes" in memory.8) Ref counting can fragment the heap thereby consuming more memory just like the gc can, though the gc typically will consume more memory overall.Isn't the actual GC implementation also fragmenting the heap? But I don't have any idea to this issue.I think he means that it doesn't make latency just disappear; it spreads it over time. It's the old "drop a frog into hot water and it'll jump out, drop it in cold water and then boil it and it won't notice" anecdote. -- Daniel -- Unlike Knuth, I have neither proven or tried the above; it may not even make sense. v2sw5+8Yhw5ln4+5pr6OFPma8u6+7Lw4Tm6+7l6+7D i28a2Xs3MSr2e4/6+7t4TNSMb6HTOp5en5g6RAHCP http://hackerkey.com/9) Ref counting does not eliminated latency problems, it just reduces them.Where do you mean is the latency with RC? The collecting work is done in the smallest pieces all the time. Only the allocation of new memory needs an unbound time to complete. But it does not need to stop all other threads. e.g. hardware interrupts don't need to be disabled.
Jun 04 2006
They definitely have to be synched: because in decref you've got to: 1. ref--; 2. if( ref == 0 ) free(ptr); 3. if( obj.canHaveCycles ) registerForCycleCheck(obj); If not more. You then also need the incs synched with that, otherwise you could end up "ref++"ing in the middle of "free(ptr)".The thread doing the ++ apparently has a pointer to the object without a reference. How could that ever happen? L.
Jun 04 2006
Lionello Lunesu wrote:*slaps forehead* Of course; you're right. I was thinking about race conditions, but obviously got confused. Please refer to my signature. Specifically the "it may not even make sense." part :P -- Daniel -- Unlike Knuth, I have neither proven or tried the above; it may not even make sense. v2sw5+8Yhw5ln4+5pr6OFPma8u6+7Lw4Tm6+7l6+7D i28a2Xs3MSr2e4/6+7t4TNSMb6HTOp5en5g6RAHCP http://hackerkey.com/They definitely have to be synched: because in decref you've got to: 1. ref--; 2. if( ref == 0 ) free(ptr); 3. if( obj.canHaveCycles ) registerForCycleCheck(obj); If not more. You then also need the incs synched with that, otherwise you could end up "ref++"ing in the middle of "free(ptr)".The thread doing the ++ apparently has a pointer to the object without a reference. How could that ever happen? L.
Jun 04 2006
If not more. You then also need the incs synched with that, otherwise you could end up "ref++"ing in the middle of "free(ptr)".As allready pointed out by Lionello, if the counter goes zero there is no other manipulator.MS=mark&sweep :)7) Fixing the compiler to hide all this stuff from the programmer will make it difficult to interface cleanly with C.hm, ok. Here is some more manual work necessary. If you give the last reference to a C lib, isn't this a problem for the MS GC also?
Jun 05 2006
Frank Benoit wrote:Ok, i see now that my "best and simplest" solution is neiter of both :) But I don't want to throw the idea away so fast. Please lets discuss it a bit further.I'm not throwing it away so fast, I've thought about it for several years <g>.Let's think about the scenario of having reference counting (RC) and mark&sweep (MS) in parallel. Like Daniel Keep said, it is used in Python. If Python uses this, the idea cannot be so crazy.Maybe, maybe not. Python isn't known for its speed. Java uses gc, not rc.There are ways to avoid the RC cycle problem, but I don't know how expensive they are.1) cyclical data structures won't get free'dCombine RC and MS. So cycles are detected. It should be possible to avoid cycles at all, if you don't want to have any MS pause.Yes.2) every pointer copy requires an increment and a corresponding decrement - including when simply passing a reference to a functionThis is true for a smart pointer class. But is it really true for a compiler supported RC?If I call a func/method, I hold a valid reference to the object. While this reference is copied down the stack, these are only temporary reference. And all of them are release before the call returns to the caller. You can ignore them all for RC.This fails if the passed reference 'escapes'. It can escape by being assigned to a global, being assigned to a class member, being inserted into some data structure that lives longer than the function, and being returned by the function.Only if the reference is stored, the ref counter has to be incremented. This is only possible if this is done by the compiler. No smart pointer can do this :)That requires synchronizing.3) in a multithreaded app, the incs and decs must be synchronizedIsn't a atomic inc/dec enough?(Don't know anything about the performance of the asm "lock" instruction)It's slow enough to be noticable.Yes, you can find out the beginning of an arbitrary pointer's memory block. But that isn't a cheap operation.4) exception handlers (finally's) must be inserted to handle all the decs so there are no leaks. Contrary to assertions otherwise, there is no such thing as "zero overhead exceptions."Yes, this is additional overhead in code size and runtime. And there are additional post destructors needed, to set all member refs to null, which wasn't nulled by the dtor.5) in order to support slicing and interior pointers, as well as supporting reference counting on arbitrary allocations of non-object data, a separate "wrapper" object must be allocated for each allocation to be ref counted. This essentially doubles the number of allocations needed.The MS GC allready has a memory registry. Isn't there stored, where an object begins? Let the ref counter be in this registry. This should work with all types and slicing and object member references.Finding the start of the memory chunk will be several times more expensive than the double indirect.6) The wrapper object will mean that all pointers will need to be double-dereferenced to access the data.=> 5.)I don't know anything about the MS GC.7) Fixing the compiler to hide all this stuff from the programmer will make it difficult to interface cleanly with C.hm, ok. Here is some more manual work necessary. If you give the last reference to a C lib, isn't this a problem for the MS GC also?Yes. malloc will fragment the heap, too. But only GC has the hope of doing compaction.8) Ref counting can fragment the heap thereby consuming more memory just like the gc can, though the gc typically will consume more memory overall.Isn't the actual GC implementation also fragmenting the heap?But I don't have any idea to this issue.That's the same with gc. Only upon an allocation is the time unbound.9) Ref counting does not eliminated latency problems, it just reduces them.Where do you mean is the latency with RC? The collecting work is done in the smallest pieces all the time. Only the allocation of new memory needs an unbound time to complete.But it does not need to stop all other threads. e.g. hardware interrupts don't need to be disabled.GC doesn't require disabling hardware interrupts. It does not need to stop any threads that do not hold references to GC allocated data.Latency cannot be guaranteed even with malloc(). I don't know what constraints you have on the system you're developing. But when I've written real time interrupt code, the ISRs were written to not make any OS calls or any allocations. All they'd do is just gather data and post it to a FIFO buffer. A non-realtime thread would then pull the data out of the buffer and process it. The system would work without dropping data as long as the average (not maximum) processing time per datum was less than the data acquisition rate.The proposed C++ shared_ptr<>, which implements ref counting, suffers from all these faults. I haven't seen a heads up benchmark of shared_ptr<> vs mark/sweep, but I wouldn't be surprised if shared_ptr<> turned out to be a significant loser in terms of both performance and memory consumption. That said, I'd like for D to optionally support some form of ref counting, as rc is better for managing scarce resources like file handles.As I said, I would be happy to be able to switch the whole runtime system to RC. If I would loose 50% performance this is acceptable for me if I get latency < 100us.
Jun 04 2006
I'm not throwing it away so fast, I've thought about it for several years <g>.I meant myself, hehe.Yes, and the compiler can catch these cases. With incr/decr the parameter reference you win nothing. So it should be possible to optimize them.If I call a func/method, I hold a valid reference to the object. While this reference is copied down the stack, these are only temporary reference. And all of them are release before the call returns to the caller. You can ignore them all for RC.This fails if the passed reference 'escapes'. It can escape by being assigned to a global, being assigned to a class member, being inserted into some data structure that lives longer than the function, and being returned by the function.yikes, you are absolutely right. So here comes the more ugly solution: Every pointer is allways a struct with the pointer to the data and a pointer to the refcounter.That requires synchronizing.3) in a multithreaded app, the incs and decs must be synchronizedIsn't a atomic inc/dec enough?(Don't know anything about the performance of the asm "lock" instruction)It's slow enough to be noticable. Yes, you can find out the beginning of an arbitrary pointer's memory block. But that isn't a cheap operation.you do :) MS=mark&sweepI don't know anything about the MS GC.7) Fixing the compiler to hide all this stuff from the programmer will make it difficult to interface cleanly with C.hm, ok. Here is some more manual work necessary. If you give the last reference to a C lib, isn't this a problem for the MS GC also?And additional the time is unbound for the GC fullcollect...That's the same with gc. Only upon an allocation is the time unbound.9) Ref counting does not eliminated latency problems, it just reduces them.Where do you mean is the latency with RC? The collecting work is done in the smallest pieces all the time. Only the allocation of new memory needs an unbound time to complete.If you want to have you ISR written in D, you have two choices: a) be sure that you do not move a reference (refb=refa; refa=null;). This can end up in GC deleting a living object. If you cannot guarantee this, than b) disable the interrupting code also while the GC is running. But this is not acceptable, so go back to a) or have another GC.But it does not need to stop all other threads. e.g. hardware interrupts don't need to be disabled.GC doesn't require disabling hardware interrupts. It does not need to stop any threads that do not hold references to GC allocated data.Latency cannot be guaranteed even with malloc(). I don't know what constraints you have on the system you're developing. But when I've written real time interrupt code, the ISRs were written to not make any OS calls or any allocations. All they'd do is just gather data and post it to a FIFO buffer. A non-realtime thread would then pull the data out of the buffer and process it. The system would work without dropping data as long as the average (not maximum) processing time per datum was less than the data acquisition rate.Yes, I will also not do any OS call, neither I would do a "new". And I can easily check the code for not doing this. I even can do a check into the GC allocator, that it is not called in a ISR. But a moving reference can corrupt the working GC. And there is no way to check for that.
Jun 05 2006
Frank Benoit wrote:It can only do it if it has full source available. It also won't work for virtual functions, as it is not known at compile time what code will be called.This fails if the passed reference 'escapes'. It can escape by being assigned to a global, being assigned to a class member, being inserted into some data structure that lives longer than the function, and being returned by the function.Yes, and the compiler can catch these cases. With incr/decr the parameter reference you win nothing. So it should be possible to optimize them.So here comes the more ugly solution: Every pointer is allways a struct with the pointer to the data and a pointer to the refcounter.It's the wrapper approach, with two allocations per allocation.I thought you meant MS=Microsoft <g>. No, it isn't a problem, as it scans the C data segment and stack, too.you do :) MS=mark&sweepI don't know anything about the MS GC.7) Fixing the compiler to hide all this stuff from the programmer will make it difficult to interface cleanly with C.hm, ok. Here is some more manual work necessary. If you give the last reference to a C lib, isn't this a problem for the MS GC also?The collector only runs during an allocation request - which means you can completely control when a collection can happen.And additional the time is unbound for the GC fullcollect...That's the same with gc. Only upon an allocation is the time unbound.9) Ref counting does not eliminated latency problems, it just reduces them.Where do you mean is the latency with RC? The collecting work is done in the smallest pieces all the time. Only the allocation of new memory needs an unbound time to complete.c) code the ISR so it does not reference any gc allocated data. For example, use malloc() for dynamic data the ISR needs. d) make sure any gc references used by the ISR have another reference to them that the gc will scan (such as in the global data segment).If you want to have you ISR written in D, you have two choices: a) be sure that you do not move a reference (refb=refa; refa=null;). This can end up in GC deleting a living object. If you cannot guarantee this, than b) disable the interrupting code also while the GC is running. But this is not acceptable, so go back to a) or have another GC.But it does not need to stop all other threads. e.g. hardware interrupts don't need to be disabled.GC doesn't require disabling hardware interrupts. It does not need to stop any threads that do not hold references to GC allocated data.
Jun 05 2006
In article <e60vmu$1b6t$1 digitaldaemon.com>, Walter Bright says.....Correct me if I'm wrong, but while this is true for M + S, doesn't RC /kind of/ have the opposite problem? In refcounting systems, allocation would be mostly predictable, a la malloc, but dereferences can trigger a long chain of other dereferences. I say 'kind of' because it is more deterministic than GC - you can always dig deeper to see what is going to be freed by a given dereference. For instance, if you are the last person to let go of a hash table, you have to wait for the whole thing to unravel, along with any objects stored inside. I guess a real-time RC implementation could be made by keeping a 'recycling bin' and only taking apart N objects at a time, i.e. incremental freeing. This would un-guarantee the prompt running of destructors though (which seems to be one of the big rationales for RC in D, right, at least for non-memory objects?) KevinThe collector only runs during an allocation request - which means you can completely control when a collection can happen.And additional the time is unbound for the GC fullcollect...That's the same with gc. Only upon an allocation is the time unbound.9) Ref counting does not eliminated latency problems, it just reduces them.Where do you mean is the latency with RC? The collecting work is done in the smallest pieces all the time. Only the allocation of new memory needs an unbound time to complete.
Jun 07 2006
Walter Bright wrote:Are you sure? It says in http://www.iecc.com/gclist/GC-algorithms.html#Reference%20counting that: "In a multi-threaded system, if threads are preemptively scheduled, reference counting requires additional care to ensure that the updates to reference counts are atomic. This is straightforward using hardware primitives such as fetch-and-add, compare-and-swap, or load-linked/store-conditional, but it is definitely necessary to pay attention to this detail." ? -- Bruno Medeiros - CS/E student http://www.prowiki.org/wiki4d/wiki.cgi?BrunoMedeiros#DOnly if the reference is stored, the ref counter has to be incremented. This is only possible if this is done by the compiler. No smart pointer can do this :)That requires synchronizing.3) in a multithreaded app, the incs and decs must be synchronizedIsn't a atomic inc/dec enough?
Jun 07 2006
Bruno Medeiros wrote:Walter Bright wrote:These are synchronizing operations. SeanAre you sure? It says in http://www.iecc.com/gclist/GC-algorithms.html#Reference%20counting that: "In a multi-threaded system, if threads are preemptively scheduled, reference counting requires additional care to ensure that the updates to reference counts are atomic. This is straightforward using hardware primitives such as fetch-and-add, compare-and-swap, or load-linked/store-conditional, but it is definitely necessary to pay attention to this detail." ?Only if the reference is stored, the ref counter has to be incremented. This is only possible if this is done by the compiler. No smart pointer can do this :)That requires synchronizing.3) in a multithreaded app, the incs and decs must be synchronizedIsn't a atomic inc/dec enough?
Jun 07 2006
Sean Kelly wrote:Bruno Medeiros wrote:By "That requires synchronizing." I thought Walter meant stuff with mutexes, etc. (i.e., non-trivial stuff), because, can't those hardware synchronizing operations be called "atomic inc/dec" ? -- Bruno Medeiros - CS/E student http://www.prowiki.org/wiki4d/wiki.cgi?BrunoMedeiros#DWalter Bright wrote:These are synchronizing operations. SeanAre you sure? It says in http://www.iecc.com/gclist/GC-algorithms.html#Reference%20counting that: "In a multi-threaded system, if threads are preemptively scheduled, reference counting requires additional care to ensure that the updates to reference counts are atomic. This is straightforward using hardware primitives such as fetch-and-add, compare-and-swap, or load-linked/store-conditional, but it is definitely necessary to pay attention to this detail." ?Only if the reference is stored, the ref counter has to be incremented. This is only possible if this is done by the compiler. No smart pointer can do this :)That requires synchronizing.3) in a multithreaded app, the incs and decs must be synchronizedIsn't a atomic inc/dec enough?
Jun 10 2006
Bruno Medeiros wrote:Sean Kelly wrote:For the most part. There are really two issues here. First, the operation must be atomic with respect to other threads. The x86 actually guarantees this for all load/store operations on aligned data <= the bus size. Second, the operations must be observed in the proper order by other threads--ie. the compiler and all involved CPUs (reader and writer) are not allowed to reorder the operations or do any other fancy tricks that may result in a load/store order that isn't the same as program order. For the compiler, this is where volatile comes in, and at the hardware level, a synchronizing operation may be required depending on the situation and on the hardware involved. Again, with x86 you can get away without a LOCK a lot of the time, but not always. So you really must assume that any competing operation has a worst case cost of... well, a lot. SeanBruno Medeiros wrote:By "That requires synchronizing." I thought Walter meant stuff with mutexes, etc. (i.e., non-trivial stuff), because, can't those hardware synchronizing operations be called "atomic inc/dec" ?Walter Bright wrote:These are synchronizing operations.Are you sure? It says in http://www.iecc.com/gclist/GC-algorithms.html#Reference%20counting that: "In a multi-threaded system, if threads are preemptively scheduled, reference counting requires additional care to ensure that the updates to reference counts are atomic. This is straightforward using hardware primitives such as fetch-and-add, compare-and-swap, or load-linked/store-conditional, but it is definitely necessary to pay attention to this detail." ?Only if the reference is stored, the ref counter has to be incremented. This is only possible if this is done by the compiler. No smart pointer can do this :)That requires synchronizing.3) in a multithreaded app, the incs and decs must be synchronizedIsn't a atomic inc/dec enough?
Jun 10 2006
Sean Kelly wrote:Bruno Medeiros wrote:Ah, I see. I had recently read about relative out-of-order execution problems in the Memory Barrier wikipedia entry, and I got the impression (out of nowhere) that the hardware took care of that transparently (i.e., without program intervention), but then, that is not the case, not allways at least? -- Bruno Medeiros - CS/E student http://www.prowiki.org/wiki4d/wiki.cgi?BrunoMedeiros#DSean Kelly wrote:For the most part. There are really two issues here. First, the operation must be atomic with respect to other threads. The x86 actually guarantees this for all load/store operations on aligned data <= the bus size. Second, the operations must be observed in the proper order by other threads--ie. the compiler and all involved CPUs (reader and writer) are not allowed to reorder the operations or do any other fancy tricks that may result in a load/store order that isn't the same as program order. For the compiler, this is where volatile comes in, and at the hardware level, a synchronizing operation may be required depending on the situation and on the hardware involved. Again, with x86 you can get away without a LOCK a lot of the time, but not always. So you really must assume that any competing operation has a worst case cost of... well, a lot. SeanBruno Medeiros wrote:By "That requires synchronizing." I thought Walter meant stuff with mutexes, etc. (i.e., non-trivial stuff), because, can't those hardware synchronizing operations be called "atomic inc/dec" ?Walter Bright wrote:These are synchronizing operations.Are you sure? It says in http://www.iecc.com/gclist/GC-algorithms.html#Reference%20counting that: "In a multi-threaded system, if threads are preemptively scheduled, reference counting requires additional care to ensure that the updates to reference counts are atomic. This is straightforward using hardware primitives such as fetch-and-add, compare-and-swap, or load-linked/store-conditional, but it is definitely necessary to pay attention to this detail." ?Only if the reference is stored, the ref counter has to be incremented. This is only possible if this is done by the compiler. No smart pointer can do this :)That requires synchronizing.3) in a multithreaded app, the incs and decs must be synchronizedIsn't a atomic inc/dec enough?
Jun 12 2006
Bruno Medeiros wrote:Ah, I see. I had recently read about relative out-of-order execution problems in the Memory Barrier wikipedia entry, and I got the impression (out of nowhere) that the hardware took care of that transparently (i.e., without program intervention), but then, that is not the case, not allways at least?A single CPU is allowed to do whatever it wants so long as it can fool the user into thinking it's executing instructions in a purely sequential manner. However, the only information other CPUs in the system have is what they observe on the system bus. Obviously, so long as CPUs aren't sharing data there aren't any problems. But things get sticky when this isn't the case. The memory model defines observable behavior allowed for a given architecture, as well as any methods offered for affecting that behavior. Say a specific architecture can operate much more quickly if it is allowed to perform writes to memory (from cache) in order of ascending address rather than in the order they were issued in code. There's no way to lie to other CPUs about the order in which writes occur and still have the optimization have any effect, so the designers state in the memory model spec that this architecture is allowed to reorder writes and then typically provide some means for overriding this behavior should it prove necessary. Now let's suppose you have two threads doing something like this: thread/CPU 1: A = 1; B = 2; thread/CPU 2: if( A == 1 ) { assert( B == 2 ); } Given the order in which a and b were declared, and therefore the order in which the writes occur, this assert may or may not fail. Enter memory barriers. Memory barriers are simply a way for the programmer to tell the CPU "I don't care if your way is faster, A simply must be written to memory before B or thread 2 will explode." So the CPU behind thread 1 does as it's told at great expense to performance and thread 2 doesn't melt down. The sticky part is that hardware designers don't agree with one another on how things should work and they never take the advice of the software people, so all architectures have different sets of observable behavior and different methods for working around it when necessary. However, the common concept is that memory barriers all constrain the order in which memory accesses may occur with respect to each other. Think of it as an assertion that "X may not occur before Y" or "X may not occur after Y" at the instruction level. The x86 is actually a bit weird in this regard as it has no formal memory barriers for normal operations (though it has the FENCE instructions for SSE use). I think this is largely for historical reasons--x86 PCs couldn't do SMP at all until fairly recently so none of this mattered, and the memory model has always been fairly strict (it was actually sequential until not terribly long ago). Also, the LOCK instruction acts as a heavy-handed sort of memory barrier as well, so there has been little motivation to add new instructions for finer-grained control. Sean
Jun 13 2006
Sean Kelly wrote:Bruno Medeiros wrote:Cool; learn something new every day. Thanks for the informative post. -- Daniel -- Unlike Knuth, I have neither proven or tried the above; it may not even make sense. v2sw5+8Yhw5ln4+5pr6OFPma8u6+7Lw4Tm6+7l6+7D i28a2Xs3MSr2e4/6+7t4TNSMb6HTOp5en5g6RAHCP http://hackerkey.com/Ah, I see. I had recently read about relative out-of-order execution problems in the Memory Barrier wikipedia entry, and I got the impression (out of nowhere) that the hardware took care of that transparently (i.e., without program intervention), but then, that is not the case, not allways at least?A single CPU is allowed to do whatever it wants so long as it can fool the user into thinking it's executing instructions in a purely sequential manner. However, the only information other CPUs in the system have is what they observe on the system bus. Obviously, so long as CPUs aren't sharing data there aren't any problems. But things get sticky when this isn't the case. The memory model defines observable behavior allowed for a given architecture, as well as any methods offered for affecting that behavior. Say a specific architecture can operate much more quickly if it is allowed to perform writes to memory (from cache) in order of ascending address rather than in the order they were issued in code. There's no way to lie to other CPUs about the order in which writes occur and still have the optimization have any effect, so the designers state in the memory model spec that this architecture is allowed to reorder writes and then typically provide some means for overriding this behavior should it prove necessary. Now let's suppose you have two threads doing something like this: thread/CPU 1: A = 1; B = 2; thread/CPU 2: if( A == 1 ) { assert( B == 2 ); } Given the order in which a and b were declared, and therefore the order in which the writes occur, this assert may or may not fail. Enter memory barriers. Memory barriers are simply a way for the programmer to tell the CPU "I don't care if your way is faster, A simply must be written to memory before B or thread 2 will explode." So the CPU behind thread 1 does as it's told at great expense to performance and thread 2 doesn't melt down. The sticky part is that hardware designers don't agree with one another on how things should work and they never take the advice of the software people, so all architectures have different sets of observable behavior and different methods for working around it when necessary. However, the common concept is that memory barriers all constrain the order in which memory accesses may occur with respect to each other. Think of it as an assertion that "X may not occur before Y" or "X may not occur after Y" at the instruction level. The x86 is actually a bit weird in this regard as it has no formal memory barriers for normal operations (though it has the FENCE instructions for SSE use). I think this is largely for historical reasons--x86 PCs couldn't do SMP at all until fairly recently so none of this mattered, and the memory model has always been fairly strict (it was actually sequential until not terribly long ago). Also, the LOCK instruction acts as a heavy-handed sort of memory barrier as well, so there has been little motivation to add new instructions for finer-grained control. Sean
Jun 14 2006
Sean Kelly wrote:Bruno Medeiros wrote:Nice post! Makes me think, how does one keep up with this? I mean, one who isn't (nor wishes to be) a hardware expert, but wants to keep up with the general developments in this area, thus maintaining an overview of it. -- Bruno Medeiros - CS/E student http://www.prowiki.org/wiki4d/wiki.cgi?BrunoMedeiros#DAh, I see. I had recently read about relative out-of-order execution problems in the Memory Barrier wikipedia entry, and I got the impression (out of nowhere) that the hardware took care of that transparently (i.e., without program intervention), but then, that is not the case, not allways at least?A single CPU is allowed to do whatever it wants so long as it can fool the user into thinking it's executing instructions in a purely sequential manner. However, the only information other CPUs in the system have is what they observe on the system bus. Obviously, so long as CPUs aren't sharing data there aren't any problems. But things get sticky when this isn't the case. The memory model defines observable behavior allowed for a given architecture, as well as any methods offered for affecting that behavior. Say a specific architecture can operate much more quickly if it is allowed to perform writes to memory (from cache) in order of ascending address rather than in the order they were issued in code. There's no way to lie to other CPUs about the order in which writes occur and still have the optimization have any effect, so the designers state in the memory model spec that this architecture is allowed to reorder writes and then typically provide some means for overriding this behavior should it prove necessary. Now let's suppose you have two threads doing something like this: thread/CPU 1: A = 1; B = 2; thread/CPU 2: if( A == 1 ) { assert( B == 2 ); } Given the order in which a and b were declared, and therefore the order in which the writes occur, this assert may or may not fail. Enter memory barriers. Memory barriers are simply a way for the programmer to tell the CPU "I don't care if your way is faster, A simply must be written to memory before B or thread 2 will explode." So the CPU behind thread 1 does as it's told at great expense to performance and thread 2 doesn't melt down. The sticky part is that hardware designers don't agree with one another on how things should work and they never take the advice of the software people, so all architectures have different sets of observable behavior and different methods for working around it when necessary. However, the common concept is that memory barriers all constrain the order in which memory accesses may occur with respect to each other. Think of it as an assertion that "X may not occur before Y" or "X may not occur after Y" at the instruction level. The x86 is actually a bit weird in this regard as it has no formal memory barriers for normal operations (though it has the FENCE instructions for SSE use). I think this is largely for historical reasons--x86 PCs couldn't do SMP at all until fairly recently so none of this mattered, and the memory model has always been fairly strict (it was actually sequential until not terribly long ago). Also, the LOCK instruction acts as a heavy-handed sort of memory barrier as well, so there has been little motivation to add new instructions for finer-grained control. Sean
Jun 14 2006
Bruno Medeiros wrote:Makes me think, how does one keep up with this? I mean, one who isn't (nor wishes to be) a hardware expert, but wants to keep up with the general developments in this area, thus maintaining an overview of it.comp.programming.threads is worth keeping an eye on, though the jargon can get a bit thick there at times. The C++ committee is also working on a new memory model for C++, so any discussion there may be useful. The easiest way to keep an eye on this is follow the links from Hans Boehm's website (http://www.hpl.hp.com/personal/Hans_Boehm/) though you could keep an eye on comp.std.c++ as well if you're having trouble staying awake at night. Finally, any paper with Maurice Herlihy's name on it is a useful resource if you want a deeper understanding of some of these ideas and don't mind some research. He's got a website with links to many of his papers, though IIRC some of them are hard to track down without an ACM membership. Sean
Jun 14 2006
Interesting indeed! I'm currently reading Boehm's article "Threads Cannot be Implemented as a Library". I wonder if this means that threads should become primitives of some sort [in D] ? L.
Jun 14 2006
Lionello Lunesu wrote:Interesting indeed! I'm currently reading Boehm's article "Threads Cannot be Implemented as a Library". I wonder if this means that threads should become primitives of some sort [in D] ?It's been a while since I read that article, but I think D already has enough in-language recognition of threading issues to escape most/all of the problems Boehm mentions: Thread is a standard library component rather than a third-party component, synchronized is available for mutual exclusion, and volatile (plus perhaps inline asm code) is enough to manage lock-free work. A more robust multithread-aware memory model would be good to have, but it's a sticky enough issue that I think Walter is right for waiting on that for now. Sean
Jun 14 2006
Sean Kelly wrote:Lionello Lunesu wrote:...that leaves only atomic operations? Or would "volatile" or "synchronized" take care of memory fencing and such? Would be nice if there were an built-in type with overloaded ++x, x++, x=y, all atomic. (I've seen some C++ templates that do this.) L.Interesting indeed! I'm currently reading Boehm's article "Threads Cannot be Implemented as a Library". I wonder if this means that threads should become primitives of some sort [in D] ?It's been a while since I read that article, but I think D already has enough in-language recognition of threading issues to escape most/all of the problems Boehm mentions: Thread is a standard library component rather than a third-party component, synchronized is available for mutual exclusion, and volatile (plus perhaps inline asm code) is enough to manage lock-free work. A more robust multithread-aware memory model would be good to have, but it's a sticky enough issue that I think Walter is right for waiting on that for now.
Jun 15 2006
Lionello Lunesu wrote:Sean Kelly wrote:"volatile" is actually intended to address compiler optimizations in a similar way to how memory barriers address CPU optimizations. It is a necessary part of any lock-free operation in D. "synchronized" is used for mutual exclusion and typically involves a mutex, so while it should have the proper effect, it's not lock-free.Lionello Lunesu wrote:...that leaves only atomic operations? Or would "volatile" or "synchronized" take care of memory fencing and such?Interesting indeed! I'm currently reading Boehm's article "Threads Cannot be Implemented as a Library". I wonder if this means that threads should become primitives of some sort [in D] ?It's been a while since I read that article, but I think D already has enough in-language recognition of threading issues to escape most/all of the problems Boehm mentions: Thread is a standard library component rather than a third-party component, synchronized is available for mutual exclusion, and volatile (plus perhaps inline asm code) is enough to manage lock-free work. A more robust multithread-aware memory model would be good to have, but it's a sticky enough issue that I think Walter is right for waiting on that for now.Would be nice if there were an built-in type with overloaded ++x, x++, x=y, all atomic. (I've seen some C++ templates that do this.)It would be nice, but I've yet to see a proposal that I like. The problem is that in addition to pure atomic operations (which is how the x86 typically works by default) lock-free programmers typically want to make an assertion about instruction ordering as well, and built-in semantics don't expose this very cleanly. One could simply have volatile types similar to Java where no optimization is allowed on them at all, but this may be too heavy-handed to satisfy some folks. For now, I think a library solution is a reasonable alternative. Ares has had one for quite a while and it's almost standalone so it could be used with Phobos with very little effort. The documentation is here (DDoc seems to want to generate docs for private template functions, so I apologize for the clutter): http://svn.dsource.org/projects/ares/trunk/doc/ares/std/atomic.html And the code itself is here: http://svn.dsource.org/projects/ares/trunk/src/ares/std/atomic.d It currently only works on the x86 and is really intended for API developers, but it does the trick and I'll expand it as needed. For typical use, msync.acq (acquire), msync.rel (release), and msync.none are the flags you're likely to care about. The other more fine-grained flags just alias acq/rel on x86 anyway. In case the docs are confusing, the API calls supported are: load store storeIf (ie. CAS) increment decrement Sean
Jun 15 2006
Sean Kelly wrote:Lionello Lunesu wrote:So I suppose "volatile" could be extended to include memory locking as well! // instructions inside this block are 'lock'ed volatile(atomic) { ++a; a = x; } (or perhaps even "synchronized(memory) {...}", but I think extending volatile makes more sense). Come to think of it, doesn't "synchronized" also imply "volatile"? Or, another possibility would be to add the volatile declaration as in C, but then actually locking all access to the variable: volatile int i; i++;// atomic Seems to me it should be possibly to extend D with a built-in and portable construct for these atomic operations....that leaves only atomic operations? Or would "volatile" or "synchronized" take care of memory fencing and such?"volatile" is actually intended to address compiler optimizations in a similar way to how memory barriers address CPU optimizations. It is a necessary part of any lock-free operation in D. "synchronized" is used for mutual exclusion and typically involves a mutex, so while it should have the proper effect, it's not lock-free.It would be nice, but I've yet to see a proposal that I like. The problem is that in addition to pure atomic operations (which is how the x86 typically works by default) lock-free programmers typically want to make an assertion about instruction ordering as well, and built-in semantics don't expose this very cleanly. One could simply have volatile types similar to Java where no optimization is allowed on them at all, but this may be too heavy-handed to satisfy some folks. For now, I think a library solution is a reasonable alternative. Ares has had one for quite a while and it's almost standalone so it could be used with Phobos with very little effort. The documentation is here (DDoc seems to want to generate docs for private template functions, so I apologize for the clutter): http://svn.dsource.org/projects/ares/trunk/doc/ares/std/atomic.html And the code itself is here: http://svn.dsource.org/projects/ares/trunk/src/ares/std/atomic.dThanks, I'll have a look at them. L.
Jun 16 2006
Lionello Lunesu wrote:Sean Kelly wrote:Yes it could, though for now I prefer it the way it is as it offers more control over exactly what's going on. It helps tremendously that D has such good inline asm support.Lionello Lunesu wrote:So I suppose "volatile" could be extended to include memory locking as well!...that leaves only atomic operations? Or would "volatile" or "synchronized" take care of memory fencing and such?"volatile" is actually intended to address compiler optimizations in a similar way to how memory barriers address CPU optimizations. It is a necessary part of any lock-free operation in D. "synchronized" is used for mutual exclusion and typically involves a mutex, so while it should have the proper effect, it's not lock-free.// instructions inside this block are 'lock'ed volatile(atomic) { ++a; a = x; } (or perhaps even "synchronized(memory) {...}", but I think extending volatile makes more sense). Come to think of it, doesn't "synchronized" also imply "volatile"?Yes. Synchronization mechanisms rely on memory barriers to work properly. In fact, it's been suggested in the past on c.p.t that an empty mutex block: synchronzed {} could be used as a high-level sort of bidirectional memory barrier, except for the risk of the compiler optimizing the call away entirely.Or, another possibility would be to add the volatile declaration as in C, but then actually locking all access to the variable: volatile int i; i++;// atomicThis is sort of how Java implements volatile and it wouldn't surprise me if C++ did something similar. Right now, the "volatile" qualifier offers basically no language guarantees for concurrent programming, as it was actually intended for accessing interrupt data IIRC.Seems to me it should be possibly to extend D with a built-in and portable construct for these atomic operations.Definately. The trouble is coming up with a good design :-) I think the C++ team is definately qualified to do so, but they may have to make some sacrifices in the interest of time. Sean
Jun 16 2006
Sean Kelly wrote:Lionello Lunesu wrote:I'd really like to see this included in the standard distribution. Has anyone made a general-purpose lock-free linked list? Just a low-level list of void * pointers would be fantastic. The documentation is here (DDocSean Kelly wrote:"volatile" is actually intended to address compiler optimizations in a similar way to how memory barriers address CPU optimizations. It is a necessary part of any lock-free operation in D. "synchronized" is used for mutual exclusion and typically involves a mutex, so while it should have the proper effect, it's not lock-free.Lionello Lunesu wrote:...that leaves only atomic operations? Or would "volatile" or "synchronized" take care of memory fencing and such?Interesting indeed! I'm currently reading Boehm's article "Threads Cannot be Implemented as a Library". I wonder if this means that threads should become primitives of some sort [in D] ?It's been a while since I read that article, but I think D already has enough in-language recognition of threading issues to escape most/all of the problems Boehm mentions: Thread is a standard library component rather than a third-party component, synchronized is available for mutual exclusion, and volatile (plus perhaps inline asm code) is enough to manage lock-free work. A more robust multithread-aware memory model would be good to have, but it's a sticky enough issue that I think Walter is right for waiting on that for now.Would be nice if there were an built-in type with overloaded ++x, x++, x=y, all atomic. (I've seen some C++ templates that do this.)It would be nice, but I've yet to see a proposal that I like. The problem is that in addition to pure atomic operations (which is how the x86 typically works by default) lock-free programmers typically want to make an assertion about instruction ordering as well, and built-in semantics don't expose this very cleanly. One could simply have volatile types similar to Java where no optimization is allowed on them at all, but this may be too heavy-handed to satisfy some folks. For now, I think a library solution is a reasonable alternative. Ares has had one for quite a while and it's almost standalone so it could be used with Phobos with very little effort.seems to want to generate docs for private template functions, so I apologize for the clutter): http://svn.dsource.org/projects/ares/trunk/doc/ares/std/atomic.html And the code itself is here: http://svn.dsource.org/projects/ares/trunk/src/ares/std/atomic.d It currently only works on the x86 and is really intended for API developers, but it does the trick and I'll expand it as needed. For typical use, msync.acq (acquire), msync.rel (release), and msync.none are the flags you're likely to care about. The other more fine-grained flags just alias acq/rel on x86 anyway. In case the docs are confusing, the API calls supported are: load store storeIf (ie. CAS) increment decrement Sean
Jun 16 2006
Don Clugston wrote:I'd really like to see this included in the standard distribution. Has anyone made a general-purpose lock-free linked list? Just a low-level list of void * pointers would be fantastic.Mango contains some of Doug Lea's containers and there may be one there. If not, it shouldn't be tremendously difficult to implement or port one. I'm not sure I have the time right now, but if someone wants to the the atomic package in Ares they're more than welcome to. Sean
Jun 16 2006
Sean Kelly wrote:Bruno Medeiros wrote:Well, that's the thing, I just want to have some general knowledge about this area of hardware and concurrency/distributed-systems, not become an expert on it. My time to learn new things is limited (and I definitely have no trouble going to sleep :( ), so my priority goes for learning things that I have immediate need to use/become-involved. If I ever have the need to learn more in-depth I will. You see, concurrency is very important for people interested in server-side programming. But for multimedia programming, it's not that important. For now that is, as it seems that with the coming of multicore CPUs, concurrency is becoming more and more of a general software development topic, relevant even for non-intrinsically concurrent apps. -- Bruno Medeiros - CS/E student http://www.prowiki.org/wiki4d/wiki.cgi?BrunoMedeiros#DMakes me think, how does one keep up with this? I mean, one who isn't (nor wishes to be) a hardware expert, but wants to keep up with the general developments in this area, thus maintaining an overview of it.comp.programming.threads is worth keeping an eye on, though the jargon can get a bit thick there at times. The C++ committee is also working on a new memory model for C++, so any discussion there may be useful. The easiest way to keep an eye on this is follow the links from Hans Boehm's website (http://www.hpl.hp.com/personal/Hans_Boehm/) though you could keep an eye on comp.std.c++ as well if you're having trouble staying awake at night. Finally, any paper with Maurice Herlihy's name on it is a useful resource if you want a deeper understanding of some of these ideas and don't mind some research. He's got a website with links to many of his papers, though IIRC some of them are hard to track down without an ACM membership. Sean
Jun 17 2006
Bruno Medeiros wrote:Sean Kelly wrote:Sadly, I don't know of any more basic sources of information on this stuff. David Butenhof's "Programming with POSIX Threads" is quite good for the basic concepts, but beyond that it's mostly research papers, wiki topics, etc. But don't hesitate to ask on comp.programming.threads or here if you have a question. SeanBruno Medeiros wrote:Well, that's the thing, I just want to have some general knowledge about this area of hardware and concurrency/distributed-systems, not become an expert on it. My time to learn new things is limited (and I definitely have no trouble going to sleep :( ), so my priority goes for learning things that I have immediate need to use/become-involved. If I ever have the need to learn more in-depth I will. You see, concurrency is very important for people interested in server-side programming. But for multimedia programming, it's not that important. For now that is, as it seems that with the coming of multicore CPUs, concurrency is becoming more and more of a general software development topic, relevant even for non-intrinsically concurrent apps.Makes me think, how does one keep up with this? I mean, one who isn't (nor wishes to be) a hardware expert, but wants to keep up with the general developments in this area, thus maintaining an overview of it.comp.programming.threads is worth keeping an eye on, though the jargon can get a bit thick there at times. The C++ committee is also working on a new memory model for C++, so any discussion there may be useful. The easiest way to keep an eye on this is follow the links from Hans Boehm's website (http://www.hpl.hp.com/personal/Hans_Boehm/) though you could keep an eye on comp.std.c++ as well if you're having trouble staying awake at night. Finally, any paper with Maurice Herlihy's name on it is a useful resource if you want a deeper understanding of some of these ideas and don't mind some research. He's got a website with links to many of his papers, though IIRC some of them are hard to track down without an ACM membership.
Jun 17 2006
For what it's worth, incremental GC is similar to refcounting in that the cost is distributed across pointer use to avoid the need for a costly mark/sweep phase. I've even seen hard-realtime incremental GC implementations, so it's a long-term solution worth considering. That said, I would like to be able to create some sort of smart pointer in D for tracking valuable resources, so it's good to hear that Walter is considering this as well. Frank Benoit wrote:Depending on the platform and the synch. requirement, a "lock" type instruction may be necessary. It typically is for non-x86 platforms, but I think you could get away without using one in some cases on x86. The best place to check for optimizations would be the Boost shared_ptr code. IIRC they don't use the Interlocked calls but there are spin locks in some places. The cost of a "lock" instruction is variable based on the hardware, number of CPUs, whether the cache line is shared, etc, but the most reliable estimate I've seen averages locked ops at around 70ns.3) in a multithreaded app, the incs and decs must be synchronizedIsn't a atomic inc/dec enough? (Don't know anything about the performance of the asm "lock" instruction)I've seen analyses that suggested reference counting is actually slower than mark/sweep when measured over the run of the program. The advantage is the avoidance of the unbounded mark/sweep phase, with far more expensive pointer modifications instead. As above, I'd almost rather see incremental GC support in D (it typically requires additional ode generation on pointer modifications, and may be tricky to sort out for C routines like memset). Sean9) Ref counting does not eliminated latency problems, it just reduces them.Where do you mean is the latency with RC? The collecting work is done in the smallest pieces all the time. Only the allocation of new memory needs an unbound time to complete. But it does not need to stop all other threads. e.g. hardware interrupts don't need to be disabled.
Jun 05 2006
Sean Kelly wrote:For what it's worth, incremental GC is similar to refcounting in that the cost is distributed across pointer use to avoid the need for a costly mark/sweep phase. I've even seen hard-realtime incremental GC implementations, so it's a long-term solution worth considering.My _only_ significant beef with D so far as been the unbound time for a garbage collection; therefore, if an incremental garbage collector built into D could guarantee me a set maximum time would be taken for each garbage collection (which I would manually induce as often as possible) then unless the efficiency hit is TERRIBLE I would be all over it. What I'm really curious about, however, is how important this is to _other_ people - especially Walter. Would it be reasonable to say that it seems D has been designed largely without long-term real-time performance in mind? Sure, there are workarounds, but I haven't seen anything elegant yet that will let me use D for such applications without making my memory management more complicated than it would have been in C++. Is anything in the works?
Jun 07 2006
Jeff Parsons wrote:What I'm really curious about, however, is how important this is to _other_ people - especially Walter. Would it be reasonable to say that it seems D has been designed largely without long-term real-time performance in mind? Sure, there are workarounds, but I haven't seen anything elegant yet that will let me use D for such applications without making my memory management more complicated than it would have been in C++. Is anything in the works?The spec doesn't contain any language forbidding such an approach, but I don't think the popular compilers will support it by default. Certainly not any time soon, at any rate. The "problem" with incremental GC is that it requires extra code generation for all pointer modifications to signal the garbage collector that some re-scanning may be necessary. This has obvious performance implications for small apps that don't ever actually need to collect, but tends to level out as run time increases. And, of course, the potential for progress guarantees (possible with incremental GC, but not at all common) is a huge bonus for realtime programming. That aside, there may be other approaches to garbage collection that would make everyone happy and that don't require additional code generation. If you know of one, please say so. My own knowledge of the topic is limited to what I've read in research papers. Sean
Jun 08 2006
On 06/08/2006 06:04 PM, Sean Kelly wrote:Jeff Parsons wrote:[snip][snip]performance in mind? Sure, there are workarounds, but I haven't seen anything elegant yet that will let me use D for such applications without making my memory management more complicated than it would have been in C++. Is anything in the works?The spec doesn't contain any language forbidding such an approach, but I don't think the popular compilers will support it by default. Certainlyprogramming. That aside, there may be other approaches to garbage collection that would make everyone happy and that don't require additional code generation. If you know of one, please say so. My own knowledge of the topic is limited to what I've read in research papers.You could define a policy pointer which could have as one policy, the current Boehm collector. IOW, any ptr<BW> p(T) on the stack would be a root pointer and each data structure would have a map of ptr<BW> locations. Similarly, there could be ptr<Copy> or ptr<IncBw> or even ptr<deterministic> and ptr<collected> instead of the class [collected] SomeClass{...}; class [deterministic] OtherClass{...} proposed by Adrei here: http://tinyurl.com/o5zpm This, of course, would require cooperation from the compiler, but the same it would be true of: class [collected] SomeClass{...}; it's just that some specializations, ptr<collected>, would be predetermined and others, for example, user-defined ones, would not.
Jun 08 2006
Jeff Parsons wrote:My _only_ significant beef with D so far as been the unbound time for a garbage collection; therefore, if an incremental garbage collector built into D could guarantee me a set maximum time would be taken for each garbage collection (which I would manually induce as often as possible) then unless the efficiency hit is TERRIBLE I would be all over it.If you're looking for a *guarantee*, then you cannot even use malloc(). Malloc has unbounded time, and is not predictable. The approach most real time programmers use to deal with this is to preallocate all needed data before entering the real time section, and the real time section does no malloc's or free's. Also, you can use malloc and free in D, in exactly the same way you would in C.
Jun 08 2006
Walter Bright wrote:Jeff Parsons wrote:For what it's worth, I have read papers describing hard real-time incremental garbage collectors, but I haven't yet seen one available outside research circles. I'd like to give one a shot at some point, but I've far too many other distractions to know when that will be. If anyone is interested however, some of the papers are available here: http://www.cs.utexas.edu/users/oops/papers.html See papers 11 and 14.My _only_ significant beef with D so far as been the unbound time for a garbage collection; therefore, if an incremental garbage collector built into D could guarantee me a set maximum time would be taken for each garbage collection (which I would manually induce as often as possible) then unless the efficiency hit is TERRIBLE I would be all over it.If you're looking for a *guarantee*, then you cannot even use malloc(). Malloc has unbounded time, and is not predictable.The approach most real time programmers use to deal with this is to preallocate all needed data before entering the real time section, and the real time section does no malloc's or free's.This is certainly the easiest and probably the safest method. Sean
Jun 08 2006
Sean Kelly wrote:Walter Bright wrote:Yup. It gets the job done. Real time threads should be considered a very valuable resource, and as such all possible computation should be removed from them and put into non-realtime threads.The approach most real time programmers use to deal with this is to preallocate all needed data before entering the real time section, and the real time section does no malloc's or free's.This is certainly the easiest and probably the safest method.
Jun 09 2006
On Thu, 08 Jun 2006 22:18:24 -0700, Walter Bright wrote:The approach most real time programmers use to deal with this is to preallocate all needed data before entering the real time section, and the real time section does no malloc's or free's. Also, you can use malloc and free in D, in exactly the same way you would in C.It seems obvious that if one doesn't want the Garbage Collector to collect garbage, one doesn't create any garbage for it to collect. ;-) -- Derek (skype: derek.j.parnell) Melbourne, Australia "Down with mediocrity!" 9/06/2006 4:44:56 PM
Jun 08 2006
Derek Parnell wrote:On Thu, 08 Jun 2006 22:18:24 -0700, Walter Bright wrote:This gets sticky if one wants to use strings in D though, as there's no real way to make them work outside of the garbage collector mechanism. Sure we could use C-style pointers, but it's not a terribly attractive option. I suppose the alternative would be a class with a custom allocator method. SeanThe approach most real time programmers use to deal with this is to preallocate all needed data before entering the real time section, and the real time section does no malloc's or free's. Also, you can use malloc and free in D, in exactly the same way you would in C.It seems obvious that if one doesn't want the Garbage Collector to collect garbage, one doesn't create any garbage for it to collect. ;-)
Jun 08 2006
Sean Kelly wrote:Derek Parnell wrote:You can use D arrays. Just allocate them using malloc (or statically allocate them). Don't concatenate them. Though I would ask why the real time code was allocating strings? C is very good for writing real time code, and you can write "C"-style code in D, and it will behave exactly like the corresponding C code. You can call printf, strcpy, malloc, etc., which will behave just like C's printf, strcpy, and malloc because they *are* C's functions. It will not do a gc collect unless the code allocates gc memory, and it will ONLY do a gc collect during such an allocation. Heck, my old Empire game was written in C. I renamed the files from .c to .d, did a few minor syntactic edits, and voila! it was now in D. It runs exactly the same. But the bottom line is that a real time thread should not be doing storage allocation. It shouldn't be calling new, or malloc. It shouldn't do storage allocation regardless of whether it is written in D, C, C++, or assembler.On Thu, 08 Jun 2006 22:18:24 -0700, Walter Bright wrote:This gets sticky if one wants to use strings in D though, as there's no real way to make them work outside of the garbage collector mechanism. Sure we could use C-style pointers, but it's not a terribly attractive option.The approach most real time programmers use to deal with this is to preallocate all needed data before entering the real time section, and the real time section does no malloc's or free's. Also, you can use malloc and free in D, in exactly the same way you would in C.It seems obvious that if one doesn't want the Garbage Collector to collect garbage, one doesn't create any garbage for it to collect. ;-)
Jun 09 2006
Walter Bright wrote:Sean Kelly wrote:So then I'd just manually set the ptr and len members? Sounds reasonable, provided everyone behaves and uses the copy on write idiom you suggest.Derek Parnell wrote:You can use D arrays. Just allocate them using malloc (or statically allocate them). Don't concatenate them.On Thu, 08 Jun 2006 22:18:24 -0700, Walter Bright wrote:This gets sticky if one wants to use strings in D though, as there's no real way to make them work outside of the garbage collector mechanism. Sure we could use C-style pointers, but it's not a terribly attractive option.The approach most real time programmers use to deal with this is to preallocate all needed data before entering the real time section, and the real time section does no malloc's or free's. Also, you can use malloc and free in D, in exactly the same way you would in C.It seems obvious that if one doesn't want the Garbage Collector to collect garbage, one doesn't create any garbage for it to collect. ;-)Though I would ask why the real time code was allocating strings?I was mostly thinking about this in the context of just how much allocation pretty much must be performed by the GC for applications with special memory requirements (gigabytes of data, for example). It just seemed to have an overlap insofar as this sort of programming is concerned.C is very good for writing real time code, and you can write "C"-style code in D, and it will behave exactly like the corresponding C code. You can call printf, strcpy, malloc, etc., which will behave just like C's printf, strcpy, and malloc because they *are* C's functions. It will not do a gc collect unless the code allocates gc memory, and it will ONLY do a gc collect during such an allocation.At times I've considered extending thread support to allow for the creation of detached threads, but it simply seems to error-prone to be worthwhile. I think you're right in that if someone is doing real-time programming then they shouldn't expect much hand-holding, because the assumptions involved offer too great a risk of unexpected behavior.Heck, my old Empire game was written in C. I renamed the files from .c to .d, did a few minor syntactic edits, and voila! it was now in D. It runs exactly the same.Yup. Though it's worth noting that D has been heavily influenced by your particular programming style. Not at all a bad thing, but it does imply that you are less likely than anyone else to experience code translation problems :-) On a related note, I sometimes wonder if Sid Meier credits you anywhere for effectively creating the genre that's made him so successful ;-)But the bottom line is that a real time thread should not be doing storage allocation. It shouldn't be calling new, or malloc. It shouldn't do storage allocation regardless of whether it is written in D, C, C++, or assembler.True enough. Sean Sean
Jun 09 2006
Sean Kelly wrote:Walter Bright wrote:int *p = cast(int *)calloc(dim, sizeof(int)); int[] a = p[0 .. dim];You can use D arrays. Just allocate them using malloc (or statically allocate them). Don't concatenate them.So then I'd just manually set the ptr and len members? Sounds reasonable, provided everyone behaves and uses the copy on write idiom you suggest.I agree that realtime programming is a special type of programming, and that someone doing it should be a more advanced programmer. You can't just take any random code (in any language), make it realtime, and expect it to work properly. It's got to be engineered for the peculiarities of realtime requirements.C is very good for writing real time code, and you can write "C"-style code in D, and it will behave exactly like the corresponding C code. You can call printf, strcpy, malloc, etc., which will behave just like C's printf, strcpy, and malloc because they *are* C's functions. It will not do a gc collect unless the code allocates gc memory, and it will ONLY do a gc collect during such an allocation.At times I've considered extending thread support to allow for the creation of detached threads, but it simply seems to error-prone to be worthwhile. I think you're right in that if someone is doing real-time programming then they shouldn't expect much hand-holding, because the assumptions involved offer too great a risk of unexpected behavior.Nope :-)Heck, my old Empire game was written in C. I renamed the files from .c to .d, did a few minor syntactic edits, and voila! it was now in D. It runs exactly the same.Yup. Though it's worth noting that D has been heavily influenced by your particular programming style. Not at all a bad thing, but it does imply that you are less likely than anyone else to experience code translation problems :-) On a related note, I sometimes wonder if Sid Meier credits you anywhere for effectively creating the genre that's made him so successful ;-)
Jun 09 2006
Walter Bright wrote:Jeff Parsons wrote:It seems that a fairly common desire is to use D for writing games. Games don't typically have hard real-time requirements, but it also doesn't look good if your screen freezes for a second or two in the middle of play. This is an area where an incremental collector or some kind of reference counting solution would be handy, since (especially in large programs) it would be difficult to avoid using garbage collected code. I realize that the programmer has the responsibility to manage memory intelligently, but it would be nice to not avoid using GC code entirely. As others have stated, it would even be worth giving up some overall performance to gain a smooth playing experience.My _only_ significant beef with D so far as been the unbound time for a garbage collection; therefore, if an incremental garbage collector built into D could guarantee me a set maximum time would be taken for each garbage collection (which I would manually induce as often as possible) then unless the efficiency hit is TERRIBLE I would be all over it.If you're looking for a *guarantee*, then you cannot even use malloc(). Malloc has unbounded time, and is not predictable.
Jun 09 2006
David Gileadi wrote:Walter Bright wrote:Didn't Walter point out that that, in the current gc implementation, you just have to be conscientious about when you call 'new' (which appears to initiate a collect) if you want to avoid the gc initiated pauses. So in game programming, I assume it's just a matter of good organization and planning. You make sure you allocate sufficient memory ahead of time and avoid using 'new' in any location that would cause an ugly pause in the game. Furthermore, you might choose appropriate times for when you want to call a full collect manually. As I see it, the obligation for careful design doesn't disappear when a gc is in place. It certainly makes life easier in many ways. But one still has to be aware of the gc shortcomings and adapt to the situation. I still see it as a tremendous improvement over having to clean up after every memory allocation you make (C/C++). In short, using a gc doesn't mean you don't have to do your homework. :) -JJRJeff Parsons wrote:It seems that a fairly common desire is to use D for writing games. Games don't typically have hard real-time requirements, but it also doesn't look good if your screen freezes for a second or two in the middle of play. This is an area where an incremental collector or some kind of reference counting solution would be handy, since (especially in large programs) it would be difficult to avoid using garbage collected code. I realize that the programmer has the responsibility to manage memory intelligently, but it would be nice to not avoid using GC code entirely. As others have stated, it would even be worth giving up some overall performance to gain a smooth playing experience.My _only_ significant beef with D so far as been the unbound time for a garbage collection; therefore, if an incremental garbage collector built into D could guarantee me a set maximum time would be taken for each garbage collection (which I would manually induce as often as possible) then unless the efficiency hit is TERRIBLE I would be all over it.If you're looking for a *guarantee*, then you cannot even use malloc(). Malloc has unbounded time, and is not predictable.
Jun 09 2006
John Reimer wrote:David Gileadi wrote:Yea, in the current GC, allocation is the only thing that may implicitly initiate a collection.Walter Bright wrote:Didn't Walter point out that that, in the current gc implementation, you just have to be conscientious about when you call 'new' (which appears to initiate a collect) if you want to avoid the gc initiated pauses.Jeff Parsons wrote:It seems that a fairly common desire is to use D for writing games. Games don't typically have hard real-time requirements, but it also doesn't look good if your screen freezes for a second or two in the middle of play. This is an area where an incremental collector or some kind of reference counting solution would be handy, since (especially in large programs) it would be difficult to avoid using garbage collected code. I realize that the programmer has the responsibility to manage memory intelligently, but it would be nice to not avoid using GC code entirely. As others have stated, it would even be worth giving up some overall performance to gain a smooth playing experience.My _only_ significant beef with D so far as been the unbound time for a garbage collection; therefore, if an incremental garbage collector built into D could guarantee me a set maximum time would be taken for each garbage collection (which I would manually induce as often as possible) then unless the efficiency hit is TERRIBLE I would be all over it.If you're looking for a *guarantee*, then you cannot even use malloc(). Malloc has unbounded time, and is not predictable.So in game programming, I assume it's just a matter of good organization and planning. You make sure you allocate sufficient memory ahead of time and avoid using 'new' in any location that would cause an ugly pause in the game. Furthermore, you might choose appropriate times for when you want to call a full collect manually.And D's easy array slicing along with array operations (like re-initializing an array with 'arr[] = 0;') makes the code to use pre-allocated buffers a lot easier to write too.As I see it, the obligation for careful design doesn't disappear when a gc is in place. It certainly makes life easier in many ways. But one still has to be aware of the gc shortcomings and adapt to the situation. I still see it as a tremendous improvement over having to clean up after every memory allocation you make (C/C++). In short, using a gc doesn't mean you don't have to do your homework. :) -JJR
Jun 09 2006
David Gileadi wrote:It seems that a fairly common desire is to use D for writing games. Games don't typically have hard real-time requirements, but it also doesn't look good if your screen freezes for a second or two in the middle of play. This is an area where an incremental collector or some kind of reference counting solution would be handy, since (especially in large programs) it would be difficult to avoid using garbage collected code. I realize that the programmer has the responsibility to manage memory intelligently, but it would be nice to not avoid using GC code entirely. As others have stated, it would even be worth giving up some overall performance to gain a smooth playing experience.This is what I'm screaming. It is becoming popular nowadays to allow people to modify games in various ways, one of which is modding the game's code. I don't think you will see much of this happening inside a 3d engine, but the game logic that sits next to the 3d engine and dictates gameplay may be open. This allows people to change things like how the AI behaves, how much damage you take when hit with a certain weapon, how that certain weapon behaves, how an economy behaves, etc. Modders may or may not be experienced programmers, so it's probably best to assume they don't have significant experience. This has caused plenty of games to use scripting languages for modding, since C plus plus is tough to learn and not good for this stuff. I think D could do it though. It would kick butt if D could do this, because it would make the mods run much faster than the interpreted code that is probably going to be used nowadays, even if you have like 50% reference counting overhead. Automatic memory management is a huge boon to that sort of ease of code writing that is expected for inexperienced coders to get in on the action and do constructive stuff. Even without the modding argument, being able to use automatic memory management in some parts of the game would be very advantageous in development time. It seems to me though, that if you want to do non-pausing games in D, you just use C style memory management. Bummer. As David Gileadi said, games don't have strict latency requirements. For games, garbage collection doesn't need to be deterministic, and it doesn't need to finish in under 100 microseconds, it just needs to not be noticable. For a few years I've messed with RunUO which is, afaik, worldsaves were annoying!). This is doable. Also, think of all the new D programmers there would be if someone were to release a popular game that exposed D as a scripting language :)
Jun 12 2006
Frank Benoit wrote:Another garbage collector thread :) The best garbage collector I can think about, is a reference counting one.Uh, what's your definition of "best"? The simplest? Then maybe. If you take a look at Java, it doesn't use a reference counting GC and there is a good reason for this: AFAIK modern GC are far better performance wise than reference counting GC. One GC that I remember (no I'm not an expert) is Mark Copy GC, see "MarkCopy:Fast copying GC with less space overhead". The only thing that reference counting GCs have for them is that this scheme is simple, more or less robust but "best", best for what? I doubt very much that the best GC used for say batch computations is the best for GUI clients.. Regards, RenoX
Jun 04 2006