digitalmars.D - More precise GC
- bearophile (34/34) Mar 28 2010 This is just a small invention.
- William T. Fnk (2/54) Mar 28 2010 This is a rather ridiculous way of emulating algebraic data types and va...
- bearophile (6/8) Mar 28 2010 I think algebraic data types in Haskell don't allow you to use for examp...
- Robert Jacques (10/65) Mar 28 2010 This would require structs/arrays/etc to contain a header with a vtable,...
- bearophile (5/10) Mar 28 2010 They can't be used, the D specs say that pointers to memory managed by t...
- Robert Jacques (20/34) Mar 28 2010 Yes. What I think you're forgetting is that all compile-time type info i...
- bearophile (11/15) Mar 28 2010 Thank you very much for all your explanations, I didn't know that the si...
- Steven Schveighoffer (10/19) Mar 28 2010 The current GC has a simple "type info" if you will -- contains pointers...
- Robert Jacques (3/24) Mar 28 2010 Also, don't forget that classes have a bunch of runtime type info.
- Steven Schveighoffer (5/16) Mar 29 2010 But the GC doesn't use/need this info (except to call the destructors). ...
- Robert Jacques (5/22) Mar 29 2010 Sorry, my comment was more for a D in general than the GC. GC's in gener...
- Robert Jacques (6/20) Mar 28 2010 Yes, hiding bit flags in pointers is generally a bad idea because it mak...
- William T. Fnk (5/18) Mar 28 2010 Why not? Are you scared of the functional language stigma? D is already ...
This is just a small invention. Enums can decrease the precision of the GC, because fields can be pointers or values, an example: enum NodeType { value, sum, product } struct Node { NodeType type; union { double val; struct { Node* left, right; } } } (Node can be used to build a simple expression tree, that can contain things like: 2.5 * 3.0 + 2.0). In this case the GC has to use the safe strategy of always following the left and right pointers, even when they are a double. At runtime the program usually knows what's inside the enum, if the enum truly contains pointers. This information can be inside a tag as in this example, inside a tagged pointer that points to the struct/enum, or at worst in the code that uses the struct/enum. But generally the compiler is not able to use this information, because a compiler usually doesn't understand the program semantics. So I can think of a new optional standard class/struct/enum method, for example gcfollow() that the garbage collector recognizes when it is present and useful. It gets called with the attribute name and returns true at runtime if the the GC has to follow a pointer/reference: struct Node { NodeType type; union { double val; struct { Node* left, right; } } bool gcfollow(string field)() { return type != NodeType.value; } } Note that the GC will call gcfollow() (here a template for more efficiency) only with the strings of "left" or "right", because "type" and "val" attributes can't contain pointers/references. So this gcfollow() will be instantiated only with "left" or "right", and in both cases it returns true if the node isn't a value, so the GC will not mistake the "val" double in the leaves of this expression tree as a couple of pointers. The GC will follow the left and right pointers only for the internal nodes of the tree. This slows down the GC a little but increases its precision (if gcfollow() has no bugs). If gcfollow() is not present the GC acts as it does now, assuming the pointers are always present. If a struct contains no pointers/references the gcfollow() is never used. If a struct contains another struct, the GC will call gcfollow() of the inner struct too if it contains pointers/references, etc. Bye, bearophile
Mar 28 2010
bearophile Wrote:This is just a small invention. Enums can decrease the precision of the GC, because fields can be pointers or values, an example: enum NodeType { value, sum, product } struct Node { NodeType type; union { double val; struct { Node* left, right; } } } (Node can be used to build a simple expression tree, that can contain things like: 2.5 * 3.0 + 2.0). In this case the GC has to use the safe strategy of always following the left and right pointers, even when they are a double. At runtime the program usually knows what's inside the enum, if the enum truly contains pointers. This information can be inside a tag as in this example, inside a tagged pointer that points to the struct/enum, or at worst in the code that uses the struct/enum. But generally the compiler is not able to use this information, because a compiler usually doesn't understand the program semantics. So I can think of a new optional standard class/struct/enum method, for example gcfollow() that the garbage collector recognizes when it is present and useful. It gets called with the attribute name and returns true at runtime if the the GC has to follow a pointer/reference: struct Node { NodeType type; union { double val; struct { Node* left, right; } } bool gcfollow(string field)() { return type != NodeType.value; } } Note that the GC will call gcfollow() (here a template for more efficiency) only with the strings of "left" or "right", because "type" and "val" attributes can't contain pointers/references. So this gcfollow() will be instantiated only with "left" or "right", and in both cases it returns true if the node isn't a value, so the GC will not mistake the "val" double in the leaves of this expression tree as a couple of pointers. The GC will follow the left and right pointers only for the internal nodes of the tree. This slows down the GC a little but increases its precision (if gcfollow() has no bugs). If gcfollow() is not present the GC acts as it does now, assuming the pointers are always present. If a struct contains no pointers/references the gcfollow() is never used. If a struct contains another struct, the GC will call gcfollow() of the inner struct too if it contains pointers/references, etc. Bye, bearophileThis is a rather ridiculous way of emulating algebraic data types and value-oriented programming. But then again, that kind of features might fit perfectly to D or at least provide some food for thought for further bikeshedding.
Mar 28 2010
William T. Fnk:This is a rather ridiculous way of emulating algebraic data types<I think algebraic data types in Haskell don't allow you to use for example enums with no tags, where the tag is stored in the less significant bit of the pointer that points to the enum. And I think algebraic data types will not be added to D, while normal not-GC-precise enums will be kept in D, so...and value-oriented programming.<I don't know what this is. Bye, bearophile
Mar 28 2010
On Sun, 28 Mar 2010 09:40:10 -0300, bearophile <bearophileHUGS lycos.com> wrote:This is just a small invention. Enums can decrease the precision of the GC, because fields can be pointers or values, an example: enum NodeType { value, sum, product } struct Node { NodeType type; union { double val; struct { Node* left, right; } } } (Node can be used to build a simple expression tree, that can contain things like: 2.5 * 3.0 + 2.0). In this case the GC has to use the safe strategy of always following the left and right pointers, even when they are a double. At runtime the program usually knows what's inside the enum, if the enum truly contains pointers. This information can be inside a tag as in this example, inside a tagged pointer that points to the struct/enum, or at worst in the code that uses the struct/enum. But generally the compiler is not able to use this information, because a compiler usually doesn't understand the program semantics. So I can think of a new optional standard class/struct/enum method, for example gcfollow() that the garbage collector recognizes when it is present and useful. It gets called with the attribute name and returns true at runtime if the the GC has to follow a pointer/reference: struct Node { NodeType type; union { double val; struct { Node* left, right; } } bool gcfollow(string field)() { return type != NodeType.value; } } Note that the GC will call gcfollow() (here a template for more efficiency) only with the strings of "left" or "right", because "type" and "val" attributes can't contain pointers/references. So this gcfollow() will be instantiated only with "left" or "right", and in both cases it returns true if the node isn't a value, so the GC will not mistake the "val" double in the leaves of this expression tree as a couple of pointers. The GC will follow the left and right pointers only for the internal nodes of the tree. This slows down the GC a little but increases its precision (if gcfollow() has no bugs). If gcfollow() is not present the GC acts as it does now, assuming the pointers are always present. If a struct contains no pointers/references the gcfollow() is never used. If a struct contains another struct, the GC will call gcfollow() of the inner struct too if it contains pointers/references, etc. Bye, bearophileThis would require structs/arrays/etc to contain a header with a vtable, so the GC could know what to do. You'd probably save memory (the headers cost 16 bytes) and would definitely save collection time simply breaking those unions into things with pointers and things without pointers. In your example, doing that would cost some extra memory, as you'd go from a 16-byte block to a 32-byte block. However, remember, the GC allocates on 16-byte boundaries so each Node* actually has 4-bits (8 total) in which to hide an enum.
Mar 28 2010
Robert Jacques:This would require structs/arrays/etc to contain a header with a vtable, so the GC could know what to do.Do you mean a vtable pointer? Can you explain me why this is necessary?remember, the GC allocates on 16-byte boundaries so each Node* actually has 4-bits (8 total) in which to hide an enum.They can't be used, the D specs say that pointers to memory managed by the GC can't be used to store flags (so I too was wrong in an answer to another person), probably because they are used by the garbage to color the graph in two or three colors. Bye, bearophile
Mar 28 2010
On Sun, 28 Mar 2010 12:32:22 -0300, bearophile <bearophileHUGS lycos.com> wrote:Robert Jacques:Yes. What I think you're forgetting is that all compile-time type info is lost at runtime. All the GC knows about Node is that it's 16-bytes, it doesn't have an object finalizer and it contains pointers somewhere. Hopefully, in the future, it'll even know where those pointers are. But that's it. It doesn't know that that 16-byte memory chunk is a Node or that a gcfollow method exists. This is one reason struct destructors don't run when you allocate them on the heap: the GC simply doesn't know about any function to run. Naturally, this also applies to arrays and value types. So if you want to add type specific GC stuff, you have to add type-specific headers to each memory chuck.This would require structs/arrays/etc to contain a header with a vtable, so the GC could know what to do.Do you mean a vtable pointer? Can you explain me why this is necessary?No, what you can't do is hide flags in high order bits or use tricks like XOR to store two pointers in a single field. The 4 low order bits are fair game: adding 0-15 to the base pointer with still cause the struct to be properly marked. As for using those bits for the mark colors; A) D supports pointers to bytes, so the GC has to leave those bits alone. B) D's GC isn't precise and has no clue what are pointers and what aren't, so it can't use pointers in this fashion. C) The color bits apply to the object pointed to, not the pointer pointing to it.remember, the GC allocates on 16-byte boundaries so each Node* actually has 4-bits (8 total) in which to hide an enum.They can't be used, the D specs say that pointers to memory managed by the GC can't be used to store flags (so I too was wrong in an answer to another person), probably because they are used by the garbage to color the graph in two or three colors. Bye, bearophile
Mar 28 2010
Robert Jacques:What I think you're forgetting is that all compile-time type info is lost at runtime. [... etc]<Thank you very much for all your explanations, I didn't know that the situation is so terrible. I suddenly like not-GC languages more :-) I think the compilation of D code must build a data structure that will be used at runtime by the GC to know the type of all variables and pointers, otherwise there's no hope in a GC that works well in long-running programs. What you have explained me means that my gcfollow is useless (it has another minor problem: sometimes the information regarding the contents of the union is not inside the struct fields, so the gcfollow can have troubles in finding such information far away).No, what you can't do is hide flags in high order bits or use tricks like XOR to store two pointers in a single field. The 4 low order bits are fair game:This page: http://www.digitalmars.com/d/2.0/garbage.html Says: Do not take advantage of alignment of pointers to store bit flags in the low order bits: p = cast(void*)(cast(int)p | 1); // error: undefined behavior Bye, bearophile
Mar 28 2010
On Sun, 28 Mar 2010 12:50:20 -0400, bearophile <bearophileHUGS lycos.com> wrote:Robert Jacques:The current GC has a simple "type info" if you will -- contains pointers or doesn't contain pointers. It doesn't mean we cannot add to that. In fact, I think dsimcha has provided a way to have precise scanning for heap-allocated types. I don't think a reasonably precise GC is out of the question. However, it may be too much to require the GC to do semantic analysis of enums for unions. Not impossible, but probably not worth the effort and restrictions necessary. -SteveWhat I think you're forgetting is that all compile-time type info is lost at runtime. [... etc]<Thank you very much for all your explanations, I didn't know that the situation is so terrible. I suddenly like not-GC languages more :-) I think the compilation of D code must build a data structure that will be used at runtime by the GC to know the type of all variables and pointers, otherwise there's no hope in a GC that works well in long-running programs.
Mar 28 2010
On Sun, 28 Mar 2010 16:16:41 -0300, Steven Schveighoffer <schveiguy yahoo.com> wrote:On Sun, 28 Mar 2010 12:50:20 -0400, bearophile <bearophileHUGS lycos.com> wrote:Also, don't forget that classes have a bunch of runtime type info.Robert Jacques:The current GC has a simple "type info" if you will -- contains pointers or doesn't contain pointers. It doesn't mean we cannot add to that. In fact, I think dsimcha has provided a way to have precise scanning for heap-allocated types. I don't think a reasonably precise GC is out of the question. However, it may be too much to require the GC to do semantic analysis of enums for unions. Not impossible, but probably not worth the effort and restrictions necessary. -SteveWhat I think you're forgetting is that all compile-time type info is lost at runtime. [... etc]<Thank you very much for all your explanations, I didn't know that the situation is so terrible. I suddenly like not-GC languages more :-) I think the compilation of D code must build a data structure that will be used at runtime by the GC to know the type of all variables and pointers, otherwise there's no hope in a GC that works well in long-running programs.
Mar 28 2010
On Sun, 28 Mar 2010 23:30:32 -0400, Robert Jacques <sandford jhu.edu> wrote:On Sun, 28 Mar 2010 16:16:41 -0300, Steven Schveighoffer <schveiguy yahoo.com> wrote:But the GC doesn't use/need this info (except to call the destructors). At least the mark/sweep portion doesn't. -SteveThe current GC has a simple "type info" if you will -- contains pointers or doesn't contain pointers. It doesn't mean we cannot add to that. In fact, I think dsimcha has provided a way to have precise scanning for heap-allocated types. I don't think a reasonably precise GC is out of the question. However, it may be too much to require the GC to do semantic analysis of enums for unions. Not impossible, but probably not worth the effort and restrictions necessary.Also, don't forget that classes have a bunch of runtime type info.
Mar 29 2010
On Mon, 29 Mar 2010 08:09:09 -0300, Steven Schveighoffer <schveiguy yahoo.com> wrote:On Sun, 28 Mar 2010 23:30:32 -0400, Robert Jacques <sandford jhu.edu> wrote:Sorry, my comment was more for a D in general than the GC. GC's in general don't know or need anything beyond a pointer mask and whether to finalize or not.On Sun, 28 Mar 2010 16:16:41 -0300, Steven Schveighoffer <schveiguy yahoo.com> wrote:But the GC doesn't use/need this info (except to call the destructors). At least the mark/sweep portion doesn't. -SteveThe current GC has a simple "type info" if you will -- contains pointers or doesn't contain pointers. It doesn't mean we cannot add to that. In fact, I think dsimcha has provided a way to have precise scanning for heap-allocated types. I don't think a reasonably precise GC is out of the question. However, it may be too much to require the GC to do semantic analysis of enums for unions. Not impossible, but probably not worth the effort and restrictions necessary.Also, don't forget that classes have a bunch of runtime type info.
Mar 29 2010
On Sun, 28 Mar 2010 13:50:20 -0300, bearophile <bearophileHUGS lycos.com> wrote:Robert Jacques:Yes, hiding bit flags in pointers is generally a bad idea because it makes your code dependent on a particular GC/CPU system. Though I do think storing a single bit is going to work on anything greater than a 8-bit system.No, what you can't do is hide flags in high order bits or use tricks like XOR to store two pointers in a single field. The 4 low order bits are fair game:This page: http://www.digitalmars.com/d/2.0/garbage.html Says: Do not take advantage of alignment of pointers to store bit flags in the low order bits: p = cast(void*)(cast(int)p | 1); // error: undefined behavior Bye, bearophile
Mar 28 2010
bearophile Wrote:William T. Fnk:Why not? Are you scared of the functional language stigma? D is already functional enough. It can't possibly get any worse. I can already write pure functional monads with the ugly D2 closure syntax. The language is already ruined in the eyes of practical imperative coders. How I see this is that features like algebraic datatypes are questioning the authority. You have a holy mission to prove that a practical language can be built without such basic features. It indeed can be built, but you need ways to emulate the behavior since it is pretty crucial and common in practical applications. You already emulate it in your code example. And it looks bad. The problem is two-fold, there are newbie users in the community who have no idea what an algebraic data type is. And then there are some more or less arrogant (actually there's only one person whose arrogance exceeds anything I've ever seen) language designers who recently studied the algebraic datatype article from wikipedia. They just don't want to admit they didn't know it before. And it's getting harder and harder to admit that mistake.This is a rather ridiculous way of emulating algebraic data types<I think algebraic data types in Haskell don't allow you to use for example enums with no tags, where the tag is stored in the less significant bit of the pointer that points to the enum. And I think algebraic data types will not be added to D, while normal not-GC-precise enums will be kept in D, so...You need to study more SPJ videos, slides & articles..and value-oriented programming.<I don't know what this is.Bye, bearophile
Mar 28 2010
The problem is two-fold, there are newbie users in the community who have no idea what an algebraic data type is. And then there are some more or less arrogant (actually there's only one person whose arrogance exceeds anything I've ever seen) language designers who recently studied the algebraic datatype article from wikipedia. They just don't want to admit they didn't know it before. And it's getting harder and harder to admit that mistake.How do I emulate untagged unions with an algebraic datatype ? Please enlighten me.
Mar 28 2010
William T. Fnk Wrote:bearophile Wrote:wut's yer point bitch? say it who it is mo'fucker so the nigga can defend his ass. dun fuck around with vague shit like that. geez this group is fucked. sum1 opened them dog pound gatez.William T. Fnk:Why not? Are you scared of the functional language stigma? D is already functional enough. It can't possibly get any worse. I can already write pure functional monads with the ugly D2 closure syntax. The language is already ruined in the eyes of practical imperative coders. How I see this is that features like algebraic datatypes are questioning the authority. You have a holy mission to prove that a practical language can be built without such basic features. It indeed can be built, but you need ways to emulate the behavior since it is pretty crucial and common in practical applications. You already emulate it in your code example. And it looks bad. The problem is two-fold, there are newbie users in the community who have no idea what an algebraic data type is. And then there are some more or less arrogant (actually there's only one person whose arrogance exceeds anything I've ever seen) language designers who recently studied the algebraic datatype article from wikipedia. They just don't want to admit they didn't know it before. And it's getting harder and harder to admit that mistake.This is a rather ridiculous way of emulating algebraic data types<I think algebraic data types in Haskell don't allow you to use for example enums with no tags, where the tag is stored in the less significant bit of the pointer that points to the enum. And I think algebraic data types will not be added to D, while normal not-GC-precise enums will be kept in D, so...
Mar 28 2010