digitalmars.D.learn - How do I find the arity of an Expression? (dmd hacking)
- Chad J (7/7) Nov 29 2009 Given an Expression object in dmd, I'd like to know how many
- Ary Borenszweig (4/12) Nov 30 2009 Where's the Expression object defined in D? I don't think there's such a...
- Ellery Newcomer (2/14) Nov 30 2009 DMD source, eg parse.c, expression.c, etc
- Chad J (2/18) Nov 30 2009 Right. That.
- Ellery Newcomer (17/35) Nov 30 2009 Not that I know anything about DMD outside parse.c, but in expression.h,...
- Ary Borenszweig (3/47) Nov 30 2009 It would be better if dmd's source code implemented the visitor pattern....
- Chad J (96/122) Nov 30 2009 I think I'll reply to both of you in one post since the thoughts are
- Ellery Newcomer (32/72) Nov 30 2009 Personally, I've come to hate this pattern (just from reading DMD src).
- Chad J (45/121) Dec 01 2009 Yeah, I can see that. There seems to be a lot of duplicated patterns
- Ellery Newcomer (5/17) Dec 01 2009 expression.h
- Don (3/31) Dec 03 2009 It works in the version of DMD in svn. Everything in Andrei's book
Given an Expression object in dmd, I'd like to know how many subexpressions it contains and, even better, iterate over them. I'd like to do this in a general way, without having to create cases for all of the different kinds of Expressions. Is there some way to do this that I've been missing? Thanks, - Chad
Nov 29 2009
Chad J wrote:Given an Expression object in dmd, I'd like to know how many subexpressions it contains and, even better, iterate over them. I'd like to do this in a general way, without having to create cases for all of the different kinds of Expressions. Is there some way to do this that I've been missing? Thanks, - ChadWhere's the Expression object defined in D? I don't think there's such a thing (I mean, a reflection capability to get an expression out of something).
Nov 30 2009
On 11/30/2009 03:53 AM, Ary Borenszweig wrote:Chad J wrote:DMD source, eg parse.c, expression.c, etcGiven an Expression object in dmd, I'd like to know how many subexpressions it contains and, even better, iterate over them. I'd like to do this in a general way, without having to create cases for all of the different kinds of Expressions. Is there some way to do this that I've been missing? Thanks, - ChadWhere's the Expression object defined in D? I don't think there's such a thing (I mean, a reflection capability to get an expression out of something).
Nov 30 2009
Ellery Newcomer wrote:On 11/30/2009 03:53 AM, Ary Borenszweig wrote:Right. That.Chad J wrote:DMD source, eg parse.c, expression.c, etcGiven an Expression object in dmd, I'd like to know how many subexpressions it contains and, even better, iterate over them. I'd like to do this in a general way, without having to create cases for all of the different kinds of Expressions. Is there some way to do this that I've been missing? Thanks, - ChadWhere's the Expression object defined in D? I don't think there's such a thing (I mean, a reflection capability to get an expression out of something).
Nov 30 2009
On 11/30/2009 12:32 PM, Chad J wrote:Ellery Newcomer wrote:Not that I know anything about DMD outside parse.c, but in expression.h, there be decls along the lines of struct UnaExp{ Expression* e1; } struct BinExp{ Expression* e1; Expression* e2; } Iterating them would just be a tree walk. That takes care of 90% and ... Oh. Yeah. There are a bunch of special cases. But from what I've seen, iterating over an expression or any part of the AST involves implementing a function (like semantic) at each subclass. I'm betting there is no other way to do this. I love ANTLR. It generates an arbitrary child/sibling AST, and walking it is a piece of cake.On 11/30/2009 03:53 AM, Ary Borenszweig wrote:Right. That.Chad J wrote:DMD source, eg parse.c, expression.c, etcGiven an Expression object in dmd, I'd like to know how many subexpressions it contains and, even better, iterate over them. I'd like to do this in a general way, without having to create cases for all of the different kinds of Expressions. Is there some way to do this that I've been missing? Thanks, - ChadWhere's the Expression object defined in D? I don't think there's such a thing (I mean, a reflection capability to get an expression out of something).
Nov 30 2009
Ellery Newcomer wrote:On 11/30/2009 12:32 PM, Chad J wrote:It would be better if dmd's source code implemented the visitor pattern. Descent's port implements it and it uses it a a lot of places.Ellery Newcomer wrote:Not that I know anything about DMD outside parse.c, but in expression.h, there be decls along the lines of struct UnaExp{ Expression* e1; } struct BinExp{ Expression* e1; Expression* e2; } Iterating them would just be a tree walk. That takes care of 90% and ... Oh. Yeah. There are a bunch of special cases. But from what I've seen, iterating over an expression or any part of the AST involves implementing a function (like semantic) at each subclass. I'm betting there is no other way to do this. I love ANTLR. It generates an arbitrary child/sibling AST, and walking it is a piece of cake.On 11/30/2009 03:53 AM, Ary Borenszweig wrote:Right. That.Chad J wrote:DMD source, eg parse.c, expression.c, etcGiven an Expression object in dmd, I'd like to know how many subexpressions it contains and, even better, iterate over them. I'd like to do this in a general way, without having to create cases for all of the different kinds of Expressions. Is there some way to do this that I've been missing? Thanks, - ChadWhere's the Expression object defined in D? I don't think there's such a thing (I mean, a reflection capability to get an expression out of something).
Nov 30 2009
I think I'll reply to both of you in one post since the thoughts are related. Ary Borenszweig wrote:Ellery Newcomer wrote:D'oh. Yep that's what I was noticing too. Thanks for confirming that.Not that I know anything about DMD outside parse.c, but in expression.h, there be decls along the lines of struct UnaExp{ Expression* e1; } struct BinExp{ Expression* e1; Expression* e2; } Iterating them would just be a tree walk. That takes care of 90% and ... Oh. Yeah. There are a bunch of special cases. But from what I've seen, iterating over an expression or any part of the AST involves implementing a function (like semantic) at each subclass. I'm betting there is no other way to do this.I wonder if Walter would go for it. Is he very receptive of purely architectural patches for dmd? ... If you care for a bit of reading, I'll tell you my story and ping an idea. This is about the property expression rewrite of course. I'd love to just use the current convention in dmd and write the rewrite as a non-recursive function that gets called at every point in the tree whenever someExpr->semantic(sc) is called. However, there's a snag. When doing the property rewrite, the inner-most subexpression will need to generate the outermost temporaries. An example: prop1.prop2 += foo; AFAIK, the compiler sees this as ((prop1).prop2) += foo; or ((prop1()).prop2()) += foo; after resolveProperties is called. So prop1 is the innermost expression. It would be rewritten as this comma expression: (auto t1 = prop1()), (auto t2 = t1.prop2()), (t2 += foo), (t1.prop2(t2)), (prop1(t1)) So t1 is the outermost temporary, and it corresponds to prop1, the innermost expression. This is problematic in dmd's traditional approach, because the semantic pass on prop1 does not have access to it's enclosing expression (as far as I can tell). If it did, I'd just go to the root of the tree, uproot the tree, stick it in a comma expression, and just start nesting: CommaExp // Step 1a for the rewrite. | +--> auto t1 = prop1() | +--> CommaExp // Step 1b . | . +--> CommaExp // Step 2a . | | . | +--> auto t2 = t1.prop2() . | | . | +--> CommaExp // Step 2b . | | . | +--> t2 += foo; // Originally the root. . | | . | +--> t1.prop2(t2); . | . +--> prop1(t1) Then as long as the innermost expression does its rewrite before its parent, then it will work. If it doesn't go first, well I dunno. I don't seem to have this option anyways. I can't find a way to get the root. Perhaps adding a root expression pointer to the Scope (sc) object that gets tossed around semantic would work, but I haven't considered the bookkeeping involved with that. I already started taking a different approach before thinking of that. My plan then is to do the rewrite after the root's expr->semantic(sc) call. It will probably look like expr->doPropertyRewrite(), and be called from statement.c. But in there will be the guts of the function that has a signature like this: Expression *doPropertyRewriteGreedily( Expression *expr, // The current expression being rewritten. Expressions *prolog, // Temporary decls and getter calls go here. Expressions *epilog, // Setter calls go here. int treatAsLvalue // Is this expression mutated by its parent? ) This works out pretty nice since adding the temporaries and such into the prolog and epilog looks very natural. The function just calls itself to walk the AST. The problem is that walking the AST thing. The function has to know about expr's children in order to recurse on them. This is easy in some cases like postfix ++ and --, because that pattern is represented by one subclass of Expression: PostExp. Same for the properties themselves (CallExp). But then there's the more general patterns like all other impure expressions (+=,-=,*=,/=, etc) and all pure expressions ever (+,-,*,/,%,&,|, pure functions, literals, etc). In a lot of those cases the rewrite doesn't care about analyzing the expression beyond a certain point, or at all, and it just wants to forward the children to the next recursion level. That's where I hit this iteration problem and all of the special cases. Now for the idea. Currently I'm planning an approach that's a bit more minimalistic than the visitor pattern; just to solve my immediate problem. I will define these methods in Expression: virtual Expression *getChild(unsigned index); virtual void setChild(unsigned index, Expression* val); virtual unsigned numChildren(); Then I will implement them as needed in the subclasses of Expression. I figure I'd have to write this logic into my function anyways, so I might as well make this usable to others. I've already written a draft of the property rewrite that uses these, so I know they get the job done in principle. I just hope this will all be OK in the patch (or even holds up under debugging). So yeah, if I haven't put you to sleep already, please let me know if I messed up somewhere. - ChadI love ANTLR. It generates an arbitrary child/sibling AST, and walking it is a piece of cake.It would be better if dmd's source code implemented the visitor pattern. Descent's port implements it and it uses it a a lot of places.
Nov 30 2009
On 11/30/2009 07:59 PM, Chad J wrote:If you care for a bit of reading, I'll tell you my story and ping an idea. This is about the property expression rewrite of course. I'd love to just use the current convention in dmd and write the rewrite as a non-recursive function that gets called at every point in the tree whenever someExpr->semantic(sc) is called. However, there's a snag.Personally, I've come to hate this pattern (just from reading DMD src). It seems like the antithesis of code reuse.When doing the property rewrite, the inner-most subexpression will need to generate the outermost temporaries. An example: prop1.prop2 += foo; AFAIK, the compiler sees this as ((prop1).prop2) += foo; or ((prop1()).prop2()) += foo; after resolveProperties is called. So prop1 is the innermost expression. It would be rewritten as this comma expression: (auto t1 = prop1()), (auto t2 = t1.prop2()), (t2 += foo), (t1.prop2(t2)), (prop1(t1))This is an interesting problem. I like it. Is this rewrite condoned by the powers that be? It'd be fun to implement if I ever get this far in my semantic analyzer.So t1 is the outermost temporary, and it corresponds to prop1, the innermost expression. This is problematic in dmd's traditional approach, because the semantic pass on prop1 does not have access to it's enclosing expression (as far as I can tell). If it did, I'd just go to the root of the tree, uproot the tree, stick it in a comma expression, and just start nesting:I'm not convinced you need access to the enclosing expression to make this rewrite happen. Working it out on paper, it appears to be a matter of a few AST copies, keeping track of your t's, and building your comma tree (in two directions simultaneously) as you traverse the AST via eg semantic. Here's some chicken scratch that more or less illustrates what I envision going on (I can't attest that DMD's ASTs look exactly like this, but I assume they're similar): http://personal.utulsa.edu/~ellery-newcomer/scribble.png How much effort it would entail is another matter though. I would assume you could just add a field or two to struct Expression for pushing information up the tree. It seems like copying functionality is already there. Pushing 't' information across the tree would probably propagate to each special case and be the most annoying part. But basically, every time you come across a property, you generate a new symbol and pair of pre/post trees. Then at the top, you make a single tree copy with the last symbol generated.Now for the idea. Currently I'm planning an approach that's a bit more minimalistic than the visitor pattern; just to solve my immediate problem. I will define these methods in Expression: virtual Expression *getChild(unsigned index); virtual void setChild(unsigned index, Expression* val); virtual unsigned numChildren();I can imagine this would be welcome regardless of how you set about your rewrite procedure. I came across a need for something like this as I was building my semantic analyzer a while back. Can't wait for school to get out so I can get back to work.Then I will implement them as needed in the subclasses of Expression. I figure I'd have to write this logic into my function anyways, so I might as well make this usable to others. I've already written a draft of the property rewrite that uses these, so I know they get the job done in principle. I just hope this will all be OK in the patch (or even holds up under debugging). So yeah, if I haven't put you to sleep already, please let me know if I messed up somewhere.I do have one nitpick: auto t1 = p1(); isn't an expression, it's a declaration. That could potentially make things difficult for you, particularly when your property assignment is deep within an expression tree.- Chad
Nov 30 2009
Ellery Newcomer wrote:On 11/30/2009 07:59 PM, Chad J wrote:Yeah, I can see that. There seems to be a lot of duplicated patterns and things done by convention. That makes it a little more difficult to figure out what is going on. (It's done this way... but *why*?) I just don't want to alienate Walter from my patch by defying the style. It may be inevitable in a minor way though.This is about the property expression rewrite of course. I'd love to just use the current convention in dmd and write the rewrite as a non-recursive function that gets called at every point in the tree whenever someExpr->semantic(sc) is called. However, there's a snag.Personally, I've come to hate this pattern (just from reading DMD src). It seems like the antithesis of code reuse.No guarantees, but a lot of promise. http://erdani.com/d/thermopylae.pdf On page 114 of the draft, 14 of the pdf, in section 4.1.10, at the bottom: notice how Andrei seems to be hedging on properties working correctly. Now I can envision that behavior with array length being hacked in for special cases. That would make the book example work but lack generality since it wouldn't work for arbitrary expressions and maybe not user-defined properties either. Walter has already swung for explicit properties that require syntax addition to the language (and annotations at the same time!) : http://www.digitalmars.com/webnews/newsgroups.php?art_group=digitalmars.D.announce&article_id=16862 I think if he'll go for explicit properties, he'll want them to work correctly too. Think about it... even if the compiler knows that the appearance of some variable in an expression is actually a property, how is it going to figure out whether or not the setter should be called and what it's going to pass to the setter? I conjecture that this property expression rewrite is quite necessary for properties to work unsurprisingly in arbitrary side-effectful expressions. Whether the properties are implicit or explicit is unrelated, and affects other things instead. So Walter is going to have to accept this or he will be in for some nasty surprises when he tries to make his explicit properties work. (Of course he could always decide that properties aren't worth it. That would be unfortunate.) http://prowiki.org/wiki4d/wiki.cgi?DocComments/Property For more info, if you haven't seen it already.When doing the property rewrite, the inner-most subexpression will need to generate the outermost temporaries. An example: prop1.prop2 += foo; AFAIK, the compiler sees this as ((prop1).prop2) += foo; or ((prop1()).prop2()) += foo; after resolveProperties is called. So prop1 is the innermost expression. It would be rewritten as this comma expression: (auto t1 = prop1()), (auto t2 = t1.prop2()), (t2 += foo), (t1.prop2(t2)), (prop1(t1))This is an interesting problem. I like it. Is this rewrite condoned by the powers that be? It'd be fun to implement if I ever get this far in my semantic analyzer.Yeah. I think you're right.So t1 is the outermost temporary, and it corresponds to prop1, the innermost expression. This is problematic in dmd's traditional approach, because the semantic pass on prop1 does not have access to it's enclosing expression (as far as I can tell). If it did, I'd just go to the root of the tree, uproot the tree, stick it in a comma expression, and just start nesting:I'm not convinced you need access to the enclosing expression to make this rewrite happen. Working it out on paper, it appears to be a matter of a few AST copies, keeping track of your t's, and building your comma tree (in two directions simultaneously) as you traverse the AST via eg semantic. Here's some chicken scratch that more or less illustrates what I envision going on (I can't attest that DMD's ASTs look exactly like this, but I assume they're similar): http://personal.utulsa.edu/~ellery-newcomer/scribble.png How much effort it would entail is another matter though. I would assume you could just add a field or two to struct Expression for pushing information up the tree. It seems like copying functionality is already there. Pushing 't' information across the tree would probably propagate to each special case and be the most annoying part. But basically, every time you come across a property, you generate a new symbol and pair of pre/post trees. Then at the top, you make a single tree copy with the last symbol generated.Oh my. I just tried out a comma expression with declarations in some D code and dmd didn't like it at all. In hindsight I should've checked that a long time ago. I did this though because I saw it happening in other parts of expression.c. Maybe I missed some details. Or maybe the backend will be fine with stuff like this. I wonder. Yeah, if I can't put declaration expressions into other expressions, then things may get ugly. I'd probably have to put more code into statement.c at least. Eh, I'll know how things work out soon enough.So yeah, if I haven't put you to sleep already, please let me know if I messed up somewhere.I do have one nitpick: auto t1 = p1(); isn't an expression, it's a declaration. That could potentially make things difficult for you, particularly when your property assignment is deep within an expression tree.
Dec 01 2009
On 12/01/2009 02:35 PM, Chad J wrote:No guarantees, but a lot of promise. http://erdani.com/d/thermopylae.pdf On page 114 of the draft, 14 of the pdf, in section 4.1.10, at the bottom: notice how Andrei seems to be hedging on properties working correctly.Oh goodie. We're going to get arr.length += 4; to actually work.I did this though because I saw it happening in other parts of expression.c. Maybe I missed some details. Or maybe the backend will be fine with stuff like this. I wonder. Yeah, if I can't put declaration expressions into other expressions, then things may get ugly. I'd probably have to put more code into statement.c at least. Eh, I'll know how things work out soon enough.expression.h struct DeclarationExp looks like you'll be fine and what I said applies really only to the parser.
Dec 01 2009
Ellery Newcomer wrote:On 12/01/2009 02:35 PM, Chad J wrote:It works in the version of DMD in svn. Everything in Andrei's book should be working now.No guarantees, but a lot of promise. http://erdani.com/d/thermopylae.pdf On page 114 of the draft, 14 of the pdf, in section 4.1.10, at the bottom: notice how Andrei seems to be hedging on properties working correctly.Oh goodie. We're going to get arr.length += 4; to actually work.I did this though because I saw it happening in other parts of expression.c. Maybe I missed some details. Or maybe the backend will be fine with stuff like this. I wonder. Yeah, if I can't put declaration expressions into other expressions, then things may get ugly. I'd probably have to put more code into statement.c at least. Eh, I'll know how things work out soon enough.expression.h struct DeclarationExp looks like you'll be fine and what I said applies really only to the parser.
Dec 03 2009