digitalmars.D - Walter - Discrepancy between EqualExpression spec and implementation
- Jarrett Billingsley (24/24) May 28 2006 I'm writing a small scripting language with a syntax based on D, and as
- Jarrett Billingsley (5/10) May 28 2006 I just realized that most of the parsing functions follow this same patt...
- Unknown W. Brackets (12/25) May 28 2006 Forgive me if I am off-base here, but the following code:
- Jarrett Billingsley (6/9) May 28 2006 I'm thinking about it. It's very weird, but I think I've got it. I thi...
- xs0 (10/11) May 29 2006 Heh, I think it's quite the opposite, with recursive descent you can
- renox (11/22) Jun 02 2006 Urgh, if I see someone coding like this, let's just say that I won't be
- Unknown W. Brackets (10/23) Jun 02 2006 I have been known to revert checkins for code line that, haha.
- James Dunne (104/119) May 29 2006 I've taken a serious look into this too :) Here's my answer:
- Jarrett Billingsley (14/23) May 29 2006 Ahhhahaha... I see. That's kind of what my mind was working towards whe...
- Walter Bright (29/59) Jun 07 2006 It's implemented as (simplified):
- Jarrett Billingsley (5/6) Jun 07 2006 Right, I noticed that when James replied about it; I looked at the
- Lionello Lunesu (7/13) Jun 16 2006 I think it would be a lot more logical (and useful) if that code were
I'm writing a small scripting language with a syntax based on D, and as such, I'm basing some of the parsing code off the the DMD frontend source. I've found a discrepancy between the spec and the code, though, and I'm not sure if it's a bug or if it was changed in the code on purpose and the spec just hasn't been updated. For EqualExpressions, the following grammar is given: EqualExpression: RelExpression EqualExpression == RelExpression EqualExpression != RelExpression EqualExpression is RelExpression EqualExpression !is RelExpression Notice that the left-hand operand of the operators is an EqualExpression. However, in the code (parse.c line 4365), instead of using parseEqualExp() to get the left-hand operand as I would have expected, it instead uses parseRelExp(). Meaning that the code implements the following grammar instead: EqualExpression: RelExpression RelExpression == RelExpression RelExpression != RelExpression RelExpression is RelExpression RelExpression !is RelExpression I'm pretty sure only Walter can answer this one!
May 28 2006
"Jarrett Billingsley" <kb3ctd2 yahoo.com> wrote in message news:e5d40i$vp4$1 digitaldaemon.com...I'm writing a small scripting language with a syntax based on D, and as such, I'm basing some of the parsing code off the the DMD frontend source. I've found a discrepancy between the spec and the code, though, and I'm not sure if it's a bug or if it was changed in the code on purpose and the spec just hasn't been updated.I just realized that most of the parsing functions follow this same pattern (of not following the spec). The only functions that seem to follow the spec are parseAssignExp() and parseCondExp().
May 28 2006
Forgive me if I am off-base here, but the following code: int main() { int i = 0; bool b = i == 0 == true !is false; return 0; } Compiles without errors. This would suggest to me that it is being parsed according to the spec, roughly. Perhaps it's just not following that exact BNF (but implementing the same difference, e.g. outside EqualExpression)? -[Unknown]"Jarrett Billingsley" <kb3ctd2 yahoo.com> wrote in message news:e5d40i$vp4$1 digitaldaemon.com...I'm writing a small scripting language with a syntax based on D, and as such, I'm basing some of the parsing code off the the DMD frontend source. I've found a discrepancy between the spec and the code, though, and I'm not sure if it's a bug or if it was changed in the code on purpose and the spec just hasn't been updated.I just realized that most of the parsing functions follow this same pattern (of not following the spec). The only functions that seem to follow the spec are parseAssignExp() and parseCondExp().
May 28 2006
"Unknown W. Brackets" <unknown simplemachines.org> wrote in message news:e5djuj$1dik$1 digitaldaemon.com...This would suggest to me that it is being parsed according to the spec, roughly. Perhaps it's just not following that exact BNF (but implementing the same difference, e.g. outside EqualExpression)?I'm thinking about it. It's very weird, but I think I've got it. I think all Walter's done with this code is factored out a recursive step; in this case, calling parseEqualExp() is the same as calling parseRelExp(). Recursive descent parsers are weird.
May 28 2006
Recursive descent parsers are weird.Heh, I think it's quite the opposite, with recursive descent you can actually figure out what's going on. Have you seen a yacc-generated parser (it's not recursive descent but "shift/reduce", LALR or whatever that's called)? Now that's really weird: http://search.cpan.org/src/RGARCIA/perl-5.9.3/perly.tab http://search.cpan.org/src/RGARCIA/perl-5.9.3/perly.c Of course, the input file is much more readable, but you can't compare C++ code in one case with grammar definition code in the other case, can you? :) xs0
May 29 2006
Unknown W. Brackets wrote:Forgive me if I am off-base here, but the following code: int main() { int i = 0; bool b = i == 0 == true !is false; return 0; } Compiles without errors.Urgh, if I see someone coding like this, let's just say that I won't be polite.. At some point I wonder if it wouldn't be better if in the language all the operators shouldn't have the same priority, just to force the programmers to use parenthesis! The only downside I see is a big pitfall for the beginners and perhaps less opportunity for optimisation by the compiler depending of the implementation.. Regards, RenoX
Jun 02 2006
I have been known to revert checkins for code line that, haha. Still, I like that I can write something like this: x = original_x * cos(angle) - original_y * sin(angle); y = original_y * cos(angle) + original_x * sin(angle); And know it will work as I expect. For this reason, I love operator precedence. But I agree that "x == y == z" is just wrong. I even take it farther and can't stand when people do "x = y = 0;", even if it is fine code. Everyone has their own preferences. -[Unknown]Urgh, if I see someone coding like this, let's just say that I won't be polite.. At some point I wonder if it wouldn't be better if in the language all the operators shouldn't have the same priority, just to force the programmers to use parenthesis! The only downside I see is a big pitfall for the beginners and perhaps less opportunity for optimisation by the compiler depending of the implementation.. Regards, RenoX
Jun 02 2006
Jarrett Billingsley wrote:"Jarrett Billingsley" <kb3ctd2 yahoo.com> wrote in message news:e5d40i$vp4$1 digitaldaemon.com...I've taken a serious look into this too :) Here's my answer: See the while loops implemented in most of those functions? Grammar rule: ------------- OrOrExpression: AndAndExpression OrOrExpression || AndAndExpression DMD implementation: ------------------- Expression *Parser::parseOrOrExp() { Expression *e; Expression *e2; Loc loc = this->loc; e = parseAndAndExp(); while (token.value == TOKoror) { nextToken(); e2 = parseAndAndExp(); e = new OrOrExp(loc, e, e2); } return e; } So, on the first iteration we either get an AndAndExpression alone, or an OrOrExpression with the left expression being an AndAndExpression and the right expression being an AndAndExpression. The left-recursive descent rule 'OrOrExpression || AndAndExpression' means to loop over the left side until a subsequent token is NOT the '||' token. Thus, we can only get an OrOrExpression out of two AndAndExpressions next to each other, separated by the '||' token, or just an AndAndExpression alone. So, the code does implement the spec correctly. All the other expression rules follow this pattern. As to the exceptional case of AssignExpression: Grammar rule: ------------- AssignExpression: ConditionalExpression ConditionalExpression = AssignExpression ConditionalExpression += AssignExpression ConditionalExpression -= AssignExpression ConditionalExpression *= AssignExpression ConditionalExpression /= AssignExpression ConditionalExpression %= AssignExpression ConditionalExpression &= AssignExpression ConditionalExpression |= AssignExpression ConditionalExpression ^= AssignExpression ConditionalExpression ~= AssignExpression ConditionalExpression <<= AssignExpression ConditionalExpression >>= AssignExpression ConditionalExpression >>>= AssignExpression DMD implementation: (I've expanded out the #define shortcut for readability) ------------------- Expression *Parser::parseAssignExp() { Expression *e; Expression *e2; Loc loc; e = parseCondExp(); while (1) { loc = this->loc; switch (token.value) { case TOKassign: nextToken(); e2 = parseAssignExp(); e = new AssignExp(loc,e,e2); continue; case TOKaddass: nextToken(); e2 = parseAssignExp(); e = new AddAssignExp(loc,e,e2); continue; case TOKminass: nextToken(); e2 = parseAssignExp(); e = new MinAssignExp(loc,e,e2); continue; case TOKmulass: nextToken(); e2 = parseAssignExp(); e = new MulAssignExp(loc,e,e2); continue; case TOKdivass: nextToken(); e2 = parseAssignExp(); e = new DivAssignExp(loc,e,e2); continue; case TOKmodass: nextToken(); e2 = parseAssignExp(); e = new ModAssignExp(loc,e,e2); continue; case TOKandass: nextToken(); e2 = parseAssignExp(); e = new AndAssignExp(loc,e,e2); continue; case TOKorass: nextToken(); e2 = parseAssignExp(); e = new OrAssignExp(loc,e,e2); continue; case TOKxorass: nextToken(); e2 = parseAssignExp(); e = new XorAssignExp(loc,e,e2); continue; case TOKshlass: nextToken(); e2 = parseAssignExp(); e = new ShlAssignExp(loc,e,e2); continue; case TOKshrass: nextToken(); e2 = parseAssignExp(); e = new ShrAssignExp(loc,e,e2); continue; case TOKushrass: nextToken(); e2 = parseAssignExp(); e = new UshrAssignExp(loc,e,e2); continue; case TOKcatass: nextToken(); e2 = parseAssignExp(); e = new CatAssignExp(loc,e,e2); continue; default: break; } break; } return e; } Here, we see that e2 is always evaluated as an AssignExpression, thus this rule is right-recursive, not left-recursive as the other pattern. Again, note the while loop here. I'm not sure how to make it clearer, but if you have any questions don't hesitate to ask. P.S. - I've also used this code pattern as a template for my own language parser and it does work quite well. -- Regards, James DunneI'm writing a small scripting language with a syntax based on D, and as such, I'm basing some of the parsing code off the the DMD frontend source. I've found a discrepancy between the spec and the code, though, and I'm not sure if it's a bug or if it was changed in the code on purpose and the spec just hasn't been updated.I just realized that most of the parsing functions follow this same pattern (of not following the spec). The only functions that seem to follow the spec are parseAssignExp() and parseCondExp().
May 29 2006
"James Dunne" <james.jdunne gmail.com> wrote in message news:e5g01q$1l9r$1 digitaldaemon.com...The left-recursive descent rule 'OrOrExpression || AndAndExpression' means to loop over the left side until a subsequent token is NOT the '||' token. Thus, we can only get an OrOrExpression out of two AndAndExpressions next to each other, separated by the '||' token, or just an AndAndExpression alone. So, the code does implement the spec correctly. All the other expression rules follow this pattern.Ahhhahaha... I see. That's kind of what my mind was working towards when I was thinking about it.Here, we see that e2 is always evaluated as an AssignExpression, thus this rule is right-recursive, not left-recursive as the other pattern. Again, note the while loop here.I thought it might have something to do with left- and right-associativity, as it's only the assignment operators which are right-associative (and the postfix operators too, but you don't see that with the way the D parser is structured). I used a different form of expression parsing before which required that you have the associativity of each operator known beforehand (along with the precedence, as using different associativity changed how you parsed the LHS and RHS of an operator, just like in the recursive descent method). The other method was a bit more terse and mathematical, but had the advantage of smaller code. But recursive descent is just.. cool, if a bit weird.
May 29 2006
Jarrett Billingsley wrote:I'm writing a small scripting language with a syntax based on D, and as such, I'm basing some of the parsing code off the the DMD frontend source. I've found a discrepancy between the spec and the code, though, and I'm not sure if it's a bug or if it was changed in the code on purpose and the spec just hasn't been updated. For EqualExpressions, the following grammar is given: EqualExpression: RelExpression EqualExpression == RelExpression EqualExpression != RelExpression EqualExpression is RelExpression EqualExpression !is RelExpression Notice that the left-hand operand of the operators is an EqualExpression. However, in the code (parse.c line 4365), instead of using parseEqualExp() to get the left-hand operand as I would have expected, it instead uses parseRelExp(). Meaning that the code implements the following grammar instead: EqualExpression: RelExpression RelExpression == RelExpression RelExpression != RelExpression RelExpression is RelExpression RelExpression !is RelExpression I'm pretty sure only Walter can answer this one!It's implemented as (simplified): Expression *Parser::parseEqualExp() { Expression *e; Expression *e2; e = parseRelExp(); while (1) { enum TOK value = token.value; switch (value) { case TOKequal: case TOKnotequal: nextToken(); e2 = parseRelExp(); e = new EqualExp(value, loc, e, e2); continue; default: break; } break; } return e; } which means that: a == b == c == d is parsed as: (((a == b) == c) == d) which I think you'll see matches the grammar. The while loop makes it left-associative.
Jun 07 2006
"Walter Bright" <newshound digitalmars.com> wrote in message news:e6606q$2t33$1 digitaldaemon.com...The while loop makes it left-associative.Right, I noticed that when James replied about it; I looked at the associativity for all the C operators and noticed that only the left-associative operators were implemented this way.
Jun 07 2006
Walter Bright wrote:which means that: a == b == c == d is parsed as: (((a == b) == c) == d) which I think you'll see matches the grammar. The while loop makes it left-associative.I think it would be a lot more logical (and useful) if that code were interpreted as: (a==b) && (b==c) && (c==d) //a==b==c==d Similarly, a < b < c could be the same as (a < b) && (b < c) //a<b<c L.
Jun 16 2006