digitalmars.D - D grammar as LL(1) grammar ( OT)
- BLS (8/8) Sep 15 2007 Sorry for beeing a bit off topic, anyway:
- BCS (8/20) Sep 15 2007 I'm working on an automatic conversion to LL for a recursive decent pars...
- Matti Niemenmaa (6/9) Sep 16 2007 Impossible due to allowing if-else without curly brackets:
- Aziz K. (8/9) Sep 16 2007 Unfortunately not :/. Take this case for example:
- Jascha Wetzel (20/32) Sep 16 2007 D needs arbitrary lookahead, that means you can write D code that can be...
- BCS (13/18) Sep 16 2007 In general that may be true (I'd have to check) but often a specific gra...
- Jascha Wetzel (17/44) Sep 17 2007 if you make sure, that the parser still generates the same parse tree as...
- BCS (8/58) Sep 17 2007 I'm sort of hoping to pull the grammar right out of the docs (I'm
- Manfred Nowak (7/9) Sep 18 2007 This indicates only that your parsing system might accept a super set
- Jascha Wetzel (5/16) Sep 18 2007 definitely. the grammar in the docs is a lot more general than the DMD
- BLS (4/16) Sep 16 2007 Thank you folks,
Sorry for beeing a bit off topic, anyway: Do you see a chance to describe D grammar as LL(1) grammar.. Please notice that I am a more DB developer than a language-designer means keep your answere as simple as possible :) all hints are welcome. Thanks Bjoern Well, at least one guy did it for *Erlang*, as you can see here: http://platform.netbeans.org/articles/nbm_interview_caoyuan.html
Sep 15 2007
Reply to bls,Sorry for beeing a bit off topic, anyway: Do you see a chance to describe D grammar as LL(1) grammar.. Please notice that I am a more DB developer than a language-designer means keep your answere as simple as possible :) all hints are welcome. Thanks Bjoern Well, at least one guy did it for *Erlang*, as you can see here: http://platform.netbeans.org/articles/nbm_interview_caoyuan.htmlI'm working on an automatic conversion to LL for a recursive decent parser (not look ahead 1 but...) I expect to have a mostly working system in the near future (a month, maybe less, maybe much less, depends on homework ;) I'm hoping to auto extract the grammar from the doc files, auto convert to LL, auto munge to get it working for my system ("now, now, cut up your grammar into small pieces, young man. Don't gulp it done like a animal" <G>) I plan on sharing it all eventually.
Sep 15 2007
BLS wrote:Sorry for beeing a bit off topic, anyway: Do you see a chance to describe D grammar as LL(1) grammar..Impossible due to allowing if-else without curly brackets: http://en.wikipedia.org/wiki/Dangling_else Explanation here: http://compilers.iecc.com/comparch/article/95-06-033 -- E-mail address: matti.niemenmaa+news, domain is iki (DOT) fi
Sep 16 2007
Hello Björn, BLS wrote:Do you see a chance to describe D grammar as LL(1) grammar..Unfortunately not :/. Take this case for example: auto dg = (int a){ return 0; }; You would have to peek behind the closing parenthesis for an opening curly bracket to know if you have to parse this as a delegate. Regards, Aziz
Sep 16 2007
BLS wrote:Sorry for beeing a bit off topic, anyway: Do you see a chance to describe D grammar as LL(1) grammar.. Please notice that I am a more DB developer than a language-designer means keep your answere as simple as possible :) all hints are welcome. Thanks Bjoern Well, at least one guy did it for *Erlang*, as you can see here: http://platform.netbeans.org/articles/nbm_interview_caoyuan.htmlD needs arbitrary lookahead, that means you can write D code that can be parsed with LL(1), some that needs LL(173) and so on. the parser has to be able to parse any of that and thus may not have a fixed length lookahead. For LL parsers that generally means that you have to use backtracking. Since DMD uses a hand-written parser, it can optimize the lookahead, such that not the whole parser has to work it's way until the point where it can decide what to parse, but only a simplified function that quickly scans for the relevant token. E.g. there is a function called peekPastParen (or similar) that just finds the first token after a parenthesis and is therefore really fast. If you generate an LL parser automatically, it's very difficult to apply such optimizations. The resulting parser will be considerably slower. That's the reason i chose to write an LR parser generator to parse D. LR parsers use the information provided by a lookahead symbol much more efficiently. That is, LR(1) can parse more languages than LL(1), although both are just using a lookahead of length 1. Furthermore, LR parsers consider multiple potential parse trees at the same time, throwing away the wrong guesses, once the necessary tokens get available. LL parsers have to check each potential parse tree separately.
Sep 16 2007
Reply to Jascha,Furthermore, LR parsers consider multiple potential parse trees at the same time, throwing away the wrong guesses, once the necessary tokens get available. LL parsers have to check each potential parse tree separately.In general that may be true (I'd have to check) but often a specific grammar can be rearranges to (partly) avoid that, particularly the parts that appear in several trees. for instance: A ::= B c C | B d D | B e E ; can be rearranged as A ::= B A_ ; A_ ::= c C | d D | e E ; This at a minimum can avoid reparsing the B part of each option. Admittedly this is actually changing the grammar that is being parsed (so I can't claim to be doing a better job with the original grammar) but these can be done automatically.
Sep 16 2007
BCS wrote:Reply to Jascha,if you make sure, that the parser still generates the same parse tree as for the unchanged grammar, that'd be cool. i wonder how this approach will do for the (large) D grammar. take a look at http://seatd.mainia.de/grammar.html if you like. it's my version of the D grammar from the docs with the errors ironed out (currently, i'm not aware of any more, that is). it might save you some time. like the version from the docs, it uses left recursion, though. but that you would have to change anyway. that grammar successfully parses tango, phobos, derelict, blade, dfl, core32 and more. what i'd love to do is to have a benchmark some day, to let the different parsing techniques compete. i'd patch the DMD source, such that it can be run as a standalone parser and it'll represent the hand-written LL team. my seatd parser will run for the automated LR team and it'll be fun if you would throw your parser in as the automated LL team :)Furthermore, LR parsers consider multiple potential parse trees at the same time, throwing away the wrong guesses, once the necessary tokens get available. LL parsers have to check each potential parse tree separately.In general that may be true (I'd have to check) but often a specific grammar can be rearranges to (partly) avoid that, particularly the parts that appear in several trees. for instance: A ::= B c C | B d D | B e E ; can be rearranged as A ::= B A_ ; A_ ::= c C | d D | e E ; This at a minimum can avoid reparsing the B part of each option. Admittedly this is actually changing the grammar that is being parsed (so I can't claim to be doing a better job with the original grammar) but these can be done automatically.
Sep 17 2007
Jascha Wetzel wrote:BCS wrote:Well, when I get the time, I'll known because I plan to do exactly thatReply to Jascha,if you make sure, that the parser still generates the same parse tree as for the unchanged grammar, that'd be cool. i wonder how this approach will do for the (large) D grammar.Furthermore, LR parsers consider multiple potential parse trees at the same time, throwing away the wrong guesses, once the necessary tokens get available. LL parsers have to check each potential parse tree separately.In general that may be true (I'd have to check) but often a specific grammar can be rearranges to (partly) avoid that, particularly the parts that appear in several trees. for instance: A ::= B c C | B d D | B e E ; can be rearranged as A ::= B A_ ; A_ ::= c C | d D | e E ; This at a minimum can avoid reparsing the B part of each option. Admittedly this is actually changing the grammar that is being parsed (so I can't claim to be doing a better job with the original grammar) but these can be done automatically.take a look at http://seatd.mainia.de/grammar.html if you like. it's my version of the D grammar from the docs with the errors ironed out (currently, i'm not aware of any more, that is). it might save you some time.I'm sort of hoping to pull the grammar right out of the docs (I'm thinking of building a set of DDoc macros that will generate it from the the original source) this would have the advantage of giving me an excuse to rag on Walter about the error in the docs.like the version from the docs, it uses left recursion, though. but that you would have to change anyway. that grammar successfully parses tango, phobos, derelict, blade, dfl, core32 and more. what i'd love to do is to have a benchmark some day, to let the different parsing techniques compete. i'd patch the DMD source, such that it can be run as a standalone parser and it'll represent the hand-written LL team.DMD is LL? cool.my seatd parser will run for the automated LR team and it'll be fun if you would throw your parser in as the automated LL team :)Again, given time I expect to have that.
Sep 17 2007
Jascha Wetzel wrotethat grammar successfully parses tango, phobos, derelict, blade, dfl, core32 and more.This indicates only that your parsing system might accept a super set of the D language. From looking at the grammar I conclude that your parsing system might accept sources that are not accepted by DMD. BTW do you have some papers on the "tdfa" you use? -manfred
Sep 18 2007
Manfred Nowak wrote:Jascha Wetzel wrotedefinitely. the grammar in the docs is a lot more general than the DMD parser. i'm narrowing it down more and more. also, i'm going chase the parser through DStress, which also includes negative test cases.that grammar successfully parses tango, phobos, derelict, blade, dfl, core32 and more.This indicates only that your parsing system might accept a super set of the D language. From looking at the grammar I conclude that your parsing system might accept sources that are not accepted by DMD.BTW do you have some papers on the "tdfa" you use?http://laurikari.net/ville/spire2000-tnfa.pdf
Sep 18 2007
Thank you folks, ...these info's are bad news for my project. Have to think about it. Bjoern BLS schrieb:Sorry for beeing a bit off topic, anyway: Do you see a chance to describe D grammar as LL(1) grammar.. Please notice that I am a more DB developer than a language-designer means keep your answere as simple as possible :) all hints are welcome. Thanks Bjoern Well, at least one guy did it for *Erlang*, as you can see here: http://platform.netbeans.org/articles/nbm_interview_caoyuan.html
Sep 16 2007