www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - D grammar as LL(1) grammar ( OT)

reply BLS <nanali nospam-wanadoo.fr> writes:
Sorry for beeing a bit off topic, anyway:

Do you see a chance to describe D grammar as LL(1) grammar..

Please notice that I am a more DB developer than a language-designer 
means keep your answere as simple as possible :)

all hints are welcome.
Thanks Bjoern

Well, at least one guy did it for *Erlang*, as you can see here:
http://platform.netbeans.org/articles/nbm_interview_caoyuan.html
Sep 15 2007
next sibling parent BCS <ao pathlink.com> writes:
Reply to bls,

 Sorry for beeing a bit off topic, anyway:
 
 Do you see a chance to describe D grammar as LL(1) grammar..
 
 Please notice that I am a more DB developer than a language-designer
 means keep your answere as simple as possible :)
 
 all hints are welcome.
 Thanks Bjoern
 Well, at least one guy did it for *Erlang*, as you can see here:
 http://platform.netbeans.org/articles/nbm_interview_caoyuan.html
 
I'm working on an automatic conversion to LL for a recursive decent parser (not look ahead 1 but...) I expect to have a mostly working system in the near future (a month, maybe less, maybe much less, depends on homework ;) I'm hoping to auto extract the grammar from the doc files, auto convert to LL, auto munge to get it working for my system ("now, now, cut up your grammar into small pieces, young man. Don't gulp it done like a animal" <G>) I plan on sharing it all eventually.
Sep 15 2007
prev sibling next sibling parent Matti Niemenmaa <see_signature for.real.address> writes:
BLS wrote:
 Sorry for beeing a bit off topic, anyway:
 
 Do you see a chance to describe D grammar as LL(1) grammar..
Impossible due to allowing if-else without curly brackets: http://en.wikipedia.org/wiki/Dangling_else Explanation here: http://compilers.iecc.com/comparch/article/95-06-033 -- E-mail address: matti.niemenmaa+news, domain is iki (DOT) fi
Sep 16 2007
prev sibling next sibling parent "Aziz K." <aziz.kerim gmail.com> writes:
Hello Björn,

BLS wrote:
 Do you see a chance to describe D grammar as LL(1) grammar..
Unfortunately not :/. Take this case for example: auto dg = (int a){ return 0; }; You would have to peek behind the closing parenthesis for an opening curly bracket to know if you have to parse this as a delegate. Regards, Aziz
Sep 16 2007
prev sibling next sibling parent reply Jascha Wetzel <firstname mainia.de> writes:
BLS wrote:
 Sorry for beeing a bit off topic, anyway:
 
 Do you see a chance to describe D grammar as LL(1) grammar..
 
 Please notice that I am a more DB developer than a language-designer 
 means keep your answere as simple as possible :)
 
 all hints are welcome.
 Thanks Bjoern
 
 Well, at least one guy did it for *Erlang*, as you can see here:
 http://platform.netbeans.org/articles/nbm_interview_caoyuan.html
D needs arbitrary lookahead, that means you can write D code that can be parsed with LL(1), some that needs LL(173) and so on. the parser has to be able to parse any of that and thus may not have a fixed length lookahead. For LL parsers that generally means that you have to use backtracking. Since DMD uses a hand-written parser, it can optimize the lookahead, such that not the whole parser has to work it's way until the point where it can decide what to parse, but only a simplified function that quickly scans for the relevant token. E.g. there is a function called peekPastParen (or similar) that just finds the first token after a parenthesis and is therefore really fast. If you generate an LL parser automatically, it's very difficult to apply such optimizations. The resulting parser will be considerably slower. That's the reason i chose to write an LR parser generator to parse D. LR parsers use the information provided by a lookahead symbol much more efficiently. That is, LR(1) can parse more languages than LL(1), although both are just using a lookahead of length 1. Furthermore, LR parsers consider multiple potential parse trees at the same time, throwing away the wrong guesses, once the necessary tokens get available. LL parsers have to check each potential parse tree separately.
Sep 16 2007
parent reply BCS <ao pathlink.com> writes:
Reply to Jascha,


 Furthermore, LR parsers consider multiple potential parse trees at the
 same time, throwing away the wrong guesses, once the necessary tokens
 get available. LL parsers have to check each potential parse tree
 separately.
 
In general that may be true (I'd have to check) but often a specific grammar can be rearranges to (partly) avoid that, particularly the parts that appear in several trees. for instance: A ::= B c C | B d D | B e E ; can be rearranged as A ::= B A_ ; A_ ::= c C | d D | e E ; This at a minimum can avoid reparsing the B part of each option. Admittedly this is actually changing the grammar that is being parsed (so I can't claim to be doing a better job with the original grammar) but these can be done automatically.
Sep 16 2007
parent reply Jascha Wetzel <firstname mainia.de> writes:
BCS wrote:
 Reply to Jascha,
 
 
 Furthermore, LR parsers consider multiple potential parse trees at the
 same time, throwing away the wrong guesses, once the necessary tokens
 get available. LL parsers have to check each potential parse tree
 separately.
In general that may be true (I'd have to check) but often a specific grammar can be rearranges to (partly) avoid that, particularly the parts that appear in several trees. for instance: A ::= B c C | B d D | B e E ; can be rearranged as A ::= B A_ ; A_ ::= c C | d D | e E ; This at a minimum can avoid reparsing the B part of each option. Admittedly this is actually changing the grammar that is being parsed (so I can't claim to be doing a better job with the original grammar) but these can be done automatically.
if you make sure, that the parser still generates the same parse tree as for the unchanged grammar, that'd be cool. i wonder how this approach will do for the (large) D grammar. take a look at http://seatd.mainia.de/grammar.html if you like. it's my version of the D grammar from the docs with the errors ironed out (currently, i'm not aware of any more, that is). it might save you some time. like the version from the docs, it uses left recursion, though. but that you would have to change anyway. that grammar successfully parses tango, phobos, derelict, blade, dfl, core32 and more. what i'd love to do is to have a benchmark some day, to let the different parsing techniques compete. i'd patch the DMD source, such that it can be run as a standalone parser and it'll represent the hand-written LL team. my seatd parser will run for the automated LR team and it'll be fun if you would throw your parser in as the automated LL team :)
Sep 17 2007
next sibling parent BCS <BCS pathlink.com> writes:
Jascha Wetzel wrote:
 BCS wrote:
 
 Reply to Jascha,


 Furthermore, LR parsers consider multiple potential parse trees at the
 same time, throwing away the wrong guesses, once the necessary tokens
 get available. LL parsers have to check each potential parse tree
 separately.
In general that may be true (I'd have to check) but often a specific grammar can be rearranges to (partly) avoid that, particularly the parts that appear in several trees. for instance: A ::= B c C | B d D | B e E ; can be rearranged as A ::= B A_ ; A_ ::= c C | d D | e E ; This at a minimum can avoid reparsing the B part of each option. Admittedly this is actually changing the grammar that is being parsed (so I can't claim to be doing a better job with the original grammar) but these can be done automatically.
if you make sure, that the parser still generates the same parse tree as for the unchanged grammar, that'd be cool. i wonder how this approach will do for the (large) D grammar.
Well, when I get the time, I'll known because I plan to do exactly that
 take a look at http://seatd.mainia.de/grammar.html if you like. it's my 
 version of the D grammar from the docs with the errors ironed out 
 (currently, i'm not aware of any more, that is). it might save you some 
 time.
I'm sort of hoping to pull the grammar right out of the docs (I'm thinking of building a set of DDoc macros that will generate it from the the original source) this would have the advantage of giving me an excuse to rag on Walter about the error in the docs.
 like the version from the docs, it uses left recursion, though. but that 
 you would have to change anyway. that grammar successfully parses tango, 
 phobos, derelict, blade, dfl, core32 and more.
 
 what i'd love to do is to have a benchmark some day, to let the 
 different parsing techniques compete. i'd patch the DMD source, such 
 that it can be run as a standalone parser and it'll represent the 
 hand-written LL team.
DMD is LL? cool.
 my seatd parser will run for the automated LR team
 and it'll be fun if you would throw your parser in as the automated LL 
 team :)
Again, given time I expect to have that.
Sep 17 2007
prev sibling parent reply Manfred Nowak <svv1999 hotmail.com> writes:
Jascha Wetzel wrote

 that grammar successfully parses tango, 
 phobos, derelict, blade, dfl, core32 and more.
This indicates only that your parsing system might accept a super set of the D language. From looking at the grammar I conclude that your parsing system might accept sources that are not accepted by DMD. BTW do you have some papers on the "tdfa" you use? -manfred
Sep 18 2007
parent Jascha Wetzel <firstname mainia.de> writes:
Manfred Nowak wrote:
 Jascha Wetzel wrote
 
 that grammar successfully parses tango, 
 phobos, derelict, blade, dfl, core32 and more.
This indicates only that your parsing system might accept a super set of the D language. From looking at the grammar I conclude that your parsing system might accept sources that are not accepted by DMD.
definitely. the grammar in the docs is a lot more general than the DMD parser. i'm narrowing it down more and more. also, i'm going chase the parser through DStress, which also includes negative test cases.
 BTW do you have some papers on the "tdfa" you use?
http://laurikari.net/ville/spire2000-tnfa.pdf
Sep 18 2007
prev sibling parent BLS <nanali nospam-wanadoo.fr> writes:
Thank you folks,
...these info's are bad news for my project. Have to think about it.

Bjoern

BLS schrieb:
 Sorry for beeing a bit off topic, anyway:
 
 Do you see a chance to describe D grammar as LL(1) grammar..
 
 Please notice that I am a more DB developer than a language-designer 
 means keep your answere as simple as possible :)
 
 all hints are welcome.
 Thanks Bjoern
 
 Well, at least one guy did it for *Erlang*, as you can see here:
 http://platform.netbeans.org/articles/nbm_interview_caoyuan.html
Sep 16 2007