digitalmars.D - Lemon Parser Generator
- G.Vidal (15/15) Mar 13 2005 Request to the D programmers:
- Ilya Minkov (22/45) Mar 16 2005 It is a LALR(1) parser generator - parsing D with it should be quite
- G.Vidal (5/54) Mar 16 2005 Who said I wanted to parse D ???
- Ilya Minkov (25/28) Mar 17 2005 I had both assumptions in mind: that one might want to parse D, or that
Request to the D programmers: I'm not too convinced about DGrammar. There's an excellent parser generator, similar to Yacc but really improved, called Lemon: http://www.hwaci.com/sw/lemon/ In there you can find the source code and the template file used to generate the parser. Someone should modify both to produce D code. I would do it myself if I had time, but I don't. It don't seems to be very complicated for any good C/D programmer (mere C->D translation of the templace, and some modification to lemon.c to produce const int instead of the #defines..., should take one day) Anyone interested ? Thanks GV
Mar 13 2005
It is a LALR(1) parser generator - parsing D with it should be quite hard, though far from impossible. Besides, since actions are being called from bottom to top, there is not much you can do while parsing - apart from building a source representation. A LL recursive descent parser generator would be a better fit. Although generally not as powerful from the grammar class as LALR, it fits perfectly to D and many other synthetic languages. Output parallels the grammar and is hence very easy to debug. You can do sensible actions - for example, most source-based tools can be very easily made single-pass. You can choose different subgrammars and actions depending on the left context. There are 2 interesting LL generators: ANTLR, which is a particularly good one since it has multi-token prediction automatically generated, and COCO/R because it is quite simple and is available in many programming languages. ANTLR however has facilities for diferent language output, and if i remember correctly even for some program transformation facilities. I've been hacking COCO/R for C&C++ now (i'm using it in a C++ project), and i'll make a D-outputting version in about a month. I already attempted a port, but failed since i lacked some knowledge of its workings and time to figure out, but i think i would succeed this time. -eye G.Vidal wrote:Request to the D programmers: I'm not too convinced about DGrammar. There's an excellent parser generator, similar to Yacc but really improved, called Lemon: http://www.hwaci.com/sw/lemon/ In there you can find the source code and the template file used to generate the parser. Someone should modify both to produce D code. I would do it myself if I had time, but I don't. It don't seems to be very complicated for any good C/D programmer (mere C->D translation of the templace, and some modification to lemon.c to produce const int instead of the #defines..., should take one day) Anyone interested ? Thanks GV
Mar 16 2005
Who said I wanted to parse D ??? You misunderstood. I just want to make the generator produce parsers IN D, to parse a language of my own. Le Wed, 16 Mar 2005 14:28:47 +0100, Ilya Minkov a écrit :It is a LALR(1) parser generator - parsing D with it should be quite hard, though far from impossible. Besides, since actions are being called from bottom to top, there is not much you can do while parsing - apart from building a source representation. A LL recursive descent parser generator would be a better fit. Although generally not as powerful from the grammar class as LALR, it fits perfectly to D and many other synthetic languages. Output parallels the grammar and is hence very easy to debug. You can do sensible actions - for example, most source-based tools can be very easily made single-pass. You can choose different subgrammars and actions depending on the left context. There are 2 interesting LL generators: ANTLR, which is a particularly good one since it has multi-token prediction automatically generated, and COCO/R because it is quite simple and is available in many programming languages. ANTLR however has facilities for diferent language output, and if i remember correctly even for some program transformation facilities. I've been hacking COCO/R for C&C++ now (i'm using it in a C++ project), and i'll make a D-outputting version in about a month. I already attempted a port, but failed since i lacked some knowledge of its workings and time to figure out, but i think i would succeed this time. -eye G.Vidal wrote:Request to the D programmers: I'm not too convinced about DGrammar. There's an excellent parser generator, similar to Yacc but really improved, called Lemon: http://www.hwaci.com/sw/lemon/ In there you can find the source code and the template file used to generate the parser. Someone should modify both to produce D code. I would do it myself if I had time, but I don't. It don't seems to be very complicated for any good C/D programmer (mere C->D translation of the templace, and some modification to lemon.c to produce const int instead of the #defines..., should take one day) Anyone interested ? Thanks GV
Mar 16 2005
G.Vidal wrote:Who said I wanted to parse D ??? You misunderstood. I just want to make the generator produce parsers IN D, to parse a language of my own.I had both assumptions in mind: that one might want to parse D, or that one might want to parse a newly-designed task-specific language. Naturally, there is no language better than D to write any sort of compiler. A token maps wonderfully to D's arrayslice/string semantics. Given you only slice, never copy, you can have both value and source position in one - to determine file and line number one looks at memory ranges that the files are loaded into - you get the file, and then scans for line ends until the starting adress of the token. You only need to do this when outputting errors, and no need to consern with it during normal parsing, thus better performance. If you want to parse a language over which you have total control, it would be even more fun to do it with a LL parser generator than LALR, because the only thing you need to do to resolve a conflict is to provide a destictive left context, i.e. prepend a keyword. Design of Pascal largely stems from being implemented with a recursive descent parser in an ad-hoc manner, and generating the target code without any explicit intermediate representation. Not exactly a flexible way, but an easy start and one can introduce complication further on. On the contrary, with an LALR parser you will need to introduce the complication right at the beginning, and more than just one. However, i won't code it before Breakpoint (25-28 March) and i won't code it before an exam on April 7th or 8th. And after that, i'm not so sure. You of course better not rely on me but make yourself a tool you like. -eye
Mar 17 2005