www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Lemon Parser Generator

reply "G.Vidal" <gyvidal wanadoo.fr> writes:
Request to the D programmers:

I'm not too convinced about DGrammar.
There's an excellent parser generator, similar to Yacc but really
improved, called Lemon:

http://www.hwaci.com/sw/lemon/

In there you can find the source code and the template file used to
generate the parser.

Someone should modify both to produce D code. I would do it myself if I
had time, but I don't. It don't seems to be very complicated for any good
C/D programmer (mere C->D translation of the templace, and some
modification to lemon.c to produce const int instead of the #defines...,
should take one day)

Anyone interested ?

Thanks

GV
Mar 13 2005
parent reply Ilya Minkov <minkov cs.tum.edu> writes:
It is a LALR(1) parser generator - parsing D with it should be quite 
hard, though far from impossible. Besides, since actions are being 
called from bottom to top, there is not much you can do while parsing - 
apart from building a source representation.

A LL recursive descent parser generator would be a better fit. Although 
generally not as powerful from the grammar class as LALR, it fits 
perfectly to D and many other synthetic languages. Output parallels the 
grammar and is hence very easy to debug. You can do sensible actions - 
for example, most source-based tools can be very easily made 
single-pass. You can choose different subgrammars and actions depending 
on the left context. There are 2 interesting LL generators: ANTLR, which 
is a particularly good one since it has multi-token prediction 
automatically generated, and COCO/R because it is quite simple and is 
available in many programming languages. ANTLR however has facilities 
for diferent language output, and if i remember correctly even for some 
program transformation facilities.

I've been hacking COCO/R for C&C++ now (i'm using it in a C++ project), 
and i'll make a D-outputting version in about a month. I already 
attempted a port, but failed since i lacked some knowledge of its 
workings and time to figure out, but i think i would succeed this time.

-eye

G.Vidal wrote:
 Request to the D programmers:
 
 I'm not too convinced about DGrammar.
 There's an excellent parser generator, similar to Yacc but really
 improved, called Lemon:
 
 http://www.hwaci.com/sw/lemon/
 
 In there you can find the source code and the template file used to
 generate the parser.
 
 Someone should modify both to produce D code. I would do it myself if I
 had time, but I don't. It don't seems to be very complicated for any good
 C/D programmer (mere C->D translation of the templace, and some
 modification to lemon.c to produce const int instead of the #defines...,
 should take one day)
 
 Anyone interested ?
 
 Thanks
 
 GV
 
Mar 16 2005
parent reply "G.Vidal" <gyvidal wanadoo.fr> writes:
Who said I wanted to parse D ???
You misunderstood.
I just want to make the generator produce parsers IN D, to parse a
language of my own.



Le Wed, 16 Mar 2005 14:28:47 +0100, Ilya Minkov a écrit :

 It is a LALR(1) parser generator - parsing D with it should be quite 
 hard, though far from impossible. Besides, since actions are being 
 called from bottom to top, there is not much you can do while parsing - 
 apart from building a source representation.
 
 A LL recursive descent parser generator would be a better fit. Although 
 generally not as powerful from the grammar class as LALR, it fits 
 perfectly to D and many other synthetic languages. Output parallels the 
 grammar and is hence very easy to debug. You can do sensible actions - 
 for example, most source-based tools can be very easily made 
 single-pass. You can choose different subgrammars and actions depending 
 on the left context. There are 2 interesting LL generators: ANTLR, which 
 is a particularly good one since it has multi-token prediction 
 automatically generated, and COCO/R because it is quite simple and is 
 available in many programming languages. ANTLR however has facilities 
 for diferent language output, and if i remember correctly even for some 
 program transformation facilities.
 
 I've been hacking COCO/R for C&C++ now (i'm using it in a C++ project), 
 and i'll make a D-outputting version in about a month. I already 
 attempted a port, but failed since i lacked some knowledge of its 
 workings and time to figure out, but i think i would succeed this time.
 
 -eye
 
 G.Vidal wrote:
 Request to the D programmers:
 
 I'm not too convinced about DGrammar.
 There's an excellent parser generator, similar to Yacc but really
 improved, called Lemon:
 
 http://www.hwaci.com/sw/lemon/
 
 In there you can find the source code and the template file used to
 generate the parser.
 
 Someone should modify both to produce D code. I would do it myself if I
 had time, but I don't. It don't seems to be very complicated for any good
 C/D programmer (mere C->D translation of the templace, and some
 modification to lemon.c to produce const int instead of the #defines...,
 should take one day)
 
 Anyone interested ?
 
 Thanks
 
 GV
Mar 16 2005
parent Ilya Minkov <minkov cs.tum.edu> writes:
G.Vidal wrote:
 Who said I wanted to parse D ??? You misunderstood. I just want to 
 make the generator produce parsers IN D, to parse a language of my 
 own.
I had both assumptions in mind: that one might want to parse D, or that one might want to parse a newly-designed task-specific language. Naturally, there is no language better than D to write any sort of compiler. A token maps wonderfully to D's arrayslice/string semantics. Given you only slice, never copy, you can have both value and source position in one - to determine file and line number one looks at memory ranges that the files are loaded into - you get the file, and then scans for line ends until the starting adress of the token. You only need to do this when outputting errors, and no need to consern with it during normal parsing, thus better performance. If you want to parse a language over which you have total control, it would be even more fun to do it with a LL parser generator than LALR, because the only thing you need to do to resolve a conflict is to provide a destictive left context, i.e. prepend a keyword. Design of Pascal largely stems from being implemented with a recursive descent parser in an ad-hoc manner, and generating the target code without any explicit intermediate representation. Not exactly a flexible way, but an easy start and one can introduce complication further on. On the contrary, with an LALR parser you will need to introduce the complication right at the beginning, and more than just one. However, i won't code it before Breakpoint (25-28 March) and i won't code it before an exam on April 7th or 8th. And after that, i'm not so sure. You of course better not rely on me but make yourself a tool you like. -eye
Mar 17 2005