D - D grammar (yacc format)

Sammy (39/39) Jan 28 2004 I began to write a yacc style grammar for D.

C (5/44) Jan 28 2004 Wow, this would be awesome! Think you'll end up using somethign besides
Ben Hinkle (7/46) Jan 28 2004 Interestingly GCC's C++ frontend switched from using a yacc generated pa...

Sammy (16/21) Jan 28 2004 parser
Sammy (15/20) Jan 28 2004 parser
Sammy (36/53) Jan 28 2004 parser

"Sammy" <none here.com> writes:

I began to write a yacc style grammar for D.

I am using the parse.c from the dmd source.
Each function is being translated to a piece of
the grammar. Each piece is added to the source
file using some tag ( /** Grammar )

Eg:

/** Grammar
 *
 * %token "^" TOKxor
 *
 * XorExp : AndExp
 *           | XorExp "^" AndExp
 *           ;
 *
 **/

Expression *Parser::parseXorExp()
{   Expression *e;
    Expression *e2;
    Loc loc = this->loc;

    e = parseAndExp();
    while (token.value == TOKxor)
    {
 nextToken();
 e2 = parseAndExp();
 e = new XorExp(loc, e, e2);
    }
    return e;
}

I dont know if the final grammar will be LALR, nor
if it will be unabiguous.

If everything is ok, i will use bison to process the grammar.
if not, there are other parser generators who can
handle anbiguous and/or non-lalr grammars
( eg: btyacc, accent, msta, D'Parser )

I would like to see the grammar maintained ( i can do that )
and distributed with DMD ( i cant do that :-) .

I also have a lexer (made with flex), who needs to be
updated.

What do you think ?

Jan 28 2004

"C" <dont respond.com> writes:

Wow, this would be awesome!  Think you'll end up using somethign besides
bison though , D'Parser looks really cool.

C
"Sammy" <none here.com> wrote in message
news:bv941u$1e1e$1 digitaldaemon.com...
 I began to write a yacc style grammar for D.

 I am using the parse.c from the dmd source.
 Each function is being translated to a piece of
 the grammar. Each piece is added to the source
 file using some tag ( /** Grammar )

 Eg:

 /** Grammar
  *
  * %token "^" TOKxor
  *
  * XorExp : AndExp
  *           | XorExp "^" AndExp
  *           ;
  *
  **/

 Expression *Parser::parseXorExp()
 {   Expression *e;
     Expression *e2;
     Loc loc = this->loc;

     e = parseAndExp();
     while (token.value == TOKxor)
     {
  nextToken();
  e2 = parseAndExp();
  e = new XorExp(loc, e, e2);
     }
     return e;
 }

 I dont know if the final grammar will be LALR, nor
 if it will be unabiguous.

 If everything is ok, i will use bison to process the grammar.
 if not, there are other parser generators who can
 handle anbiguous and/or non-lalr grammars
 ( eg: btyacc, accent, msta, D'Parser )

 I would like to see the grammar maintained ( i can do that )
 and distributed with DMD ( i cant do that :-) .

 I also have a lexer (made with flex), who needs to be
 updated.

 What do you think ?

Jan 28 2004

"Ben Hinkle" <bhinkle4 juno.com> writes:

Interestingly GCC's C++ frontend switched from using a yacc generated parser
to a hand-written parser in GCC-3.4. I don't know all the reasons why but
you could probably Google around to find out why. Personally I find the 3.4
hand-written parser is easier to understand.

-Ben

"Sammy" <none here.com> wrote in message
news:bv941u$1e1e$1 digitaldaemon.com...
 I began to write a yacc style grammar for D.

 I am using the parse.c from the dmd source.
 Each function is being translated to a piece of
 the grammar. Each piece is added to the source
 file using some tag ( /** Grammar )

 Eg:

 /** Grammar
  *
  * %token "^" TOKxor
  *
  * XorExp : AndExp
  *           | XorExp "^" AndExp
  *           ;
  *
  **/

 Expression *Parser::parseXorExp()
 {   Expression *e;
     Expression *e2;
     Loc loc = this->loc;

     e = parseAndExp();
     while (token.value == TOKxor)
     {
  nextToken();
  e2 = parseAndExp();
  e = new XorExp(loc, e, e2);
     }
     return e;
 }

 I dont know if the final grammar will be LALR, nor
 if it will be unabiguous.

 If everything is ok, i will use bison to process the grammar.
 if not, there are other parser generators who can
 handle anbiguous and/or non-lalr grammars
 ( eg: btyacc, accent, msta, D'Parser )

 I would like to see the grammar maintained ( i can do that )
 and distributed with DMD ( i cant do that :-) .

 I also have a lexer (made with flex), who needs to be
 updated.

 What do you think ?

Jan 28 2004

"Sammy" <none here.com> writes:

"Ben Hinkle" <bhinkle4 juno.com> wrote in message
news:bv9etc$21ib$1 digitaldaemon.com...
 Interestingly GCC's C++ frontend switched from using a yacc generated

parser
 to a hand-written parser in GCC-3.4. I don't know all the reasons why but
 you could probably Google around to find out why. Personally I find the

3.4
 hand-written parser is easier to understand.

 -Ben

I am not sure, and i dont remember where i read this,
but I think its because the parser is faster, simpler and
as you said, easier to understand ( and possibly to implement ).

Implement the parser, any way you want.
But if you want to have a clear picture of the language,
i think you need a grammar.

If you use EBNF, instead of pure BNF, you will end up with
a grammar fitting on 2 or 3 pages instead of the 67 pages
of the file parser.c

So you are right, but i still find its still and interesting
exercice.

- S

Jan 28 2004

"Sammy" <none here.com> writes:

"Ben Hinkle" <bhinkle4 juno.com> wrote in message
news:bv9etc$21ib$1 digitaldaemon.com...
 Interestingly GCC's C++ frontend switched from using a yacc generated

parser
 to a hand-written parser in GCC-3.4. I don't know all the reasons why but
 you could probably Google around to find out why. Personally I find the

3.4
 hand-written parser is easier to understand.

 -Ben

Found this short explaination on :

     http://gnu.essentkabel.com/software/gcc/gcc-3.4/changes.html

GCC 3.4 Release Series. Changes, New Features, and Fixes

C++
A hand-written recursive-descent C++ parser has replaced the YACC-derived
C++ parser from previous GCC releases. The new parser contains much improved
infrastructure needed for better parsing of C++ source codes, handling of
extensions, and clean separation (where possible) between proper semantics
analysis and parsing. The new parser fixes many bugs that were found in the
old parser.

- S

Jan 28 2004

"Sammy" <none here.com> writes:

"Ben Hinkle" <bhinkle4 juno.com> wrote in message
news:bv9etc$21ib$1 digitaldaemon.com...
 Interestingly GCC's C++ frontend switched from using a yacc generated

parser
 to a hand-written parser in GCC-3.4. I don't know all the reasons why but
 you could probably Google around to find out why. Personally I find the

3.4
 hand-written parser is easier to understand.

 -Ben

Sorry, to post again.
All those explanation tells that its a very good choice
to use a hand written parser...

From: http://compilers.iecc.com/comparch/article/02-01-022
----------------------------------------------------------------------
Re: C++ parsing : what's new ?
From comp.compilers

From:  Zack Weinberg <zackw panix.com>
Newsgroups:  comp.compilers
Date:  4 Jan 2002 00:09:35 -0500
Organization:  PANIX -- Public Access Networks Corp.
References:  01-12-179
Keywords:  C++, parse
Posted-Date:  04 Jan 2002 00:09:35 EST

Peter H. Froehlich <pfroehli ics.uci.edu> writes:
On Saturday, December 29, 2001, at 10:30 , Travis Nixon wrote:

 "Martin von Loewis" <loewis informatik.hu-berlin.de> wrote in message
 The gcc grammar is currently being rewritten, from a bison-based one
 to a hand-written recursive-descent parser.

 Is there a discussion online anywhere about the reasons for doing
 this?

As I recall the motivation is performance. The people doing the
rewrite expect the recursive descent parser to be way faster than
the table-driven one they have now. However, I sure would *not*
want to write a parser like that for a grammar as hairy as
C/C++. :-)


The motivations are correctness, performance, and better error
recovery, in that order.


The existing gcc grammar for C++ does not interpret all language
constructs correctly; this has been judged unfixable while continuing
to use yacc. There are already horrible kludges to get certain
constructs mostly right (look at cp/spew.c, if you want to know) that
become unnecessary with a recursive descent parser.


A hand-rolled recursive descent parser has a lot more information
accessible to it. This should permit it to make better backtracking
decisions. Bad backtracking is the major cause of bad performance in
the existing parser. It will certainly permit better error recovery.


Note that this approach is known to work - the Edison Design Group
(EDG)'s C/C++ front end uses a hand-written recursive descent parser,
and it does all the above things better than g++. As someone who's
worked on both front ends, I can also say that I found it _easier_ to
work with the recursive descent parser.


zw

Jan 28 2004

D Programming

C/C++ Programming

Other

D - D grammar (yacc format)