www.digitalmars.com         C & C++   DMDScript  

D.gnu - lexer for flex

reply C.R.Chafer <blackmarlin nospam.asean-mail.com> writes:
I have written a flex lexer for d - source attached
please point out any bugs

(bison parser to follow when written)

C 2002/6/11
Jun 11 2002
parent reply "Martin M. Pedersen" <mmp www.moeller-pedersen.dk> writes:
"C.R.Chafer" <blackmarlin nospam.asean-mail.com> wrote in message
news:ae5bmk$2u97$1 digitaldaemon.com...
 Content-Transfer-Encoding: 8Bit

 I have written a flex lexer for d - source attached
 please point out any bugs

 (bison parser to follow when written)
It seems you have started what I did too :-) However, I ran into problems with the declaration grammar. The problem is that bison does not do N look-ahead that seems to be required by the D grammar - at least if "parse.c" is to be translated more or less directly. The problem is the Parse::is...() methods in "parse.c" that makes the grammar LALR(N). I have not figured out any way to solve this problem. Another problem is that you cannot easily destinguish type names from other names as in C. If you find your way around these problems, I would very much like to hear about it. I stopped for these reasons. I did the statements and expressions, though, and the grammar attached to this mail might help you or someone else getting started. I guess TOK_TYPENAME should not be defined as a token as i did, but I did not know better. After once having problems porting a flex generated lexer to an EBCDIC platform, I haven't used it since - so I don't have a .l file like you. But if you substitute the token names it should do. Regards, Martin M. Pedersen begin 666 grammar.y M+RHJ*BHJ*BHJ*BHJ*BHJ*BHJ*BHJ*BHJ*BHJ*BHJ*BHJ*BHJ*BHJ*BHJ*BHJ M.B!G<F%M;6%R+GDL=B Q+C0 ,C P,B\P-B\P,R R,#HT-3HT,"!M;7 17AP M("0-"B (" D4V]U<F-E.B O:&]M92]M;7 O87)C:&EV92]C=G-R;V]T+W!R M:B]D9G)O;G0O<W)C+V=R86UM87(N>2QV("0-" T**BHJ*BHJ*BHJ*BHJ*BHJ M*BHJ*BHJ*BHJ*BHJ*BHJ*BHJ*BHJ*BHJ*BHJ*BHJ*BHJ*BHJ*BHJ*BHJ*BHJ M('1A9ST]5$]+7T1/0T-/34U%3E0 *3L-" T*(" (&EF(" :7-A<VT *2![ M*BHJ*BHJ*BHJ*BHJ*BHJ*BHJ*BHJ*BHJ*BHJ*BHJ*BHJ*BHJ*BHJ*BHJ*BH- M*BHJ*BHJ*BHJ*BHJ*BHJ*BHJ*BHJ*BHJ*BHJ*BHJ*BHJ*BHJ*BHJ*BHJ*BHJ M3TM?4U!!0T4-"B5T;VME;B 5$]+7T-/34U%3E0-"B5T;VME;B 5$]+7T1/ M0T-/34U%3E0-"B5T;VME;B 5$]+7T)!1$-(05(-" T*+RH 4W!E8VEA;"!H M96X (%1/2U]35%))3D=?3$E415)!3 T*)71O:V5N("!43TM?24Y414=%4E], M;W)D<R!I;B!A;'!H86)E=&EC86P ;W)D97( *B\-"B5T;VME;B 5$]+7T%" M(%1/2U]"250-"B5T;VME;B 5$]+7T)/1%D-"B5T;VME;B 5$]+7T)214%+ M5 T*)71O:V5N("!43TM?0TA!4 T*)71O:V5N("!43TM?0TQ!4U,-"B5T;VME M=&]K96X (%1/2U]$3U5"3$4-"B5T;VME;B 5$]+7T5,4T4-"B5T;VME;B M;B 5$]+7T9/4 T*)71O:V5N("!43TM?1T]43PT*)71O:V5N("!43TM?248- M=&]K96X (%1/2U])3 T*)71O:V5N("!43TM?24Y/550-"B5T;VME;B 5$]+ M4DE!3E0-"B5T;VME;B 5$]+7TQ/3D<-"B5T;VME;B 5$]+7TY%5PT*)71O M5$]+7U-93D-(4D].25I%1 T*)71O:V5N("!43TM?5$A)4PT*)71O:V5N("!4 M3TM?5$A23U<-"B5T;VME;B 5$]+7U12544-"B5T;VME;B 5$]+7U1260T* M)71O:V5N("!43TM?5%E0141%1 T*)71O:V5N("!43TM?54)95$4-"B5T;VME M3$].1PT*)71O:V5N("!43TM?54Y)3TX-"B5T;VME;B 5$]+7U532$]25 T* M)71O:V5N("!43TM?5D524TE/3 T*)71O:V5N("!43TM?5D])1 T*)71O:V5N M+PT*)71O:V5N("!43TM?4TQ!4T (" (" (" (" (" O*B O(" ("HO M"B5T;VME;B 5$]+7T1/5" (" (" (" (" (" +RH +B (" J+PT* M=&]K96X (%1/2U]$3U1?1$]47T1/5" (" (" ("\J("XN+B *B\-"B5T M;VME;B 5$]+7T%.1" (" (" (" (" (" +RH )B (" J+PT*)71O M96X (%1/2U]!3D1?04Y$(" (" (" (" ("\J("8F(" *B\-"B5T;VME M;B 5$]+7T]2(" (" (" (" (" (" +RH ?" (" J+PT*)71O:V5N M(%1/2U]/4E]/4B (" (" (" (" ("\J('Q\(" *B\-"B5T;VME;B M5$]+7TU)3E53(" (" (" (" (" +RH +2 (" J+PT*)71O:V5N("!4 M2U]-24Y54U]-24Y54R (" (" ("\J("TM(" *B\-"B5T;VME;B 5$]+ M7U!,55, (" (" (" (" (" +RH *R (" J+PT*)71O:V5N("!43TM? M3%537U!,55, (" (" (" ("\J("LK(" *B\-"B5T;VME;B 5$]+7TQ4 M(" (" (" (" (" (" +RH /" (" J+PT*)71O:V5N("!43TM?3%1? M5" (" (" (" (" ("\J(#P\(" *B\-"B5T;VME;B 5$]+7TQ47TQ4 M7T51(" (" (" (" +RH /#P](" J+PT*)71O:V5N("!43TM?3%1?1U0 M42 (" (" (" ("\J(#P^/2 *B\-"B5T;VME;B 5$]+7T=4(" (" M(" (" (" (" +RH /B (" J+PT*)71O:V5N("!43TM?1U1?15$ (" M(" (" (" ("\J(#X^(" *B\-"B5T;VME;B 5$]+7T=47T=47T51(" M(" (" (" +RH /CX](" J+PT*)71O:V5N("!43TM?1U1?1U1?1U0 (" M(" (" ("\J(#X^/CT *B\-"B5T;VME;B 5$]+7TY/5" (" (" (" M(" (" +RH (2 (" J+PT*)71O:V5N("!43TM?3D]47T51(" (" (" M(" ("\J("$]/2 *B\-"B5T;VME;B 5$]+7TY/5%],5%]'5" (" (" M(" +RH (3P^(" J+PT*)71O:V5N("!43TM?3D]47TQ47T=47T51(" (" M(" O*B A/#X M("\J("$\(" *B\-"B5T;VME;B 5$]+7TY/5%],5%]%42 (" (" (" M+RH (3P](" J+PT*)71O:V5N("!43TM?3D]47T=4(" (" (" (" (" O M("$^/2 *B\-"B5T;VME;B 5$]+7U!!4D%.7T]014X (" (" (" +RH M*" (" J+PT*)71O:V5N("!43TM?4$%204Y?0TQ/4T4 (" (" (" O*B I M(" *B\-"B5T;VME;B 5$]+7T)204-+151?0TQ/4T4 (" (" +RH 72 M(" J+PT*)71O:V5N("!43TM?0U523%E?3U!%3B (" (" (" O*B![(" M*B\-"B5T;VME;B 5$]+7U%-05)+(" (" (" (" (" +RH /R (" J M+PT*)71O:V5N("!43TM?0T]-34$ (" (" (" (" (" O*B L(" ("HO M"B5T;VME;B 5$]+7T-/3$].(" (" (" (" (" +RH .B (" J+PT* M=&]K96X (%1/2U]%42 (" (" (" (" (" ("\J(#T (" *B\-"B5T M;VME;B 5$]+7T517T51(" (" (" (" (" +RH /3T (" J+PT*)71O M96X (%1/2U]35$%2(" (" (" (" (" ("\J("H (" *B\-"B5T;VME M;B 5$]+7U-405)?15$ (" (" (" (" +RH *CT (" J+PT*)71O:V5N M5$]+7UA/4B (" (" (" (" (" +RH 7B (" J+PT*)71O:V5N("!4 M2U].14< (" (" (" (" (" ("\J('X (" *B\-"B5T;VME;B 5$]+ M="!C;VYF;&EC=',Z( T*+2!D86YG;&EN9R B96QS92( :6X (FEF( T**B\- M*BHJ*BHJ*BHJ*BHJ*BHJ*BHJ*BHJ*BHJ*BHJ*BHJ*BHJ*BHJ*BHJ*BHJ*BHJ M*BHJ*BHJ*BHJ*BHJ*BHJ*BHJ*BHJ*BHJ*BHJ*BHJ*BHJ*BHJ*BHJ*BHJ*BHJ M*BHJ*BHJ*BHJ*BHJ*BHJ*BHJ*B\-" T*:6YP=70-"B (" Z('-T871E;65N M*BHJ*BHJ*BHJ*BHJ*BHJ*BHJ*BHJ*BHJ*BHJ*BHJ*BHJ*BHJ*BHJ*BHJ*BHJ M*BHJ*BHJ*BHJ*BHJ*BHJ*BHJ*BHJ*BHJ*BHJ*BHJ*BHJ*BHJ*BHJ*BHJ*BHJ M;&ES= T*(" (#H M<W1A=&5M96YT7VQI<W0-"B (" Z('-T871E;65N= T*(" ('P <W1A=&5M M=&%T96UE;G0-"B ("!\(&EF7W-T871E;65N= T*(" ('P =VAI;&5?<W1A M=&%T96UE;G0-"B ("!\('-W:71C:%]S=&%T96UE;G0-"B ("!\(&-A<V5? M?"!W:71H7W-T871E;65N= T*(" ('P <WEN8VAR;VYI>F5D7W-T871E;65N M= T*(" ('P =')Y7W-T871E;65N= T*(" ('P =&AR;W=?<W1A=&5M96YT M;65N= T*(" (#H M14U)( T*(" (#L- M87)I86)L95]D96-L87)A=&EO;B!43TM?4T5-20T*(" (#L-" T*:69?<W1A M('-T871E;65N= T*(" (#L- M3TM?5TA)3$4 5$]+7U!!4D%.7T]014X 97AP<F5S<VEO;B!43TM?4$%204Y? M"B (" Z(%1/2U]$3R!S=&%T96UE;G0 5$]+7U=(24Q%(%1/2U]005)!3E]/ M;FET(%1/2U]314U)(&]P=%]E>'!R97-S:6]N(%1/2U]314U)(&]P=%]E>'!R M=&%T96UE;G0-"B (" Z(%1/2U]35TE40T 5$]+7U!!4D%.7T]014X 97AP M;65N= T*(" (#H 5$]+7T1%1D%53%0 5$]+7T-/3$].('-T871E;65N= T* M(" (#L- M871E;65N= T*(" (#H 5$]+7T)214%+(&]P=%]I9&5N=&EF:65R(%1/2U]3 M871E;65N= T*(" (#H 5$]+7T=/5$\ :61E;G1I9FEE<B!43TM?4T5-20T* M(" (#L-" T*=VET:%]S=&%T96UE;G0-"B (" Z(%1/2U]7251((%1/2U]0 M05)!3E]/4$5.(&5X<')E<W-I;VX 5$]+7U!!4D%.7T-,3U-%(&)L;V-K7W-T M871E;65N= T*(" (#L-" T*<WEN8VAR;VYI>F5D7W-T871E;65N= T*(" M(#H M2%)/3DE:140 5$]+7U!!4D%.7T]014X 97AP<F5S<VEO;B!43TM?4$%204Y? M"F-A=&-H97,-"B (" Z(&QA<W1C871C: T*(" ('P 8V%T8V -"B ("!\ M.B O*B!E;7!T>2 J+PT*(" ('P 5$]+7T9)3D%,3%D 8FQO8VM?<W1A=&5M M;5]O<&5N(&]P=%]A<VU?:6YS=')U8W1I;VY?;&ES="!A<VU?8VQO<V4-"B M('L-"B (" (" (" (&ES87-M(#T ,#L-"B (" (" ?0T*(" (#L- M(&%S;5]I;G-T<G5C=&EO;B!43TM?4T5-22!A<VU?:6YS=')U8W1I;VY?;&ES M= T*(" (#L- M"B ("!687)I86)L92!D96-L87)A=&EO;G,-" T**BHJ*BHJ*BHJ*BHJ*BHJ M*BHJ*BHJ*BHJ*BHJ*BHJ*BHJ*BHJ*BHJ*BHJ*BHJ*BHJ*BHJ*BHJ*BHJ*BHJ M*B!43T1/.B!);7!L96UE;G0 5'EP92 J+PT*(" (#H 5$]+7U194$5.04U% M(&1E8VQA<F5D7VYA;65?;&ES= T*(" (#L-" T*9&5C;&%R961?;F%M95]L M<F5D7VYA;64-"B (" Z(&ED96YT:69I97( ;W!T7VEN:71I86QI>F%T:6]N M5$]+7T51(&%S<VEG;FUE;G1?97AP<F5S<VEO; T*(" (#L- M*BHJ*BHJ*BHJ*BHJ*BHJ*BHJ*BHJ*BHJ*BHJ*BHJ*BHJ*BHJ*BHJ*BHJ*BHJ M(#L-" T*97AP<F5S<VEO; T*(" (#H 87-S:6=N;65N=%]E>'!R97-S:6]N M8V]N9&ET:6]N86Q?97AP<F5S<VEO; T*(" ('P 87-S:6=N;65N=%]E>'!R M97-S:6]N(&%S<VEG;FUE;G1?;W!E<F%T;W( 8V]N9&ET:6]N86Q?97AP<F5S M<VEO; T*(" (#L-" T*87-S:6=N;65N=%]O<&5R871O< T*(" (#H 5$]+ M3TM?4$520T5.5%]%40T*(" ('P 5$]+7T%.1%]%40T*(" ('P 5$]+7T]2 M?"!43TM?3%1?3%1?15$-"B ("!\(%1/2U]'5%]'5%]%40T*(" ('P 5$]+ M;W)?97AP<F5S<VEO;B!43TM?44U!4DL 97AP<F5S<VEO;B!43TM?0T],3TX M8V]N9&ET:6]N86Q?97AP<F5S<VEO; T*(" (#L-" T*;&]G:6-A;%]O<E]E M?"!L;V=I8V%L7V]R7V5X<')E<W-I;VX 5$]+7T]27T]2(&QO9VEC86Q?86YD M"B (" Z(&]R7V5X<')E<W-I;VX-"B ("!\(&QO9VEC86Q?86YD7V5X<')E M<W-I;VX 5$]+7T%.1%]!3D0 ;W)?97AP<F5S<VEO; T*(" (#L-" T*;W)? M97AP<F5S<VEO; T*(" (#H >&]R7V5X<')E<W-I;VX-"B ("!\(&]R7V5X M97AP<F5S<VEO; T*(" (#H 86YD7V5X<')E<W-I;VX-"B ("!\('AO<E]E M?"!A;F1?97AP<F5S<VEO;B!43TM?04Y$(&5Q=6%L:71Y7V5X<')E<W-I;VX- M86Q?97AP<F5S<VEO; T*(" ('P 97%U86QI='E?97AP<F5S<VEO;B!E<75A M2U].3U1?15$-"B ("!\(%1/2U]%45]%45]%40T*(" ('P 5$]+7TY/5%]% M45]%40T*(" (#L- M(%1/2U].3U1?3%1?1U1?15$-"B ("!\(%1/2U].3U1?3%1?1U0-"B ("!\ M:69T7V5X<')E<W-I;VX-"B (" Z(&%D9&ET:79E7V5X<')E<W-I;VX-"B M("!\('-H:69T7V5X<')E<W-I;VX <VAI9G1?;W!E<F%T;W( 861D:71I=F5? M97AP<F5S<VEO; T*(" (#L-" T*<VAI9G1?;W!E<F%T;W(-"B (" Z(%1/ M8V%T:79E7V5X<')E<W-I;VX-"B ("!\(&%D9&ET:79E7V5X<')E<W-I;VX M861D:71I=F5?;W!E<F%T;W( ;75L=&EP;&EC871I=F5?97AP<F5S<VEO; T* M(" (#L- M<&QI8V%T:79E7V5X<')E<W-I;VX-"B (" Z('5N87)Y7V5X<')E<W-I;VX- M"B ("!\(&UU;'1I<&QI8V%T:79E7V5X<')E<W-I;VX ;75L=&EP;&EC871I M=F5?;W!E<F%T;W( =6YA<GE?97AP<F5S<VEO; T*(" (#L-" T*;75L=&EP M3D0 =6YA<GE?97AP<F5S<VEO; T*(" ('P 5$]+7U!,55-?4$Q54R!U;F%R M<F5S<VEO; T*(" ('P 5$]+7U-405( =6YA<GE?97AP<F5S<VEO; T*(" M('P 5$]+7TU)3E53('5N87)Y7V5X<')E<W-I;VX-"B ("!\(%1/2U]03%53 M('5N87)Y7V5X<')E<W-I;VX-"B ("!\(%1/2U].3U0 =6YA<GE?97AP<F5S M3TM?1$5,151%('5N87)Y7V5X<')E<W-I;VX-"B ("!\(%1/2U].15< ;F5W M7V5X<')E<W-I;VX-"B ("!\(%1/2U]005)!3E]/4$5.(%1/2U]465!%3D%- M12!43TM?4$%204Y?0TQ/4T4 =6YA<GE?97AP<F5S<VEO; T*(" ('P 5$]+ M(" (#H M<W-I;VX 5$]+7U!,55-?4$Q54PT*(" ('P <&]S=&9I>%]E>'!R97-S:6]N M(%1/2U]-24Y54U]-24Y54PT*(" ('P <&]S=&9I>%]E>'!R97-S:6]N(%1/ M(" ?"!B;V]L7VQI=&5R86P-"B ("!\(&YU;65R:6-?;&ET97)A; T*(" M87)G=6UE;G1?;&ES= T*(" (#H +RH 96UP='D *B\-"B ("!\(&%R9W5M M;FUE;G1?97AP<F5S<VEO; T*(" ('P 87)G=6UE;G1?;&ES="!43TM?0T]- M<VEO; T*(" (#H 5$]+7U194$5.04U%("\J(%1/1$\Z(&9I>" J+PT*(" M(#L-" T*+RHJ*BHJ*BHJ*BHJ*BHJ*BHJ*BHJ*BHJ*BHJ*BHJ*BHJ*BHJ*BHJ M(" 3&ET97)A;',-" T**BHJ*BHJ*BHJ*BHJ*BHJ*BHJ*BHJ*BHJ*BHJ*BHJ M*BHJ*BHJ*BHJ*BHJ*BHJ*BHJ*BHJ*BHJ*BHJ*BHJ*BHJ*BHJ*BHJ*BHJ*BHJ ` end
Jun 11 2002
next sibling parent reply "Walter" <walter digitalmars.com> writes:
Why not just use lexer.c and parse.c more or less directly?
Jun 11 2002
next sibling parent C.R.Chafer <blackmarlin nospam.asean-mail.com> writes:
Walter wrote:

 Why not just use lexer.c and parse.c more or less directly?
No offence, but they are written in C++, which will not interface to BTL and would probably be a pain to interface to GCC. And anyway the best way to understand a language fully is to write a compiler for it. PS: Thanks to MMP for the parser yacc source, my version mainly consists of rebuilding a grammar I wrote some time ago to parse oomic - just got some shift/reduce conflicts to resolve. C 2002/6/12
Jun 12 2002
prev sibling parent "Martin M. Pedersen" <mmp www.moeller-pedersen.dk> writes:
"Walter" <walter digitalmars.com> wrote in message
news:ae60k2$iq5$1 digitaldaemon.com...
 Why not just use lexer.c and parse.c more or less directly?
My reasons are the same as described by C.R.Chafer, portability and understanding. Also, using a tool like bison documents the grammar, makes it easier to maintain and experiment with, and it pinpoints ambiguties in the grammar. Regards, Martin M. Pedersen
Jun 12 2002
prev sibling parent reply Erik de Castro Lopo <nospam mega-nerd.com> writes:
"Martin M. Pedersen" wrote:
 
 It seems you have started what I did too :-) However, I ran into problems
 with the declaration grammar. The problem is that bison does not do N
 look-ahead that seems to be required by the D grammar - at least if
 "parse.c" is to be translated more or less directly. The problem is the
 Parse::is...() methods in "parse.c" that makes the grammar LALR(N). I have
 not figured out any way to solve this problem. Another problem is that you
 cannot easily destinguish type names from other names as in C. If you find
 your way around these problems, I would very much like to hear about it. I
 stopped for these reasons.
Bison is also able to generate GLR parsers (since 1.50) via the "%glr-parser" option. Have you looked that using this? Erik -- +-----------------------------------------------------------+ Erik de Castro Lopo nospam mega-nerd.com (Yes it's valid) +-----------------------------------------------------------+ Fundamentalist : Someone who is colour blind and yet wants everyone else to see the world with the same lack of colour.
Jul 09 2004
parent reply "Nicolas Lehuen" <nicolas.lehuen thecrmcompany.com> writes:
I'm not expert in the parsing department, but I used ANTLR =
(http://www.antlr.org/) which is a LL(k) parser generator, outputing =

well ?

Regards,
Nicolas Lehuen

"Erik de Castro Lopo" <nospam mega-nerd.com> a =E9crit dans le message =
de news:40EF24E3.B5BE2EB5 mega-nerd.com...
 "Martin M. Pedersen" wrote:
=20
 It seems you have started what I did too :-) However, I ran into =
problems
 with the declaration grammar. The problem is that bison does not do =
N
 look-ahead that seems to be required by the D grammar - at least if
 "parse.c" is to be translated more or less directly. The problem is =
the
 Parse::is...() methods in "parse.c" that makes the grammar LALR(N). =
I have
 not figured out any way to solve this problem. Another problem is =
that you
 cannot easily destinguish type names from other names as in C. If =
you find
 your way around these problems, I would very much like to hear about =
it. I
 stopped for these reasons.
=20 Bison is also able to generate GLR parsers (since 1.50) via the=20 "%glr-parser" option. Have you looked that using this? =20 Erik --=20 +-----------------------------------------------------------+ Erik de Castro Lopo nospam mega-nerd.com (Yes it's valid) +-----------------------------------------------------------+ Fundamentalist : Someone who is colour blind and yet wants everyone else to see the world with the same lack of colour.
Oct 01 2004
parent Deja Augustine <deja scratch-ware.net> writes:
Nicolas Lehuen wrote:
 I'm not expert in the parsing department, but I used ANTLR

or C++. Maybe they could be nudged into generating D code as well ?
 
 Regards,
 Nicolas Lehuen
 
 "Erik de Castro Lopo" <nospam mega-nerd.com> a écrit dans le message de
news:40EF24E3.B5BE2EB5 mega-nerd.com...
 
"Martin M. Pedersen" wrote:

It seems you have started what I did too :-) However, I ran into problems
with the declaration grammar. The problem is that bison does not do N
look-ahead that seems to be required by the D grammar - at least if
"parse.c" is to be translated more or less directly. The problem is the
Parse::is...() methods in "parse.c" that makes the grammar LALR(N). I have
not figured out any way to solve this problem. Another problem is that you
cannot easily destinguish type names from other names as in C. If you find
your way around these problems, I would very much like to hear about it. I
stopped for these reasons.
Bison is also able to generate GLR parsers (since 1.50) via the "%glr-parser" option. Have you looked that using this? Erik -- +-----------------------------------------------------------+ Erik de Castro Lopo nospam mega-nerd.com (Yes it's valid) +-----------------------------------------------------------+ Fundamentalist : Someone who is colour blind and yet wants everyone else to see the world with the same lack of colour.
To the best of my knowledge, Andy Friesen on the d.D newsgroup posted his D ANTLR grammar. Might be able to save yourself some work. -Deja
Oct 02 2004