digitalmars.D.announce - Pegged, a Parsing Expression Grammar (PEG) generator in D
- Philippe Sigaud (80/80) Mar 10 2012 Hello,
- =?UTF-8?B?QWxleCBSw7hubmUgUGV0ZXJzZW4=?= (5/80) Mar 10 2012 Admittedly I have not heard of PEGs before, so I'm curious: Is this
- Andrej Mitrovic (13/13) Mar 10 2012 I see you are not the only one who started writing string array
- bearophile (5/7) Mar 10 2012 To avoid that bug:
- Andrei Alexandrescu (20/33) Mar 10 2012 I, too, think this is very significant work! Suggestion for Philippe:
- Philippe Sigaud (11/28) Mar 10 2012 It's already implemented! No need for ';'
- Andrei Alexandrescu (5/10) Mar 10 2012 Great! I think you'd be wise to add the ";", otherwise it's impossible
- Philippe Sigaud (11/14) Mar 10 2012 Complex rules can be multi-line, I'll put more examples of this in the d...
- Philippe Sigaud (5/6) Mar 10 2012 <- ...)
- Philippe Sigaud (8/10) Mar 10 2012 ful
- Andrei Alexandrescu (4/12) Mar 10 2012 Any chance you consider adding AST generator actions as discussed in the...
- Philippe Sigaud (11/13) Mar 10 2012 The AST is automatically produced, and there are already AST actions
- Andrei Alexandrescu (5/18) Mar 10 2012 I was thinking of ANTLR-style operators in which you say where the root
- Philippe Sigaud (25/28) Mar 11 2012 Get it, thanks for the ref!
- Andrei Alexandrescu (7/26) Mar 11 2012 Not getting this, but picture me with an intense look and nodding.
- =?UTF-8?B?QWxleCBSw7hubmUgUGV0ZXJzZW4=?= (7/15) Mar 11 2012 Oh, I should have mentioned I only meant the actual language (ignoring
- Philippe Sigaud (11/16) Mar 11 2012 com>
- =?UTF-8?B?QWxleCBSw7hubmUgUGV0ZXJzZW4=?= (5/22) Mar 11 2012 Hm, I don't *think* C has such ambiguities but I could well be wrong. In...
- Philippe Sigaud (4/5) Mar 11 2012 any case, if it can handle the non-ambiguous case, that's enough for me.
- =?UTF-8?B?QWxleCBSw7hubmUgUGV0ZXJzZW4=?= (5/10) Mar 11 2012 Yep, see: http://ssw.jku.at/Coco/ (C.atg)
- Philippe Sigaud (18/31) Mar 10 2012 Yes, I use what I call "Haskell-comma" (I saw it first on Haskell
- Jay Norwood (28/31) Mar 12 2012 I've just read a few articles referenced from this page, and the
- Jay Norwood (4/13) Mar 12 2012 Also in the later paper he did a C parser, so I suppose that is
- Jonathan M Davis (5/9) Mar 10 2012 Cool! PEGs aren't used all that much for language grammars - primarily b...
- Philippe Sigaud (3/11) Mar 10 2012 Is there an EBNF grammar for D somewhere, I mean outside the dlang docs?
- Jonathan M Davis (6/21) Mar 10 2012 Someone may have been trying to write one, but I don't believe that ther...
- Philippe Sigaud (2/7) Mar 10 2012 Hmm. I'll post on D.main and see.
- =?UTF-8?B?QWxleCBSw7hubmUgUGV0ZXJzZW4=?= (4/79) Mar 11 2012 Question: Are the generated parsers, AST nodes, etc classes or structs?
- Philippe Sigaud (3/4) Mar 11 2012 They are structs. See:
- Tom Compton (3/5) Aug 08 2013 Why is this important?
- kraybourne (5/9) Mar 11 2012 Very cool!
- Philippe Sigaud (13/14) Mar 11 2012 space-insensitivity, where might one find this?
- =?UTF-8?B?QWxleCBSw7hubmUgUGV0ZXJzZW4=?= (12/87) Mar 11 2012 By the way, bootstrap.d seems to fail to build at the moment:
- =?UTF-8?B?QWxleCBSw7hubmUgUGV0ZXJzZW4=?= (5/117) Mar 11 2012 Also, I have sent a pull request to fix the build on 64-bit:
- Philippe Sigaud (2/3) Mar 11 2012 https://github.com/PhilippeSigaud/Pegged/pull/1
- Philippe Sigaud (10/17) Mar 11 2012 Hmm, it compiled for me a few hours ago. I'll see if I broke something
- =?UTF-8?B?QWxleCBSw7hubmUgUGV0ZXJzZW4=?= (4/22) Mar 11 2012 Is bootstrap.d currently essential to actually use Pegged?
- =?UTF-8?B?QWxleCBSw7hubmUgUGV0ZXJzZW4=?= (5/80) Mar 11 2012 Hm, since ' is used in the grammar of Pegged, how do I express it in my
- =?UTF-8?B?QWxleCBSw7hubmUgUGV0ZXJzZW4=?= (5/110) Mar 11 2012 Ah, Quote it is. I also noticed there is DoubleQuote. Is " used for
- bls (26/31) Mar 12 2012 Just WOW!
- Dmitry Olshansky (6/20) Mar 13 2012 Maybe A <- B / C. And even then it's not exactly equivalent if the
- bls (38/59) Mar 12 2012 PEG is pretty new to me. Can you elaborate a bit ?
- Dmitry Olshansky (26/87) Mar 13 2012 PEG defines order of alternatives, that is pretty much like a top-down
- =?ISO-8859-1?Q?Alex_R=F8nne_Petersen?= (5/105) Mar 13 2012 That was an illustrative example from the Pegged docs. But yeah, you
- Philippe Sigaud (9/18) Mar 13 2012 ld
- Philippe Sigaud (20/37) Mar 13 2012 to
- Dmitry Olshansky (6/39) Mar 14 2012 Hey, wait, I thought there has to be backtrack here, i.e. when second
- Philippe Sigaud (8/14) Mar 16 2012 PEG only do local backtracking in 'or' choices, not in sequences. I
- Dmitry Olshansky (32/47) Mar 17 2012 Ok, let's agree on fact that semantically
- Philippe Sigaud (12/30) Mar 17 2012 It's local, yes. But the pb is with Expr <- A* B C D, when D fails.
- Dmitry Olshansky (33/70) Mar 17 2012 I'd argue they do. As I see it as:
- Philippe Sigaud (17/33) Mar 17 2012 That's what people doing Regex-to-PEG translations do, yes. But it's
- =?ISO-8859-1?Q?Alex_R=F8nne_Petersen?= (7/41) Mar 13 2012 The thing is, PEG cannot represent an ambiguous grammar. It is fully
- Philippe Sigaud (7/14) Mar 13 2012 Thanks! Don't be too excited, it's still quite slow as a parser. But
- Tobias Pankrath (6/6) Mar 13 2012 I am impressed. That's a really nice showcase for the D compile
- Jay Norwood (9/10) Mar 14 2012 In reading the D spec I've seen a few instance where there are
- Timon Gehr (3/14) Mar 15 2012 This is not what a parser does. Resolving symbols and assigning types to...
- Martin Nowak (7/16) Mar 22 2012 Just wanted to draw your attention on a pull request of mine.
- Elie Morisse (9/9) Aug 07 2013 Hi Philippe,
Hello, I created a new Github project, Pegged, a Parsing Expression Grammar (PEG) generator in D. https://github.com/PhilippeSigaud/Pegged docs: https://github.com/PhilippeSigaud/Pegged/wiki PEG: http://en.wikipedia.org/wiki/Parsing_expression_grammar The idea is to give the generator a PEG with the standard syntax. From this grammar definition, a set of related parsers will be created, to be used at runtime or compile time. Usage ----- To use Pegged, just call the `grammar` function with a PEG and mix it in. For example: import pegged.grammar; mixin(grammar(" Expr <- Factor AddExpr* AddExpr <- ('+'/'-') Factor Factor <- Primary MulExpr* MulExpr <- ('*'/'/') Primary Primary <- Parens / Number / Variable / '-' Primary Parens <- '(' Expr ')' Number <~ [0-9]+ Variable <- Identifier ")); This creates the `Expr`, `AddExpr`, `Factor` (and so on) parsers for basic arithmetic expressions with operator precedence ('*' and '/' bind stronger than '+' or '-'). `Identifier` is a pre-defined parser recognizing your basic C-style identifier. Recursive or mutually recursive rules are OK (no left recursion for now). To use a parser, use the `.parse` method. It will return a parse tree containing the calls to the different rules: // Parsing at compile-time: enum parseTree1 = Expr.parse("1 + 2 - (3*x-5)*6"); pragma(msg, parseTree1.capture); writeln(parseTree1); // And at runtime too: auto parseTree2 = Expr.parse(" 0 + 123 - 456 "); assert(parseTree2.capture == ["0", "+", "123", "-", "456"]); Features -------- * The complete set of PEG operators are implemented * Pegged can parse its input at compile time and generate a complete parse tree at compile time. In a word: compile-time string (read: D code) transformation and generation. * You can parse at runtime also, you lucky you. * Use a standard and readable PEG syntax as a DSL, not a bunch of templates that hide the parser in noise. * But you can use expression templates if you want, as parsers are all available as such. Pegged is implemented as an expression template, and what's good for the library writer is sure OK for the user too. * Some useful additional operators are there too: a way to discard matches (thus dumping them from the parse tree), to push captures on a stack, to accept matches that are equal to another match * Adding new parsers is easy. * Grammars are composable: you can put different `mixin(grammar(rules));` in a module and then grammars and rules can refer to one another. That way, you can have utility grammars providing their functionalities to other grammars. * That's why Pegged comes with some pre-defined grammars (JSON, etc). * Grammars can be dumped in a file to create a D module. More advanced features, outside the standard PEG perimeter are there to bring more power in the mix: * Parametrized rules: `List(E, Sep) <- E (Sep E)*` is possible. The previous rule defines a parametrized parser taking two other parsers (namely, `E` and `Sep`) to match a `Sep`-separated list of `E`'s. * Named captures: any parser can be named with the `=` operator. The parse tree generated by the parser (so, also its matches) is delivered to the user in the output. Other parsers in the grammar see the named captures too. * Semantic actions can be added to any rule in a grammar. Once a rule has matched, its associated action is called on the rule output and passed as final result to other parsers further up the grammar. Do what you want to the parse tree. If the passed actions are delegates, they can access external variables. Philippe
Mar 10 2012
On 11-03-2012 00:28, Philippe Sigaud wrote:Hello, I created a new Github project, Pegged, a Parsing Expression Grammar (PEG) generator in D. https://github.com/PhilippeSigaud/Pegged docs: https://github.com/PhilippeSigaud/Pegged/wiki PEG: http://en.wikipedia.org/wiki/Parsing_expression_grammar The idea is to give the generator a PEG with the standard syntax. From this grammar definition, a set of related parsers will be created, to be used at runtime or compile time. Usage ----- To use Pegged, just call the `grammar` function with a PEG and mix it in. For example: import pegged.grammar; mixin(grammar(" Expr <- Factor AddExpr* AddExpr <- ('+'/'-') Factor Factor <- Primary MulExpr* MulExpr <- ('*'/'/') Primary Primary <- Parens / Number / Variable / '-' Primary Parens <- '(' Expr ')' Number <~ [0-9]+ Variable <- Identifier ")); This creates the `Expr`, `AddExpr`, `Factor` (and so on) parsers for basic arithmetic expressions with operator precedence ('*' and '/' bind stronger than '+' or '-'). `Identifier` is a pre-defined parser recognizing your basic C-style identifier. Recursive or mutually recursive rules are OK (no left recursion for now). To use a parser, use the `.parse` method. It will return a parse tree containing the calls to the different rules: // Parsing at compile-time: enum parseTree1 = Expr.parse("1 + 2 - (3*x-5)*6"); pragma(msg, parseTree1.capture); writeln(parseTree1); // And at runtime too: auto parseTree2 = Expr.parse(" 0 + 123 - 456 "); assert(parseTree2.capture == ["0", "+", "123", "-", "456"]); Features -------- * The complete set of PEG operators are implemented * Pegged can parse its input at compile time and generate a complete parse tree at compile time. In a word: compile-time string (read: D code) transformation and generation. * You can parse at runtime also, you lucky you. * Use a standard and readable PEG syntax as a DSL, not a bunch of templates that hide the parser in noise. * But you can use expression templates if you want, as parsers are all available as such. Pegged is implemented as an expression template, and what's good for the library writer is sure OK for the user too. * Some useful additional operators are there too: a way to discard matches (thus dumping them from the parse tree), to push captures on a stack, to accept matches that are equal to another match * Adding new parsers is easy. * Grammars are composable: you can put different `mixin(grammar(rules));` in a module and then grammars and rules can refer to one another. That way, you can have utility grammars providing their functionalities to other grammars. * That's why Pegged comes with some pre-defined grammars (JSON, etc). * Grammars can be dumped in a file to create a D module. More advanced features, outside the standard PEG perimeter are there to bring more power in the mix: * Parametrized rules: `List(E, Sep) <- E (Sep E)*` is possible. The previous rule defines a parametrized parser taking two other parsers (namely, `E` and `Sep`) to match a `Sep`-separated list of `E`'s. * Named captures: any parser can be named with the `=` operator. The parse tree generated by the parser (so, also its matches) is delivered to the user in the output. Other parsers in the grammar see the named captures too. * Semantic actions can be added to any rule in a grammar. Once a rule has matched, its associated action is called on the rule output and passed as final result to other parsers further up the grammar. Do what you want to the parse tree. If the passed actions are delegates, they can access external variables. PhilippeAdmittedly I have not heard of PEGs before, so I'm curious: Is this powerful enough to parse a language such as C? -- - Alex
Mar 10 2012
I see you are not the only one who started writing string array literals like this: enum PEGCode = grammarCode!( "Grammar <- S Definition+ EOI" ,"Definition <- RuleName Arrow Expression" ,"RuleName <- Identifier>(ParamList?)" ,"Expression <- Sequence (OR Sequence)*" ); IOW comma on the left side. I know it's not a style preference but actually a (unfortunate but needed) technique for avoiding bugs. :) So can you use this to actually parse D code, extract syntax trees and stuff like that? I'm clueless about parsing but it looks very neat. Nice work!
Mar 10 2012
Andrej Mitrovic:IOW comma on the left side. I know it's not a style preference but actually a (unfortunate but needed) technique for avoiding bugs. :)To avoid that bug: http://d.puremagic.com/issues/show_bug.cgi?id=3827 Bye, bearophile
Mar 10 2012
On 3/10/12 5:56 PM, Andrej Mitrovic wrote:I see you are not the only one who started writing string array literals like this: enum PEGCode = grammarCode!( "Grammar<- S Definition+ EOI" ,"Definition<- RuleName Arrow Expression" ,"RuleName<- Identifier>(ParamList?)" ,"Expression<- Sequence (OR Sequence)*" ); IOW comma on the left side. I know it's not a style preference but actually a (unfortunate but needed) technique for avoiding bugs. :) So can you use this to actually parse D code, extract syntax trees and stuff like that? I'm clueless about parsing but it looks very neat. Nice work!I, too, think this is very significant work! Suggestion for Philippe: instead of this: enum PEGCode = grammarCode!( "Grammar <- S Definition+ EOI" ,"Definition <- RuleName Arrow Expression" ,"RuleName <- Identifier>(ParamList?)" ,"Expression <- Sequence (OR Sequence)*" ); how about this: enum PEGCode = grammarCode!(" Grammar <- S Definition+ EOI; Definition <- RuleName Arrow Expression; RuleName <- Identifier>(ParamList?); Expression <- Sequence (OR Sequence)*; "); Splitting on ";" is trivial and makes client code considerably easier to play with. I'll get back to this. Andrei
Mar 10 2012
On Sun, Mar 11, 2012 at 08:26, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:I, too, think this is very significant work! Suggestion for Philippe: instead of this: enum PEGCode =3D grammarCode!( =C2=A0 =C2=A0 "Grammar <- S Definition+ EOI" =C2=A0 =C2=A0,"Definition <- RuleName Arrow Expression" =C2=A0 =C2=A0,"RuleName =C2=A0 <- Identifier>(ParamList?)" =C2=A0 =C2=A0,"Expression <- Sequence (OR Sequence)*" ); how about this: enum PEGCode =3D grammarCode!(" =C2=A0 =C2=A0Grammar <- S Definition+ EOI; =C2=A0 =C2=A0Definition <- RuleName Arrow Expression; =C2=A0 =C2=A0RuleName =C2=A0 <- Identifier>(ParamList?); =C2=A0 =C2=A0Expression <- Sequence (OR Sequence)*; "); Splitting on ";" is trivial and makes client code considerably easier to play with.It's already implemented! No need for ';' The docs are up to date, the internal code also. It's only an old bootstrapping file that Andrej cited that still contains multi-string definitions. I'll correct that, as that means I didn"t push dogfooding far enough. Now, just use one string: "Grammar <- ... Definition <- ... "
Mar 10 2012
On 3/11/12 1:35 AM, Philippe Sigaud wrote:On Sun, Mar 11, 2012 at 08:26, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:Great! I think you'd be wise to add the ";", otherwise it's impossible to split complex rules on more than one line. Alternatively you could have the complement, a line continuation thingie. AndreiSplitting on ";" is trivial and makes client code considerably easier to play with.It's already implemented! No need for ';'
Mar 10 2012
On Sun, Mar 11, 2012 at 08:47, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:Great! I think you'd be wise to add the ";", otherwise it's impossible to split complex rules on more than one line. Alternatively you could have the complement, a line continuation thingie.Complex rules can be multi-line, I'll put more examples of this in the docs. Now that Pegged is bootstrapped (the grammar generator was generated by Pegged itself), it's easy to change the grammar. Anyway, the separating if done on the rule definitions (Identifier <- ...) A <- B C / C E <- New Def
Mar 10 2012
On Sun, Mar 11, 2012 at 08:53, Philippe Sigaud <philippe.sigaud gmail.com> wrote:Anyway, the separating if done on the rule definitions =C2=A0(Identifier =<- ...) Aw, I meant, the separation is done on rule definitions. Damn children climbing on me to get more breakfast :-)
Mar 10 2012
On Sun, Mar 11, 2012 at 00:34, Alex R=C3=B8nne Petersen <xtzgzorex gmail.co= m> wrote:Admittedly I have not heard of PEGs before, so I'm curious: Is this power=fulenough to parse a language such as C?I think so. But you'd have to do add some semantic action to deal with typedefs and macros. People parsed Java and Javascript with them. I personnally never used Pegged for more than JSON and custom formats, but I intend to try the D grammar.
Mar 10 2012
On 3/11/12 1:22 AM, Philippe Sigaud wrote:On Sun, Mar 11, 2012 at 00:34, Alex Rønne Petersen<xtzgzorex gmail.com> wrote:Any chance you consider adding AST generator actions as discussed in the main forum a while ago? AndreiAdmittedly I have not heard of PEGs before, so I'm curious: Is this powerful enough to parse a language such as C?I think so. But you'd have to do add some semantic action to deal with typedefs and macros. People parsed Java and Javascript with them. I personnally never used Pegged for more than JSON and custom formats, but I intend to try the D grammar.
Mar 10 2012
On Sun, Mar 11, 2012 at 08:26, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:Any chance you consider adding AST generator actions as discussed in the main forum a while ago?The AST is automatically produced, and there are already AST actions to simplify / guide its creation. There already is an 'ignore that node' syntax and a 'fuse the captures here'. Also, semantic actions are there, so I think the basis is there, I just need to add a few tree actions : cut children, take only some children. I'll add it today. But maybe you had something else in mind? I'll go and search for the thread you refer to.
Mar 10 2012
On 3/11/12 1:38 AM, Philippe Sigaud wrote:On Sun, Mar 11, 2012 at 08:26, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:I was thinking of ANTLR-style operators in which you say where the root should be and you get to drop unnecessary nodes. (Post on 2012/02/29 10:19AM GMT-0600.) AndreiAny chance you consider adding AST generator actions as discussed in the main forum a while ago?The AST is automatically produced, and there are already AST actions to simplify / guide its creation. There already is an 'ignore that node' syntax and a 'fuse the captures here'. Also, semantic actions are there, so I think the basis is there, I just need to add a few tree actions : cut children, take only some children. I'll add it today. But maybe you had something else in mind? I'll go and search for the thread you refer to.
Mar 10 2012
On Sun, Mar 11, 2012 at 08:51, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:I was thinking of ANTLR-style operators in which you say where the root should be and you get to drop unnecessary nodes. (Post on 2012/02/29 10:19AM GMT-0600.)Get it, thanks for the ref! There is an operator to drop unnecessary nodes (':'). Apart from that, a Pegged grammar is a self-contained entity: it automatically cuts nodes coming from other grammars, to simplify the tree (it keeps the matcheds substrings, of course). I plan to code different type of grammars, to play with the way the deal with nodes coming from outside rules and grammar composition. As for the root, the PEG rule is that the first rule in the grammar is the root. But currently, I deliberately made my grammars unsealed, in that the user can call any rule inside the grammar to begin parsing: mixin(grammar(" A <- B C B <- ... C <- ")); A.parse("some input"); // standard way to do it, the parse tree will be rooted on an 'A' node. B.parse("some input"); // also possible, the parse tree will be rooted on a 'B' node. I'll add what I called closed and sealed grammars. I'll also add named grammars and such in the days to come. I can add a root note indication ("unless told otherwise, when asked to parse with grammar G, begin with rule R).
Mar 11 2012
On 3/11/12 3:02 AM, Philippe Sigaud wrote:There is an operator to drop unnecessary nodes (':').Good.Apart from that, a Pegged grammar is a self-contained entity: it automatically cuts nodes coming from other grammars, to simplify the tree (it keeps the matcheds substrings, of course). I plan to code different type of grammars, to play with the way the deal with nodes coming from outside rules and grammar composition.Not getting this, but picture me with an intense look and nodding.As for the root, the PEG rule is that the first rule in the grammar is the root. But currently, I deliberately made my grammars unsealed, in that the user can call any rule inside the grammar to begin parsing: mixin(grammar(" A<- B C B<- ... C<- ")); A.parse("some input"); // standard way to do it, the parse tree will be rooted on an 'A' node. B.parse("some input"); // also possible, the parse tree will be rooted on a 'B' node.That sounds cool, but how do you say that in the construct Expression <- Term "+" Term the "+" should become the root? Andrei
Mar 11 2012
On 11-03-2012 08:22, Philippe Sigaud wrote:On Sun, Mar 11, 2012 at 00:34, Alex Rønne Petersen<xtzgzorex gmail.com> wrote:Oh, I should have mentioned I only meant the actual language (ignoring the preprocessor). Why do you need semantic actions for typedefs though? Can't you defer resolution of types until after parsing?Admittedly I have not heard of PEGs before, so I'm curious: Is this powerful enough to parse a language such as C?I think so. But you'd have to do add some semantic action to deal with typedefs and macros.People parsed Java and Javascript with them. I personnally never used Pegged for more than JSON and custom formats, but I intend to try the D grammar.-- - Alex
Mar 11 2012
com> wrote: [Parsing C?]On Sun, Mar 11, 2012 at 00:34, Alex R=C3=B8nne Petersen<xtzgzorex gmail.=the preprocessor). OK. I admit I downloaded the C spec online, but was a bit taken aback by the size of it. mot of it was the definition of the standard library, but still...I think so. But you'd have to do add some semantic action to deal with typedefs and macros.Oh, I should have mentioned I only meant the actual language (ignoringWhy do you need semantic actions for typedefs though? Can't you deferresolution of types until after parsing? Yes, that the way I'd do it. But some people seem to want to do it while parsing. Maybe it blocks some parsing, if the parser encounter an identifier where there should be a type?
Mar 11 2012
On 11-03-2012 18:06, Philippe Sigaud wrote:>> On Sun, Mar 11, 2012 at 00:34, Alex Rønne Petersen<xtzgzorex gmail.com <mailto:xtzgzorex gmail.com>> wrote: [Parsing C?] >> I think so. But you'd have to do add some semantic action to deal with >> typedefs and macros. > > > Oh, I should have mentioned I only meant the actual language (ignoring the preprocessor). OK. I admit I downloaded the C spec online, but was a bit taken aback by the size of it. mot of it was the definition of the standard library, but still... > Why do you need semantic actions for typedefs though? Can't you defer resolution of types until after parsing? Yes, that the way I'd do it. But some people seem to want to do it while parsing. Maybe it blocks some parsing, if the parser encounter an identifier where there should be a type?Hm, I don't *think* C has such ambiguities but I could well be wrong. In any case, if it can handle the non-ambiguous case, that's enough for me. :) -- - Alex
Mar 11 2012
Hm, I don't *think* C has such ambiguities but I could well be wrong. Inany case, if it can handle the non-ambiguous case, that's enough for me. I wanted to tackle D this week, but I might as well begin with C :) Do you happen to have any handy and readable EBNF grammar for C? At least for D, I have dlang.org.
Mar 11 2012
On 11-03-2012 18:19, Philippe Sigaud wrote:> Hm, I don't *think* C has such ambiguities but I could well be wrong. In any case, if it can handle the non-ambiguous case, that's enough for me. I wanted to tackle D this week, but I might as well begin with C :) Do you happen to have any handy and readable EBNF grammar for C? At least for D, I have dlang.org <http://dlang.org>.Yep, see: http://ssw.jku.at/Coco/ (C.atg) Coco/R is more or less EBNF. -- - Alex
Mar 11 2012
On Sun, Mar 11, 2012 at 00:56, Andrej Mitrovic <andrej.mitrovich gmail.com> wrote:I see you are not the only one who started writing string array literals like this: enum PEGCode =3D grammarCode!( =C2=A0 =C2=A0 "Grammar <- S Definition+ EOI" =C2=A0 =C2=A0,"Definition <- RuleName Arrow Expression" =C2=A0 =C2=A0,"RuleName =C2=A0 <- Identifier>(ParamList?)" =C2=A0 =C2=A0,"Expression <- Sequence (OR Sequence)*" ); IOW comma on the left side. I know it's not a style preference but actually a (unfortunate but needed) technique for avoiding bugs. :)Yes, I use what I call "Haskell-comma" (I saw it first on Haskell code). But in the previous sample, that's old code. Now, that should only be; enum code =3D grammar(" Grammar <- S Definition+ EOI Definition <- RuleName Arrow Expression RuleName =C2=A0 <- Identifier>(ParamList?) Expression <- Sequence (OR Sequence)* "); I see I didn't update bootstrap.d, I'll do that.So can you use this to actually parse D code, extract syntax trees and stuff like that? I'm clueless about parsing but it looks very neat.I can partially parse D code, because I only wrote part of the D grammar (see the pegged.examples.dgrammar module), just enough to parse most of template constraints. I intend to complete it and see if that floats. Yes, it extracts syntax trees, at compile-time or runtime.Nice work!Thanks!
Mar 10 2012
I've just read a few articles referenced from this page, and the second link was by someone who had done java 1.5, the second link http://bford.info/packrat/ http://www.romanredz.se/papers/FI2007.pdf It is interesting but that article left me with some questions about the implementation in order to make it useful for my needs. I had done an experiment with mvel expression evaluation in java and gotten good improvements relative to homebrew expression evaluators. However, the mvel expressions are missing the ability to express array operations clearly, which is something that is very clear in D, and my particular need is to enable the user to express array operations. With this pegged embedded parser, it appears to me you could provide a group of context symbols as part of a language definition, similar to providing a list of reserved words, so that the parsing of the user's expression would also validate the symbols used. Also, I've been reading David Simcha's parallel_algorithm.d, here: https://github.com/dsimcha/parallel_algorithm/blob/master/parallel_algorithm.d and in the parallelArrayOp portion, he has presented a way to turn the D array expressions into code that is executed in parallel on multicore systems. That is something I'd want to use, but the manual lexing requirement is a bit clunky, and the rules are unclear to me, so it seems to me a combination with the pegged parser could make that more accessible. Thanks, JayAdmittedly I have not heard of PEGs before, so I'm curious: Is this powerful enough to parse a language such as C?
Mar 12 2012
On Tuesday, 13 March 2012 at 05:25:38 UTC, Jay Norwood wrote:Also in the later paper he did a C parser, so I suppose that is the answer ... http://www.romanredz.se/papers/FI2008.pdfI've just read a few articles referenced from this page, and the second link was by someone who had done java 1.5, the second link http://bford.info/packrat/ http://www.romanredz.se/papers/FI2007.pdfAdmittedly I have not heard of PEGs before, so I'm curious: Is this powerful enough to parse a language such as C?
Mar 12 2012
On Sunday, March 11, 2012 00:28:42 Philippe Sigaud wrote:Hello, I created a new Github project, Pegged, a Parsing Expression Grammar (PEG) generator in D.Cool! PEGs aren't used all that much for language grammars - primarily because BNF and EBNF are more traditional I think - but they're definitely cool. And it's great to see a D implementation of a PEG parser. - Jonathan M Davis
Mar 10 2012
On Sun, Mar 11, 2012 at 08:30, Jonathan M Davis <jmdavisProg gmx.com> wrote:On Sunday, March 11, 2012 00:28:42 Philippe Sigaud wrote:Is there an EBNF grammar for D somewhere, I mean outside the dlang docs? I vaguely remember someone trying to write one.Hello, I created a new Github project, Pegged, a Parsing Expression Grammar (PEG) generator in D.Cool! PEGs aren't used all that much for language grammars - primarily because BNF and EBNF are more traditional I think - but they're definitely cool. And it's great to see a D implementation of a PEG parser.
Mar 10 2012
On Sunday, March 11, 2012 08:39:47 Philippe Sigaud wrote:On Sun, Mar 11, 2012 at 08:30, Jonathan M Davis <jmdavisProg gmx.com> wrote:Someone may have been trying to write one, but I don't believe that there's an official one, and I recall someone complaining the the BNF grammar was inaccurate in a number of places, but that may have been fixed now, since Walter did a number of documentation fixes a few weeks back. - Jonathan M DavisOn Sunday, March 11, 2012 00:28:42 Philippe Sigaud wrote:Is there an EBNF grammar for D somewhere, I mean outside the dlang docs? I vaguely remember someone trying to write one.Hello, I created a new Github project, Pegged, a Parsing Expression Grammar (PEG) generator in D.Cool! PEGs aren't used all that much for language grammars - primarily because BNF and EBNF are more traditional I think - but they're definitely cool. And it's great to see a D implementation of a PEG parser.
Mar 10 2012
On Sun, Mar 11, 2012 at 08:46, Jonathan M Davis <jmdavisProg gmx.com> wrote:Is there an EBNF grammar for D somewhere, I mean outside the dlang docs?Someone may have been trying to write one, but I don't believe that there's an official one, and I recall someone complaining the the BNF grammar was inaccurate in a number of places, but that may have been fixed now, since Walter did a number of documentation fixes a few weeks back.Hmm. I'll post on D.main and see.
Mar 10 2012
On 11-03-2012 00:28, Philippe Sigaud wrote:Hello, I created a new Github project, Pegged, a Parsing Expression Grammar (PEG) generator in D. https://github.com/PhilippeSigaud/Pegged docs: https://github.com/PhilippeSigaud/Pegged/wiki PEG: http://en.wikipedia.org/wiki/Parsing_expression_grammar The idea is to give the generator a PEG with the standard syntax. From this grammar definition, a set of related parsers will be created, to be used at runtime or compile time. Usage ----- To use Pegged, just call the `grammar` function with a PEG and mix it in. For example: import pegged.grammar; mixin(grammar(" Expr <- Factor AddExpr* AddExpr <- ('+'/'-') Factor Factor <- Primary MulExpr* MulExpr <- ('*'/'/') Primary Primary <- Parens / Number / Variable / '-' Primary Parens <- '(' Expr ')' Number <~ [0-9]+ Variable <- Identifier ")); This creates the `Expr`, `AddExpr`, `Factor` (and so on) parsers for basic arithmetic expressions with operator precedence ('*' and '/' bind stronger than '+' or '-'). `Identifier` is a pre-defined parser recognizing your basic C-style identifier. Recursive or mutually recursive rules are OK (no left recursion for now). To use a parser, use the `.parse` method. It will return a parse tree containing the calls to the different rules: // Parsing at compile-time: enum parseTree1 = Expr.parse("1 + 2 - (3*x-5)*6"); pragma(msg, parseTree1.capture); writeln(parseTree1); // And at runtime too: auto parseTree2 = Expr.parse(" 0 + 123 - 456 "); assert(parseTree2.capture == ["0", "+", "123", "-", "456"]); Features -------- * The complete set of PEG operators are implemented * Pegged can parse its input at compile time and generate a complete parse tree at compile time. In a word: compile-time string (read: D code) transformation and generation. * You can parse at runtime also, you lucky you. * Use a standard and readable PEG syntax as a DSL, not a bunch of templates that hide the parser in noise. * But you can use expression templates if you want, as parsers are all available as such. Pegged is implemented as an expression template, and what's good for the library writer is sure OK for the user too. * Some useful additional operators are there too: a way to discard matches (thus dumping them from the parse tree), to push captures on a stack, to accept matches that are equal to another match * Adding new parsers is easy. * Grammars are composable: you can put different `mixin(grammar(rules));` in a module and then grammars and rules can refer to one another. That way, you can have utility grammars providing their functionalities to other grammars. * That's why Pegged comes with some pre-defined grammars (JSON, etc). * Grammars can be dumped in a file to create a D module. More advanced features, outside the standard PEG perimeter are there to bring more power in the mix: * Parametrized rules: `List(E, Sep) <- E (Sep E)*` is possible. The previous rule defines a parametrized parser taking two other parsers (namely, `E` and `Sep`) to match a `Sep`-separated list of `E`'s. * Named captures: any parser can be named with the `=` operator. The parse tree generated by the parser (so, also its matches) is delivered to the user in the output. Other parsers in the grammar see the named captures too. * Semantic actions can be added to any rule in a grammar. Once a rule has matched, its associated action is called on the rule output and passed as final result to other parsers further up the grammar. Do what you want to the parse tree. If the passed actions are delegates, they can access external variables. PhilippeQuestion: Are the generated parsers, AST nodes, etc classes or structs? -- - Alex
Mar 11 2012
alex:Question: Are the generated parsers, AST nodes, etc classes or structs?They are structs. See: https://github.com/PhilippeSigaud/Pegged/wiki/Parse-Trees
Mar 11 2012
On Sunday, 11 March 2012 at 12:40:34 UTC, Alex Rønne Petersen wrote:Question: Are the generated parsers, AST nodes, etc classes or structs?Why is this important?
Aug 08 2013
On 3/11/12 24:28 , Philippe Sigaud wrote:Hello, I created a new Github project, Pegged, a Parsing Expression Grammar (PEG) generator in D. PhilippeVery cool! Quick question, you mention the ability to opt-out of the space-insensitivity, where might one find this? Thanks!
Mar 11 2012
Quick question, you mention the ability to opt-out of thespace-insensitivity, where might one find this? Yes, undocumented. Use the '>' operator. You know, I introduced space-insensitivity recently, to simplify some rules and it keeps biting me back. For example Line <- (!EOL .)* EOL The catch is, the (!EOL .) sequence accepts spaces (so, line terminators) between the !EOL and the . Crap. So, I keep writing Line <- (!EOL > .)* EOL And I'm more and more convinced that ws a bbad move on my part. Or, at least, give the user a way to opt-out for an entire rule.
Mar 11 2012
On 11-03-2012 00:28, Philippe Sigaud wrote:Hello, I created a new Github project, Pegged, a Parsing Expression Grammar (PEG) generator in D. https://github.com/PhilippeSigaud/Pegged docs: https://github.com/PhilippeSigaud/Pegged/wiki PEG: http://en.wikipedia.org/wiki/Parsing_expression_grammar The idea is to give the generator a PEG with the standard syntax. From this grammar definition, a set of related parsers will be created, to be used at runtime or compile time. Usage ----- To use Pegged, just call the `grammar` function with a PEG and mix it in. For example: import pegged.grammar; mixin(grammar(" Expr <- Factor AddExpr* AddExpr <- ('+'/'-') Factor Factor <- Primary MulExpr* MulExpr <- ('*'/'/') Primary Primary <- Parens / Number / Variable / '-' Primary Parens <- '(' Expr ')' Number <~ [0-9]+ Variable <- Identifier ")); This creates the `Expr`, `AddExpr`, `Factor` (and so on) parsers for basic arithmetic expressions with operator precedence ('*' and '/' bind stronger than '+' or '-'). `Identifier` is a pre-defined parser recognizing your basic C-style identifier. Recursive or mutually recursive rules are OK (no left recursion for now). To use a parser, use the `.parse` method. It will return a parse tree containing the calls to the different rules: // Parsing at compile-time: enum parseTree1 = Expr.parse("1 + 2 - (3*x-5)*6"); pragma(msg, parseTree1.capture); writeln(parseTree1); // And at runtime too: auto parseTree2 = Expr.parse(" 0 + 123 - 456 "); assert(parseTree2.capture == ["0", "+", "123", "-", "456"]); Features -------- * The complete set of PEG operators are implemented * Pegged can parse its input at compile time and generate a complete parse tree at compile time. In a word: compile-time string (read: D code) transformation and generation. * You can parse at runtime also, you lucky you. * Use a standard and readable PEG syntax as a DSL, not a bunch of templates that hide the parser in noise. * But you can use expression templates if you want, as parsers are all available as such. Pegged is implemented as an expression template, and what's good for the library writer is sure OK for the user too. * Some useful additional operators are there too: a way to discard matches (thus dumping them from the parse tree), to push captures on a stack, to accept matches that are equal to another match * Adding new parsers is easy. * Grammars are composable: you can put different `mixin(grammar(rules));` in a module and then grammars and rules can refer to one another. That way, you can have utility grammars providing their functionalities to other grammars. * That's why Pegged comes with some pre-defined grammars (JSON, etc). * Grammars can be dumped in a file to create a D module. More advanced features, outside the standard PEG perimeter are there to bring more power in the mix: * Parametrized rules: `List(E, Sep) <- E (Sep E)*` is possible. The previous rule defines a parametrized parser taking two other parsers (namely, `E` and `Sep`) to match a `Sep`-separated list of `E`'s. * Named captures: any parser can be named with the `=` operator. The parse tree generated by the parser (so, also its matches) is delivered to the user in the output. Other parsers in the grammar see the named captures too. * Semantic actions can be added to any rule in a grammar. Once a rule has matched, its associated action is called on the rule output and passed as final result to other parsers further up the grammar. Do what you want to the parse tree. If the passed actions are delegates, they can access external variables. PhilippeBy the way, bootstrap.d seems to fail to build at the moment: ../pegged/utils/bootstrap.d(1433): found ':' when expecting ')' following template argument list ../pegged/utils/bootstrap.d(1433): members expected ../pegged/utils/bootstrap.d(1433): { } expected following aggregate declaration ../pegged/utils/bootstrap.d(1433): semicolon expected, not '!' ../pegged/utils/bootstrap.d(1433): Declaration expected, not '!' ../pegged/utils/bootstrap.d(1466): unrecognized declaration -- - Alex
Mar 11 2012
On 11-03-2012 16:02, Alex Rønne Petersen wrote:On 11-03-2012 00:28, Philippe Sigaud wrote:Also, I have sent a pull request to fix the build on 64-bit: https://github.com/PhilippeSigaud/Pegged/pull/1 -- - AlexHello, I created a new Github project, Pegged, a Parsing Expression Grammar (PEG) generator in D. https://github.com/PhilippeSigaud/Pegged docs: https://github.com/PhilippeSigaud/Pegged/wiki PEG: http://en.wikipedia.org/wiki/Parsing_expression_grammar The idea is to give the generator a PEG with the standard syntax. From this grammar definition, a set of related parsers will be created, to be used at runtime or compile time. Usage ----- To use Pegged, just call the `grammar` function with a PEG and mix it in. For example: import pegged.grammar; mixin(grammar(" Expr <- Factor AddExpr* AddExpr <- ('+'/'-') Factor Factor <- Primary MulExpr* MulExpr <- ('*'/'/') Primary Primary <- Parens / Number / Variable / '-' Primary Parens <- '(' Expr ')' Number <~ [0-9]+ Variable <- Identifier ")); This creates the `Expr`, `AddExpr`, `Factor` (and so on) parsers for basic arithmetic expressions with operator precedence ('*' and '/' bind stronger than '+' or '-'). `Identifier` is a pre-defined parser recognizing your basic C-style identifier. Recursive or mutually recursive rules are OK (no left recursion for now). To use a parser, use the `.parse` method. It will return a parse tree containing the calls to the different rules: // Parsing at compile-time: enum parseTree1 = Expr.parse("1 + 2 - (3*x-5)*6"); pragma(msg, parseTree1.capture); writeln(parseTree1); // And at runtime too: auto parseTree2 = Expr.parse(" 0 + 123 - 456 "); assert(parseTree2.capture == ["0", "+", "123", "-", "456"]); Features -------- * The complete set of PEG operators are implemented * Pegged can parse its input at compile time and generate a complete parse tree at compile time. In a word: compile-time string (read: D code) transformation and generation. * You can parse at runtime also, you lucky you. * Use a standard and readable PEG syntax as a DSL, not a bunch of templates that hide the parser in noise. * But you can use expression templates if you want, as parsers are all available as such. Pegged is implemented as an expression template, and what's good for the library writer is sure OK for the user too. * Some useful additional operators are there too: a way to discard matches (thus dumping them from the parse tree), to push captures on a stack, to accept matches that are equal to another match * Adding new parsers is easy. * Grammars are composable: you can put different `mixin(grammar(rules));` in a module and then grammars and rules can refer to one another. That way, you can have utility grammars providing their functionalities to other grammars. * That's why Pegged comes with some pre-defined grammars (JSON, etc). * Grammars can be dumped in a file to create a D module. More advanced features, outside the standard PEG perimeter are there to bring more power in the mix: * Parametrized rules: `List(E, Sep) <- E (Sep E)*` is possible. The previous rule defines a parametrized parser taking two other parsers (namely, `E` and `Sep`) to match a `Sep`-separated list of `E`'s. * Named captures: any parser can be named with the `=` operator. The parse tree generated by the parser (so, also its matches) is delivered to the user in the output. Other parsers in the grammar see the named captures too. * Semantic actions can be added to any rule in a grammar. Once a rule has matched, its associated action is called on the rule output and passed as final result to other parsers further up the grammar. Do what you want to the parse tree. If the passed actions are delegates, they can access external variables. PhilippeBy the way, bootstrap.d seems to fail to build at the moment: .../pegged/utils/bootstrap.d(1433): found ':' when expecting ')' following template argument list .../pegged/utils/bootstrap.d(1433): members expected .../pegged/utils/bootstrap.d(1433): { } expected following aggregate declaration .../pegged/utils/bootstrap.d(1433): semicolon expected, not '!' .../pegged/utils/bootstrap.d(1433): Declaration expected, not '!' .../pegged/utils/bootstrap.d(1466): unrecognized declaration
Mar 11 2012
Also, I have sent a pull request to fix the build on 64-bit:https://github.com/PhilippeSigaud/Pegged/pull/1 Merged, thanks!
Mar 11 2012
By the way, bootstrap.d seems to fail to build at the moment: ../pegged/utils/bootstrap.d(1433): found ':' when expecting ')' followingtemplate argument list../pegged/utils/bootstrap.d(1433): members expected ../pegged/utils/bootstrap.d(1433): { } expected following aggregatedeclaration../pegged/utils/bootstrap.d(1433): semicolon expected, not '!' ../pegged/utils/bootstrap.d(1433): Declaration expected, not '!' ../pegged/utils/bootstrap.d(1466): unrecognized declarationHmm, it compiled for me a few hours ago. I'll see if I broke something while pushing. I'll also try to make the whole grammar-modification process easier. Since users can modify Pegged own grammar, I might as well make that fluid and easy to do. I'll put the Pegged grammar as a string in a separate module and create a function that does the rest: modify the string, it will recompile the entire grammar for you.
Mar 11 2012
On 11-03-2012 18:17, Philippe Sigaud wrote:> By the way, bootstrap.d seems to fail to build at the moment: > > ../pegged/utils/bootstrap.d(1433): found ':' when expecting ')' following template argument list > ../pegged/utils/bootstrap.d(1433): members expected > ../pegged/utils/bootstrap.d(1433): { } expected following aggregate declaration > ../pegged/utils/bootstrap.d(1433): semicolon expected, not '!' > ../pegged/utils/bootstrap.d(1433): Declaration expected, not '!' > ../pegged/utils/bootstrap.d(1466): unrecognized declaration Hmm, it compiled for me a few hours ago. I'll see if I broke something while pushing. I'll also try to make the whole grammar-modification process easier. Since users can modify Pegged own grammar, I might as well make that fluid and easy to do. I'll put the Pegged grammar as a string in a separate module and create a function that does the rest: modify the string, it will recompile the entire grammar for you.Is bootstrap.d currently essential to actually use Pegged? -- - Alex
Mar 11 2012
On 11-03-2012 00:28, Philippe Sigaud wrote:Hello, I created a new Github project, Pegged, a Parsing Expression Grammar (PEG) generator in D. https://github.com/PhilippeSigaud/Pegged docs: https://github.com/PhilippeSigaud/Pegged/wiki PEG: http://en.wikipedia.org/wiki/Parsing_expression_grammar The idea is to give the generator a PEG with the standard syntax. From this grammar definition, a set of related parsers will be created, to be used at runtime or compile time. Usage ----- To use Pegged, just call the `grammar` function with a PEG and mix it in. For example: import pegged.grammar; mixin(grammar(" Expr <- Factor AddExpr* AddExpr <- ('+'/'-') Factor Factor <- Primary MulExpr* MulExpr <- ('*'/'/') Primary Primary <- Parens / Number / Variable / '-' Primary Parens <- '(' Expr ')' Number <~ [0-9]+ Variable <- Identifier ")); This creates the `Expr`, `AddExpr`, `Factor` (and so on) parsers for basic arithmetic expressions with operator precedence ('*' and '/' bind stronger than '+' or '-'). `Identifier` is a pre-defined parser recognizing your basic C-style identifier. Recursive or mutually recursive rules are OK (no left recursion for now). To use a parser, use the `.parse` method. It will return a parse tree containing the calls to the different rules: // Parsing at compile-time: enum parseTree1 = Expr.parse("1 + 2 - (3*x-5)*6"); pragma(msg, parseTree1.capture); writeln(parseTree1); // And at runtime too: auto parseTree2 = Expr.parse(" 0 + 123 - 456 "); assert(parseTree2.capture == ["0", "+", "123", "-", "456"]); Features -------- * The complete set of PEG operators are implemented * Pegged can parse its input at compile time and generate a complete parse tree at compile time. In a word: compile-time string (read: D code) transformation and generation. * You can parse at runtime also, you lucky you. * Use a standard and readable PEG syntax as a DSL, not a bunch of templates that hide the parser in noise. * But you can use expression templates if you want, as parsers are all available as such. Pegged is implemented as an expression template, and what's good for the library writer is sure OK for the user too. * Some useful additional operators are there too: a way to discard matches (thus dumping them from the parse tree), to push captures on a stack, to accept matches that are equal to another match * Adding new parsers is easy. * Grammars are composable: you can put different `mixin(grammar(rules));` in a module and then grammars and rules can refer to one another. That way, you can have utility grammars providing their functionalities to other grammars. * That's why Pegged comes with some pre-defined grammars (JSON, etc). * Grammars can be dumped in a file to create a D module. More advanced features, outside the standard PEG perimeter are there to bring more power in the mix: * Parametrized rules: `List(E, Sep) <- E (Sep E)*` is possible. The previous rule defines a parametrized parser taking two other parsers (namely, `E` and `Sep`) to match a `Sep`-separated list of `E`'s. * Named captures: any parser can be named with the `=` operator. The parse tree generated by the parser (so, also its matches) is delivered to the user in the output. Other parsers in the grammar see the named captures too. * Semantic actions can be added to any rule in a grammar. Once a rule has matched, its associated action is called on the rule output and passed as final result to other parsers further up the grammar. Do what you want to the parse tree. If the passed actions are delegates, they can access external variables. PhilippeHm, since ' is used in the grammar of Pegged, how do I express it in my grammar spec? Is there a predefined rule for it, or is \' supposed to work? -- - Alex
Mar 11 2012
On 12-03-2012 02:27, Alex Rønne Petersen wrote:On 11-03-2012 00:28, Philippe Sigaud wrote:Ah, Quote it is. I also noticed there is DoubleQuote. Is " used for anything in the Pegged grammar? -- - AlexHello, I created a new Github project, Pegged, a Parsing Expression Grammar (PEG) generator in D. https://github.com/PhilippeSigaud/Pegged docs: https://github.com/PhilippeSigaud/Pegged/wiki PEG: http://en.wikipedia.org/wiki/Parsing_expression_grammar The idea is to give the generator a PEG with the standard syntax. From this grammar definition, a set of related parsers will be created, to be used at runtime or compile time. Usage ----- To use Pegged, just call the `grammar` function with a PEG and mix it in. For example: import pegged.grammar; mixin(grammar(" Expr <- Factor AddExpr* AddExpr <- ('+'/'-') Factor Factor <- Primary MulExpr* MulExpr <- ('*'/'/') Primary Primary <- Parens / Number / Variable / '-' Primary Parens <- '(' Expr ')' Number <~ [0-9]+ Variable <- Identifier ")); This creates the `Expr`, `AddExpr`, `Factor` (and so on) parsers for basic arithmetic expressions with operator precedence ('*' and '/' bind stronger than '+' or '-'). `Identifier` is a pre-defined parser recognizing your basic C-style identifier. Recursive or mutually recursive rules are OK (no left recursion for now). To use a parser, use the `.parse` method. It will return a parse tree containing the calls to the different rules: // Parsing at compile-time: enum parseTree1 = Expr.parse("1 + 2 - (3*x-5)*6"); pragma(msg, parseTree1.capture); writeln(parseTree1); // And at runtime too: auto parseTree2 = Expr.parse(" 0 + 123 - 456 "); assert(parseTree2.capture == ["0", "+", "123", "-", "456"]); Features -------- * The complete set of PEG operators are implemented * Pegged can parse its input at compile time and generate a complete parse tree at compile time. In a word: compile-time string (read: D code) transformation and generation. * You can parse at runtime also, you lucky you. * Use a standard and readable PEG syntax as a DSL, not a bunch of templates that hide the parser in noise. * But you can use expression templates if you want, as parsers are all available as such. Pegged is implemented as an expression template, and what's good for the library writer is sure OK for the user too. * Some useful additional operators are there too: a way to discard matches (thus dumping them from the parse tree), to push captures on a stack, to accept matches that are equal to another match * Adding new parsers is easy. * Grammars are composable: you can put different `mixin(grammar(rules));` in a module and then grammars and rules can refer to one another. That way, you can have utility grammars providing their functionalities to other grammars. * That's why Pegged comes with some pre-defined grammars (JSON, etc). * Grammars can be dumped in a file to create a D module. More advanced features, outside the standard PEG perimeter are there to bring more power in the mix: * Parametrized rules: `List(E, Sep) <- E (Sep E)*` is possible. The previous rule defines a parametrized parser taking two other parsers (namely, `E` and `Sep`) to match a `Sep`-separated list of `E`'s. * Named captures: any parser can be named with the `=` operator. The parse tree generated by the parser (so, also its matches) is delivered to the user in the output. Other parsers in the grammar see the named captures too. * Semantic actions can be added to any rule in a grammar. Once a rule has matched, its associated action is called on the rule output and passed as final result to other parsers further up the grammar. Do what you want to the parse tree. If the passed actions are delegates, they can access external variables. PhilippeHm, since ' is used in the grammar of Pegged, how do I express it in my grammar spec? Is there a predefined rule for it, or is \' supposed to work?
Mar 11 2012
On 03/10/2012 03:28 PM, Philippe Sigaud wrote:Hello, I created a new Github project, Pegged, a Parsing Expression Grammar (PEG) generator in D. https://github.com/PhilippeSigaud/Pegged docs: https://github.com/PhilippeSigaud/Pegged/wikiJust WOW! Nice to have on your WIKI would be a EBNF to PEG sheet. Wirth EBNF Pegged A = BC. A <- B C A = B|C. A <- C / C A = [B]. A <- B? A = {B}. A <- B* Having EBNF expressed in Pegged ... EBNF <- Procuction+ Production <- Identifier '=' Expression '.' Expression <- Term ( '|' Term)* Term <- Factor Factor* Factor <- Identifier / Literal / '[' Expression ']' / '{' Expression }'/ '(' Expression ')' lowerCase <- [a-z] upperCase <- [A-Z] Identifier <- (lowerCase / upperCase) (lowerCase / upperCase)* Literal <- ("'" .+ "'" / '"' .+ '"') Due to the fact that EBNF can be expressed in EBNF it should be possible to parse an arbitrary EBNF file and generate PEG output. What do you think ? BTW.. Is my EBNF PEG description correct ? TIA Bjoern
Mar 12 2012
On 12.03.2012 16:43, bls wrote:On 03/10/2012 03:28 PM, Philippe Sigaud wrote:Maybe A <- B / C. And even then it's not exactly equivalent if the grammar was ambiguous. Imagine: B <- a, C <- aa -- Dmitry OlshanskyHello, I created a new Github project, Pegged, a Parsing Expression Grammar (PEG) generator in D. https://github.com/PhilippeSigaud/Pegged docs: https://github.com/PhilippeSigaud/Pegged/wikiJust WOW! Nice to have on your WIKI would be a EBNF to PEG sheet. Wirth EBNF Pegged A = BC. A <- B C A = B|C. A <- C / C
Mar 13 2012
On 03/13/2012 04:28 AM, Dmitry Olshansky wrote:On 12.03.2012 16:43, bls wrote:PEG is pretty new to me. Can you elaborate a bit ?On 03/10/2012 03:28 PM, Philippe Sigaud wrote:Maybe A <- B / C. And even then it's not exactly equivalent if the grammar was ambiguous. Imagine: B <- a, C <- aaHello, I created a new Github project, Pegged, a Parsing Expression Grammar (PEG) generator in D. https://github.com/PhilippeSigaud/Pegged docs: https://github.com/PhilippeSigaud/Pegged/wikiJust WOW! Nice to have on your WIKI would be a EBNF to PEG sheet. Wirth EBNF Pegged A = BC. A <- B C A = B|C. A <- C / CMy mistake.. cleaned up stuff.. Pegged Wirth EBNF Sequence A <- B C A = BC. B or C A <- B / C A = B|C. Zero or one B A <- B? A = [B]. Zero or more Bs A <- B* A = {B}. One or more Bs A <- B+ Not available PEG description of EBNF EBNF <- Procuction+ Production <- Identifier '=' Expression '.' Expression <- Term ( '|' Term)* Term <- Factor Factor* Factor <- Identifier / Literal / '[' Expression ']' / '{' Expression '}' / '(' Expression ')' lowerCase <- [a-z] upperCase <- [A-Z] Identifier <- (lowerCase / upperCase) (lowerCase / upperCase)* Literal <- ("'" .+ "'") / ('"' .+ '"') Still not sure if this is correct. Especially : Term <- Factor Factor* Another thing I never really understand is the "production" order, In other words : Why not top down .. Start : lowerCase <- [a-z] upperCase <- [A-Z] Identifier <- (lowerCase / upperCase) (lowerCase / upperCase)* .... End : EBNF <- Procuction+ where End is Root.. TIA, Bjoern
Mar 12 2012
On 12.03.2012 17:45, bls wrote:On 03/13/2012 04:28 AM, Dmitry Olshansky wrote:PEG defines order of alternatives, that is pretty much like a top-down recursive descent parser would parse it. Alternatives are tried from left to right, if first one fails, it tries next and so on. In an example I give B is always picked first and so C is never ever looked at. Somewhat less artificial example: Literal <- IntL| FloatL FloatL <- [0-9]+(.[0-9]+)? IntL <- [0-9]+ If you change it to: Literal <- FloatL| IntL then integer literals would get parsed as floating point.On 12.03.2012 16:43, bls wrote:PEG is pretty new to me. Can you elaborate a bit ?On 03/10/2012 03:28 PM, Philippe Sigaud wrote:Maybe A <- B / C. And even then it's not exactly equivalent if the grammar was ambiguous. Imagine: B <- a, C <- aaHello, I created a new Github project, Pegged, a Parsing Expression Grammar (PEG) generator in D. https://github.com/PhilippeSigaud/Pegged docs: https://github.com/PhilippeSigaud/Pegged/wikiJust WOW! Nice to have on your WIKI would be a EBNF to PEG sheet. Wirth EBNF Pegged A = BC. A <- B C A = B|C. A <- C / CWhy not: Identifier <- [a-zA-Z]+My mistake.. cleaned up stuff.. Pegged Wirth EBNF Sequence A <- B C A = BC. B or C A <- B / C A = B|C. Zero or one B A <- B? A = [B]. Zero or more Bs A <- B* A = {B}. One or more Bs A <- B+ Not available PEG description of EBNF EBNF <- Procuction+ Production <- Identifier '=' Expression '.' Expression <- Term ( '|' Term)* Term <- Factor Factor* Factor <- Identifier / Literal / '[' Expression ']' / '{' Expression '}' / '(' Expression ')' lowerCase <- [a-z] upperCase <- [A-Z] Identifier <- (lowerCase / upperCase) (lowerCase / upperCase)*Literal <- ("'" .+ "'") / ('"' .+ '"')This needs escaping. Plain '.+' in pattern asks for trouble 99% of time.Still not sure if this is correct. Especially : Term <- Factor Factor* Another thing I never really understand is the "production" order, In other words : Why not top down .. Start : lowerCase <- [a-z] upperCase <- [A-Z] Identifier <- (lowerCase / upperCase) (lowerCase / upperCase)*.... End : EBNF <- Procuction+ where End is Root..In fact grammars are usually devised the other way around, e.g. Start: Program -> ... Ehm... what the whole program is exactly ? Ok, let it be Declaration* for now. What kind of declarations do we have? ... and so on. Latter grammars get tweaked and extended numerous times. At any rate production order has no effect on the grammar, it's still the same. The only thing of importance is what non-terminal considered final (or start if you are LL-centric).TIA, Bjoern-- Dmitry Olshansky
Mar 13 2012
On 13-03-2012 17:17, Dmitry Olshansky wrote:On 12.03.2012 17:45, bls wrote:That was an illustrative example from the Pegged docs. But yeah, you should just use a range; reads nicer.On 03/13/2012 04:28 AM, Dmitry Olshansky wrote:PEG defines order of alternatives, that is pretty much like a top-down recursive descent parser would parse it. Alternatives are tried from left to right, if first one fails, it tries next and so on. In an example I give B is always picked first and so C is never ever looked at. Somewhat less artificial example: Literal <- IntL| FloatL FloatL <- [0-9]+(.[0-9]+)? IntL <- [0-9]+ If you change it to: Literal <- FloatL| IntL then integer literals would get parsed as floating point.On 12.03.2012 16:43, bls wrote:PEG is pretty new to me. Can you elaborate a bit ?On 03/10/2012 03:28 PM, Philippe Sigaud wrote:Maybe A <- B / C. And even then it's not exactly equivalent if the grammar was ambiguous. Imagine: B <- a, C <- aaHello, I created a new Github project, Pegged, a Parsing Expression Grammar (PEG) generator in D. https://github.com/PhilippeSigaud/Pegged docs: https://github.com/PhilippeSigaud/Pegged/wikiJust WOW! Nice to have on your WIKI would be a EBNF to PEG sheet. Wirth EBNF Pegged A = BC. A <- B C A = B|C. A <- C / CWhy not: Identifier <- [a-zA-Z]+My mistake.. cleaned up stuff.. Pegged Wirth EBNF Sequence A <- B C A = BC. B or C A <- B / C A = B|C. Zero or one B A <- B? A = [B]. Zero or more Bs A <- B* A = {B}. One or more Bs A <- B+ Not available PEG description of EBNF EBNF <- Procuction+ Production <- Identifier '=' Expression '.' Expression <- Term ( '|' Term)* Term <- Factor Factor* Factor <- Identifier / Literal / '[' Expression ']' / '{' Expression '}' / '(' Expression ')' lowerCase <- [a-z] upperCase <- [A-Z] Identifier <- (lowerCase / upperCase) (lowerCase / upperCase)*-- - AlexLiteral <- ("'" .+ "'") / ('"' .+ '"')This needs escaping. Plain '.+' in pattern asks for trouble 99% of time.Still not sure if this is correct. Especially : Term <- Factor Factor* Another thing I never really understand is the "production" order, In other words : Why not top down .. Start : lowerCase <- [a-z] upperCase <- [A-Z] Identifier <- (lowerCase / upperCase) (lowerCase / upperCase)*.... End : EBNF <- Procuction+ where End is Root..In fact grammars are usually devised the other way around, e.g. Start: Program -> ... Ehm... what the whole program is exactly ? Ok, let it be Declaration* for now. What kind of declarations do we have? ... and so on. Latter grammars get tweaked and extended numerous times. At any rate production order has no effect on the grammar, it's still the same. The only thing of importance is what non-terminal considered final (or start if you are LL-centric).TIA, Bjoern
Mar 13 2012
On Tue, Mar 13, 2012 at 18:05, Alex R=C3=B8nne Petersen <xtzgzorex gmail.co= m> wrote:ldThat was an illustrative example from the Pegged docs. But yeah, you shou=lowerCase <- [a-z] upperCase <- [A-Z] Identifier <- (lowerCase / upperCase) (lowerCase / upperCase)*Why not: Identifier <- [a-zA-Z]+just use a range; reads nicer.The docs are for teaching PEG :) (btw, it's the docs describe C-like identifiers, that's why I chose a longer approach) It's always this 'tension', between inlining and refactoring. [a-zA-Z]+ is shorter and more readable. But If you decide to extend your grammar to UTF-32, it'd be easier to just change the 'letter' rule.
Mar 13 2012
On Tue, Mar 13, 2012 at 17:17, Dmitry Olshansky <dmitry.olsh gmail.com> wro= te:PEG defines order of alternatives, that is pretty much like a top-down recursive descent parser would parse it. Alternatives are tried from left=toright, if first one fails, it tries next and so on.Yes.In an example I give B is always picked first and so C is never ever look=edat.That's one of the caveats on PEG. That and greedy operators. 'a*a' never succeeds because 'a*' consumes all the available a's.This is an example were PEG would munch anything till the end, and fail (since """ is not found in an empty string). The standard PEG way to do that is (!EndMarker .)+ EndMarker It's a common enough pattern I tend to define a parameterized rule for that= : Until(Expr) <- (!Expr .)* Expr Gotta love parametrized rules! 15' to code, a neverending stream of joy.Literal <- ("'" .+ "'") / ('"' .+ '"')This needs escaping. Plain '.+' in pattern asks for trouble 99% of time.In fact grammars are usually devised the other way around, e.g. Start: =C2=A0Program -> ... Ehm... what the whole program is exactly ? Ok, let it be Declaration* for now. What kind of declarations do we have? ... and so on. Latter grammars get tweaked and extended numerous times. At any rate production order has no effect on the grammar, it's still the same. The only thing of importance is what non-terminal considered final =(orstart if you are LL-centric).Yes. The PEG standard seem to be that the first rule is the start (root) rule. Pegged let you decide which rule you use for a start. The rest is automatic. I might change that.
Mar 13 2012
On 14.03.2012 2:49, Philippe Sigaud wrote:On Tue, Mar 13, 2012 at 17:17, Dmitry Olshansky<dmitry.olsh gmail.com> wrote:Hey, wait, I thought there has to be backtrack here, i.e. when second 'a' finally fails?PEG defines order of alternatives, that is pretty much like a top-down recursive descent parser would parse it. Alternatives are tried from left to right, if first one fails, it tries next and so on.Yes.In an example I give B is always picked first and so C is never ever looked at.That's one of the caveats on PEG. That and greedy operators. 'a*a' never succeeds because 'a*' consumes all the available a's.Nice!This is an example were PEG would munch anything till the end, and fail (since """ is not found in an empty string). The standard PEG way to do that is (!EndMarker .)+ EndMarker It's a common enough pattern I tend to define a parameterized rule for that: Until(Expr)<- (!Expr .)* Expr Gotta love parametrized rules! 15' to code, a neverending stream of joy.Literal<- ("'" .+ "'") / ('"' .+ '"')This needs escaping. Plain '.+' in pattern asks for trouble 99% of time.-- Dmitry OlshanskyIn fact grammars are usually devised the other way around, e.g. Start: Program -> ... Ehm... what the whole program is exactly ? Ok, let it be Declaration* for now. What kind of declarations do we have? ... and so on. Latter grammars get tweaked and extended numerous times. At any rate production order has no effect on the grammar, it's still the same. The only thing of importance is what non-terminal considered final (or start if you are LL-centric).Yes. The PEG standard seem to be that the first rule is the start (root) rule. Pegged let you decide which rule you use for a start. The rest is automatic. I might change that.
Mar 14 2012
On Wed, Mar 14, 2012 at 10:48, Dmitry Olshansky <dmitry.olsh gmail.com> wrote:PEG only do local backtracking in 'or' choices, not in sequences. I think that's because the original author was dealing with packrat parsing and its O(input-size) guarantee. I remember trying compile-time backtracking in another similar library in D 1-2 years ago and getting lots of pb. I might add that in Pegged, but I don't know the consequences. How do you do that in std.regex? (nice article btw, I learnt some about regexes)That's one of the caveats on PEG. That and greedy operators. 'a*a' never succeeds because 'a*' consumes all the available a's.Hey, wait, I thought there has to be backtrack here, i.e. when second 'a' finally fails?
Mar 16 2012
On 17.03.2012 10:59, Philippe Sigaud wrote:On Wed, Mar 14, 2012 at 10:48, Dmitry Olshansky<dmitry.olsh gmail.com> wrote:Ok, let's agree on fact that semantically a* is : As <- a As / a and a*? is this: As <- a / a As Now that's local ;)PEG only do local backtracking in 'or' choices, not in sequences. I think that's because the original author was dealing with packrat parsing and its O(input-size) guarantee.That's one of the caveats on PEG. That and greedy operators. 'a*a' never succeeds because 'a*' consumes all the available a's.Hey, wait, I thought there has to be backtrack here, i.e. when second 'a' finally fails?I remember trying compile-time backtracking in another similar library in D 1-2 years ago and getting lots of pb. I might add that in Pegged, but I don't know the consequences. How do you do that in std.regex?There are two fundamental ways to get around the problem, the easiest to implement is to use a stack to record a position in input + number of alternative/production (I call it a PC - program counter) every time there is a "fork" like that. Then when the current path ultimately fails pop stack, reset input and go over again until you either match or empty the stack. That's how to avoid deeeep recursion that happens here. The problematic thing now is combinatorial explosion of a* a* a* .... //simplified, but you get the idea The trick to avoid this sort of crap is to 1) agree that whatever matches first has higher priority (left-most match) that's a simple disambiguation rule. 2) record what positions + pc you place on stack e.g. use a set, or in fact, a bitmap to push every unique combination (pc,index) only once. Voila! Now you have a linear worst-case algorithm with (M*N) complexity where M is number of all possible PCs, and N is number of all possible indexes in input. Now if we put aside all crap talk about mathematical purity and monads, and you name it, a packrat is essentially this, a dynamic programming trick applied to recursive top-down parsing. The trick could be extended even more to avoid extra checks on input characters, but the essence is this "memoization".(nice article btw, I learnt some about regexes)Thanks, I hope it makes them more popular. Might as well keep me busy fixing bugs :) -- Dmitry Olshansky
Mar 17 2012
On Sat, Mar 17, 2012 at 10:09, Dmitry Olshansky <dmitry.olsh gmail.com> wrote:Ok, let's agree on fact that semantically a* is : As <- a As / a and a*? is this: As <- a / a As Now that's local ;)It's local, yes. But the pb is with Expr <- A* B C D, when D fails. PEG sequences don't backtrack.(snip) Thanks for the explanations. Does std.regex implement this?I remember trying compile-time backtracking in another similar library in D 1-2 years ago and getting lots of pb. I might add that in Pegged, but I don't know the consequences. How do you do that in std.regex?There are two fundamental ways to get around the problemNow if we put aside all crap talk about mathematical purity and monads, and you name it, a packrat is essentially this, a dynamic programming trick applied to recursive top-down parsing. The trick could be extended even more to avoid extra checks on input characters, but the essence is this "memoization".I plan to store indices anyway. I might add this in the future. A read something on using PEGs to parse Java and C and there was no need to do a complete memoization: storing the last parse result as a temporary seemed to be enough.As people said on Reddit, you should make more whooping sounds about CT-regex. That was what wowed me and pushed me to tackle CT-parsing again.(nice article btw, I learnt some about regexes)Thanks, I hope it makes them more popular. Might as well keep me busy fixing bugs :)
Mar 17 2012
On 17.03.2012 18:11, Philippe Sigaud wrote:On Sat, Mar 17, 2012 at 10:09, Dmitry Olshansky<dmitry.olsh gmail.com> wrote:I'd argue they do. As I see it as: Expr <- As B C D / B C D As <- A / A As (or use an epsilon production for As, is it allowed in pegged ?)Ok, let's agree on fact that semantically a* is : As<- a As / a and a*? is this: As<- a / a As Now that's local ;)It's local, yes. But the pb is with Expr<- A* B C D, when D fails. PEG sequences don't backtrack.It does the second way that I haven't told about, and had hard time to implement in C-T :) And the simple version of what I presented (i.e. stack stuff) is used in CT-regex and as fallback in R-T. The problem with doing a bitmap memoization is that regex often used to _search_ things in large inputs. Some compromise and/or thresholds have to be estimated and found. Parsing on the contrary guaranties that the whole input is used or in error, hence bitmap is not wasted.(snip) Thanks for the explanations. Does std.regex implement this?I remember trying compile-time backtracking in another similar library in D 1-2 years ago and getting lots of pb. I might add that in Pegged, but I don't know the consequences. How do you do that in std.regex?There are two fundamental ways to get around the problemWell even straight no-memorization is somehow enough, it's just a way to ensure no extra work is done. C & Java are not much of backtracky grammars.Now if we put aside all crap talk about mathematical purity and monads, and you name it, a packrat is essentially this, a dynamic programming trick applied to recursive top-down parsing. The trick could be extended even more to avoid extra checks on input characters, but the essence is this "memoization".I plan to store indices anyway. I might add this in the future. A read something on using PEGs to parse Java and C and there was no need to do a complete memoization: storing the last parse result as a temporary seemed to be enough.It's just I'm also well aware of how much luck it (still) takes to fly ;) The asserts that compare C-T vs R-T go off too often for my taste, other then this the memory usage during compile is too high. I recall once dmd had GC re-enabled for brief period, dmd used no more then ~64Mb doing real crazy ct-regex stuff. OK, back to C-T parsing, I have one crazy idea that I can't get away from - add operator precedence grammar into the mix. From what I observe it should integrate into PEG smoothly. Then it would make "military-grade" hybrid that uses operator precedence parsing of expressions and the like. Few more tricks and it may beat some existing parser generators. See this post, where I tried to describe that idea early on: http://www.digitalmars.com/webnews/newsgroups.php?art_group=digitalmars.D&article_id=159562 I might catch spare time to go about adding it myself, the only tricky thing is to embed plain semantic actions, as AST generation would be more or less the same. -- Dmitry OlshanskyAs people said on Reddit, you should make more whooping sounds about CT-regex. That was what wowed me and pushed me to tackle CT-parsing again.(nice article btw, I learnt some about regexes)Thanks, I hope it makes them more popular. Might as well keep me busy fixing bugs :)
Mar 17 2012
On Sat, Mar 17, 2012 at 15:48, Dmitry Olshansky <dmitry.olsh gmail.com> wro= te:That's what people doing Regex-to-PEG translations do, yes. But it's not the spontaneous behavior of A* B C D in PEG. But that means I could add a switch to transform the expressions that way.PEG sequences don't backtrack.I'd argue they do. As I see it as: Expr <- As B C D / B C D As <- A / A As(or use an epsilon production for As, is it allowed in pegged ?)I called it 'Eps', it's a predefined parser that always matches and consumes nothing. I used the greek epsilon at the beginning (=CE=B5) but thought that many people would shout at this :)OK, back to C-T parsing, I have one crazy idea that I can't get away from=-add operator precedence grammar into the mix. From what I observe it shou=ldintegrate into PEG smoothly. Then it would make "military-grade" hybrid t=hatuses operator precedence parsing of expressions and the like. Few more tricks and it may beat some existing parser generators. See this post, where I tried to describe that idea early on: http://www.digitalmars.com/webnews/newsgroups.php?art_group=3Ddigitalmars=.D&article_id=3D159562 I remember reading this. But I don't feel I'm up to it for now.I might catch spare time to go about adding it myself, the only tricky th=ingis to embed plain semantic actions, as AST generation would be more or le=ssthe same.Cool!
Mar 17 2012
On 12-03-2012 13:43, bls wrote:On 03/10/2012 03:28 PM, Philippe Sigaud wrote:The thing is, PEG cannot represent an ambiguous grammar. It is fully ordered, so it's just plain impossible. That's not to say you can't turn an EBNF grammar into PEG, but it won't necessarily be useful in all cases. -- - AlexHello, I created a new Github project, Pegged, a Parsing Expression Grammar (PEG) generator in D. https://github.com/PhilippeSigaud/Pegged docs: https://github.com/PhilippeSigaud/Pegged/wikiJust WOW! Nice to have on your WIKI would be a EBNF to PEG sheet. Wirth EBNF Pegged A = BC. A <- B C A = B|C. A <- C / C A = [B]. A <- B? A = {B}. A <- B* Having EBNF expressed in Pegged ... EBNF <- Procuction+ Production <- Identifier '=' Expression '.' Expression <- Term ( '|' Term)* Term <- Factor Factor* Factor <- Identifier / Literal / '[' Expression ']' / '{' Expression }'/ '(' Expression ')' lowerCase <- [a-z] upperCase <- [A-Z] Identifier <- (lowerCase / upperCase) (lowerCase / upperCase)* Literal <- ("'" .+ "'" / '"' .+ '"') Due to the fact that EBNF can be expressed in EBNF it should be possible to parse an arbitrary EBNF file and generate PEG output. What do you think ? BTW.. Is my EBNF PEG description correct ? TIA Bjoern
Mar 13 2012
On Mon, Mar 12, 2012 at 13:43, bls <bizprac orange.fr> wrote:Just WOW!Thanks! Don't be too excited, it's still quite slow as a parser. But that is a fun project :)Nice to have on your WIKI would be a EBNF to PEG sheet. Wirth EBNF =C2=A0 =C2=A0 =C2=A0Pegged A =3D BC. =C2=A0 =C2=A0 =C2=A0 =C2=A0 A <- B C A =3D B|C. =C2=A0 =C2=A0 =C2=A0 =C2=A0A <- C / C A =3D [B]. =C2=A0 =C2=A0 =C2=A0 =C2=A0A <- B? A =3D {B}. =C2=A0 =C2=A0 =C2=A0 =C2=A0A <- B*fact is, I don't know EBNF that much. I basically learned everything I know about parsing or grammars while coding Pegged in February :) I probably made every mistakes in the book. Hey, it's a github public wiki, I guess you can create a new page?
Mar 13 2012
I am impressed. That's a really nice showcase for the D compile time features. Can I use PEG to parse languages like python and haskell where indention matters without preprocessing? Will you make it work with input ranges of dchar? So that I can easily plug in some preprocessing steps?
Mar 13 2012
On Saturday, 10 March 2012 at 23:28:42 UTC, Philippe Sigaud wrote:* Grammars can be dumped in a file to create a D module.In reading the D spec I've seen a few instance where there are infered items such as auto for variables and various parameters in templates. I presume your D grammar will have to have some rules to be able to infer these as well. It seems like it would not be a big step to output what are the infered proper statements as the .capture output. Is that correct? I think that would be helpful in some cases to be able to view the fully expanded expression.
Mar 14 2012
On 03/15/2012 12:09 AM, Jay Norwood wrote:On Saturday, 10 March 2012 at 23:28:42 UTC, Philippe Sigaud wrote:This is not what a parser does. Resolving symbols and assigning types to expressions is part of semantic analysis.* Grammars can be dumped in a file to create a D module.In reading the D spec I've seen a few instance where there are infered items such as auto for variables and various parameters in templates. I presume your D grammar will have to have some rules to be able to infer these as well. It seems like it would not be a big step to output what are the infered proper statements as the .capture output. Is that correct? I think that would be helpful in some cases to be able to view the fully expanded expression.
Mar 15 2012
On Sun, 11 Mar 2012 00:28:42 +0100, Philippe Sigaud <philippe.sigaud gmail.com> wrote:Hello, I created a new Github project, Pegged, a Parsing Expression Grammar (PEG) generator in D. https://github.com/PhilippeSigaud/Pegged docs: https://github.com/PhilippeSigaud/Pegged/wiki PEG: http://en.wikipedia.org/wiki/Parsing_expression_grammar The idea is to give the generator a PEG with the standard syntax. From this grammar definition, a set of related parsers will be created, to be used at runtime or compile time.Just wanted to draw your attention on a pull request of mine. https://github.com/D-Programming-Language/dmd/pull/426 Using an added option (-M|Mf|Md) dmd will gather all mixin strings into a file. This is very helpful for better error messages and debugging.
Mar 22 2012
Hi Philippe, I wrote a Kate(the official text editor of KDE, also used in KDevelop) syntax highlighting file for Pegged: http://www.homo-nebulus.fr/dlang/pegged.xml Screenshot: http://imgur.com/sb3jFqs.png Of course for the syntax highlighting to work the grammar must be in a separate file and included like this in a .d file: mixin(grammar(import("Tales.pgd")));
Aug 07 2013