digitalmars.D - Anyone interested in a Spirit for D?
- Walter Bright (5/5) Oct 18 2006 Along the lines of Don's regexp template metaprograms, is anyone
- Pragma (18/26) Oct 18 2006 Now there's an idea!
- J Duncan (3/33) Oct 18 2006 Yeah Its a good idea, but my first thought was "is that even possible?"
- Walter Bright (15/30) Oct 18 2006 I think the operator overloading aspect of Spirit is only a minor part
- Bill Baxter (19/32) Oct 18 2006 Huh. Very interesting. Here's the example:
- Richard Koch (4/46) Oct 18 2006 all that is cool, but (i know i am the dummy here) readability as in bnf...
- Bill Baxter (15/27) Oct 18 2006 You mean this coco?
- Paolo Invernizzi (8/11) Oct 19 2006 As a Spirit user, I can tell that readability is one of the most
- Walter Bright (2/15) Oct 19 2006 You're probably right.
- Bill Baxter (46/50) Oct 18 2006 Now that would be useful I think.
- Walter Bright (7/25) Oct 18 2006 Yes, it would be. But there's a catastrophic problem with it. Spirit
- Bill Baxter (20/40) Oct 18 2006 But maybe you could allow the user to access those terminals via strings...
- BCS (57/73) Oct 18 2006 How about delegate literals for code snipits??
- Kristian (17/28) Oct 19 2006 =
- Walter Bright (1/1) Oct 19 2006 Not a bad idea.
- Jacques (45/57) Oct 19 2006 I actually did something similar to this when creating a generic PEG
- BCS (61/69) Oct 19 2006 it's conceivable that in D the above parser could just be created with:
- Karen Lanrap (7/10) Oct 19 2006 I do not see why transferring a tool into a library should reduce any
- Pragma (38/50) Oct 18 2006 From what I gather, that's the major benefit, other than a
- Bill Baxter (38/92) Oct 18 2006 Yes! Sounds like we're thinking along the same lines here. But if
- pragma (15/121) Oct 18 2006 Yes and no. The parser generator has a good deal of flexibility
Along the lines of Don's regexp template metaprograms, is anyone interested in a Spirit-like parser generator capability in D? http://spirit.sourceforge.net/ http://www.codeproject.com/useritems/spart.asp
Oct 18 2006
Walter Bright wrote:Along the lines of Don's regexp template metaprograms, is anyone interested in a Spirit-like parser generator capability in D? http://spirit.sourceforge.net/ http://www.codeproject.com/useritems/spart.aspNow there's an idea! Words of caution to follow: FWIW, I looked into doing this years ago, and didn't get to far. The biggest hurdle, aside from the limitations of templates at the time, was a lack of unary operators to override. In particular, not being able to override unary '*' and '!' caused some cosmetic problems. The only other major hangup I had was not having IFTI so I could instantiate templates transparently. This feature alone could close the gap on most of Spirit's useage of C++ templates. At a minimum, it means that a D programmer could get very close to the cosmetic appeal of Spirit (operator problems aside). Don't get me wrong: I'm not a nay-sayer here. I think this is very doable and worthwhile suggestion by Walter. Folks should take it seriously. But it will require some design compromises and changes from the original - IMO, it'll probably require more of a re-write than a port. -- - EricAnderton at yahoo
Oct 18 2006
Pragma wrote:Walter Bright wrote:Yeah Its a good idea, but my first thought was "is that even possible?" It wont be spirit, but a lexer in the uh spirit of spirit :)Along the lines of Don's regexp template metaprograms, is anyone interested in a Spirit-like parser generator capability in D? http://spirit.sourceforge.net/ http://www.codeproject.com/useritems/spart.aspNow there's an idea! Words of caution to follow: FWIW, I looked into doing this years ago, and didn't get to far. The biggest hurdle, aside from the limitations of templates at the time, was a lack of unary operators to override. In particular, not being able to override unary '*' and '!' caused some cosmetic problems. The only other major hangup I had was not having IFTI so I could instantiate templates transparently. This feature alone could close the gap on most of Spirit's useage of C++ templates. At a minimum, it means that a D programmer could get very close to the cosmetic appeal of Spirit (operator problems aside). Don't get me wrong: I'm not a nay-sayer here. I think this is very doable and worthwhile suggestion by Walter. Folks should take it seriously. But it will require some design compromises and changes from the original - IMO, it'll probably require more of a re-write than a port.
Oct 18 2006
Pragma wrote:Words of caution to follow: FWIW, I looked into doing this years ago, and didn't get to far. The biggest hurdle, aside from the limitations of templates at the time, was a lack of unary operators to override. In particular, not being able to override unary '*' and '!' caused some cosmetic problems.I think the operator overloading aspect of Spirit is only a minor part of the implementation - in fact, just a pretty shell around it. It could all be done using functional notation.The only other major hangup I had was not having IFTI so I could instantiate templates transparently. This feature alone could close the gap on most of Spirit's useage of C++ templates. At a minimum, it means that a D programmer could get very close to the cosmetic appeal of Spirit (operator problems aside).use operator overloading or even templates.Don't get me wrong: I'm not a nay-sayer here. I think this is very doable and worthwhile suggestion by Walter. Folks should take it seriously. But it will require some design compromises and changes from the original - IMO, it'll probably require more of a re-write than a port.I think it would be a complete rewrite. The reason I'm interested in it for D is that: 1) it's a pretty cool library 2) it's one of Boost's most popular ones 3) it's been touted as a reason why D is no good and C++ roolz 4) it's popular enough to have been a driving force behind improvements in C++ compilers 5) it would surely improve D 6) and last, and most importantly, it's very useful
Oct 18 2006
Walter Bright wrote:use operator overloading or even templates.Huh. Very interesting. Here's the example: // spirit: num_p >> *( ch_p(',') >> num_p) Ops.Seq( Prims.Digit, Ops.Start( Ops.Seq(Prims.Ch(','), Prims.Digit))) Though it's definitely not as easy to read, I think I might actually use of operator-overloading is that it can be a real pain to discover things because they don't have real names. Seq( Digit, Start( Seq(Ch(','), Digit))) That doesn't look too bad to me. Still it would rock the world if you could just do: parser("digit (',' digit)*"); and have the grammar be verified at compile-time.I think it would be a complete rewrite. The reason I'm interested in it for D is that: 1) it's a pretty cool library 2) it's one of Boost's most popular ones 3) it's been touted as a reason why D is no good and C++ roolz 4) it's popular enough to have been a driving force behind improvements in C++ compilers 5) it would surely improve D 6) and last, and most importantly, it's very usefulExcellent reasons. --bb
Oct 18 2006
Bill Baxter wrote:Walter Bright wrote:all that is cool, but (i know i am the dummy here) readability as in bnf is something that eludes me. better to go for coco? richarddoesn't use operator overloading or even templates.Huh. Very interesting. Here's the example: // spirit: num_p >> *( ch_p(',') >> num_p) Ops.Seq( Prims.Digit, Ops.Start( Ops.Seq(Prims.Ch(','), Prims.Digit))) Though it's definitely not as easy to read, I think I might actually use of operator-overloading is that it can be a real pain to discover things because they don't have real names. Seq( Digit, Start( Seq(Ch(','), Digit))) That doesn't look too bad to me. Still it would rock the world if you could just do: parser("digit (',' digit)*"); and have the grammar be verified at compile-time.I think it would be a complete rewrite. The reason I'm interested in it for D is that: 1) it's a pretty cool library 2) it's one of Boost's most popular ones 3) it's been touted as a reason why D is no good and C++ roolz 4) it's popular enough to have been a driving force behind improvements in C++ compilers 5) it would surely improve D 6) and last, and most importantly, it's very usefulExcellent reasons. --bb
Oct 18 2006
Richard Koch wrote:Bill Baxter wrote:Walter Bright wrote:You mean this coco? http://www.ssw.uni-linz.ac.at/Coco/ Not sure what you mean by coco being more readable than BNF. Coco's grammar looks pretty much like BNF to me. ----- from Taste.atg ----- VarDecl (. wchar_t* name; int type; .) = Type<type> Ident<name> (. tab->NewObj(name, var, type); .) { ',' Ident<name> (. tab->NewObj(name, var, type); .) } ';'. -------------------------- As far as I can tell that's just good old EBNF with some notations. VarDecl ::= Type Ident ("," Ident)* ";" --bb// spirit: num_p >> *( ch_p(',') >> num_p) Ops.Seq( Prims.Digit, Ops.Start( Ops.Seq(Prims.Ch(','), Prims.Digit)))all that is cool, but (i know i am the dummy here) readability as in bnf is something that eludes me. better to go for coco?
Oct 18 2006
Walter Bright wrote:use operator overloading or even templates.3) it's been touted as a reason why D is no good and C++ roolzAs a Spirit user, I can tell that readability is one of the most important factor. So, IMHO, if you want to target your 'point 3', I think you must have at least the same amount of operator overloading you can have in C++. Or C++ guys will continue to argue that D is no good and C++ roolz. ;-P --- Paolo Invernizzi
Oct 19 2006
Paolo Invernizzi wrote:Walter Bright wrote:You're probably right.doesn't use operator overloading or even templates.3) it's been touted as a reason why D is no good and C++ roolzAs a Spirit user, I can tell that readability is one of the most important factor. So, IMHO, if you want to target your 'point 3', I think you must have at least the same amount of operator overloading you can have in C++. Or C++ guys will continue to argue that D is no good and C++ roolz. ;-P
Oct 19 2006
Walter Bright wrote:Along the lines of Don's regexp template metaprograms, is anyone interested in a Spirit-like parser generator capability in D? http://spirit.sourceforge.net/Now that would be useful I think. Take this example from the Spirit intro of code to make a parser for a list of real numbers: r = real_p >> *(ch_p(',') >> real_p); In EBNF that's just: real_number ("," real_number)* In C++ you have to get creative with the operator overloading there (prefix '*' used to denote the regexp Kleene star, '>>' used to separate tokens) But given Don's experiments with compile-time text parsing in D, it's conceivable that in D the above parser could just be created with: r = make_parser("real_number (',' real_number)*"); I.e. use the EBNF version directly in a string literal that gets parsed at compile time. That would be pretty cool. Though, you know, even thinking about Boost::Spirit, I have to wonder if it really is necessary. From the intro it says that it's primary use is "extremely small micro-parsers", not a full blown language processor. But if that's the target then the runtime overhead of translating the EBNF description to a parser would be pretty trivial. So I guess the real benefit of a compile-time parser-generator is that your grammar can be _verified_ at compile-time. I wonder if it would be any easier to make a compile-time grammar verifier than a full blown parser generator? Then just do the parser-generating at runtime. --- heh heh, this is fun. From one of the code examples: typedef alternative<alternative<space_parser, sequence<sequence< strlit<const char*>, kleene_star<difference<anychar_parser, chlit<char> > > >, chlit<char> > >, sequence<sequence< strlit<const char*>, kleene_star<difference<anychar_parser, strlit<const char*> > > >, strlit<const char*> > > skip_t; skip_t skip; That monster type signature was determined by deliberately forcing a compiler error and then copy-pasting the type from the resulting error message. Too funny. (Note that this as given not as the main way to use the library but as a way to eliminate some of the code bloat all the templates lead to -- another reason to not try to generate the parser at compile-time, but just verify it.) At any rate the Spirit documentation seems to be rife with juicy comments of the form "yes it looks funky, but we're stuck with C++ here". So it's a good place to get ideas for how to make things better. --bb
Oct 18 2006
Bill Baxter wrote:But given Don's experiments with compile-time text parsing in D, it's conceivable that in D the above parser could just be created with: r = make_parser("real_number (',' real_number)*"); I.e. use the EBNF version directly in a string literal that gets parsed at compile time. That would be pretty cool.Yes, it would be. But there's a catastrophic problem with it. Spirit enables code snippets to be attached to terminals by overloading the [] operator. If the EBNF was all in a string literal, this would be impossible.Though, you know, even thinking about Boost::Spirit, I have to wonder if it really is necessary. From the intro it says that it's primary use is "extremely small micro-parsers", not a full blown language processor. But if that's the target then the runtime overhead of translating the EBNF description to a parser would be pretty trivial. So I guess the real benefit of a compile-time parser-generator is that your grammar can be _verified_ at compile-time.I disagree. I think the real benefit is avoiding reliance on an add-on tool. Such tools are a nuisance; making archival, maintenance, etc., clumsy.At any rate the Spirit documentation seems to be rife with juicy comments of the form "yes it looks funky, but we're stuck with C++ here". So it's a good place to get ideas for how to make things better.Yup.
Oct 18 2006
Walter Bright wrote:Bill Baxter wrote:But maybe you could allow the user to access those terminals via strings: r.lookup_terminal("real_number").add_action(&func); or just r.add_action("real_number", &func);But given Don's experiments with compile-time text parsing in D, it's conceivable that in D the above parser could just be created with: r = make_parser("real_number (',' real_number)*"); I.e. use the EBNF version directly in a string literal that gets parsed at compile time. That would be pretty cool.Yes, it would be. But there's a catastrophic problem with it. Spirit enables code snippets to be attached to terminals by overloading the [] operator. If the EBNF was all in a string literal, this would be impossible.Hmm. Well if no external tools is the main benefit, then simply making Lex/Yacc (or more apropriately, Enki) into a library should be sufficient. I guess you do need some way to attach code to terminals at runtime, but that's doable via various existing callback mechanisms. The machinery needed is basically the same as signals/slots. You just need to be able to do something like connect(ASTreeNode.accept(), mycode); at runtime. Then you should be able to get this kind of thing to work: auto r = make_parser_node("real_number (',' real_number)*"); r.add_action("real_number", &func); using nothing but runtime parsing of the grammar to build your AST. No fancy templates needed, except perhaps in adding the callback to &func. That kind of thing could be done in C++ too. --bbSo I guess the real benefit of a compile-time parser-generator is that your grammar can be _verified_ at compile-time.I disagree. I think the real benefit is avoiding reliance on an add-on tool. Such tools are a nuisance; making archival, maintenance, etc., clumsy.
Oct 18 2006
Walter Bright wrote:Bill Baxter wrote:How about delegate literals for code snipits?? template parse(char[] rulename, char[] rule, T, T delegate(T[]) act) { // mixin allows this to be used in the scope // parse!(rulename)(out T); // for rule == // "AssignExpression | AssignExpression ',' Expression", // parse!(rulename) expand to something like this // mixin a specialization bool parse!(rulename)(out T ret) { T[2] set if(T* ret = rule!("AssignExpression")(set[0])) { ret = act[0](set); return true; } if(rule!("AssignExpression")(set[0]) && rule!("Expression")(set[1])) { ret = act[1](set); return true; } false; } } ///used something like this class Parser : RootParser { mixin parse("Expression", " AssignExpression | AssignExpression ',' Expression ", Expr, [ (Expr[] e){return e[0];}, (Expr[] e){return new Expr(e[0], e[1]);} ] ); mixin parse("AssignExpression", " ConditionalExpression | ConditionalExpression '=' AssignExpression | ConditionalExpression '+=' AssignExpression " Expr, [ (Expr[] e){return e[0];}, (Expr[] e){return new AssignExper(e[0], e[1]);}, (Expr[] e){return new AssignExper(e[0], e[1]);} ] ); } //// I've never used mixins so I most likely have something wrong in thereBut given Don's experiments with compile-time text parsing in D, it's conceivable that in D the above parser could just be created with: r = make_parser("real_number (',' real_number)*"); I.e. use the EBNF version directly in a string literal that gets parsed at compile time. That would be pretty cool.Yes, it would be. But there's a catastrophic problem with it. Spirit enables code snippets to be attached to terminals by overloading the [] operator. If the EBNF was all in a string literal, this would be impossible.
Oct 18 2006
On Thu, 19 Oct 2006 00:12:41 +0300, Walter Bright = <newshound digitalmars.com> wrote:Bill Baxter wrote:=But given Don's experiments with compile-time text parsing in D, it's=conceivable that in D the above parser could just be created with: r =3D make_parser("real_number (',' real_number)*"); I.e. use the EBNF version directly in a string literal that gets =parsed at compile time. That would be pretty cool.Yes, it would be. But there's a catastrophic problem with it. Spirit =enables code snippets to be attached to terminals by overloading the [=] =operator. If the EBNF was all in a string literal, this would be =impossible.Well, couldn't one use arrays to define actions? For example: r =3D make_parser("real_number[0] (',' real_number[1])*"); array[] =3D {&firstAction, //[0] &generalAction //[1] }; r.attach(array); You could also give string ids for actions: r =3D make_parser("real_number[myaction] (',' real_number[real])*")= ; array[] =3D {"myaction", &firstAction, "real", &generalAction }; r.attach(array);
Oct 19 2006
Kristian wrote:Well, couldn't one use arrays to define actions? For example: r = make_parser("real_number[0] (',' real_number[1])*"); array[] = {&firstAction, //[0] &generalAction //[1] }; r.attach(array); You could also give string ids for actions: r = make_parser("real_number[myaction] (',' real_number[real])*"); array[] = {"myaction", &firstAction, "real", &generalAction }; r.attach(array);I actually did something similar to this when creating a generic PEG parser. (http://en.wikipedia.org/wiki/Parsing_Expression_Grammar) I opted to use the square brackets to denote semantic groups. Each semantic group has an associated delegate, which performs an action on the nodes in the group and returns a replacement node for the parse tree. The following is part of my calculator test that parses and evaluates an expression. auto c = new RuleSet(); // Number.create creates a number object from a string. c.addRegex("Number", "\s*(([\+\-])*([0-9]+)(\.([0-9]*))?(([eE])([\+\-]*)([0-9]+))?)\s*", &Number.create); c.addRegex("Identifier", r"\s*([a-zA-Z_]\w*)\s*"); c.addRule("Statement", "Assignment / Expression"); c.addRule("Assignment", r"[Identifier '=' Expression]", // assign value to variable delegate Node(Node[] n) { // All this casting feels a bit clunky! char[] s = (cast(RegexNode)n[0]).match[1]; vars[s] = (cast(Number)n[2]).eval(); return new Number(vars[s]); }); c.addRule("Expression", "[Term (('+' / '-') Term)*]", // perform addition and subtraction delegate Node(Node[] n) { ... }); c.addRule("Term", "[Factor (('*' / '/') Factor)*]", // perform multiplication and division delegate Node(Node[] n) { ... }); c.addRule("Factor", "Number / [Identifier] / ['(' Expression ')']", // Get value of identifier delegate Node(Node[] n) { ... }, // 2 is a shortcut for returning 2nd node in semantic group. // The expression itself in this case. 2); auto p = new Parser(c); Its all runtime as my templating abilities arent up to the task. A compile time version will be fun though!
Oct 19 2006
Sorry about that garbald post, lets try that again: Walter Bright wrote:Bill Baxter wrote:it's conceivable that in D the above parser could just be created with:But given Don's experiments with compile-time text parsing in D,parsed at compile time.r = make_parser("real_number (',' real_number)*"); I.e. use the EBNF version directly in a string literal that getsenables code snippets to be attached to terminals by overloading the [] operator. If the EBNF was all in a string literal, this would be impossible. How about delegate literals for code snipits?? template parse(char[] rulename, char[] rule, T, T delegate(T[]) act) { // mixin allows this to be used in the scope // parse!(rulename)(out T); // for rule == // "AssignExpression | AssignExpression ',' Expression", // parse!(rulename) expand to something like this // mixin a specialization bool parse!(rulename)(out T ret) { T[2] set if(T* ret = rule!("AssignExpression")(set[0])) { ret = act[0](set); return true; } if(rule!("AssignExpression")(set[0]) && rule!("Expression")(set[1])) { ret = act[1](set); return true; } false; } } ///used something like this class Parser : RootParser { mixin parse("Expression", " AssignExpression | AssignExpression ',' Expression ", Expr, [ (Expr[] e){return e[0];}, (Expr[] e){return new Expr(e[0], e[1]);} ] ); mixin parse("AssignExpression", " ConditionalExpression | ConditionalExpression '=' AssignExpression | ConditionalExpression '+=' AssignExpression " Expr, [ (Expr[] e){return e[0];}, (Expr[] e){return new AssignExper(e[0], e[1]);}, (Expr[] e){return new AssignExper(e[0], e[1]);} ] ); } //// I've never used mixins so I most likely have something wrong in thereThat would be pretty cool.Yes, it would be. But there's a catastrophic problem with it. Spirit
Oct 19 2006
Walter Bright wrote:I disagree. I think the real benefit is avoiding reliance on an add-on tool. Such tools are a nuisance; making archival, maintenance, etc., clumsy.I do not see why transferring a tool into a library should reduce any clumsiness. The process of developing compilers is still more a piece of art than a piece of engineering and who can propose that the D community is able to establish such an engeneering principle by the lonely fact that D has templates and mixins?
Oct 19 2006
Bill Baxter wrote:[snip] Though, you know, even thinking about Boost::Spirit, I have to wonder if it really is necessary. From the intro it says that it's primary use is "extremely small micro-parsers", not a full blown language processor. But if that's the target then the runtime overhead of translating the EBNF description to a parser would be pretty trivial. So I guess the real benefit of a compile-time parser-generator is that your grammar can be _verified_ at compile-time.From what I gather, that's the major benefit, other than a "self-documenting design". All the "prettyness" of using a near EBNF syntax in C++ code gets you close enough to actual EBNF that it's apparent what and how it functions. However, the only problem with composing this as an EBNF compile-time parser, is that you can't attach actions to arbitrary terminals without some sort of binding lookup. I'm not saying it's impossible, but it'll be a little odd to use until we get some stronger reflection support. But what you're suggesting could just as easily be a Compile-Time rendition of Enki. It's quite possible to pull off. Especially if you digest the grammar one production at a time as to side-step any recursion depth limitations when processing the parser templates. :) auto grammar = new Parser( Production!("Number ::= NumberPart {NumberPart}", // binding attached to production ('all' is supplied by default?) void function(char[] all){ writefln("Parsed Number: %s",all); } ), Production!("NumberPart ::= Sep | Digit "), Production!("Digit ::= 0|1|2|3|4|5|6|7|8|9"), Production!("Sep ::= '_' | ','") ); // call specifying start production grammar.parse("Number",myInput); Depending on how you'd like the call bindings to go, you could probably go about as complex as what Enki lets you get away with. But you'll have to accept a 'soft' binding in there someplace, hence you loose the type/name checking benefits of being at compile time.I wonder if it would be any easier to make a compile-time grammar verifier than a full blown parser generator? Then just do the parser-generating at runtime.Maybe I don't fully understand, but I don't think there's a gain there. If you've already gone through the gyrations of parsing the BNF expression, it's hardly any extra trouble to do something at each step of the resulting parse tree*. (* of course template-based parsers use the call-tree as a parse-tree but that's besides the point) -- - EricAnderton at yahoo
Oct 18 2006
Pragma wrote:Bill Baxter wrote:Yes! Sounds like we're thinking along the same lines here. But if Walter's right, that the compile-time verification is not a big deal, then it would be even simpler. Actually it sounds very similar to the way writing shader code for OpenGL/Direct3D works. You have to compile the code it to use it, but conveniently compilation is so fast that you can do it at run-time easily. Or if you prefer, you can still precompile it. What I like to do is set up my IDE to go ahead and precompile my shaders just so I can check for errors at compile time, but then I use the runtime compilation in the end anyway because that makes some things easier -- like modifying the code on the fly. It actually works pretty well I think. The only difference between shader code and grammar code is that shader code doesn't need to make any callbacks. But callbacks aren't hard.[snip]>Though, you know, even thinking about Boost::Spirit, I have to wonder if it really is necessary. From the intro it says that it's primary use is "extremely small micro-parsers", not a full blown language processor. But if that's the target then the runtime overhead of translating the EBNF description to a parser would be pretty trivial. So I guess the real benefit of a compile-time parser-generator is that your grammar can be _verified_ at compile-time.From what I gather, that's the major benefit, other than a "self-documenting design". All the "prettyness" of using a near EBNF syntax in C++ code gets you close enough to actual EBNF that it's apparent what and how it functions. However, the only problem with composing this as an EBNF compile-time parser, is that you can't attach actions to arbitrary terminals without some sort of binding lookup. I'm not saying it's impossible, but it'll be a little odd to use until we get some stronger reflection support. But what you're suggesting could just as easily be a Compile-Time rendition of Enki. It's quite possible to pull off. Especially if you digest the grammar one production at a time as to side-step any recursion depth limitations when processing the parser templates. :)auto grammar = new Parser( Production!("Number ::= NumberPart {NumberPart}", // binding attached to production ('all' is supplied by default?) void function(char[] all){ writefln("Parsed Number: %s",all); } ), Production!("NumberPart ::= Sep | Digit "), Production!("Digit ::= 0|1|2|3|4|5|6|7|8|9"), Production!("Sep ::= '_' | ','") ); // call specifying start production grammar.parse("Number",myInput);That's one way to do it, but I think you could also allow bindings to be attached after the fact: auto grammar = new Parser( "Number ::= NumberPart {NumberPart} NumberPart ::= Sep | Digit Digit ::= 0|1|2|3|4|5|6|7|8|9 Sep ::= '_' | ','"); ); grammer.attach("Number", // binding attached to production ('all' is supplied by default?) void function(char[] all){ writefln("Parsed Number: %s",all); }) This is _exactly_ how parameter binding works in shader code. Just here the value we're binding is a function pointer instead of a texture coordinate or something.Depending on how you'd like the call bindings to go, you could probably go about as complex as what Enki lets you get away with. But you'll have to accept a 'soft' binding in there someplace, hence you loose the type/name checking benefits of being at compile time.I'll have to take your word for it. You mean in Enki you can say that Number has to output something convertible to 'real'?Yeh, I was just talking crap. I thought maybe you might be able to save some bookkeeping if all you cared about was that the grammar made a valid tree, but didn't care about it's output. But probably it's the other way around. Checking validity is the hard part, not making a tree. --bbI wonder if it would be any easier to make a compile-time grammar verifier than a full blown parser generator? Then just do the parser-generating at runtime.Maybe I don't fully understand, but I don't think there's a gain there. If you've already gone through the gyrations of parsing the BNF expression, it's hardly any extra trouble to do something at each step of the resulting parse tree*. (* of course template-based parsers use the call-tree as a parse-tree but that's besides the point)
Oct 18 2006
Bill Baxter wrote:Pragma wrote:Yes and no. The parser generator has a good deal of flexibility built-in, including a pseudo-variant type that tries to perform conversions wherever possible. For instance, if we re-wrote the production for Number like so: Number = real handleNumber(whole,part) ::= (NumberPart {NumberPart}):whole '.' (NumberPart {NumberPart}):fraction; ... Enki would emit code that binds the chars traversed for for 'whole' and 'fraction', and passes those onto a function called 'handleNumber' that returns a real. That return value is passed up parse chain so that other terminals can bind to it: Foobar = writeMe(foo) ::= Number:foo; And so on.Bill Baxter wrote:Yes! Sounds like we're thinking along the same lines here. But if Walter's right, that the compile-time verification is not a big deal, then it would be even simpler. Actually it sounds very similar to the way writing shader code for OpenGL/Direct3D works. You have to compile the code it to use it, but conveniently compilation is so fast that you can do it at run-time easily. Or if you prefer, you can still precompile it. What I like to do is set up my IDE to go ahead and precompile my shaders just so I can check for errors at compile time, but then I use the runtime compilation in the end anyway because that makes some things easier -- like modifying the code on the fly. It actually works pretty well I think. The only difference between shader code and grammar code is that shader code doesn't need to make any callbacks. But callbacks aren't hard.[snip]>Though, you know, even thinking about Boost::Spirit, I have to wonder if it really is necessary. From the intro it says that it's primary use is "extremely small micro-parsers", not a full blown language processor. But if that's the target then the runtime overhead of translating the EBNF description to a parser would be pretty trivial. So I guess the real benefit of a compile-time parser-generator is that your grammar can be _verified_ at compile-time.From what I gather, that's the major benefit, other than a "self-documenting design". All the "prettyness" of using a near EBNF syntax in C++ code gets you close enough to actual EBNF that it's apparent what and how it functions. However, the only problem with composing this as an EBNF compile-time parser, is that you can't attach actions to arbitrary terminals without some sort of binding lookup. I'm not saying it's impossible, but it'll be a little odd to use until we get some stronger reflection support. But what you're suggesting could just as easily be a Compile-Time rendition of Enki. It's quite possible to pull off. Especially if you digest the grammar one production at a time as to side-step any recursion depth limitations when processing the parser templates. :)auto grammar = new Parser( Production!("Number ::= NumberPart {NumberPart}", // binding attached to production ('all' is supplied by default?) void function(char[] all){ writefln("Parsed Number: %s",all); } ), Production!("NumberPart ::= Sep | Digit "), Production!("Digit ::= 0|1|2|3|4|5|6|7|8|9"), Production!("Sep ::= '_' | ','") ); // call specifying start production grammar.parse("Number",myInput);That's one way to do it, but I think you could also allow bindings to be attached after the fact: auto grammar = new Parser( "Number ::= NumberPart {NumberPart} NumberPart ::= Sep | Digit Digit ::= 0|1|2|3|4|5|6|7|8|9 Sep ::= '_' | ','"); ); grammer.attach("Number", // binding attached to production ('all' is supplied by default?) void function(char[] all){ writefln("Parsed Number: %s",all); }) This is _exactly_ how parameter binding works in shader code. Just here the value we're binding is a function pointer instead of a texture coordinate or something.Depending on how you'd like the call bindings to go, you could probably go about as complex as what Enki lets you get away with. But you'll have to accept a 'soft' binding in there someplace, hence you loose the type/name checking benefits of being at compile time.I'll have to take your word for it. You mean in Enki you can say that Number has to output something convertible to 'real'?Yeh, I was just talking crap. I thought maybe you might be able to save some bookkeeping if all you cared about was that the grammar made a valid tree, but didn't care about it's output. But probably it's the other way around. Checking validity is the hard part, not making a tree. --bbI wonder if it would be any easier to make a compile-time grammar verifier than a full blown parser generator? Then just do the parser-generating at runtime.Maybe I don't fully understand, but I don't think there's a gain there. If you've already gone through the gyrations of parsing the BNF expression, it's hardly any extra trouble to do something at each step of the resulting parse tree*. (* of course template-based parsers use the call-tree as a parse-tree but that's besides the point)
Oct 18 2006