digitalmars.D - Writing a language parser in D
- Justin Johansson (4/4) Sep 14 2009 Can D people please recommend suitable tools for generating a parser (in...
- div0 (21/29) Sep 14 2009 -----BEGIN PGP SIGNED MESSAGE-----
- Bill Baxter (10/29) Sep 14 2009 D) for an LL(1) grammar. =A0There's bound to be much better parser gene...
- div0 (19/76) Sep 15 2009 -----BEGIN PGP SIGNED MESSAGE-----
- Justin Johansson (3/8) Sep 14 2009 Thanks for all replies, Ellery, div0, Bill et. al.
- div0 (15/27) Sep 15 2009 -----BEGIN PGP SIGNED MESSAGE-----
- Ellery Newcomer (8/16) Sep 14 2009 You might have a look at ANTLR. It's an LL(k) or LL(*) (versions) parser
- Trass3r (3/7) Sep 15 2009 Not completely true, there is one, it's just antiquated
- Ellery Newcomer (2/10) Sep 15 2009 That actually works?
- Ellery Newcomer (3/13) Sep 15 2009 Wow. I had a go with it, and it actually does a lot more than I thought
- Justin Johansson (6/12) Sep 14 2009 Hi Nick,
- downs (2/10) Sep 15 2009 In a completely different vein, tools.rd is a simplicistic recursive des...
- Justin Johansson (4/15) Sep 15 2009 Hi downs,
- downs (44/59) Sep 16 2009 Well for instance, take the PAD (Pastebin Adventure) component of my IRC...
-
Don
(3/13)
Sep 16 2009
That's a very interesting DSL
. - Justin Johansson (6/90) Sep 16 2009 Hmm, delightful.
- Lutger (3/3) Sep 15 2009 APaGeD can do LL parsers: http://apaged.mainia.de/
- Alexander Bothe (3/11) Sep 15 2009 My D-IDE (written in C#) has an extra DLL which contains an entire D Par...
- BCS (5/17) Sep 15 2009 I've written this:
Can D people please recommend suitable tools for generating a parser (in D) for an LL(1) grammar. There's bound to be much better parser generator tools available nowadays, since my last foray into this area 10+ years ago with YACC. I've heard of tools like bison, SableCC etc but apart from the names know nothing about them. (Note. This question is not about writing a parser for D. It is about writing a parser in D for another language which has an LL(1) grammar). Thanks in advance for all help. -- Justin Johansson
Sep 14 2009
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Justin Johansson wrote:Can D people please recommend suitable tools for generating a parser (in D) for an LL(1) grammar. There's bound to be much better parser generator tools available nowadays, since my last foray into this area 10+ years ago with YACC. I've heard of tools like bison, SableCC etc but apart from the names know nothing about them. (Note. This question is not about writing a parser for D. It is about writing a parser in D for another language which has an LL(1) grammar). Thanks in advance for all help. -- Justin JohanssonI've ported boost::spirit to d. No idea if it does what you want, but I've written some fairly complicated grammars with it. It's not a tool though, you just define your grammar directly in code. Which is either a plus or a minus depending on your point of view. Quick intro: http://www.boost.org/doc/libs/1_36_0/libs/spirit/classic/index.html And D implementation: http://www.sstk.co.uk/spiritd.php - -- My enormous talent is exceeded only by my outrageous laziness. http://www.ssTk.co.uk -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.7 (MingW32) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iD8DBQFKrrmbT9LetA9XoXwRAg0ZAJ95oFr48DUbDEBGKUOCpWDNewYhGwCfQb83 ud7OQmiXnPmpAuRQdpLHyLc= =PPEc -----END PGP SIGNATURE-----
Sep 14 2009
On Mon, Sep 14, 2009 at 2:46 PM, div0 <div0 users.sourceforge.net> wrote:-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Justin Johansson wrote:D) for an LL(1) grammar. =A0There's bound to be much better parser generat= or tools available nowadays, since my last foray into this area 10+ years a= go with YACC. =A0I've heard of tools like bison, SableCC etc but apart from= the names know nothing about them.Can D people please recommend suitable tools for generating a parser (in=out writing a parser in D for another language which has an LL(1) grammar).(Note. =A0This question is not about writing a parser for D. =A0It is ab=I'm not seeing the powershell script or test app in that .zip file. I don't really need it, I was just curious what the syntax looked like without any operator overloading. --bbThanks in advance for all help. -- Justin JohanssonI've ported boost::spirit to d. No idea if it does what you want, but I've written some fairly complicated grammars with it. It's not a tool though, you just define your grammar directly in code. Which is either a plus or a minus depending on your point of view. Quick intro: http://www.boost.org/doc/libs/1_36_0/libs/spirit/classic/index.html And D implementation: http://www.sstk.co.uk/spiritd.php
Sep 14 2009
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Bill Baxter wrote:On Mon, Sep 14, 2009 at 2:46 PM, div0 <div0 users.sourceforge.net> wrote:Their both in the top level bit of the zip, build.ps1 & test0.d I've gone for template factory functions at the moment, it's quick and dirty:-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Justin Johansson wrote:I'm not seeing the powershell script or test app in that .zip file. I don't really need it, I was just curious what the syntax looked like without any operator overloading. --bbCan D people please recommend suitable tools for generating a parser (in D) for an LL(1) grammar. There's bound to be much better parser generator tools available nowadays, since my last foray into this area 10+ years ago with YACC. I've heard of tools like bison, SableCC etc but apart from the names know nothing about them. (Note. This question is not about writing a parser for D. It is about writing a parser in D for another language which has an LL(1) grammar). Thanks in advance for all help. -- Justin JohanssonI've ported boost::spirit to d. No idea if it does what you want, but I've written some fairly complicated grammars with it. It's not a tool though, you just define your grammar directly in code. Which is either a plus or a minus depending on your point of view. Quick intro: http://www.boost.org/doc/libs/1_36_0/libs/spirit/classic/index.html And D implementation: http://www.sstk.co.uk/spiritd.phprT values = rT.create( or( or( or( or(parseReal, boolVal[&_outer.gotBool]), parseInt ), stringVal[&_outer.gotString] ), arrayValues )); rT fieldName = rT.create( lexemeD[ seq( or(alphaP, chP!(chT)('_')), star(or(alnumP, chP!(chT)('_'))) ) ]); rT field = rT.create( seq( seq(fieldName[&_outer.gotFieldName], chP!(chT)(':')), values ));One day I may write a something to generate the grammar from a string but I've got way too much other stuff to do at the mo, so that's a low priority. - -- My enormous talent is exceeded only by my outrageous laziness. http://www.ssTk.co.uk -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.7 (MingW32) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iD8DBQFKr+HRT9LetA9XoXwRAppuAJ4n+0i/BCW4pVI3CPpBXXEadDlF8wCdG8RH gp+7369u/3k9hkE2E/vxapg= =ZYF3 -----END PGP SIGNATURE-----
Sep 15 2009
I've ported boost::spirit to d. No idea if it does what you want, but I've written some fairly complicated grammars with it. It's not a tool though, you just define your grammar directly in code. Which is either a plus or a minus depending on your point of view.Thanks for all replies, Ellery, div0, Bill et. al. There's 101 odd productions in EBNF so whatever is the easiest to plug these directly into the tool or engine is probably the road of least resistance for this exercise. <JJ/>
Sep 14 2009
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Justin Johansson wrote:Yar, that's a bit much for doing in spiritd. Least ways not with out extra non existent plumbing. - -- My enormous talent is exceeded only by my outrageous laziness. http://www.ssTk.co.uk -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.7 (MingW32) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iD8DBQFKr+IHT9LetA9XoXwRAkECAKDL38p+IfBQpkqoXdM6vwRJya5uYACgg46l 7wqGhQEi/bhFoy9oZ7KNGKI= =bcBy -----END PGP SIGNATURE-----I've ported boost::spirit to d. No idea if it does what you want, but I've written some fairly complicated grammars with it. It's not a tool though, you just define your grammar directly in code. Which is either a plus or a minus depending on your point of view.Thanks for all replies, Ellery, div0, Bill et. al. There's 101 odd productions in EBNF so whatever is the easiest to plug these directly into the tool or engine is probably the road of least resistance for this exercise. <JJ/>
Sep 15 2009
Justin Johansson wrote:Can D people please recommend suitable tools for generating a parser (in D) for an LL(1) grammar. There's bound to be much better parser generator tools available nowadays, since my last foray into this area 10+ years ago with YACC. I've heard of tools like bison, SableCC etc but apart from the names know nothing about them. (Note. This question is not about writing a parser for D. It is about writing a parser in D for another language which has an LL(1) grammar). Thanks in advance for all help. -- Justin JohanssonYou might have a look at ANTLR. It's an LL(k) or LL(*) (versions) parser generator. I've found it suitable for writing a parser for D (yes I know), so it is definitely powerful enough. Currently, there is no implementation of ANTLR that generates D code, so if you really want a pure D parser, look elsewhere. However, I believe it can generate C, which you might be able to link to. If so, I'd say it's your best bet.
Sep 14 2009
Ellery Newcomer schrieb:Currently, there is no implementation of ANTLR that generates D code, so if you really want a pure D parser, look elsewhere. However, I believe it can generate C, which you might be able to link to. If so, I'd say it's your best bet.Not completely true, there is one, it's just antiquated http://www.mbutscher.de/antlrd/
Sep 15 2009
Trass3r wrote:Ellery Newcomer schrieb:That actually works?Currently, there is no implementation of ANTLR that generates D code, so if you really want a pure D parser, look elsewhere. However, I believe it can generate C, which you might be able to link to. If so, I'd say it's your best bet.Not completely true, there is one, it's just antiquated http://www.mbutscher.de/antlrd/
Sep 15 2009
Ellery Newcomer wrote:Trass3r wrote:Wow. I had a go with it, and it actually does a lot more than I thought it would. I am so going to dust this off and get it working!Ellery Newcomer schrieb:That actually works?Currently, there is no implementation of ANTLR that generates D code, so if you really want a pure D parser, look elsewhere. However, I believe it can generate C, which you might be able to link to. If so, I'd say it's your best bet.Not completely true, there is one, it's just antiquated http://www.mbutscher.de/antlrd/
Sep 15 2009
Hi Nick, Thanks. The grammar is already spec'ed for LL ans so looking for the course of least resistance. I've used GOLD and spoken with its author, Devin Cook, in the past though. It's rather cool in a way. Still it's great to see GOLD coming to a screen in the D village. <JJ/>If you can't find anything you like for LL and don't mind switching to LALR instead, I've recently released Goldie ( http://www.dsource.org/projects/goldie ) which works in conjunction with GOLD Parser Builder ( http://www.devincook.com/goldparser/ ). If you're familiar with GOLD, Goldie is a GOLD Engine for D. I do plan to add support for LL eventually, but that's kind of a ways off for now.
Sep 14 2009
Justin Johansson wrote:Can D people please recommend suitable tools for generating a parser (in D) for an LL(1) grammar. There's bound to be much better parser generator tools available nowadays, since my last foray into this area 10+ years ago with YACC. I've heard of tools like bison, SableCC etc but apart from the names know nothing about them. (Note. This question is not about writing a parser for D. It is about writing a parser in D for another language which has an LL(1) grammar). Thanks in advance for all help. -- Justin JohanssonIn a completely different vein, tools.rd is a simplicistic recursive descent parser framework implemented at compiletime that I've used for most/all of my toy languages. It keeps things trivial - there's no lexing stage, it parses straight from input string. It's not that well documented, but if you want, give me a simple language description and I can write you a sample parser. It's probably the easiest to use though - just mix it in from D code :)
Sep 15 2009
Hi downs, Thanks for the offer but since YACC is my prior background I'll probably go to the closest tool which is the modern variant for LL(1). Still if you have a small sample to share I'm sure other D people will be delighted. <JJ/> downs Wrote:Justin Johansson wrote:Can D people please recommend suitable tools for generating a parser (in D) for an LL(1) grammar. There's bound to be much better parser generator tools available nowadays, since my last foray into this area 10+ years ago with YACC. I've heard of tools like bison, SableCC etc but apart from the names know nothing about them. (Note. This question is not about writing a parser for D. It is about writing a parser in D for another language which has an LL(1) grammar). Thanks in advance for all help. -- Justin JohanssonIn a completely different vein, tools.rd is a simplicistic recursive descent parser framework implemented at compiletime that I've used for most/all of my toy languages. It keeps things trivial - there's no lexing stage, it parses straight from input string. It's not that well documented, but if you want, give me a simple language description and I can write you a sample parser. It's probably the easiest to use though - just mix it in from D code :)
Sep 15 2009
Justin Johansson wrote:downs Wrote:Well for instance, take the PAD (Pastebin Adventure) component of my IRC bot, that can run simple text adventures from a variety of sources, like local Gobby sessions, Wikis and (originally) Pastebin.com: http://dsource.org/projects/scrapple/browser/trunk/idc/pad Let's look at http://dsource.org/projects/scrapple/browser/trunk/idc/pad/engine.d L175: gotToken Functions like this form the building blocks of tools.rd parsing. They always have the form "bool gotBlarghle(ref string st, out T result)" and return true if result could be parsed from st, otherwise false (in which case st is not modified). gotToken trivially removes a token from the input text. L200: bool accept(ref string st, string cmp): This function is called internally by the parser framework to decide if st starts with a comparison string, in which case it is removed and true returned. bool accept removes tokens from both strings and compares until a comparison fails (false, st not modified) or cmp is used up (true). L230: The first use of the actual Parser DSL. return mixin(gotMatchExpr("s: log")); This simply matches "log" against the input string s. Nothing fancy. L282: Not related to the parser but still, IMHO, insanely cool. const string Table = ` | bool | int | string | float --------+---------------+-------------+----------------------+-------- Boolean | b | b | b?q{true}p:q{false}p | ø Integer | i != 0 | i | Format(i) | i String | s == q{true}p | atoi(s) | s | atof(s) Float | ø | cast(int) f | Format(f) | f`; This table contains a conversion matrix for internal types to basic type. Two things are of interest: 1) q{}p is unrolled by .litstring_expand() into nested and escaped ""s. It's a backport of D2 nestable string literals to D1. 2) The table itself. tools.ctfe contains functionality to select rows, columns, and iterate the table in column-major order. This means the above table can be automatically translated into nested if/switch statements. L487: A more instructive use of the parser framework. if (mixin(gotMatchExpr("st: [==$#eq=true$|!=$#neq=true$|<=$#eq=smaller=true$|>=$#eq=greater=true$|<$#smaller=tru $|>$#greater=true$] " "$dg2 <- genExprMath$" ))) { ... } Okay, first we have a conditional branch: [a|b|c|d]. This matches each of the possible branches against the input string in turn. Segments in $$ indicate variable matches and/or programmatic reactions. $#eq=smaller=true$ basically translates to "execute eq=smaller=true when this part of the parse string is successfully reached. ". "$dg2 <- genExprMath$" means "Generate dg2 using the genExprMath function" It is assumed that this function follows the convention of bool(ref string, out typeof(dg2)). It hasn't been used in that sample, but "y <- foo/x" means "pass x as an extra parameter to foo". And that's basically it. :) Oh, just for fun, here's the unrolled D syntax for the above expression: (ref string s) { auto scratch = s; return ( true && (ref string s) { auto scratch = s; return (true && scratch.accept("==") && (((eq=true), true))) && ((s=scratch), true) || (((scratch=s), true) && scratch.accept("!=") && (((neq=true), true))) && ((s=scratch), true) || (((scratch=s), true) && scratch.accept("<=") && (((eq=smaller=true), true))) && ((s=scratch), true) || (((scratch=s), true) && scratch.accept(">=") && (((eq=greater=true), true))) && ((s=scratch), true) || (((scratch=s), true) && scratch.accept("<") && (((smaller=true), true))) && ((s=scratch), true) || (((scratch=s), true) && scratch.accept(">") && (((greater=true), true))) && ((s = scratch), true); }(scratch) && ( genExprMath(scratch, dg2 )) ) && ((s = scratch), true); }(st)Justin Johansson wrote:Hi downs, Thanks for the offer but since YACC is my prior background I'll probably go to the closest tool which is the modern variant for LL(1). Still if you have a small sample to share I'm sure other D people will be delighted. <JJ/>Can D people please recommend suitable tools for generating a parser (in D) for an LL(1) grammar. There's bound to be much better parser generator tools available nowadays, since my last foray into this area 10+ years ago with YACC. I've heard of tools like bison, SableCC etc but apart from the names know nothing about them. (Note. This question is not about writing a parser for D. It is about writing a parser in D for another language which has an LL(1) grammar). Thanks in advance for all help. -- Justin JohanssonIn a completely different vein, tools.rd is a simplicistic recursive descent parser framework implemented at compiletime that I've used for most/all of my toy languages. It keeps things trivial - there's no lexing stage, it parses straight from input string. It's not that well documented, but if you want, give me a simple language description and I can write you a sample parser. It's probably the easiest to use though - just mix it in from D code :)
Sep 16 2009
downs wrote:L282: Not related to the parser but still, IMHO, insanely cool. const string Table = ` | bool | int | string | float --------+---------------+-------------+----------------------+-------- Boolean | b | b | b?q{true}p:q{false}p | ø Integer | i != 0 | i | Format(i) | i String | s == q{true}p | atoi(s) | s | atof(s) Float | ø | cast(int) f | Format(f) | f`; This table contains a conversion matrix for internal types to basic type.That's a very interesting DSL <g>. Insanely cool, indeed.
Sep 16 2009
Hmm, delightful. Thanks for sharing. There's obviously some very talented people out there :-) Gotta put this in my input queue for later consumption. JJ downs Wrote:Justin Johansson wrote:downs Wrote:Well for instance, take the PAD (Pastebin Adventure) component of my IRC bot, that can run simple text adventures from a variety of sources, like local Gobby sessions, Wikis and (originally) Pastebin.com: http://dsource.org/projects/scrapple/browser/trunk/idc/pad Let's look at http://dsource.org/projects/scrapple/browser/trunk/idc/pad/engine.d L175: gotToken Functions like this form the building blocks of tools.rd parsing. They always have the form "bool gotBlarghle(ref string st, out T result)" and return true if result could be parsed from st, otherwise false (in which case st is not modified). gotToken trivially removes a token from the input text. L200: bool accept(ref string st, string cmp): This function is called internally by the parser framework to decide if st starts with a comparison string, in which case it is removed and true returned. bool accept removes tokens from both strings and compares until a comparison fails (false, st not modified) or cmp is used up (true). L230: The first use of the actual Parser DSL. return mixin(gotMatchExpr("s: log")); This simply matches "log" against the input string s. Nothing fancy. L282: Not related to the parser but still, IMHO, insanely cool. const string Table = ` | bool | int | string | float --------+---------------+-------------+----------------------+-------- Boolean | b | b | b?q{true}p:q{false}p | ø Integer | i != 0 | i | Format(i) | i String | s == q{true}p | atoi(s) | s | atof(s) Float | ø | cast(int) f | Format(f) | f`; This table contains a conversion matrix for internal types to basic type. Two things are of interest: 1) q{}p is unrolled by .litstring_expand() into nested and escaped ""s. It's a backport of D2 nestable string literals to D1. 2) The table itself. tools.ctfe contains functionality to select rows, columns, and iterate the table in column-major order. This means the above table can be automatically translated into nested if/switch statements. L487: A more instructive use of the parser framework. if (mixin(gotMatchExpr("st: [==$#eq=true$|!=$#neq=true$|<=$#eq=smaller=true$|>=$#eq=greater=true$|<$#smaller=tru $|>$#greater=true$] " "$dg2 <- genExprMath$" ))) { ... } Okay, first we have a conditional branch: [a|b|c|d]. This matches each of the possible branches against the input string in turn. Segments in $$ indicate variable matches and/or programmatic reactions. $#eq=smaller=true$ basically translates to "execute eq=smaller=true when this part of the parse string is successfully reached. ". "$dg2 <- genExprMath$" means "Generate dg2 using the genExprMath function" It is assumed that this function follows the convention of bool(ref string, out typeof(dg2)). It hasn't been used in that sample, but "y <- foo/x" means "pass x as an extra parameter to foo". And that's basically it. :) Oh, just for fun, here's the unrolled D syntax for the above expression: (ref string s) { auto scratch = s; return ( true && (ref string s) { auto scratch = s; return (true && scratch.accept("==") && (((eq=true), true))) && ((s=scratch), true) || (((scratch=s), true) && scratch.accept("!=") && (((neq=true), true))) && ((s=scratch), true) || (((scratch=s), true) && scratch.accept("<=") && (((eq=smaller=true), true))) && ((s=scratch), true) || (((scratch=s), true) && scratch.accept(">=") && (((eq=greater=true), true))) && ((s=scratch), true) || (((scratch=s), true) && scratch.accept("<") && (((smaller=true), true))) && ((s=scratch), true) || (((scratch=s), true) && scratch.accept(">") && (((greater=true), true))) && ((s = scratch), true); }(scratch) && ( genExprMath(scratch, dg2 )) ) && ((s = scratch), true); }(st)Justin Johansson wrote:Hi downs, Thanks for the offer but since YACC is my prior background I'll probably go to the closest tool which is the modern variant for LL(1). Still if you have a small sample to share I'm sure other D people will be delighted. <JJ/>Can D people please recommend suitable tools for generating a parser (in D) for an LL(1) grammar. There's bound to be much better parser generator tools available nowadays, since my last foray into this area 10+ years ago with YACC. I've heard of tools like bison, SableCC etc but apart from the names know nothing about them. (Note. This question is not about writing a parser for D. It is about writing a parser in D for another language which has an LL(1) grammar). Thanks in advance for all help. -- Justin JohanssonIn a completely different vein, tools.rd is a simplicistic recursive descent parser framework implemented at compiletime that I've used for most/all of my toy languages. It keeps things trivial - there's no lexing stage, it parses straight from input string. It's not that well documented, but if you want, give me a simple language description and I can write you a sample parser. It's probably the easiest to use though - just mix it in from D code :)
Sep 16 2009
APaGeD can do LL parsers: http://apaged.mainia.de/ There is a fork: http://www.dsource.org/projects/apaged2 Not a recommendation perse, but just wanted to mention it as an option.
Sep 15 2009
Justin Johansson Wrote:Can D people please recommend suitable tools for generating a parser (in D) for an LL(1) grammar. There's bound to be much better parser generator tools available nowadays, since my last foray into this area 10+ years ago with YACC. I've heard of tools like bison, SableCC etc but apart from the names know nothing about them. (Note. This question is not about writing a parser for D. It is about writing a parser in D for another language which has an LL(1) grammar). Thanks in advance for all help. -- Justin JohanssonParser...It's based on the SharpDevelop parser and lexer.... may you can use it.... www.alexanderbothe.com/?id=27
Sep 15 2009
Hello Justin,Can D people please recommend suitable tools for generating a parser (in D) for an LL(1) grammar. There's bound to be much better parser generator tools available nowadays, since my last foray into this area 10+ years ago with YACC. I've heard of tools like bison, SableCC etc but apart from the names know nothing about them. (Note. This question is not about writing a parser for D. It is about writing a parser in D for another language which has an LL(1) grammar). Thanks in advance for all help.I've written this: http://www.dsource.org/projects/scrapple/browser/trunk/dparser It's a pure compile time parser generator that takes grammars defined as text and generates a backtracking recursive decent parser.
Sep 15 2009