digitalmars.D.announce - Goldie Parsing System v0.4 Released - Now for D2
- Nick Sabalausky (48/48) Mar 27 2011 Goldie is a series of flexible open-source parsing tools, including a D2...
- Long Chang (22/70) Mar 27 2011 I try use the gold from trunk, not the release version . It seems
- Nick Sabalausky (12/16) Mar 27 2011 For other people who haven't read my reply to that ticket, this is the t...
- Andrej Mitrovic (3/3) Apr 14 2011 So could your library be used to implement an alternative for HTOD? Or
- Nick Sabalausky (42/45) Apr 15 2011 C code is admittedly a bit tricky because it uses a preprocessor. But in...
- Daniel Gibson (7/16) Apr 15 2011 Why? Just call the preprocessor from your tool or from a wrapping script
- Nick Sabalausky (13/25) Apr 15 2011 If by "your tool" mean a program that uses Goldie to process C code, the...
- Daniel Gibson (2/22) Apr 15 2011 I meant Andrej's hypothetical tool using Goldie to process C code :-)
- Andrej Mitrovic (15/15) Apr 15 2011 I've used your tool yesterday. I used it on a simple C file with the
- Nick Sabalausky (39/54) Apr 15 2011 Like any generalized parsing tool (AFAIK), Goldie doesn't really have a
- Andrej Mitrovic (42/42) Apr 15 2011 What I meant was that code like this will throw if MyType isn't
- Nick Sabalausky (2/53) Apr 15 2011 I'm not at my computer right now, so I can't check, but it sounds like t...
- Nick Sabalausky (37/94) Apr 16 2011 Yea, turns out that grammar just doesn't support using user-defined type...
-
Kagamin
(2/34)
Apr 16 2011
As I understand,
is a type, is a variable. There should be ... - Nick Sabalausky (28/67) Apr 16 2011 First of all, the name up there is misleading. That only refers th...
- Nick Sabalausky (7/38) Apr 16 2011 In other words, we basically have a form of this:
- Kagamin (2/9) Apr 18 2011 A hairy grammar can be used here, anyway goldie's output needs postproce...
- Nick Sabalausky (15/114) Apr 16 2011 Unfortunately, I think this may require LALR(k). Goldie and GOLD are onl...
- Long Chang (19/105) Mar 27 2011 just read you replay, very look forward to the Character Set Optimizatio...
- Nick Sabalausky (3/5) Mar 27 2011 Thanks for your interest :)
Goldie is a series of flexible open-source parsing tools, including a D2 library called GoldieLib. It's compatible with GOLD Parser Builder and can be used together with it, but does not require it. In fact, Goldie can be used as a cross-platform, shell-scripting-compatible alternative to GOLD Parser Builder. == Links: == Main homepage and documentation: http://www.semitwist.com/goldie/ Prepackaged downloads: http://www.dsource.org/projects/goldie/browser/downloads The related GOLD Parser Builder: http://www.devincook.com/goldparser/ == New in v0.4: == - Switched from D1/Tango to D2/Phobos. - New tool: GRMC: Grammar Compiler. Because of this, Goldie no longer requires GOLD Parser Builder. - Grammars can be compiled not only from GRMC: Grammar Compiler, but also through the D API, GoldieLib. - No longer requires xfBuild or Rebuild. - Executable filenames are now prefixed with 'goldie-' to minimize chance of collisions on the PATH. - Many misc changes/improvements to tools, API and documentation. - Includes a lexing-only D2 grammar: http://www.dsource.org/projects/goldie/browser/trunk/lang/dlex.grm This D2 grammar does have a few small limitations ATM though, which I've already described here: http://www.mail-archive.com/digitalmars-d-learn puremagic.com/msg11491.html http://www.mail-archive.com/digitalmars-d-learn puremagic.com/msg11493.html == Some of Goldie's benefits: (most are thanks to Goldie's compatibility with GOLD Parser Builder) == - Grammars are fully-reusable: No need to create a new grammar for every use and every host language. Many grammars are already available. - Grammar-agnostic engine: One lexer/parser engine can be used for all grammars. - Engines for nearly any language or platform: A cross-platform D v2.x engine is included via GoldieLib. Engines for many other platforms are also available. New engines are easy to write. - Dynamic-Style: Dynamic-style lets you write programs that support user-created grammars. - Static-Style: Static-style provides compile-time checks and extra type-safety. - Lexing and parsing: Lexing and parsing are defined in the same file and handled by one unified tool. - Many tools available. Goldie is fully-usable and has been tested on both Windows and Linux (it should also work on OSX and any other platform supported by DMD, but has not been tested), although GoldieLib's API is still subject to change. Goldie is licensed under The zlib/libpng License.
Mar 27 2011
I try use the gold from trunk, not the release version . It seems very slow for parse css . please see this http://www.dsource.org/projects/goldie/ticket/18 . Is all lalr slow like this , or it is a gold problem . On Sun, Mar 27, 2011 at 4:11 PM, Nick Sabalausky <a a.a> wrote:Goldie is a series of flexible open-source parsing tools, including a D2 library called GoldieLib. It's compatible with GOLD Parser Builder and ca=nbe used together with it, but does not require it. In fact, Goldie can be used as a cross-platform, shell-scripting-compatible alternative to GOLD Parser Builder. =3D=3D Links: =3D=3D Main homepage and documentation: =A0 =A0http://www.semitwist.com/goldie/ Prepackaged downloads: =A0 =A0http://www.dsource.org/projects/goldie/browser/downloads The related GOLD Parser Builder: =A0 =A0http://www.devincook.com/goldparser/ =3D=3D New in v0.4: =3D=3D =A0 =A0- Switched from D1/Tango to D2/Phobos. =A0 =A0- New tool: GRMC: Grammar Compiler. Because of this, Goldie no lon=gerrequires GOLD Parser Builder. =A0 =A0- Grammars can be compiled not only from GRMC: Grammar Compiler, b=utalso through the D API, GoldieLib. =A0 =A0- No longer requires xfBuild or Rebuild. =A0 =A0- Executable filenames are now prefixed with 'goldie-' to minimize chance of collisions on the PATH. =A0 =A0- Many misc changes/improvements to tools, API and documentation. =A0 =A0- Includes a lexing-only D2 grammar: http://www.dsource.org/projects/goldie/browser/trunk/lang/dlex.grm This D2 grammar does have a few small limitations ATM though, which I've already described here: http://www.mail-archive.com/digitalmars-d-learn puremagic.com/msg11491.ht=mlhttp://www.mail-archive.com/digitalmars-d-learn puremagic.com/msg11493.ht=ml=3D=3D Some of Goldie's benefits: (most are thanks to Goldie's compatibil=itywith GOLD Parser Builder) =3D=3D =A0 =A0- Grammars are fully-reusable: No need to create a new grammar for=everyuse and every host language. Many grammars are already available. =A0 =A0- Grammar-agnostic engine: One lexer/parser engine can be used for=allgrammars. =A0 =A0- Engines for nearly any language or platform: A cross-platform D =v2.xengine is included via GoldieLib. Engines for many other platforms are al=soavailable. New engines are easy to write. =A0 =A0- Dynamic-Style: Dynamic-style lets you write programs that suppor=tuser-created grammars. =A0 =A0- Static-Style: Static-style provides compile-time checks and extr=atype-safety. =A0 =A0- Lexing and parsing: Lexing and parsing are defined in the same f=ileand handled by one unified tool. =A0 =A0- Many tools available. Goldie is fully-usable and has been tested on both Windows and Linux (it should also work on OSX and any other platform supported by DMD, but has =notbeen tested), although GoldieLib's API is still subject to change. Goldie=islicensed under The zlib/libpng License.--=20 .
Mar 27 2011
"Long Chang" <changlong jkys.info> wrote in message news:mailman.2805.1301240416.4748.digitalmars-d-announce puremagic.com...I try use the gold from trunk, not the release version . It seems very slow for parse css . please see this http://www.dsource.org/projects/goldie/ticket/18 . Is all lalr slow like this , or it is a gold problem .For other people who haven't read my reply to that ticket, this is the top priority for Goldie now. I've profiled, and the biggest bottleneck by far is the (rather stupid and half-assed) way that I'm handling character sets in the lexer. (To directly answer the quesion, it's neither an LALR thing nor a GOLD thing, it's just a temporary Goldie thing.) Additionally, my plain is to switch from large infrequent releases to smaller more frequent releases, so it shouldn't be another 6+ month wait for "v0.5 with a speed boost". This v0.4 version has a lot of big stuff in it (the change from D1->D2 and the grammar compiler), so that's why it took so long.
Mar 27 2011
So could your library be used to implement an alternative for HTOD? Or more simply put, could I use this to do (simple) transformations of C code?
Apr 14 2011
"Andrej Mitrovic" <andrej.mitrovich gmail.com> wrote in message news:mailman.3528.1302836832.4748.digitalmars-d-announce puremagic.com...So could your library be used to implement an alternative for HTOD? Or more simply put, could I use this to do (simple) transformations of C code?C code is admittedly a bit tricky because it uses a preprocessor. But in general, yes, Goldie can be used transform source. The way it would work is like this: 1. Define a grammar for the "input" language. There's an ANSI C grammar here, but I haven't looked at it, so I don't know how good it is: http://www.devincook.com/goldparser/grammars/index.htm An introduction to the grammar description langauge is here: http://www.semitwist.com/goldie/Start/Grammar/ 2. Use Goldie to parse the input. Details here: http://www.semitwist.com/goldie/Start/HowToUse/ 3. Once Goldie's parsed the input, it will give you a parse tree (it'll be structured based on the grammar you used). You can then walk the tree and do whatever you want with it. I don't recommend actually modifying the parse tree that Goldie gives you in-place, since the interface isn't really designed for that right now (though you may still be able to make it work). But you can walk it and either build up your own tree structure, or convert it to text however you want, etc. Actually, you can even take a look at what the parse tree you'll get back will look like before writing any code: Use the included Goldie parse ( http://www.semitwist.com/goldie/Tools/Parse/ ) to parse a file according to whatever grammar you want to use. It'll save the parse tree to JSON. Then you can inspect the parse tree with this: ( http://www.semitwist.com/goldie/Tools/JsonViewer/ ). But try to use just a small sample file: Parse trees tend to get very big, very fast and Since you're talking about C, you'll probably want to run your original C code through the "preprocess-only" option of a real C compiler. (I *think* DMC will do that.) Then parse the resulting "preprocessed C" files with Goldie. (Although if your goal is an HTOD-like tool, maybe you would need to deal with the original un-preprocessed source directly. If Golde's grammar langauge doesn't seem quite up to the task, it probably wouldn't bee too hard to just manually make a basic C preprocessor.) Right now, the grammar description format isn't really very good as describing preprocessors (a limitation Goldie inherited from GOLD Parser Builder). But fixing that limitation is one of the things on my TODO list for Goldie. If you do try this, I'd love to hear how it works out :) Even if you encounter any problems, it would be very helpful for me to know. Haven't gotten a whole lot of feedback yet.
Apr 15 2011
Am 15.04.2011 09:50, schrieb Nick Sabalausky:Since you're talking about C, you'll probably want to run your original C code through the "preprocess-only" option of a real C compiler. (I *think* DMC will do that.) Then parse the resulting "preprocessed C" files with Goldie. (Although if your goal is an HTOD-like tool, maybe you would need to deal with the original un-preprocessed source directly.Why? Just call the preprocessor from your tool or from a wrapping script and go on with the preprocessed C code. Should be much easier and more compatible because C compilers ought to know how to preprocess correctly. For GCC the option you're looking for is "-E", btw.If Golde's grammar langauge doesn't seem quite up to the task, it probably wouldn't bee too hard to just manually make a basic C preprocessor.)Cheers, - Daniel
Apr 15 2011
"Daniel Gibson" <metalcaedes gmail.com> wrote in message news:io8u12$132q$1 digitalmars.com...Am 15.04.2011 09:50, schrieb Nick Sabalausky:If by "your tool" mean a program that uses Goldie to process C code, then yea, that's what I meant. If you meant that Goldie should invoke a C preprocessor directly, that's a bit tricky: Goldie is a generalized parsing tool (sort of like ANTLR or Spirit), so it doesn't really know "Ok, this is supposed to be C". It just parses according to whatever grammar it's given. Of course, it's not entirely out of the question to have some sort of system for specifying that a source should have XYZ tool (such as "C preprocessor") invoked on it first, etc, but it's probably easiest if programs using Goldie just invoke whatever other tools they need by themselves. (Sorry if I've stil misunderstood - it's late over here ;) )Since you're talking about C, you'll probably want to run your original C code through the "preprocess-only" option of a real C compiler. (I *think* DMC will do that.) Then parse the resulting "preprocessed C" files with Goldie. (Although if your goal is an HTOD-like tool, maybe you would need to deal with the original un-preprocessed source directly.Why? Just call the preprocessor from your tool or from a wrapping script and go on with the preprocessed C code. Should be much easier and more compatible because C compilers ought to know how to preprocess correctly. For GCC the option you're looking for is "-E", btw.
Apr 15 2011
Am 15.04.2011 10:13, schrieb Nick Sabalausky:"Daniel Gibson" <metalcaedes gmail.com> wrote in message news:io8u12$132q$1 digitalmars.com...I meant Andrej's hypothetical tool using Goldie to process C code :-)Am 15.04.2011 09:50, schrieb Nick Sabalausky:If by "your tool" mean a program that uses Goldie to process C code, then yea, that's what I meant.Since you're talking about C, you'll probably want to run your original C code through the "preprocess-only" option of a real C compiler. (I *think* DMC will do that.) Then parse the resulting "preprocessed C" files with Goldie. (Although if your goal is an HTOD-like tool, maybe you would need to deal with the original un-preprocessed source directly.Why? Just call the preprocessor from your tool or from a wrapping script and go on with the preprocessed C code. Should be much easier and more compatible because C compilers ought to know how to preprocess correctly. For GCC the option you're looking for is "-E", btw.
Apr 15 2011
I've used your tool yesterday. I used it on a simple C file with the ANSI C grammar from the gold website. It does seem to work fine, but yeah I have to preprocess a C file first (I've spent so much time with D that I almost completely forgot about the C preprocessor in the first place). I've tried a file with your ParseAnything sample. It works ok as long as all the types are defined. If not I usually get a Token exception of some sort. Is this considered the semantic pass stage? Btw, is there a grammar file for C99? What about C++, I haven't seen a grammar on the Gold website? (well, C++ is a monster, I know..). I'm also trying to figure out whether to go with the static or dynamic approach (I've looked at your docs). The static examples seem quite complex, but perhaps they're more reliable. I think I'll do a few tryouts with dynamic style since it looks much easier to do. If I get anything done you'll know about it. :)
Apr 15 2011
"Andrej Mitrovic" <andrej.mitrovich gmail.com> wrote in message news:mailman.3531.1302884207.4748.digitalmars-d-announce puremagic.com...I've used your tool yesterday. I used it on a simple C file with the ANSI C grammar from the gold website. It does seem to work fine, but yeah I have to preprocess a C file first (I've spent so much time with D that I almost completely forgot about the C preprocessor in the first place). I've tried a file with your ParseAnything sample. It works ok as long as all the types are defined. If not I usually get a Token exception of some sort. Is this considered the semantic pass stage?Like any generalized parsing tool (AFAIK), Goldie doesn't really have a semantic stage (because language semantics isn't something that's easily formalized). Probably the C grammar just considers something in your source to be either a syntax or grammatical error. (This could be a bug or limitation in the C grammar.) Goldie currently handles syntax/grammatical errors by throwing a ParseException when it detects all the errors it can find. The message of the exception is the "filename(line:col): Error: Description of error" message that you'd normally expect a compiler to output. Most of the apps in Goldie catch this exception and just output the message, but I guess I didn't do that in ParseAnything. Of course, it could also be a bug in either ParseAnything or Goldie. Can you send one of the C files that's getting an error? I'll take a look and see what's going on. You may want to try "goldie-parse" instead of "goldie-parseAnything" (I really should rename one of them, it's probably confusing). "goldie-parseAnything" is mainly intended as an example of how to use Goldie (like the Calculator examples). "goldie-parse" is the one that outputs JSON.Btw, is there a grammar file for C99? What about C++, I haven't seen a grammar on the Gold website? (well, C++ is a monster, I know..).Not that I'm aware of. But if you know the differences between ANSI C and C99 you should be able to modify the ANSI C grammar and turn it into a C99. The grammar description language should be very easy to understand if you're familiar with BNF and regex (In fact, the grammar definition langauge doesn't even use the barely-readable Perl regex syntax - it uses a far more readable equivalent instead). BTW, Tip on the grammar language: Everything enclosed in angle brackets is a nonterminal. And yea, C++ is a beast. And one of C++'s biggest issues is that, not only does it have the preprocessor, but what's worse: the parsing is dependent on the semantics pass. I'd say that any generalized parsing tool that can do C++ properly is doing an *incredibly* damn good job.I'm also trying to figure out whether to go with the static or dynamic approach (I've looked at your docs). The static examples seem quite complex, but perhaps they're more reliable. I think I'll do a few tryouts with dynamic style since it looks much easier to do.The general recommendation is to use static whenever you just have one specific grammar you're trying to deal with (because it provides better protection against mistakes). But you're right, the dynamic style may be an easier way to learn Goldie. If you haven't already, you may wat to look at the source for the calculator examples. They're both the exact same program, but one does it the static way, and the other does it the dynamic way.If I get anything done you'll know about it. :)Cool, appreciated :)
Apr 15 2011
What I meant was that code like this will throw if MyType isn't defined anywhere: int main(int x) { MyType var; } goldie.exception.UnexpectedTokenException src\goldie\exception.d(35): test.c(3:12): Unexpected Id: 'var' It looks like valid C /syntax/, except that MyType isn't defined. But this will work: struct MyType { int field; }; int main(int x) { struct MyType var; } So either Goldie or ParseAnything needs to have all types defined. Maybe this is obvious, but I wouldn't know since I've never used a parser before. :p Oddly enough, this one will throw: typedef struct { int field; } MyType; int main(int x) { MyType var; } goldie.exception.UnexpectedTokenException src\goldie\exception.d(35): test.c(7:12): Unexpected Id: 'var' This one will throw as well: struct SomeStruct { int field; }; typedef struct SomeStruct MyType; int main(int x) { MyType var; } goldie.exception.UnexpectedTokenException src\goldie\exception.d(35): test.c(13:12): Unexpected Id: 'myvar' Isn't typedef a part of ANSI C?
Apr 15 2011
Andrej Mitrovic Wrote:What I meant was that code like this will throw if MyType isn't defined anywhere: int main(int x) { MyType var; } goldie.exception.UnexpectedTokenException src\goldie\exception.d(35): test.c(3:12): Unexpected Id: 'var' It looks like valid C /syntax/, except that MyType isn't defined. But this will work: struct MyType { int field; }; int main(int x) { struct MyType var; } So either Goldie or ParseAnything needs to have all types defined. Maybe this is obvious, but I wouldn't know since I've never used a parser before. :p Oddly enough, this one will throw: typedef struct { int field; } MyType; int main(int x) { MyType var; } goldie.exception.UnexpectedTokenException src\goldie\exception.d(35): test.c(7:12): Unexpected Id: 'var' This one will throw as well: struct SomeStruct { int field; }; typedef struct SomeStruct MyType; int main(int x) { MyType var; } goldie.exception.UnexpectedTokenException src\goldie\exception.d(35): test.c(13:12): Unexpected Id: 'myvar' Isn't typedef a part of ANSI C?I'm not at my computer right now, so I can't check, but it sounds like the grammar follows the really old C-style of requiring structs to be declared with "struct StructName varName". Apperently it doesn't take into account the possibility of typedefs being used to eliminate that. When I get home, I'll check, I think it may be an easy change to the grammar.
Apr 15 2011
"Nick Sabalausky" <a a.a> wrote in message news:ioanmi$82c$1 digitalmars.com...Andrej Mitrovic Wrote:Yea, turns out that grammar just doesn't support using user-defined types without preceding them with "struct", "union", or "enum". You can see that here: <Var Decl> ::= <Mod> <Type> <Var> <Var List> ';' | <Type> <Var> <Var List> ';' | <Mod> <Var> <Var List> ';' <Mod> ::= extern | static | register | auto | volatile | const <Type> ::= <Base> <Pointers> <Base> ::= <Sign> <Scalar> ! Ie, the built-ins like char, signed int, etc... | struct Id | struct '{' <Struct Def> '}' | union Id | union '{' <Struct Def> '}' | enum Id So when you use "MyType" instead of "struct MyType": It sees "MyType", assumes it's a variable since it doesn't match any of the <Type> forms above, and then barfs on "var" because "variable1 variable2" isn't valid C code. Normally, you'd just add another form to <Base> (Ie, add a line after " | enum Id" that says " | Id "). Except, the problem is... C is notorious for types and variables being ambiguous with each other. So the distinction pretty much has to be done in the semantic phase (ie, outside of the formal grammar). But this grammar seems to be trying to make that distinction anyway. So trying to fix it by just simply adding a "<Base> ::= Id" leads to ambiguity problems with types versus variables/expressions. That's probably why they didn't enhance the grammar that far - their "separation of type and variable" approach doesn't really work for C. I'll have to think a bit on how best to adjust it. You can also check the GOLD mailing lists here to see if anyone has another C grammar: http://www.devincook.com/goldparser/contact.htmWhat I meant was that code like this will throw if MyType isn't defined anywhere: int main(int x) { MyType var; } goldie.exception.UnexpectedTokenException src\goldie\exception.d(35): test.c(3:12): Unexpected Id: 'var' It looks like valid C /syntax/, except that MyType isn't defined. But this will work: struct MyType { int field; }; int main(int x) { struct MyType var; } So either Goldie or ParseAnything needs to have all types defined. Maybe this is obvious, but I wouldn't know since I've never used a parser before. :p Oddly enough, this one will throw: typedef struct { int field; } MyType; int main(int x) { MyType var; } goldie.exception.UnexpectedTokenException src\goldie\exception.d(35): test.c(7:12): Unexpected Id: 'var' This one will throw as well: struct SomeStruct { int field; }; typedef struct SomeStruct MyType; int main(int x) { MyType var; } goldie.exception.UnexpectedTokenException src\goldie\exception.d(35): test.c(13:12): Unexpected Id: 'myvar' Isn't typedef a part of ANSI C?I'm not at my computer right now, so I can't check, but it sounds like the grammar follows the really old C-style of requiring structs to be declared with "struct StructName varName". Apperently it doesn't take into account the possibility of typedefs being used to eliminate that. When I get home, I'll check, I think it may be an easy change to the grammar.
Apr 16 2011
Nick Sabalausky Wrote:Yea, turns out that grammar just doesn't support using user-defined types without preceding them with "struct", "union", or "enum". You can see that here: <Var Decl> ::= <Mod> <Type> <Var> <Var List> ';' | <Type> <Var> <Var List> ';' | <Mod> <Var> <Var List> ';' <Mod> ::= extern | static | register | auto | volatile | const <Type> ::= <Base> <Pointers> <Base> ::= <Sign> <Scalar> ! Ie, the built-ins like char, signed int, etc... | struct Id | struct '{' <Struct Def> '}' | union Id | union '{' <Struct Def> '}' | enum Id So when you use "MyType" instead of "struct MyType": It sees "MyType", assumes it's a variable since it doesn't match any of the <Type> forms above, and then barfs on "var" because "variable1 variable2" isn't valid C code. Normally, you'd just add another form to <Base> (Ie, add a line after " | enum Id" that says " | Id "). Except, the problem is... C is notorious for types and variables being ambiguous with each other.As I understand, <Type> is a type, <Var> is a variable. There should be no problem here.
Apr 16 2011
"Kagamin" <spam here.lot> wrote in message news:iod552$rbe$1 digitalmars.com...Nick Sabalausky Wrote:First of all, the name <Var> up there is misleading. That only refers the the "name of the variable" in the variable's declaration. When actually *using* a variable, that's a <Value>, which is defined like this: <Value> ::= OctLiteral | HexLiteral | DecLiteral | StringLiteral | CharLiteral | FloatLiteral | Id '(' <Expr> ')' ! Function call | Id '(' ')' ! Function call | Id ! Use a variable | '(' <Expr> ')' So we have a situation like this: <Type> ::= <Base> <Base> ::= Id <Value> ::= Id So when the parser encounters an Id, how does it know whether to reduce it to a <Base> or a <Value>? Since they can both appear in the same place (Ex: Immediately after a left curly-brace, such as at the start of a function body), there's no way to tell. Worse, suppose it comes across this: x*y If x is a variable, then that's a multiplication. If x is a type then it's a pointer declaration. Is it supposed to be multiplication or a declaration? Could be either. They're both permitted in the same place.Yea, turns out that grammar just doesn't support using user-defined types without preceding them with "struct", "union", or "enum". You can see that here: <Var Decl> ::= <Mod> <Type> <Var> <Var List> ';' | <Type> <Var> <Var List> ';' | <Mod> <Var> <Var List> ';' <Mod> ::= extern | static | register | auto | volatile | const <Type> ::= <Base> <Pointers> <Base> ::= <Sign> <Scalar> ! Ie, the built-ins like char, signed int, etc... | struct Id | struct '{' <Struct Def> '}' | union Id | union '{' <Struct Def> '}' | enum Id So when you use "MyType" instead of "struct MyType": It sees "MyType", assumes it's a variable since it doesn't match any of the <Type> forms above, and then barfs on "var" because "variable1 variable2" isn't valid C code. Normally, you'd just add another form to <Base> (Ie, add a line after " | enum Id" that says " | Id "). Except, the problem is... C is notorious for types and variables being ambiguous with each other.As I understand, <Type> is a type, <Var> is a variable. There should be no problem here.
Apr 16 2011
"Nick Sabalausky" <a a.a> wrote in message news:iod6fn$tch$1 digitalmars.com..."Kagamin" <spam here.lot> wrote in message news:iod552$rbe$1 digitalmars.com...In other words, we basically have a form of this: <A> ::= <B> | <C> <B> ::= X <C> ::= X Can't be done. No way to tell if X is <B> or <C>.As I understand, <Type> is a type, <Var> is a variable. There should be no problem here.First of all, the name <Var> up there is misleading. That only refers the the "name of the variable" in the variable's declaration. When actually *using* a variable, that's a <Value>, which is defined like this: <Value> ::= OctLiteral | HexLiteral | DecLiteral | StringLiteral | CharLiteral | FloatLiteral | Id '(' <Expr> ')' ! Function call | Id '(' ')' ! Function call | Id ! Use a variable | '(' <Expr> ')' So we have a situation like this: <Type> ::= <Base> <Base> ::= Id <Value> ::= Id So when the parser encounters an Id, how does it know whether to reduce it to a <Base> or a <Value>? Since they can both appear in the same place (Ex: Immediately after a left curly-brace, such as at the start of a function body), there's no way to tell. Worse, suppose it comes across this: x*y If x is a variable, then that's a multiplication. If x is a type then it's a pointer declaration. Is it supposed to be multiplication or a declaration? Could be either. They're both permitted in the same place.
Apr 16 2011
Nick Sabalausky Wrote:In other words, we basically have a form of this: <A> ::= <B> | <C> <B> ::= X <C> ::= X Can't be done. No way to tell if X is <B> or <C>.A hairy grammar can be used here, anyway goldie's output needs postprocessing, right?
Apr 18 2011
"Nick Sabalausky" <a a.a> wrote in message news:iobh9o$1d04$1 digitalmars.com..."Nick Sabalausky" <a a.a> wrote in message news:ioanmi$82c$1 digitalmars.com...Unfortunately, I think this may require LALR(k). Goldie and GOLD are only LALR(1) right now. I had been under the impression that LALR(1) was sufficient because according to the oh-so-useful-in-the-real-world formal literature, any LR(k) can *technically* be converted into a *cough* "equivalent" LR(1). But not only is algorithm to do this hidden behind the academic ivory wall, but word on the street is that the resulting grammar is gigantic and bears little or no resemblance to the original structure (and is therefore essentially useless in the real world). Seems I'm gonna have to add some backtracking or stack-cloning to Goldie, probably along with some sort of cycle-detection. (I think I'm starting to understand why Walter said he doesn't like to bother with parser generators, unngh...)Andrej Mitrovic Wrote:Yea, turns out that grammar just doesn't support using user-defined types without preceding them with "struct", "union", or "enum". You can see that here: <Var Decl> ::= <Mod> <Type> <Var> <Var List> ';' | <Type> <Var> <Var List> ';' | <Mod> <Var> <Var List> ';' <Mod> ::= extern | static | register | auto | volatile | const <Type> ::= <Base> <Pointers> <Base> ::= <Sign> <Scalar> ! Ie, the built-ins like char, signed int, etc... | struct Id | struct '{' <Struct Def> '}' | union Id | union '{' <Struct Def> '}' | enum Id So when you use "MyType" instead of "struct MyType": It sees "MyType", assumes it's a variable since it doesn't match any of the <Type> forms above, and then barfs on "var" because "variable1 variable2" isn't valid C code. Normally, you'd just add another form to <Base> (Ie, add a line after " | enum Id" that says " | Id "). Except, the problem is... C is notorious for types and variables being ambiguous with each other. So the distinction pretty much has to be done in the semantic phase (ie, outside of the formal grammar). But this grammar seems to be trying to make that distinction anyway. So trying to fix it by just simply adding a "<Base> ::= Id" leads to ambiguity problems with types versus variables/expressions. That's probably why they didn't enhance the grammar that far - their "separation of type and variable" approach doesn't really work for C. I'll have to think a bit on how best to adjust it. You can also check the GOLD mailing lists here to see if anyone has another C grammar: http://www.devincook.com/goldparser/contact.htmWhat I meant was that code like this will throw if MyType isn't defined anywhere: int main(int x) { MyType var; } goldie.exception.UnexpectedTokenException src\goldie\exception.d(35): test.c(3:12): Unexpected Id: 'var' It looks like valid C /syntax/, except that MyType isn't defined. But this will work: struct MyType { int field; }; int main(int x) { struct MyType var; } So either Goldie or ParseAnything needs to have all types defined. Maybe this is obvious, but I wouldn't know since I've never used a parser before. :p Oddly enough, this one will throw: typedef struct { int field; } MyType; int main(int x) { MyType var; } goldie.exception.UnexpectedTokenException src\goldie\exception.d(35): test.c(7:12): Unexpected Id: 'var' This one will throw as well: struct SomeStruct { int field; }; typedef struct SomeStruct MyType; int main(int x) { MyType var; } goldie.exception.UnexpectedTokenException src\goldie\exception.d(35): test.c(13:12): Unexpected Id: 'myvar' Isn't typedef a part of ANSI C?I'm not at my computer right now, so I can't check, but it sounds like the grammar follows the really old C-style of requiring structs to be declared with "struct StructName varName". Apperently it doesn't take into account the possibility of typedefs being used to eliminate that. When I get home, I'll check, I think it may be an easy change to the grammar.
Apr 16 2011
just read you replay, very look forward to the Character Set Optimization. and thank you to done such a useful project . On 3/27/11, Long Chang <changlong jkys.info> wrote:I try use the gold from trunk, not the release version . It seems very slow for parse css . please see this http://www.dsource.org/projects/goldie/ticket/18 . Is all lalr slow like this , or it is a gold problem . On Sun, Mar 27, 2011 at 4:11 PM, Nick Sabalausky <a a.a> wrote:eGoldie is a series of flexible open-source parsing tools, including a D2 library called GoldieLib. It's compatible with GOLD Parser Builder and can be used together with it, but does not require it. In fact, Goldie can b=ngerused as a cross-platform, shell-scripting-compatible alternative to GOLD Parser Builder. =3D=3D Links: =3D=3D Main homepage and documentation: =A0 =A0http://www.semitwist.com/goldie/ Prepackaged downloads: =A0 =A0http://www.dsource.org/projects/goldie/browser/downloads The related GOLD Parser Builder: =A0 =A0http://www.devincook.com/goldparser/ =3D=3D New in v0.4: =3D=3D =A0 =A0- Switched from D1/Tango to D2/Phobos. =A0 =A0- New tool: GRMC: Grammar Compiler. Because of this, Goldie no lo=butrequires GOLD Parser Builder. =A0 =A0- Grammars can be compiled not only from GRMC: Grammar Compiler, =ealso through the D API, GoldieLib. =A0 =A0- No longer requires xfBuild or Rebuild. =A0 =A0- Executable filenames are now prefixed with 'goldie-' to minimiz=tmlchance of collisions on the PATH. =A0 =A0- Many misc changes/improvements to tools, API and documentation. =A0 =A0- Includes a lexing-only D2 grammar: http://www.dsource.org/projects/goldie/browser/trunk/lang/dlex.grm This D2 grammar does have a few small limitations ATM though, which I've already described here: http://www.mail-archive.com/digitalmars-d-learn puremagic.com/msg11491.h=tmlhttp://www.mail-archive.com/digitalmars-d-learn puremagic.com/msg11493.h=lity=3D=3D Some of Goldie's benefits: (most are thanks to Goldie's compatibi=rwith GOLD Parser Builder) =3D=3D =A0 =A0- Grammars are fully-reusable: No need to create a new grammar fo=r allevery use and every host language. Many grammars are already available. =A0 =A0- Grammar-agnostic engine: One lexer/parser engine can be used fo=v2.xgrammars. =A0 =A0- Engines for nearly any language or platform: A cross-platform D=rtengine is included via GoldieLib. Engines for many other platforms are also available. New engines are easy to write. =A0 =A0- Dynamic-Style: Dynamic-style lets you write programs that suppo=rauser-created grammars. =A0 =A0- Static-Style: Static-style provides compile-time checks and ext=filetype-safety. =A0 =A0- Lexing and parsing: Lexing and parsing are defined in the same =eand handled by one unified tool. =A0 =A0- Many tools available. Goldie is fully-usable and has been tested on both Windows and Linux (it should also work on OSX and any other platform supported by DMD, but has not been tested), although GoldieLib's API is still subject to change. Goldi=--=20 .is licensed under The zlib/libpng License.-- .
Mar 27 2011
"Long Chang" <changlong jkys.info> wrote in message news:mailman.2806.1301240916.4748.digitalmars-d-announce puremagic.com...just read you replay, very look forward to the Character Set Optimization. and thank you to done such a useful project .Thanks for your interest :)
Mar 27 2011