digitalmars.D - Current D grammar
- Manfred Nowak (9/9) Jun 14 2015 With my favorite LALR-CompilerCompiler I analyzed the current D grammar....
- "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= (3/5) Jun 14 2015 Isn't the parser LL(*) or something? You could try ANTLR which
- Manfred Nowak (5/6) Jun 15 2015 Yes. Antlr generates an ALL(*)-parser with around 3000 states after
- "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= (2/5) Jun 15 2015 Yikes, LL(*) should be O( n^2 ).
- Iain Buclaw via Digitalmars-d (13/22) Jun 15 2015 remained,
- Manfred Nowak (5/6) Jun 16 2015 If the grammar is truely generated automatically, then there are some
- Nick Sabalausky (20/28) Jun 16 2015 Although I don't recall details, when I attempted it a few years back, I...
With my favorite LALR-CompilerCompiler I analyzed the current D grammar. After preliminary eliminating the RR-conflicts around 1800 states remained, from which around 100 states are still contaminated with SR-conflicts. Two possibilities: 1) Those ambiguities are not in DMD but introduced by excerpting the grammar from DMD 2) Those ambiguities do exist in DMD. Which one do you prefer? Is the grammar usable? -manfred
Jun 14 2015
On Monday, 15 June 2015 at 00:05:59 UTC, Manfred Nowak wrote:With my favorite LALR-CompilerCompiler I analyzed the current D grammar. After preliminary eliminating the RR-conflicts aroundIsn't the parser LL(*) or something? You could try ANTLR which generate LL(*) parsers.
Jun 14 2015
"Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= wrote:try ANTLR which generate LL(*) parsersYes. Antlr generates an ALL(*)-parser with around 3000 states after eliminating some left-recursion from the grammar. But runtime is O(n^4) for this type of parsers. -manfred
Jun 15 2015
On Monday, 15 June 2015 at 23:13:31 UTC, Manfred Nowak wrote:Yes. Antlr generates an ALL(*)-parser with around 3000 states after eliminating some left-recursion from the grammar. But runtime is O(n^4) for this type of parsersYikes, LL(*) should be O( n^2 ).
Jun 15 2015
On 15 Jun 2015 02:10, "Manfred Nowak via Digitalmars-d" < digitalmars-d puremagic.com> wrote:With my favorite LALR-CompilerCompiler I analyzed the current D grammar. After preliminary eliminating the RR-conflicts around 1800 statesremained,from which around 100 states are still contaminated with SR-conflicts. Two possibilities: 1) Those ambiguities are not in DMD but introduced by excerpting the grammar from DMD 2) Those ambiguities do exist in DMD. Which one do you prefer? Is the grammar usable? -manfredThe most complete grammar I'm aware of is: https://github.com/Hackerpilot/DGrammar I've done a basic LALR grammar (expressions only so far) with YACC, but it depends on a two tier lexer in order to resolve interpolating identifiers and dots before applying grammar rules. This I've found is the best way to get over most SR errors and allow (PostExpression).ident to be accepted, among others to be accepted correctly. https://github.com/ibuclaw/gdb/blob/dlang/gdb/d-exp.y Regards Iain
Jun 15 2015
Iain Buclaw via Digitalmars-d wrote:https://github.com/Hackerpilot/DGrammarIf the grammar is truely generated automatically, then there are some quirks in it. For example is declarator defined as a list of attributes only. -manfred
Jun 16 2015
On 06/14/2015 08:05 PM, Manfred Nowak wrote:With my favorite LALR-CompilerCompiler I analyzed the current D grammar. After preliminary eliminating the RR-conflicts around 1800 states remained, from which around 100 states are still contaminated with SR-conflicts. Two possibilities: 1) Those ambiguities are not in DMD but introduced by excerpting the grammar from DMD 2) Those ambiguities do exist in DMD. Which one do you prefer? Is the grammar usable?Although I don't recall details, when I attempted it a few years back, I came to the conclusion that getting a correct grammar for D (and likely other non-trivial C-family languages) that was realistically useful is essentially impossible in LALR(1). I'm sure you'd be able to do it if you upgraded to GLF though (or maybe LALR(k) might work, though I'm not sure). The problem I found with LR is that totally separate branches of the grammar will easily conflict with each other (whereas in LL separate branches are completely independent). That leads to tree patterns that work fine in LL but make LR/LALR barf unless you do major contortions to get around the conflicts. GLR looked like it should be able to overcome those conflicts with ease (although I don't know how the efficiency would compare to LL). With LR and LALR, you can *theoretically* you can get around all those conflicts for the academic sake of "Does this input satisfy the grammar or not? Yes or no?". But again, it requires contorting the grammar. And unfortunately, the more you have to contort it, the less useful the parse tree becomes for real-world purposes beyond just "The input satisfies/doesn't satisfy the grammar".
Jun 16 2015