www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Current D grammar

reply Manfred Nowak <svv1999 hotmail.com> writes:
With my favorite LALR-CompilerCompiler I analyzed the current D grammar. 
After preliminary eliminating the RR-conflicts around 1800 states remained, 
from which around 100 states are still contaminated with SR-conflicts.

Two possibilities:
1) Those ambiguities are not in DMD but introduced by excerpting the 
grammar from DMD
2) Those ambiguities do exist in DMD.

Which one do you prefer? Is the grammar usable?

-manfred
Jun 14 2015
next sibling parent reply "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= writes:
On Monday, 15 June 2015 at 00:05:59 UTC, Manfred Nowak wrote:
 With my favorite LALR-CompilerCompiler I analyzed the current D 
 grammar. After preliminary eliminating the RR-conflicts around
Isn't the parser LL(*) or something? You could try ANTLR which generate LL(*) parsers.
Jun 14 2015
parent reply Manfred Nowak <svv1999 hotmail.com> writes:
"Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= wrote:

 try ANTLR which generate LL(*) parsers
Yes. Antlr generates an ALL(*)-parser with around 3000 states after eliminating some left-recursion from the grammar. But runtime is O(n^4) for this type of parsers. -manfred
Jun 15 2015
parent "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= writes:
On Monday, 15 June 2015 at 23:13:31 UTC, Manfred Nowak wrote:
 Yes. Antlr generates an ALL(*)-parser with around 3000 states 
 after eliminating some left-recursion from the grammar. But 
 runtime is O(n^4) for this type of parsers
Yikes, LL(*) should be O( n^2 ).
Jun 15 2015
prev sibling next sibling parent reply Iain Buclaw via Digitalmars-d <digitalmars-d puremagic.com> writes:
On 15 Jun 2015 02:10, "Manfred Nowak via Digitalmars-d" <
digitalmars-d puremagic.com> wrote:
 With my favorite LALR-CompilerCompiler I analyzed the current D grammar.
 After preliminary eliminating the RR-conflicts around 1800 states
remained,
 from which around 100 states are still contaminated with SR-conflicts.

 Two possibilities:
 1) Those ambiguities are not in DMD but introduced by excerpting the
 grammar from DMD
 2) Those ambiguities do exist in DMD.

 Which one do you prefer? Is the grammar usable?

 -manfred
The most complete grammar I'm aware of is: https://github.com/Hackerpilot/DGrammar I've done a basic LALR grammar (expressions only so far) with YACC, but it depends on a two tier lexer in order to resolve interpolating identifiers and dots before applying grammar rules. This I've found is the best way to get over most SR errors and allow (PostExpression).ident to be accepted, among others to be accepted correctly. https://github.com/ibuclaw/gdb/blob/dlang/gdb/d-exp.y Regards Iain
Jun 15 2015
parent Manfred Nowak <svv1999 hotmail.com> writes:
Iain Buclaw via Digitalmars-d wrote:

 https://github.com/Hackerpilot/DGrammar
If the grammar is truely generated automatically, then there are some quirks in it. For example is declarator defined as a list of attributes only. -manfred
Jun 16 2015
prev sibling parent Nick Sabalausky <SeeWebsiteToContactMe semitwist.com> writes:
On 06/14/2015 08:05 PM, Manfred Nowak wrote:
 With my favorite LALR-CompilerCompiler I analyzed the current D grammar.
 After preliminary eliminating the RR-conflicts around 1800 states remained,
 from which around 100 states are still contaminated with SR-conflicts.

 Two possibilities:
 1) Those ambiguities are not in DMD but introduced by excerpting the
 grammar from DMD
 2) Those ambiguities do exist in DMD.

 Which one do you prefer? Is the grammar usable?
Although I don't recall details, when I attempted it a few years back, I came to the conclusion that getting a correct grammar for D (and likely other non-trivial C-family languages) that was realistically useful is essentially impossible in LALR(1). I'm sure you'd be able to do it if you upgraded to GLF though (or maybe LALR(k) might work, though I'm not sure). The problem I found with LR is that totally separate branches of the grammar will easily conflict with each other (whereas in LL separate branches are completely independent). That leads to tree patterns that work fine in LL but make LR/LALR barf unless you do major contortions to get around the conflicts. GLR looked like it should be able to overcome those conflicts with ease (although I don't know how the efficiency would compare to LL). With LR and LALR, you can *theoretically* you can get around all those conflicts for the academic sake of "Does this input satisfy the grammar or not? Yes or no?". But again, it requires contorting the grammar. And unfortunately, the more you have to contort it, the less useful the parse tree becomes for real-world purposes beyond just "The input satisfies/doesn't satisfy the grammar".
Jun 16 2015