digitalmars.D - Current D grammar

Manfred Nowak (9/9) Jun 14 2015 With my favorite LALR-CompilerCompiler I analyzed the current D grammar....

"Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= (3/5) Jun 14 2015 Isn't the parser LL(*) or something? You could try ANTLR which

Manfred Nowak (5/6) Jun 15 2015 Yes. Antlr generates an ALL(*)-parser with around 3000 states after

"Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= (2/5) Jun 15 2015 Yikes, LL(*) should be O( n^2 ).

Iain Buclaw via Digitalmars-d (13/22) Jun 15 2015 remained,

Manfred Nowak (5/6) Jun 16 2015 If the grammar is truely generated automatically, then there are some

Nick Sabalausky (20/28) Jun 16 2015 Although I don't recall details, when I attempted it a few years back, I...

Manfred Nowak <svv1999 hotmail.com> writes:

With my favorite LALR-CompilerCompiler I analyzed the current D grammar. 
After preliminary eliminating the RR-conflicts around 1800 states remained, 
from which around 100 states are still contaminated with SR-conflicts.

Two possibilities:
1) Those ambiguities are not in DMD but introduced by excerpting the 
grammar from DMD
2) Those ambiguities do exist in DMD.

Which one do you prefer? Is the grammar usable?

-manfred

Jun 14 2015

"Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= writes:

On Monday, 15 June 2015 at 00:05:59 UTC, Manfred Nowak wrote:
 With my favorite LALR-CompilerCompiler I analyzed the current D 
 grammar. After preliminary eliminating the RR-conflicts around

Isn't the parser LL(*) or something? You could try ANTLR which 
generate LL(*) parsers.

Jun 14 2015

Manfred Nowak <svv1999 hotmail.com> writes:

"Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= wrote:

 try ANTLR which generate LL(*) parsers

Yes. Antlr generates an ALL(*)-parser with around 3000 states after 
eliminating some left-recursion from the grammar. But runtime is O(n^4) for  
this type of parsers.

-manfred

Jun 15 2015

"Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= writes:

On Monday, 15 June 2015 at 23:13:31 UTC, Manfred Nowak wrote:
 Yes. Antlr generates an ALL(*)-parser with around 3000 states 
 after eliminating some left-recursion from the grammar. But 
 runtime is O(n^4) for this type of parsers

Yikes, LL(*) should be O( n^2 ).

Jun 15 2015

Iain Buclaw via Digitalmars-d <digitalmars-d puremagic.com> writes:

On 15 Jun 2015 02:10, "Manfred Nowak via Digitalmars-d" <
digitalmars-d puremagic.com> wrote:
 With my favorite LALR-CompilerCompiler I analyzed the current D grammar.
 After preliminary eliminating the RR-conflicts around 1800 states

remained,
 from which around 100 states are still contaminated with SR-conflicts.

 Two possibilities:
 1) Those ambiguities are not in DMD but introduced by excerpting the
 grammar from DMD
 2) Those ambiguities do exist in DMD.

 Which one do you prefer? Is the grammar usable?

 -manfred

The most complete grammar I'm aware of is:

https://github.com/Hackerpilot/DGrammar

I've done a basic LALR grammar (expressions only so far) with YACC, but it
depends on a two tier lexer in order to resolve interpolating identifiers
and dots before applying grammar rules.  This I've found is the best way to
get over most SR errors and allow (PostExpression).ident to be accepted,
among others to be accepted correctly.

https://github.com/ibuclaw/gdb/blob/dlang/gdb/d-exp.y

Regards
Iain

Jun 15 2015

Manfred Nowak <svv1999 hotmail.com> writes:

Iain Buclaw via Digitalmars-d wrote:

 https://github.com/Hackerpilot/DGrammar

If the grammar is truely generated automatically, then there are some 
quirks in it. For example is declarator defined as a list of attributes 
only.

-manfred

Jun 16 2015

Nick Sabalausky <SeeWebsiteToContactMe semitwist.com> writes:

On 06/14/2015 08:05 PM, Manfred Nowak wrote:
 With my favorite LALR-CompilerCompiler I analyzed the current D grammar.
 After preliminary eliminating the RR-conflicts around 1800 states remained,
 from which around 100 states are still contaminated with SR-conflicts.

 Two possibilities:
 1) Those ambiguities are not in DMD but introduced by excerpting the
 grammar from DMD
 2) Those ambiguities do exist in DMD.

 Which one do you prefer? Is the grammar usable?

Although I don't recall details, when I attempted it a few years back, I 
came to the conclusion that getting a correct grammar for D (and likely 
other non-trivial C-family languages) that was realistically useful is 
essentially impossible in LALR(1). I'm sure you'd be able to do it if 
you upgraded to GLF though (or maybe LALR(k) might work, though I'm not 
sure).

The problem I found with LR is that totally separate branches of the 
grammar will easily conflict with each other (whereas in LL separate 
branches are completely independent). That leads to tree patterns that 
work fine in LL but make LR/LALR barf unless you do major contortions to 
get around the conflicts.

GLR looked like it should be able to overcome those conflicts with ease 
(although I don't know how the efficiency would compare to LL).

With LR and LALR, you can *theoretically* you can get around all those 
conflicts for the academic sake of "Does this input satisfy the grammar 
or not? Yes or no?". But again, it requires contorting the grammar. And 
unfortunately, the more you have to contort it, the less useful the 
parse tree becomes for real-world purposes beyond just "The input 
satisfies/doesn't satisfy the grammar".

Jun 16 2015

D Programming

C/C++ Programming

Other

digitalmars.D - Current D grammar