digitalmars.D - D grammar overhaul

Trass3r (14/20) Mar 30 2011 Original discussion forked in the D grammar thread, so I'm opening a new...

KennyTM~ (3/4) Mar 30 2011 Property:
Rainer Schuetze (20/30) Mar 31 2011 I'm not yet friend with git, but this might be a chance to get used to

Trass3r (5/18) Mar 31 2011 Yep, you fork the original repo and apply your changes. Once you think y...

Jonathan M Davis (11/23) Mar 31 2011 I would point out though that pull requests are done by branch, so if yo...
Brad Roberts (3/28) Mar 31 2011 Um.. no. Github allows you to pick both a subrange of a branch. Take a...

Trass3r (1/9) Apr 04 2011 https://github.com/Trass3r/d-programming-language.org/commit/ae532130f03...

Jonathan M Davis (8/39) Mar 31 2011 I stand corrected. From everything that I've seen (or remembered - I'm n...
Rainer Schuetze (28/34) Apr 02 2011 Before going into details with the grammar, I'd like to clarify one

Trass3r (7/15) Apr 07 2011 Well we would have to introduce C++ as a keyword just for that purpose.

Rainer Schuetze (13/24) Apr 07 2011 It does not have to be a keyword, but the parser can just verify there

Trass3r (5/8) Apr 07 2011 Hmm what if you have something like Templ!is(T:int)

Rainer Schuetze (7/19) Apr 08 2011 This is not allowed, template arguments without parenthesis are limited

Trass3r <un known.com> writes:

Original discussion forked in the D grammar thread, so I'm opening a new one
for this specific issue.
The D grammar is in dire need of an overhaul.

I suggest we discuss changes here and then put everything into github to issue
a pull request once everything's done.
I already forked the d-programming-language.org repo to get started by fixing
some mistakes. Hereafter I list some issues that come to my mind.

Alongside missing things (like  attributes I think) and smaller mistakes parts
of it like the whole declarations subgrammar are overly complex
and allow strange constructs, partly cause C style arrays and function pointers
are still in there even though considered harmful and thus deprecated.
(see http://www.digitalmars.com/d/2.0/declaration.html)

Furthermore function declarations are obscurely woven into all those
productions.
So, correct me if I'm wrong but the current grammar even allows monsters like
auto ubyte[] ([] bar []) (T)(int i) const if(someCodition)  = 2 , foo = 5 ;


Also, like Rainer proposed, BasicType2 should be somehow pulled out of
Declarator to get rid of the
DeclaratorInitializer-DeclaratorIdentifierList-DeclaratorIdentifier construct
which CMIIW only exists to disallow multiple types: int a, *b;


 isn't AutoDeclaration already covered by Decl -> StorageClasses Decl and so on?


Ok it's just an misleading name. It covers not only declarations with 'auto'
but all Declarations where no type but only a StorageClass is given: shared a =
5;


 Also there are 3 different instances of mixin: MixinExpression, MixinStatement
and MixinDeclaration. Even though they all come down to the same thing.

 I think, the different mixin variants are ok, because you might want to 
 generate different AST for these.

Well I can imagine that MixinStatement is necessary because if you have an
ExpStatement with a MixinExpression the result is probably discarded (just like
'a' is discarded in a = 5;).
But why do you need another MixinDeclaration?


 Modifiers like shared are repeated, e.g. in BasicType, Attribute,
TypeSpecialization. 

 Yes the confusion of const/shared/immutable etc in
Attribute,StorageClass,Types,etc causes a lot of headaches.

Mar 30 2011

KennyTM~ <kennytm gmail.com> writes:

On Mar 30, 11 20:18, Trass3r wrote:
 Alongside missing things (like  attributes I think)

Property:
	  Identifier

Mar 30 2011

Rainer Schuetze <r.sagitario gmx.de> writes:

Trass3r wrote:
 Original discussion forked in the D grammar thread, so I'm opening a new one
for this specific issue.
 The D grammar is in dire need of an overhaul.
 
 I suggest we discuss changes here and then put everything into github to issue
a pull request once everything's done.
 I already forked the d-programming-language.org repo to get started by fixing
some mistakes. Hereafter I list some issues that come to my mind.

I'm not yet friend with git, but this might be a chance to get used to 
it. How can I contribute patches? I guess this would be with pull 
requests, so I'll need a github repository, too? The discussion page for 
pull requests looks like a nice feature...

I'm not sure though it is a good place to keep an overview of the 
grammar in a readable style. The grammar in the ddoc source is sometimes 
rather polluted with formatting information and links.

As a starter, here are some syntactic problems in the ddoc sources that 
cause the generation of grammar.txt to fail:

- in template.dd, $(GNAME Constraint) is not enclosed by $(GRAMMAR)
- in attribute.dd, $(GRAMMAR $(B const)), $(GRAMMAR $(B override)), etc 
are used for formatting without having to do with the grammar. The boxes 
they generate don't make much sense anyway, so they can just be removed.
- MulExpression is only in $(V1), so it does not show for D2

I've attached the patch for these.

 Also there are 3 different instances of mixin: MixinExpression, MixinStatement
and MixinDeclaration. Even though they all come down to the same thing.

 I think, the different mixin variants are ok, because you might want to 
 generate different AST for these.

 Well I can imagine that MixinStatement is necessary because if you have an
ExpStatement with a MixinExpression the result is probably discarded (just like
'a' is discarded in a = 5;).
 But why do you need another MixinDeclaration?

MixinDeclaration expands to declarations, while MixinStatement expands 
to statements, so the difference is not interesting during parsing, but 
might be important in the semantic pass (which restarts parsing for the 
string).

Mar 31 2011

Trass3r <un known.com> writes:

Rainer Schuetze Wrote:
 How can I contribute patches? I guess this would be with pull 
 requests, so I'll need a github repository, too? The discussion page for 
 pull requests looks like a nice feature...

Yep, you fork the original repo and apply your changes. Once you think you're
somewhat done you can open a pull request.
I think you can even comment on single lines in commits.

The advantage of this approach is that your changes don't appear as a single
huge diff but as atomic changesets (as long as the author groups changes that
belong together in single commits)

 As a starter, here are some syntactic problems in the ddoc sources that 
 cause the generation of grammar.txt to fail:
 
 - in template.dd, $(GNAME Constraint) is not enclosed by $(GRAMMAR)
 - in attribute.dd, $(GRAMMAR $(B const)), $(GRAMMAR $(B override)), etc 
 are used for formatting without having to do with the grammar. The boxes 
 they generate don't make much sense anyway, so they can just be removed.
 - MulExpression is only in $(V1), so it does not show for D2
 
 I've attached the patch for these

Nice, I already incorporated some of your changes from the wiki. Will push soon
so you can have a look what's already done.

Mar 31 2011

Jonathan M Davis <jmdavisProg gmx.com> writes:

On 2011-03-31 02:32, Trass3r wrote:
 Rainer Schuetze Wrote:
 How can I contribute patches? I guess this would be with pull
 requests, so I'll need a github repository, too? The discussion page for
 pull requests looks like a nice feature...

 
 Yep, you fork the original repo and apply your changes. Once you think
 you're somewhat done you can open a pull request. I think you can even
 comment on single lines in commits.
 
 The advantage of this approach is that your changes don't appear as a
 single huge diff but as atomic changesets (as long as the author groups
 changes that belong together in single commits)

I would point out though that pull requests are done by branch, so if you have 
commits that you don't want to be grouped with others or put in the pull 
request, then they need to be on a separate branch. If you go and make several 
commits and then only want some of them pulled, that doesn't work so well. 
It's technically feasible via git, but it's more of a pain, and it's not how 
github itself manages the process. So, typically, you'd make a set of changes 
on a branch created for those changes - be they one commit or several. Then 
you do a pull request on that branch. Unrelated changes would be on other 
branches.

- Jonathan M Davis

Mar 31 2011

Brad Roberts <braddr puremagic.com> writes:

On 3/31/2011 10:34 AM, Jonathan M Davis wrote:
 On 2011-03-31 02:32, Trass3r wrote:
 Rainer Schuetze Wrote:
 How can I contribute patches? I guess this would be with pull
 requests, so I'll need a github repository, too? The discussion page for
 pull requests looks like a nice feature...

 Yep, you fork the original repo and apply your changes. Once you think
 you're somewhat done you can open a pull request. I think you can even
 comment on single lines in commits.

 The advantage of this approach is that your changes don't appear as a
 single huge diff but as atomic changesets (as long as the author groups
 changes that belong together in single commits)

 
 I would point out though that pull requests are done by branch, so if you have 
 commits that you don't want to be grouped with others or put in the pull 
 request, then they need to be on a separate branch. If you go and make several 
 commits and then only want some of them pulled, that doesn't work so well. 
 It's technically feasible via git, but it's more of a pain, and it's not how 
 github itself manages the process. So, typically, you'd make a set of changes 
 on a branch created for those changes - be they one commit or several. Then 
 you do a pull request on that branch. Unrelated changes would be on other 
 branches.
 
 - Jonathan M Davis

Um.. no.  Github allows you to pick both a subrange of a branch.  Take a look
at the help docs:

   http://help.github.com/pull-requests/

Mar 31 2011

Trass3r <un known.com> writes:

 As a starter, here are some syntactic problems in the ddoc sources that
 cause the generation of grammar.txt to fail:

 - in template.dd, $(GNAME Constraint) is not enclosed by $(GRAMMAR)
 - in attribute.dd, $(GRAMMAR $(B const)), $(GRAMMAR $(B override)), etc
 are used for formatting without having to do with the grammar. The boxes
 they generate don't make much sense anyway, so they can just be removed.
 - MulExpression is only in $(V1), so it does not show for D2

 I've attached the patch for these.

https://github.com/Trass3r/d-programming-language.org/commit/ae532130f0374079c93a2c8f0609f04705dd8f82

Apr 04 2011

Jonathan M Davis <jmdavisProg gmx.com> writes:

On 2011-03-31 11:04, Brad Roberts wrote:
 On 3/31/2011 10:34 AM, Jonathan M Davis wrote:
 On 2011-03-31 02:32, Trass3r wrote:
 Rainer Schuetze Wrote:
 How can I contribute patches? I guess this would be with pull
 requests, so I'll need a github repository, too? The discussion page
 for pull requests looks like a nice feature...

 
 Yep, you fork the original repo and apply your changes. Once you think
 you're somewhat done you can open a pull request. I think you can even
 comment on single lines in commits.
 
 The advantage of this approach is that your changes don't appear as a
 single huge diff but as atomic changesets (as long as the author groups
 changes that belong together in single commits)

 
 I would point out though that pull requests are done by branch, so if you
 have commits that you don't want to be grouped with others or put in the
 pull request, then they need to be on a separate branch. If you go and
 make several commits and then only want some of them pulled, that
 doesn't work so well. It's technically feasible via git, but it's more
 of a pain, and it's not how github itself manages the process. So,
 typically, you'd make a set of changes on a branch created for those
 changes - be they one commit or several. Then you do a pull request on
 that branch. Unrelated changes would be on other branches.
 
 - Jonathan M Davis

 
 Um.. no.  Github allows you to pick both a subrange of a branch.  Take a
 look at the help docs:
 
    http://help.github.com/pull-requests/

I stand corrected. From everything that I've seen (or remembered - I'm not 
sure if I'd read that page before or not), it always takes the whole branch, 
but apparently, you can tell it to do otherwise. That's certainly not the 
default though. Then again, I wouldn't generally be very comfortable with 
someone grabbing just portions of a branch like that. I'm too paranoid about 
likely breaks or merge conflicts, I guess. Still, good to know.

- Jonathan M Davis

Mar 31 2011

Rainer Schuetze <r.sagitario gmx.de> writes:

Before going into details with the grammar, I'd like to clarify one 
thing regarding the lexer, that also has some influence on the grammar:
Do we want to allow spaces and comments between these tokens?

- ! in
- ! is
- extern ( C ++ )
-   Property (safe, trusted, disable, etc)

here is some example code which also shows an ambiguity introduced by 
combining these from seperate tokens:

extern(C
//
++) void foo();

  /* comment inside
property */ safe int bar() { return 0; }

void main()
{
	int[int] a;
	struct S { int x; }
	S s;
	int b;

	if (1 ! /* still an infix operator? */ in a) {}
	if (null ! /* still an infix operator? */ is null) {}
	
//	if (s.x !in a) {} /* does not compile: 
http://d.puremagic.com/issues/show_bug.cgi?id=5785 */
	if (b !in a) {} // but this compiles!?
}



Trass3r wrote:
 Original discussion forked in the D grammar thread, so I'm opening a new one
for this specific issue.
 The D grammar is in dire need of an overhaul.
 
 I suggest we discuss changes here and then put everything into github to issue
a pull request once everything's done.
 I already forked the d-programming-language.org repo to get started by fixing
some mistakes. Hereafter I list some issues that come to my mind.

Apr 02 2011

Trass3r <un known.com> writes:

Am 02.04.2011, 10:11 Uhr, schrieb Rainer Schuetze <r.sagitario gmx.de>:
 Do we want to allow spaces and comments between these tokens?
 - extern ( C ++ )

Well we would have to introduce C++ as a keyword just for that purpose.
Not worth it.

 -   Property (safe, trusted, disable, etc)

safe, trusted etc. aren't keywords.
dmd handles it as (' ' identifier) and that's exactly what the grammar  
should be to support user-defined attributes in the future.

 extern(C
 //
 ++) void foo();

   /* comment inside
 property */ safe int bar() { return 0; }

Would someone really do bullshit like that?

Apr 07 2011

Rainer Schuetze <r.sagitario gmx.de> writes:

Trass3r wrote:
 Am 02.04.2011, 10:11 Uhr, schrieb Rainer Schuetze <r.sagitario gmx.de>:
 Do we want to allow spaces and comments between these tokens?
 - extern ( C ++ )

 Well we would have to introduce C++ as a keyword just for that purpose.
 Not worth it.

It does not have to be a keyword, but the parser can just verify there 
are no spaces, if a lexer token keeps that information. But I agree, not 
worth changing anything.

 
 -   Property (safe, trusted, disable, etc)

 safe, trusted etc. aren't keywords.
 dmd handles it as (' ' identifier) and that's exactly what the grammar 
 should be to support user-defined attributes in the future.
 

Although it is not the current behaviour, I think it would be cleaner if 
the lexer would generate a token "Property" that always starts with ' ' 
and continues as an identifer. This would replace the current token " ", 
simply moving the rule "Property: ' ' Identifier" from the parser into 
the lexer.

This still includes any user-defined attributes for future extensions.

I hope, we agree on !in and !is, as the current lexing introduces the 
shown parsing problems. But !is needs to be added to the IsExpression 
rules then.

Apr 07 2011

Trass3r <un known.com> writes:

 I hope, we agree on !in and !is, as the current lexing introduces the  
 shown parsing problems. But !is needs to be added to the IsExpression  
 rules then.

Hmm what if you have something like Templ!is(T:int)

Is http://d.puremagic.com/issues/show_bug.cgi?id=5785 fixable with grammar  
changes other that !is being one token?
I haven't had a look yet how UFCS is expressed and how it could affect  
this.

Apr 07 2011

Rainer Schuetze <r.sagitario gmx.de> writes:

Trass3r wrote:
 I hope, we agree on !in and !is, as the current lexing introduces the 
 shown parsing problems. But !is needs to be added to the IsExpression 
 rules then.

 
 Hmm what if you have something like Templ!is(T:int)

This is not allowed, template arguments without parenthesis are limited 
to a specific list of single tokens.

 
 Is http://d.puremagic.com/issues/show_bug.cgi?id=5785 fixable with 
 grammar changes other that !is being one token?

    if (s.x !in a) {}
    if (b !in a) {} // but this compiles!?


The second line works because it is special cased for PrimaryExpression 
in the dmd parser by checking whether '!' is followed by 'in' or 'is', 
while PostfixExpression only checks 'is'. I've added a patch to the 
bugreport.

 I haven't had a look yet how UFCS is expressed and how it could affect 
 this.

Apr 08 2011

D Programming

C/C++ Programming

Other

digitalmars.D - D grammar overhaul