www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - D grammar overhaul

reply Trass3r <un known.com> writes:
Original discussion forked in the D grammar thread, so I'm opening a new one
for this specific issue.
The D grammar is in dire need of an overhaul.

I suggest we discuss changes here and then put everything into github to issue
a pull request once everything's done.
I already forked the d-programming-language.org repo to get started by fixing
some mistakes. Hereafter I list some issues that come to my mind.

Alongside missing things (like  attributes I think) and smaller mistakes parts
of it like the whole declarations subgrammar are overly complex
and allow strange constructs, partly cause C style arrays and function pointers
are still in there even though considered harmful and thus deprecated.
(see http://www.digitalmars.com/d/2.0/declaration.html)

Furthermore function declarations are obscurely woven into all those
productions.
So, correct me if I'm wrong but the current grammar even allows monsters like
auto ubyte[] ([] bar []) (T)(int i) const if(someCodition)  = 2 , foo = 5 ;


Also, like Rainer proposed, BasicType2 should be somehow pulled out of
Declarator to get rid of the
DeclaratorInitializer-DeclaratorIdentifierList-DeclaratorIdentifier construct
which CMIIW only exists to disallow multiple types: int a, *b;


 isn't AutoDeclaration already covered by Decl -> StorageClasses Decl and so on?
Ok it's just an misleading name. It covers not only declarations with 'auto' but all Declarations where no type but only a StorageClass is given: shared a = 5;
 Also there are 3 different instances of mixin: MixinExpression, MixinStatement
and MixinDeclaration. Even though they all come down to the same thing.
I think, the different mixin variants are ok, because you might want to generate different AST for these.
Well I can imagine that MixinStatement is necessary because if you have an ExpStatement with a MixinExpression the result is probably discarded (just like 'a' is discarded in a = 5;). But why do you need another MixinDeclaration?
 Modifiers like shared are repeated, e.g. in BasicType, Attribute,
TypeSpecialization. 
Yes the confusion of const/shared/immutable etc in Attribute,StorageClass,Types,etc causes a lot of headaches.
Mar 30 2011
next sibling parent KennyTM~ <kennytm gmail.com> writes:
On Mar 30, 11 20:18, Trass3r wrote:
 Alongside missing things (like  attributes I think)
Property: Identifier
Mar 30 2011
prev sibling next sibling parent reply Rainer Schuetze <r.sagitario gmx.de> writes:
Trass3r wrote:
 Original discussion forked in the D grammar thread, so I'm opening a new one
for this specific issue.
 The D grammar is in dire need of an overhaul.
 
 I suggest we discuss changes here and then put everything into github to issue
a pull request once everything's done.
 I already forked the d-programming-language.org repo to get started by fixing
some mistakes. Hereafter I list some issues that come to my mind.
I'm not yet friend with git, but this might be a chance to get used to it. How can I contribute patches? I guess this would be with pull requests, so I'll need a github repository, too? The discussion page for pull requests looks like a nice feature... I'm not sure though it is a good place to keep an overview of the grammar in a readable style. The grammar in the ddoc source is sometimes rather polluted with formatting information and links. As a starter, here are some syntactic problems in the ddoc sources that cause the generation of grammar.txt to fail: - in template.dd, $(GNAME Constraint) is not enclosed by $(GRAMMAR) - in attribute.dd, $(GRAMMAR $(B const)), $(GRAMMAR $(B override)), etc are used for formatting without having to do with the grammar. The boxes they generate don't make much sense anyway, so they can just be removed. - MulExpression is only in $(V1), so it does not show for D2 I've attached the patch for these.
 Also there are 3 different instances of mixin: MixinExpression, MixinStatement
and MixinDeclaration. Even though they all come down to the same thing.
I think, the different mixin variants are ok, because you might want to generate different AST for these.
Well I can imagine that MixinStatement is necessary because if you have an ExpStatement with a MixinExpression the result is probably discarded (just like 'a' is discarded in a = 5;). But why do you need another MixinDeclaration?
MixinDeclaration expands to declarations, while MixinStatement expands to statements, so the difference is not interesting during parsing, but might be important in the semantic pass (which restarts parsing for the string).
Mar 31 2011
next sibling parent reply Trass3r <un known.com> writes:
Rainer Schuetze Wrote:
 How can I contribute patches? I guess this would be with pull 
 requests, so I'll need a github repository, too? The discussion page for 
 pull requests looks like a nice feature...
Yep, you fork the original repo and apply your changes. Once you think you're somewhat done you can open a pull request. I think you can even comment on single lines in commits. The advantage of this approach is that your changes don't appear as a single huge diff but as atomic changesets (as long as the author groups changes that belong together in single commits)
 As a starter, here are some syntactic problems in the ddoc sources that 
 cause the generation of grammar.txt to fail:
 
 - in template.dd, $(GNAME Constraint) is not enclosed by $(GRAMMAR)
 - in attribute.dd, $(GRAMMAR $(B const)), $(GRAMMAR $(B override)), etc 
 are used for formatting without having to do with the grammar. The boxes 
 they generate don't make much sense anyway, so they can just be removed.
 - MulExpression is only in $(V1), so it does not show for D2
 
 I've attached the patch for these
Nice, I already incorporated some of your changes from the wiki. Will push soon so you can have a look what's already done.
Mar 31 2011
next sibling parent Jonathan M Davis <jmdavisProg gmx.com> writes:
On 2011-03-31 02:32, Trass3r wrote:
 Rainer Schuetze Wrote:
 How can I contribute patches? I guess this would be with pull
 requests, so I'll need a github repository, too? The discussion page for
 pull requests looks like a nice feature...
Yep, you fork the original repo and apply your changes. Once you think you're somewhat done you can open a pull request. I think you can even comment on single lines in commits. The advantage of this approach is that your changes don't appear as a single huge diff but as atomic changesets (as long as the author groups changes that belong together in single commits)
I would point out though that pull requests are done by branch, so if you have commits that you don't want to be grouped with others or put in the pull request, then they need to be on a separate branch. If you go and make several commits and then only want some of them pulled, that doesn't work so well. It's technically feasible via git, but it's more of a pain, and it's not how github itself manages the process. So, typically, you'd make a set of changes on a branch created for those changes - be they one commit or several. Then you do a pull request on that branch. Unrelated changes would be on other branches. - Jonathan M Davis
Mar 31 2011
prev sibling parent Brad Roberts <braddr puremagic.com> writes:
On 3/31/2011 10:34 AM, Jonathan M Davis wrote:
 On 2011-03-31 02:32, Trass3r wrote:
 Rainer Schuetze Wrote:
 How can I contribute patches? I guess this would be with pull
 requests, so I'll need a github repository, too? The discussion page for
 pull requests looks like a nice feature...
Yep, you fork the original repo and apply your changes. Once you think you're somewhat done you can open a pull request. I think you can even comment on single lines in commits. The advantage of this approach is that your changes don't appear as a single huge diff but as atomic changesets (as long as the author groups changes that belong together in single commits)
I would point out though that pull requests are done by branch, so if you have commits that you don't want to be grouped with others or put in the pull request, then they need to be on a separate branch. If you go and make several commits and then only want some of them pulled, that doesn't work so well. It's technically feasible via git, but it's more of a pain, and it's not how github itself manages the process. So, typically, you'd make a set of changes on a branch created for those changes - be they one commit or several. Then you do a pull request on that branch. Unrelated changes would be on other branches. - Jonathan M Davis
Um.. no. Github allows you to pick both a subrange of a branch. Take a look at the help docs: http://help.github.com/pull-requests/
Mar 31 2011
prev sibling parent Trass3r <un known.com> writes:
 As a starter, here are some syntactic problems in the ddoc sources that
 cause the generation of grammar.txt to fail:

 - in template.dd, $(GNAME Constraint) is not enclosed by $(GRAMMAR)
 - in attribute.dd, $(GRAMMAR $(B const)), $(GRAMMAR $(B override)), etc
 are used for formatting without having to do with the grammar. The boxes
 they generate don't make much sense anyway, so they can just be removed.
 - MulExpression is only in $(V1), so it does not show for D2

 I've attached the patch for these.
https://github.com/Trass3r/d-programming-language.org/commit/ae532130f0374079c93a2c8f0609f04705dd8f82
Apr 04 2011
prev sibling next sibling parent Jonathan M Davis <jmdavisProg gmx.com> writes:
On 2011-03-31 11:04, Brad Roberts wrote:
 On 3/31/2011 10:34 AM, Jonathan M Davis wrote:
 On 2011-03-31 02:32, Trass3r wrote:
 Rainer Schuetze Wrote:
 How can I contribute patches? I guess this would be with pull
 requests, so I'll need a github repository, too? The discussion page
 for pull requests looks like a nice feature...
Yep, you fork the original repo and apply your changes. Once you think you're somewhat done you can open a pull request. I think you can even comment on single lines in commits. The advantage of this approach is that your changes don't appear as a single huge diff but as atomic changesets (as long as the author groups changes that belong together in single commits)
I would point out though that pull requests are done by branch, so if you have commits that you don't want to be grouped with others or put in the pull request, then they need to be on a separate branch. If you go and make several commits and then only want some of them pulled, that doesn't work so well. It's technically feasible via git, but it's more of a pain, and it's not how github itself manages the process. So, typically, you'd make a set of changes on a branch created for those changes - be they one commit or several. Then you do a pull request on that branch. Unrelated changes would be on other branches. - Jonathan M Davis
Um.. no. Github allows you to pick both a subrange of a branch. Take a look at the help docs: http://help.github.com/pull-requests/
I stand corrected. From everything that I've seen (or remembered - I'm not sure if I'd read that page before or not), it always takes the whole branch, but apparently, you can tell it to do otherwise. That's certainly not the default though. Then again, I wouldn't generally be very comfortable with someone grabbing just portions of a branch like that. I'm too paranoid about likely breaks or merge conflicts, I guess. Still, good to know. - Jonathan M Davis
Mar 31 2011
prev sibling parent reply Rainer Schuetze <r.sagitario gmx.de> writes:
Before going into details with the grammar, I'd like to clarify one 
thing regarding the lexer, that also has some influence on the grammar:
Do we want to allow spaces and comments between these tokens?

- ! in
- ! is
- extern ( C ++ )
-   Property (safe, trusted, disable, etc)

here is some example code which also shows an ambiguity introduced by 
combining these from seperate tokens:

extern(C
//
++) void foo();

  /* comment inside
property */ safe int bar() { return 0; }

void main()
{
	int[int] a;
	struct S { int x; }
	S s;
	int b;

	if (1 ! /* still an infix operator? */ in a) {}
	if (null ! /* still an infix operator? */ is null) {}
	
//	if (s.x !in a) {} /* does not compile: 
http://d.puremagic.com/issues/show_bug.cgi?id=5785 */
	if (b !in a) {} // but this compiles!?
}



Trass3r wrote:
 Original discussion forked in the D grammar thread, so I'm opening a new one
for this specific issue.
 The D grammar is in dire need of an overhaul.
 
 I suggest we discuss changes here and then put everything into github to issue
a pull request once everything's done.
 I already forked the d-programming-language.org repo to get started by fixing
some mistakes. Hereafter I list some issues that come to my mind.
 
Apr 02 2011
parent reply Trass3r <un known.com> writes:
Am 02.04.2011, 10:11 Uhr, schrieb Rainer Schuetze <r.sagitario gmx.de>:
 Do we want to allow spaces and comments between these tokens?
 - extern ( C ++ )
Well we would have to introduce C++ as a keyword just for that purpose. Not worth it.
 -   Property (safe, trusted, disable, etc)
safe, trusted etc. aren't keywords. dmd handles it as (' ' identifier) and that's exactly what the grammar should be to support user-defined attributes in the future.
 extern(C
 //
 ++) void foo();

   /* comment inside
 property */ safe int bar() { return 0; }
Would someone really do bullshit like that?
Apr 07 2011
parent reply Rainer Schuetze <r.sagitario gmx.de> writes:
Trass3r wrote:
 Am 02.04.2011, 10:11 Uhr, schrieb Rainer Schuetze <r.sagitario gmx.de>:
 Do we want to allow spaces and comments between these tokens?
 - extern ( C ++ )
Well we would have to introduce C++ as a keyword just for that purpose. Not worth it.
It does not have to be a keyword, but the parser can just verify there are no spaces, if a lexer token keeps that information. But I agree, not worth changing anything.
 
 -   Property (safe, trusted, disable, etc)
safe, trusted etc. aren't keywords. dmd handles it as (' ' identifier) and that's exactly what the grammar should be to support user-defined attributes in the future.
Although it is not the current behaviour, I think it would be cleaner if the lexer would generate a token "Property" that always starts with ' ' and continues as an identifer. This would replace the current token " ", simply moving the rule "Property: ' ' Identifier" from the parser into the lexer. This still includes any user-defined attributes for future extensions. I hope, we agree on !in and !is, as the current lexing introduces the shown parsing problems. But !is needs to be added to the IsExpression rules then.
Apr 07 2011
parent reply Trass3r <un known.com> writes:
 I hope, we agree on !in and !is, as the current lexing introduces the  
 shown parsing problems. But !is needs to be added to the IsExpression  
 rules then.
Hmm what if you have something like Templ!is(T:int) Is http://d.puremagic.com/issues/show_bug.cgi?id=5785 fixable with grammar changes other that !is being one token? I haven't had a look yet how UFCS is expressed and how it could affect this.
Apr 07 2011
parent Rainer Schuetze <r.sagitario gmx.de> writes:
Trass3r wrote:
 I hope, we agree on !in and !is, as the current lexing introduces the 
 shown parsing problems. But !is needs to be added to the IsExpression 
 rules then.
Hmm what if you have something like Templ!is(T:int)
This is not allowed, template arguments without parenthesis are limited to a specific list of single tokens.
 
 Is http://d.puremagic.com/issues/show_bug.cgi?id=5785 fixable with 
 grammar changes other that !is being one token?
    if (s.x !in a) {}
    if (b !in a) {} // but this compiles!?
The second line works because it is special cased for PrimaryExpression in the dmd parser by checking whether '!' is followed by 'in' or 'is', while PostfixExpression only checks 'is'. I've added a patch to the bugreport.
 I haven't had a look yet how UFCS is expressed and how it could affect 
 this.
Apr 08 2011