digitalmars.D.learn - Attributes (lexical)
- rumbu (27/27) Nov 25 2021 Just playing around with attributes.
- Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= (3/4) Nov 25 2021 Yes. The lexer just eats whitespace and the parser accepts way
- Elronnd (7/8) Nov 25 2021 @ (12) does exactly what I would expect. @nogc I always assumed
- Dennis (2/4) Nov 25 2021 Where does it say that?
- Rumbu (14/18) Nov 25 2021 Well:
- Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= (4/6) Nov 25 2021 I think it is easier to just look at the lexer in the dmd source.
- rumbu (13/19) Nov 25 2021 I try to base my reasoning on specification, dmd is not always a
- Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= (5/9) Nov 25 2021 Alright. I haven't looked at it after the ```importC``` feature
- Dennis (69/86) Nov 25 2021 What it's failing to mention is how in the lexical grammar rules,
- Dennis (6/7) Nov 25 2021 Filed as:
- zjh (2/5) Nov 25 2021 I hate `#`.
Just playing around with attributes. This is valid D code: ```d nogc: //yes, this is nogc in fact, even some lines are between /* i can put some comments */ /** even some documentation */ // single line comments also (12) // yes, comments and newlines are allowed between attribute and declaration int x; // (12) is attached to declaration ``` Is that ok or it's a lexer bug? Also, this works also for #line, even if the specification tells us that all tokens must be on the same line ```d //this works line /* this too */ 12 //this is #line 12 ```
Nov 25 2021
On Thursday, 25 November 2021 at 08:06:27 UTC, rumbu wrote:Is that ok or it's a lexer bug?Yes. The lexer just eats whitespace and the parser accepts way too much.
Nov 25 2021
On Thursday, 25 November 2021 at 08:06:27 UTC, rumbu wrote:Is that ok or it's a lexer bug?(12) does exactly what I would expect. nogc I always assumed was a single token, but the spec says otherwise. I suppose that makes sense. #line is dicier as it is not part of the grammar proper; however the spec describes it as a 'special token sequence', and comments are not tokens, so I think the current behaviour is correct.
Nov 25 2021
On Thursday, 25 November 2021 at 08:06:27 UTC, rumbu wrote:Also, this works also for #line, even if the specification tells us that all tokens must be on the same lineWhere does it say that?
Nov 25 2021
On Thursday, 25 November 2021 at 10:10:25 UTC, Dennis wrote:On Thursday, 25 November 2021 at 08:06:27 UTC, rumbu wrote:Well: ``` #line IntegerLiteral Filespec? EndOfLine ``` Having EndOfLine at the end means for me that there are no other EOLs between, otherwise this syntax should pass but it's not (DMD last): ```d #line 12 "source.d" ``` I am not asking this questions out of thin air, I am trying to write a conforming lexer and this is one of the ambiguities.Also, this works also for #line, even if the specification tells us that all tokens must be on the same lineWhere does it say that?
Nov 25 2021
On Thursday, 25 November 2021 at 10:41:05 UTC, Rumbu wrote:I am not asking this questions out of thin air, I am trying to write a conforming lexer and this is one of the ambiguities.I think it is easier to just look at the lexer in the dmd source. The D language does not really have a proper spec, it is more like an effort to document the implementation.
Nov 25 2021
On Thursday, 25 November 2021 at 11:25:49 UTC, Ola Fosheim Grøstad wrote:On Thursday, 25 November 2021 at 10:41:05 UTC, Rumbu wrote:I try to base my reasoning on specification, dmd is not always a good source of information, the lexer is polluted by old features or right now by the ImportC feature, trying to lex D an C in the same time. DMD skips the new line if the file was not specified, that's why the "filename" is unexpected on a new line: https://github.com/dlang/dmd/blob/d374003a572fe0c64da4aa4dcc55d894c648514b/src/dmd/lexer.d#L2838 libdparse completely ignores the contents after #line skipping everything until EOL, even a EOF/NUL marker which should end the lexing: https://github.com/dlang-community/libdparse/blob/7112880dae3f25553d96dae53a445c16261de7f9/src/dparse/lexer.d#L1100I am not asking this questions out of thin air, I am trying to write a conforming lexer and this is one of the ambiguities.I think it is easier to just look at the lexer in the dmd source. The D language does not really have a proper spec, it is more like an effort to document the implementation.
Nov 25 2021
On Thursday, 25 November 2021 at 12:16:50 UTC, rumbu wrote:I try to base my reasoning on specification, dmd is not always a good source of information, the lexer is polluted by old features or right now by the ImportC feature, trying to lex D an C in the same time.Alright. I haven't looked at it after the ```importC``` feature was started on. (The lexer code takes a bit of browsing to get used to, but it isn't all that challenging once you are into it.)
Nov 25 2021
On Thursday, 25 November 2021 at 10:41:05 UTC, Rumbu wrote:Well: ``` #line IntegerLiteral Filespec? EndOfLine ``` Having EndOfLine at the end means for me that there are no other EOLs between, otherwise this syntax should pass but it's not (DMD last): ```d #line 12 "source.d" ```The lexical grammar section starts with:The source text is decoded from its source representation into Unicode Characters. The Characters are further divided into: WhiteSpace, EndOfLine, Comments, SpecialTokenSequences, and Tokens, with the source terminated by an EndOfFile.What it's failing to mention is how in the lexical grammar rules, spaces denote 'immediate concatenation' of the characters/rules before and after it, e.g.: ``` DecimalDigits: DecimalDigit DecimalDigit DecimalDigits ``` `3 1 4` is not a single `IntegerLiteral`, it needs to be `314`. Now in the parsing grammar, it should mention that spaces denote immediate concatenation of *Tokens*, with arbitrary *Comments* and *WhiteSpace* inbetween. So the rule: ``` AtAttribute: nogc ``` Means: an token, followed by arbitrary comments and whitespace, followed by an identifier token that equals "nogc". That explains your first example. Regarding this lexical rule: ``` #line IntegerLiteral Filespec? EndOfLine ``` This is wrong already from a lexical standpoint, it would suggest a SpecialTokenSequence looks like this: ``` #line10"file" ``` *WhiteSpace* and *Comment*s, looks for an identifier token ("line"), and then it goes into a custom loop that allows separation by *WhiteSpace* but not *Comment*, and also the first '\n' will be assumed to be the final *EndOfLine*, which is why this fails: ``` #line 12 "source.d" ``` It thinks it's done after "12". In conclusion the specification should: - define the notation used in lexical / parsing grammar blocks - clearly distinguish lexical / parsing blocks - fix up the `SpecialTokenSequence` definition (and maybe change dmd as well) By the way, the parsing grammar defines: ``` LinkageType: C C++ D Windows System Objective-C ``` C++ and Objective-C cannot be single tokens currently, so they are actually 2/3, which is why these are allowed: ```D extern(C ++) void f() {} extern(Objective - C) void g() {} ``` This should also be fixed in the spec.I am not asking this questions out of thin air, I am trying to write a conforming lexer and this is one of the ambiguities.That's cool! Are you writing an editor plugin?
Nov 25 2021
On Thursday, 25 November 2021 at 12:09:55 UTC, Dennis wrote:This should also be fixed in the spec.Filed as: Issue 22543 - [spec] grammar blocks use unspecified notation: https://issues.dlang.org/show_bug.cgi?id=22543 Issue 22544 - [spec] C++ and Objective-C are not single tokens https://issues.dlang.org/show_bug.cgi?id=22544
Nov 25 2021
On Thursday, 25 November 2021 at 08:06:27 UTC, rumbu wrote://this works line
Nov 25 2021