www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Meaningful identifiers and other multi-token keywords

reply Quirin Schroll <qs.il.paperinik gmail.com> writes:
D’s has 4 places in the grammar where meaningful identifiers are 
used instead of keywords:
- Pragmas
- Traits
- Linkages
- Scope guards

For pragmas and traits, this is total non-issue as they have 
special and dedicated keywords. For linkages and scope guards, 
there will be rough edges if we make `(Type)` be a well-formed 
`BasicType`. The reason is that `extern(C)` could mean `extern` 
plus the basic type `(C)`, where `C` denotes e.g. a dummy class; 
or `scope (exit) x = 10;` with the intention not to assign `x`, 
but to declare `x` as a `scope` variable of type `exit`. In 
general, you could ask: Why would one write such code? and you’d 
be correct.

The issue is with the argument to `extern` and `linkage` being 
identifiers. For linkage, it’s implementation defined which ones 
are supported, and they’re not just identifiers (e.g. `C++` and 
`Objective-C`), however, with scope guards, there are only 
`exit`, `success`, and `failure`.

I want to suggest moving the parsing of scope guards and linkages 
to the lexer, i.e., if the lexer sees `scope`, `(`, any one of 
the identifiers `exit`, `success`, or `failure`, and `)`, that is 
a scope guard and is treated as a single token.

The same with `extern(C)` – it will never be seen as anything but 
a linkage. It’s a multi-token keyword.

Possibly, we can handle other cases alike, e.g. `static assert`, 
`static foreach`, and `auto ref`. By all accounts, their meaning 
isn’t derived from composing the semantics of the parts.

What do you think?
Sep 24
next sibling parent Dom DiSc <dominikus scherkl.de> writes:
On Tuesday, 24 September 2024 at 20:37:36 UTC, Quirin Schroll 
wrote:
 I want to suggest moving the parsing of scope guards and 
 linkages to the lexer, i.e., if the lexer sees `scope`, `(`, 
 any one of the identifiers `exit`, `success`, or `failure`, and 
 `)`, that is a scope guard and is treated as a single token.

 The same with `extern(C)` – it will never be seen as anything 
 but a linkage. It’s a multi-token keyword.

 Possibly, we can handle other cases alike, e.g. `static 
 assert`, `static foreach`, and `auto ref`. By all accounts, 
 their meaning isn’t derived from composing the semantics of the 
 parts.

 What do you think?
I think this is a good idea. They are multi-token keywords just to not occupy more words as keywords, but in fact could be treated as single entities.
Sep 24
prev sibling next sibling parent "Richard (Rikki) Andrew Cattermole" <richard cattermole.co.nz> writes:
On 25/09/2024 8:37 AM, Quirin Schroll wrote:
 The same with |extern(C)| – it will never be seen as anything but a 
 linkage. It’s a multi-token keyword.
I'm not sure this one is a good idea. Not all linkages can be done i.e. C++ has namespace. So it is moving one behavior that has no special casing, into another place that would require special casing and will slow things down. Overall I'm convinced that given how the lexer works, that this isn't a path we should be going down. Its done the way it is for a reason. I would expect that any changes down this path to slow down all identifiers for very little value.
Sep 24
prev sibling next sibling parent reply Tim <tim.dlang t-online.de> writes:
On Tuesday, 24 September 2024 at 20:37:36 UTC, Quirin Schroll 
wrote:
 I want to suggest moving the parsing of scope guards and 
 linkages to the lexer, i.e., if the lexer sees `scope`, `(`, 
 any one of the identifiers `exit`, `success`, or `failure`, and 
 `)`, that is a scope guard and is treated as a single token.

 The same with `extern(C)` – it will never be seen as anything 
 but a linkage. It’s a multi-token keyword.
I don't think, the lexer would be the right place, because the constructs are still multiple tokens. For example whitespace and comments are allowed in `extern ( C ++ /*comment*/ )`. Unknown languages in `extern(...)` attributes should also produce errors, so future compilers can add them without breaking code. Consider this example: ``` extern(X) x = 0; ```` Currently `X` is a normal identifier, but in the future it be could another language supported by the compiler. If `(X)` is interpreted as a type, then adding `extern(X)` to the compiler would be a breaking change. For forward compatibility it would be best if `extern(...)` and `scope(...)` are always parsed as whole attributes and not attributes with types in parens. Unknown languages or scope guard identifiers would then produce errors, so future compilers could add them without breaking code.
Sep 25
parent Quirin Schroll <qs.il.paperinik gmail.com> writes:
On Wednesday, 25 September 2024 at 15:50:20 UTC, Tim wrote:
 On Tuesday, 24 September 2024 at 20:37:36 UTC, Quirin Schroll 
 wrote:
 I want to suggest moving the parsing of scope guards and 
 linkages to the lexer, i.e., if the lexer sees `scope`, `(`, 
 any one of the identifiers `exit`, `success`, or `failure`, 
 and `)`, that is a scope guard and is treated as a single 
 token.

 The same with `extern(C)` – it will never be seen as anything 
 but a linkage. It’s a multi-token keyword.
I don't think, the lexer would be the right place, because the constructs are still multiple tokens. For example whitespace and comments are allowed in `extern ( C ++ /*comment*/ )`.
The whitespace is not an issue. The comments maybe are. But even if they were, one option would be to just ban comments in linkage attributes and scope guards and not deal with the problem. I mean, who would do that, except for a QA tester?
 Unknown languages in `extern(...)` attributes should also 
 produce errors, so future compilers can add them without 
 breaking code.
Officially, it’s implementation defined what’s supported beyond `D` and `C`, see [here](https://dlang.org/spec/attribute.html#LinkageAttribute). The fact that DMD supports `C++`, `Objective-C`, `System`, and `Windows` is already an extension. Considering C++ namespaces, the syntax is quite flexible. Essentially, any token soup with balanced parentheses is allowed. Maybe C++ was right, there it’s `extern "C"`.
 Consider this example:
 ```
 extern(X) x = 0;
 ````
 Currently `X` is a normal identifier, but in the future it be 
 could another language supported by the compiler. If `(X)` is 
 interpreted as a type, then adding `extern(X)` to the compiler 
 would be a breaking change. For forward compatibility it would 
 be best if `extern(...)` and `scope(...)` are always parsed as 
 whole attributes and not attributes with types in parens. 
 Unknown languages or scope guard identifiers would then produce 
 errors, so future compilers could add them without breaking 
 code.
With Primary Type Syntax, `extern (Type)` can happen by accident, yes. Then, `Type` could happen to be a valid linkage, but even in that case, there’s a high likelihood that there’s a parse error down the line. (I fact, it might be guaranteed, I couldn’t find a way how it’s not.) That is because linkage attributes are not storage classes. Unlike `static`, `ref`, etc., `extern(C)` cannot be used instead of `auto`. ```d // Current behavior: alias C = int; extern (C) x = 0; // Error: basic type expected extern (C) auto x = 0; // Good, and `C` can’t be the type of `x` static extern (C) x = 0; // Error: basic type expected extern (C) static x = 0; // Good, and `C` can’t be the type of `x`, even if it denotes a type alias Type = int; extern (Type) x = 0; // Error: Type is not a linkage extern (Type) auto x = 0; // Error: Type is not a linkage static extern (Type) x = 0; // Error: Type is not a linkage extern (Type) static x = 0; // Error: Type is not a linkage // My implementation: alias C = int; extern (C) x = 0; // Error: basic type expected extern (C) auto x = 0; // Good, and `C` can’t be the type of `x` static extern (C) x = 0; // Error: basic type expected extern (C) static x = 0; // Good, but `C` can’t be the type of `x`, even if it denotes a type alias Type = int; extern (Type) x = 0; // Error: `Type` is not a linkage extern (Type) auto x = 0; // Error: `Type` is not a linkage static extern (Type) x = 0; // Error: `Type` is not a linkage extern (Type) static x = 0; // Error: `Type` is not a linkage ``` From what I’ve understood in [the attribute spec](https://dlang.org/spec/attribute.html), `extern` marks a symbol a declaration whereas without, it would be a definition. It comes with or implies `export`, or at least implies `static`. So, if needed, one can just put that (or any nothingburger) between `extern` and `(Type)` and be good. ```d // My implementation extern export (Type) x; // ok extern static (Type) y; // ok extern 0 (Type) z; // ok ```
Sep 26
prev sibling next sibling parent ryuukk_ <ryuukk.dev gmail.com> writes:
On Tuesday, 24 September 2024 at 20:37:36 UTC, Quirin Schroll 
wrote:
 D’s has 4 places in the grammar where meaningful identifiers 
 are used instead of keywords:
 - Pragmas
 - Traits
 - Linkages
 - Scope guards

 For pragmas and traits, this is total non-issue as they have 
 special and dedicated keywords. For linkages and scope guards, 
 there will be rough edges if we make `(Type)` be a well-formed 
 `BasicType`. The reason is that `extern(C)` could mean `extern` 
 plus the basic type `(C)`, where `C` denotes e.g. a dummy 
 class; or `scope (exit) x = 10;` with the intention not to 
 assign `x`, but to declare `x` as a `scope` variable of type 
 `exit`. In general, you could ask: Why would one write such 
 code? and you’d be correct.

 The issue is with the argument to `extern` and `linkage` being 
 identifiers. For linkage, it’s implementation defined which 
 ones are supported, and they’re not just identifiers (e.g. 
 `C++` and `Objective-C`), however, with scope guards, there are 
 only `exit`, `success`, and `failure`.

 I want to suggest moving the parsing of scope guards and 
 linkages to the lexer, i.e., if the lexer sees `scope`, `(`, 
 any one of the identifiers `exit`, `success`, or `failure`, and 
 `)`, that is a scope guard and is treated as a single token.

 The same with `extern(C)` – it will never be seen as anything 
 but a linkage. It’s a multi-token keyword.

 Possibly, we can handle other cases alike, e.g. `static 
 assert`, `static foreach`, and `auto ref`. By all accounts, 
 their meaning isn’t derived from composing the semantics of the 
 parts.

 What do you think?
I love scope guards, i use them all the time, however, they are both painful to type and makes code ugly to read Perhaps `scope(exit)` and `scope(failure)` should be renamed, `defer` and `errDefer` Solves your problem, and mine
Sep 25
prev sibling next sibling parent Walter Bright <newshound2 digitalmars.com> writes:
On 9/24/2024 1:37 PM, Quirin Schroll wrote:
 What do you think?
It isn't clear what problem is being solved by this.
Sep 27
prev sibling parent Imperatorn <johan_forsberg_86 hotmail.com> writes:
On Tuesday, 24 September 2024 at 20:37:36 UTC, Quirin Schroll 
wrote:
 D’s has 4 places in the grammar where meaningful identifiers 
 are used instead of keywords:
 - Pragmas
 - Traits
 - Linkages
 - Scope guards

 [...]
Just a question, what would that mean for backwards compatibility and potential loss of flexibility?
Sep 27