www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - DMD as a library - recap, next steps

reply RazvanN <razvan.nitu1305 gmail.com> writes:
A few years ago, I worked on a project to refactor the dmd 
codebase so that it becomes easier to be used as a library. This 
has the advantage that tools that use dmd-as-a-lib will rely on 
the latest working compiler. From that perspective, I made 
several PRs:

1. Template the parser to remove reliance on modules that 
implement semantic analysis: 
https://github.com/dlang/dmd/pull/6625

2. Create ASTBase, an AST family that contains the minimum 
information to separate the parser from the rest of the compiler: 
https://github.com/dlang/dmd/pull/6836

3. A series of PRs to pull out all the semantic methods from AST 
nodes into visitors or free functions so that AST nodes in the 
compiler will replace the ASTBase family (which is essentially 
duplicated code):
     https://github.com/dlang/dmd/pull/7031
     https://github.com/dlang/dmd/pull/7048
     https://github.com/dlang/dmd/pull/7049
     https://github.com/dlang/dmd/pull/7114
     https://github.com/dlang/dmd/pull/7119
     https://github.com/dlang/dmd/pull/7122

4. I created visitors for semantic time analysis: 
https://github.com/dlang/dmd/pull/7411

At that point I've hit some bugs that prevented me from moving 
forward. I started fixing them and after a while I moved to 
another project which left the dmd as a library project in a half 
baked state.

What remains to be done is to:

- Further strip the AST nodes of functions that require semantic 
analysis so that we can remove the code duplication in ASTBase
- Refactor dmd to offer a decent interface. Sometimes people 
argue against this point by saying that some changes affect the 
compilation time of the compiler. Honestly, it takes 5 seconds to 
compile dmd, even if we double that time I say it is worth it if 
we gain a decent interface for dmd-as-a-lib. On this point, me 
and Edi Staniloiu have been working with a bachelor student to 
see what interface is required to be able to use dmd-as-a-lib in 
tools in the ecosystem (like DCD).
- Add some visitors that make it easy for 3rd party tools to use 
compiler features.

In the mean time there were some PRs that regressed the state of 
dmd as a lib, PRs such as this one: 
https://github.com/dlang/dmd/pull/9010 . That PR makes it 
impossible for someone to override the type semantic. This is 
moving things in the exact opposite direction of every PR that 
was showcased on this post and is a showstopper for this: 
https://github.com/dlang/dmd/pull/11265 moving forward in a 
consistent way.

Me and Edi are in the position where we can use bachelor students 
to do the heavy-lifting on helping this project to cross the 
finish line, however, it is critical that the Dlang Foundation 
leadership has a clear direction/vision on this.

So, how do we move forward?

Cheers,
RazvanN
Jun 15 2020
next sibling parent evilrat <evilrat666 gmail.com> writes:
On Tuesday, 16 June 2020 at 04:13:10 UTC, RazvanN wrote:
 A few years ago, I worked on a project to refactor the dmd 
 codebase so that it becomes easier to be used as a library. 
 This has the advantage that tools that use dmd-as-a-lib will 
 rely on the latest working compiler.
Nice progress so far, but I would also like to note that there was some annoying change some time last winter that puts whole compiler configuration in a private functions. I know this is probably better addressed in dub instead, but it is just way less maintained than dmd. Simply put, this dmd-as-a-library can be incredibly useful in doing custom code transformation step by serving as dmd proxy, however dub probing is too strict and relies on CTFE introspection results in std output, and that changes I mentioned currently forces one to copy-paste ~500 lines of code for option parsing and configuration, so it can be recognized by dub as actual compiler. (last time I checked it in March)
 On this point, me and Edi Staniloiu have been working with a 
 bachelor student to see what interface is required to be able 
 to use dmd-as-a-lib in tools in the ecosystem (like DCD).
 - Add some visitors that make it easy for 3rd party tools to 
 use compiler features.
This would be awesome, it can even end the "IDE support sucks" complains. If dmd will be able to do recompile in memory on code update to provide compilation database updates in under 500ms it will be usable enough to use in LSP's and other productivity tooling while actually supporting every language feature including template instance body inspection and UFCS.
Jun 15 2020
prev sibling next sibling parent reply Stefan Koch <uplink.coder googlemail.com> writes:
On Tuesday, 16 June 2020 at 04:13:10 UTC, RazvanN wrote:

 - Further strip the AST nodes of functions that require 
 semantic analysis so that we can remove the code duplication in 
 ASTBase
The AST without SemA is useless.
 - Refactor dmd to offer a decent interface. Sometimes people 
 argue against this point by saying that some changes affect the 
 compilation time of the compiler. Honestly, it takes 5 seconds 
 to compile dmd, even if we double that time I say it is worth 
 it if we gain a decent interface for dmd-as-a-lib.
No if we double that time, it's a massive change to the rapid development we can have right now. Try compiling DMD with LDC and you'll see what I mean
 In the mean time there were some PRs that regressed the state 
 of dmd as a lib, PRs such as this one: 
 https://github.com/dlang/dmd/pull/9010 . That PR makes it 
 impossible for someone to override the type semantic.
Why would you ever need or want to OVERRIDE semantic. If you override semantics than by definition you are no longer, in the same language space.
 Me and Edi are in the position where we can use bachelor 
 students to do the heavy-lifting on helping this project to 
 cross the finish line, however, it is critical that the Dlang 
 Foundation leadership has a clear direction/vision on this.

 So, how do we move forward?
First establish a usecase. I would say that the ability to custom add static-analysis passes should be at the forefront, since that is what I would want a compiler as a library to do. For Example, I want to be able to ask. Forall functions in module a, give me the ones which call a function called malloc, either directly or transitively, as far as you can see by the source code I gave you. Forall functions which only exist as declarations (you don't have the body), create a list and cross reference it with the call-graph of the selection we've got before. For that to work you need to be able to run time compiler until just before code-generation and you need to be able to walk that type/identifer-resolved tree in a useful manner. Some kind of C plugin api would be preferred.
Jun 16 2020
next sibling parent reply Petar Kirov [ZombineDev] <petar.p.kirov gmail.com> writes:
On Tuesday, 16 June 2020 at 08:16:25 UTC, Stefan Koch wrote:
 On Tuesday, 16 June 2020 at 04:13:10 UTC, RazvanN wrote:

 - Further strip the AST nodes of functions that require 
 semantic analysis so that we can remove the code duplication 
 in ASTBase
The AST without SemA is useless.
It is useful for serialization and source generation purposes at least. In other language communities, where they don't have the metaprogramming power of D, they do extensive source code generation, and most of the time you don't need much semantic for that (sometimes you just need a way to string interpolation on symbol names). Also perhaps one of the goal is to completely replace the parser with something more fault-tolerant. For example see: https://unallocated.com/blog/incremental-packrat-parsing-the-secret-to-fast-language-servers/ https://github.com/tree-sitter/tree-sitter https://github.com/tree-sitter/tree-sitter/blob/master/docs/index.md ^^^ I suggest you watch the talks linked from the last page.
 - Refactor dmd to offer a decent interface. Sometimes people 
 argue against this point by saying that some changes affect 
 the compilation time of the compiler. Honestly, it takes 5 
 seconds to compile dmd, even if we double that time I say it 
 is worth it if we gain a decent interface for dmd-as-a-lib.
No if we double that time, it's a massive change to the rapid development we can have right now. Try compiling DMD with LDC and you'll see what I mean
I agree. Though if a refactoring of this sort increases the compile-time by 2x we have serious problems elsewhere. I don't expect something like this to require more 1.15x in the worst case. And even then we should be able to find many ways to further decrease the compile-time to e.g. 0.85x.
 In the mean time there were some PRs that regressed the state 
 of dmd as a lib, PRs such as this one: 
 https://github.com/dlang/dmd/pull/9010 . That PR makes it 
 impossible for someone to override the type semantic.
Why would you ever need or want to OVERRIDE semantic. If you override semantics than by definition you are no longer, in the same language space.
I suggest you try using a language with an excellent You'd be amazed how well it works. It is able to make sense of all kinds of broken code (e.g. giving you auto-completion for a function with a missing closing curly brace. Most of these things are completely impossible to do with the current rigid nature of the dmd frontend. Of course Razvan, Edi and Cristian are in better position to answer, but I think that the main idea is that they don't want to change the language, but instead they want to be able to plug code in more parts of the compilation pipeline so the LSP can be notified when e.g. the compiler is visiting an overload set. For example, the compiler may stop looking when it finds the best overload match, while for the LSP you want to display overloads, as the user may have made a typo and so on. (Please don't read too much into this example, I may have made a mistake, but the general idea is that they need to be able to extract more info from the frontend.)
 Me and Edi are in the position where we can use bachelor 
 students to do the heavy-lifting on helping this project to 
 cross the finish line, however, it is critical that the Dlang 
 Foundation leadership has a clear direction/vision on this.

 So, how do we move forward?
First establish a usecase. I would say that the ability to custom add static-analysis passes should be at the forefront, since that is what I would want a compiler as a library to do.
The use case implementing all of this API: https://microsoft.github.io/language-server-protocol/specifications/specification-current/ Specifically, take a look at the "Language Features" section, e.g.: https://microsoft.github.io/language-server-protocol/specifications/specification-current/#textDocument_completion
 For Example, I want to be able to ask.

 Forall functions in module a, give me the ones which call a 
 function called malloc,
 either directly or transitively, as far as you can see by the 
 source code I gave you.
 Forall functions which only exist as declarations (you don't 
 have the body), create a list and cross reference it with the 
 call-graph of the selection we've got before.

 For that to work you need to be able to run time compiler until 
 just before code-generation and you need to be able to walk 
 that type/identifer-resolved tree in a useful manner.

 Some kind of C plugin api would be preferred.
The set of C developers that want to write a D langauge server is much smaller than the set of D developers that want to do the same :D But I agree on the general point that a well-defined and versioned API is much needed, just like it sucks when changes frontend break LDC and GDC.
Jun 16 2020
next sibling parent Petar Kirov [ZombineDev] <petar.p.kirov gmail.com> writes:
On Tuesday, 16 June 2020 at 09:00:55 UTC, Petar Kirov 
[ZombineDev] wrote:
 On Tuesday, 16 June 2020 at 08:16:25 UTC, Stefan Koch wrote:
 On Tuesday, 16 June 2020 at 04:13:10 UTC, RazvanN wrote:

 - Further strip the AST nodes of functions that require 
 semantic analysis so that we can remove the code duplication 
 in ASTBase
The AST without SemA is useless.
BTW, I really don't get this obsession with bad OOP design in DMD. The AST classes should be just pure data. SemA should functions that operate on this data. I can't find any good reason why one would put the SemA *implementation* inside the AST classes. Just imagine if the constructor of the FunctionDeclaration directly outputted x86 assembly :D So this is why I think all logic should be moved from the AST classes to other functions/classes/modules.
Jun 16 2020
prev sibling next sibling parent reply RazvanN <razvan.nitu1305 gmail.com> writes:
On Tuesday, 16 June 2020 at 09:00:55 UTC, Petar Kirov 
[ZombineDev] wrote:
 On Tuesday, 16 June 2020 at 08:16:25 UTC, Stefan Koch wrote:
 [...]
 It is useful for serialization and source generation purposes 
 at least. In other language communities, where they don't have 
 the metaprogramming power of D, they do extensive source code 
 generation, and most of the time you don't need much semantic 
 for that (sometimes you just need a way to string interpolation 
 on symbol names).
Indeed!
 Also perhaps one of the goal is to completely replace the 
 parser with something more fault-tolerant. For example see:

 https://unallocated.com/blog/incremental-packrat-parsing-the-secret-to-fast-language-servers/

 https://github.com/tree-sitter/tree-sitter
 https://github.com/tree-sitter/tree-sitter/blob/master/docs/index.md

 ^^^
 I suggest you watch the talks linked from the last page.
That is exactly what I had in mind.
 [...]
I agree. Though if a refactoring of this sort increases the compile-time by 2x we have serious problems elsewhere. I don't expect something like this to require more 1.15x in the worst case. And even then we should be able to find many ways to further decrease the compile-time to e.g. 0.85x.
You are correct. I was exaggerating for the sake of the argument. What I meant was: we are extremely fast, but we are lacking a compiler interface. Small performance regressions are acceptable if they offer a guaranteed benefit with regards to defining a good interface.
 I suggest you try using a language with an excellent 

 TypeScript. You'd be amazed how well it works. It is able to 
 make sense of all kinds of broken code (e.g. giving you 
 auto-completion for a function with a missing closing curly 
 brace. Most of these things are completely impossible to do 
 with the current rigid nature of the dmd frontend.

 Of course Razvan, Edi and Cristian are in better position to 
 answer, but I think that the main idea is that they don't want 
 to change the language, but instead they want to be able to 
 plug code in more parts of the compilation pipeline so the LSP 
 can be notified when e.g. the compiler is visiting an overload 
 set. For example, the compiler may stop looking when it finds 
 the best overload match, while for the LSP you want to display 
 overloads, as the user may have made a typo and so on. (Please 
 don't read too much into this example, I may have made a 
 mistake, but the general idea is that they need to be able to 
 extract more info from the frontend.)
You are entirely right. We have replaced all uses of libdparse and other tools that mimic semantic analysis in DCD with dmd as a lib and it works, but it requires that we override current semantic analysis that is done for CallExps to be able to cope with semantic failures on incomplete code.
 [...]
The use case implementing all of this API: https://microsoft.github.io/language-server-protocol/specifications/specification-current/ Specifically, take a look at the "Language Features" section, e.g.: https://microsoft.github.io/language-server-protocol/specifications/specification-current/#textDocument_completion
 [...]
The set of C developers that want to write a D langauge server is much smaller than the set of D developers that want to do the same :D But I agree on the general point that a well-defined and versioned API is much needed, just like it sucks when changes frontend break LDC and GDC.
Thanks for your reply. I feel that we are on the same page on this.
Jun 16 2020
parent Petar Kirov [ZombineDev] <petar.p.kirov gmail.com> writes:
On Tuesday, 16 June 2020 at 09:15:24 UTC, RazvanN wrote:
 Thanks for your reply. I feel that we are on the same page on 
 this.
My pleasure, please keep up the good work on this amazing project!
Jun 16 2020
prev sibling parent Jacob Carlborg <doob me.com> writes:
On Tuesday, 16 June 2020 at 09:00:55 UTC, Petar Kirov 
[ZombineDev] wrote:

 Also perhaps one of the goal is to completely replace the 
 parser with something more fault-tolerant. For example see:

 https://unallocated.com/blog/incremental-packrat-parsing-the-secret-to-fast-language-servers/

 https://github.com/tree-sitter/tree-sitter
 https://github.com/tree-sitter/tree-sitter/blob/master/docs/index.md

 ^^^
 I suggest you watch the talks linked from the last page.
I agree. The Eclipse Java compiler has quite a few "modes" in which you can run it. Only run the parser or run various levels of semantic analysis. It allows you to compile and run code that does not compile. For example, the signature of a functions compiles but not the body. The compiler can just replace the body with throwing an exception. If that functions is not called at runtime, it's perfectly fine. -- /Jacob Carlborg
Jun 16 2020
prev sibling next sibling parent reply RazvanN <razvan.nitu1305 gmail.com> writes:
On Tuesday, 16 June 2020 at 08:16:25 UTC, Stefan Koch wrote:
 On Tuesday, 16 June 2020 at 04:13:10 UTC, RazvanN wrote:

 - Further strip the AST nodes of functions that require 
 semantic analysis so that we can remove the code duplication 
 in ASTBase
The AST without SemA is useless.
It is not useless, the fact that libdparse exists and it is used as a standalone library is proof of that.
 - Refactor dmd to offer a decent interface. Sometimes people 
 argue against this point by saying that some changes affect 
 the compilation time of the compiler. Honestly, it takes 5 
 seconds to compile dmd, even if we double that time I say it 
 is worth it if we gain a decent interface for dmd-as-a-lib.
No if we double that time, it's a massive change to the rapid development we can have right now. Try compiling DMD with LDC and you'll see what I mean
The point here was that minor performance regressions that come out from refactorings should not be an obstacle if it offers a clear benefit from a dmd-as-a-lib standpoint.
 In the mean time there were some PRs that regressed the state 
 of dmd as a lib, PRs such as this one: 
 https://github.com/dlang/dmd/pull/9010 . That PR makes it 
 impossible for someone to override the type semantic.
Why would you ever need or want to OVERRIDE semantic. If you override semantics than by definition you are no longer, in the same language space.
 Me and Edi are in the position where we can use bachelor 
 students to do the heavy-lifting on helping this project to 
 cross the finish line, however, it is critical that the Dlang 
 Foundation leadership has a clear direction/vision on this.

 So, how do we move forward?
First establish a usecase. I would say that the ability to custom add static-analysis passes should be at the forefront, since that is what I would want a compiler as a library to do. For Example, I want to be able to ask. Forall functions in module a, give me the ones which call a function called malloc, either directly or transitively, as far as you can see by the source code I gave you. Forall functions which only exist as declarations (you don't have the body), create a list and cross reference it with the call-graph of the selection we've got before. For that to work you need to be able to run time compiler until just before code-generation and you need to be able to walk that type/identifer-resolved tree in a useful manner.
Analyzing the AST is one scenario, but there are other situations: 1. You want to extend the language with some feature 2. Semantic analysis mutates the AST in a way that makes it impossible for you to reason about what was there in the first place. One example here is an auto-complete tool that needs to be able to analyze incomplete code; if you do not override specific semantic methods, by the time you analyze the AST you will have error nodes that prevent you from doing any work. 3. You want to drop certain semantic passes because your tool does not necessitate them. Ideally we would offer maximum flexibility with dmd-as-a-lib. Currently, you are forced to run the full semantic analysis pass and hope for the best.
 Some kind of C plugin api would be preferred.
I don't understand what you are reffering to.
Jun 16 2020
parent reply Jacob Carlborg <doob me.com> writes:
On Tuesday, 16 June 2020 at 09:07:19 UTC, RazvanN wrote:

 2. Semantic analysis mutates the AST in a way that makes it 
 impossible for you to reason about what was there in the first 
 place.
Ideally the compiler should be modified to preserve all information through all phases of the compilation. In my tool I had to make quite a bit of extra effort to preserve the information I needed. Basically walking the AST twice, once before running the semantic analyzer and once after. -- /Jacob Carlborg
Jun 16 2020
parent reply RazvanN <razvan.nitu1305 gmail.com> writes:
On Tuesday, 16 June 2020 at 11:34:43 UTC, Jacob Carlborg wrote:
 On Tuesday, 16 June 2020 at 09:07:19 UTC, RazvanN wrote:

 2. Semantic analysis mutates the AST in a way that makes it 
 impossible for you to reason about what was there in the first 
 place.
Ideally the compiler should be modified to preserve all information through all phases of the compilation. In my tool I had to make quite a bit of extra effort to preserve the information I needed. Basically walking the AST twice, once before running the semantic analyzer and once after. -- /Jacob Carlborg
Would it have been easier if you had the ability override certain portions of the semantic analysis? What we are trying to push forward now is the ability to extend the semantic visitor and override/extend functionality as you wish, however, since some nodes have a lot of code that does semantic on them (CallExp ~1000 lines of code) you would have to copy paste a lot of code and modify only what interests you. The advantage is that you perform semantic only once.
Jun 16 2020
parent Jacob Carlborg <doob me.com> writes:
On 2020-06-17 06:58, RazvanN wrote:

 Would it have been easier if you had the ability override certain 
 portions of the semantic analysis? 
Yes, definitely.
 What we are trying to push forward 
 now is the ability to extend the semantic visitor and override/extend 
 functionality as you wish, however, since some nodes have a lot of code 
 that does semantic on them (CallExp ~1000 lines of code) you would have 
 to copy paste a lot of code and modify only what interests you.
 The advantage is that you perform semantic only once.
If it's possible to design the interface so it's possible to add a customization points both before and after the original semantic implementation and give access it the original implementation it would be a good start. That would suffice for my needs in the current tool. Inheritance is a good example of this: class Foo { void foo() {} } class Bar : Foo { override foo() { // new code super.foo(); // call original implementation // new code } } The API doesn't need to be inheritance but it's a good example that shows that is possible to add new code both before and after the original implementation. And, you don't need to invoke the original implementation at all if you don't want to. -- /Jacob Carlborg
Jun 17 2020
prev sibling parent Jacob Carlborg <doob me.com> writes:
On Tuesday, 16 June 2020 at 08:16:25 UTC, Stefan Koch wrote:

 Why would you ever need or want to OVERRIDE semantic.
 If you override semantics than by definition you are no longer,
 in the same language space.
It's useful to be able to do. It's not up the compiler developers to come up with every single use case. And just because they cannot come up with a use case doesn't mean there aren't any good ones. If a language or a library could only be used for what the creator could think of it would probably not be very useful at all. I have a tool that has made some minor changes to the semantic phase to allow to infer attributes for all functions. Then it outputs all the attributes that can be attached to all functions of a given file. -- /Jacob Carlborg
Jun 16 2020
prev sibling next sibling parent reply WebFreak001 <d.forum webfreak.org> writes:
On Tuesday, 16 June 2020 at 04:13:10 UTC, RazvanN wrote:
 A few years ago, I worked on a project to refactor the dmd 
 codebase so that it becomes easier to be used as a library. 
 This has the advantage that tools that use dmd-as-a-lib will 
 rely on the latest working compiler. From that perspective, I 
 made several PRs:

 [...]

 Me and Edi are in the position where we can use bachelor 
 students to do the heavy-lifting on helping this project to 
 cross the finish line, however, it is critical that the Dlang 
 Foundation leadership has a clear direction/vision on this.

 So, how do we move forward?

 Cheers,
 RazvanN
very cool, thanks for all the work! I think however increasing the compilation time significantly is a major no-go, at least when it affects all code, expect of course if it allows us to do faster things like incremental compilation. For architecture ideas you might want to check out how Microsoft implemented their Roslyn compiler platform: https://github.com/dotnet/roslyn/wiki/Roslyn-Overview Just AST and visitors is not the most useful on its own, we have libdparse for this already which works fine too. Much more interesting is the semantic analysis. If there is one thing I would want exposed by dmd for anything, being completion, dynamic linting, navigation, etc., I would really really want a symbol API. Much like dsymbol one incremental database of all defined symbols (modules, types, aliases, parameters, template parameters, variables, etc.) with references, definitions, types (of variables and parameters), names and all traits information. This database would contain all symbols in the entire compilation unit, be aware of scopes at any given point and be able to incrementally update by adding/removing or changing files. The semantic analysis needs to be incremental here too though, so symbols would need some kind of dependency graph for things using mixin or templates. Also it would be difficult to extract information from scopes like `version (Foo)` where version is not Foo. But just this symbols database and some APIs to query it per location, per file or per symbol name would be enough to implement nearly all features a user would expect from tooling and more. Otherwise raw token access and AST visitors is all you really need to implement the rest like formatting, highlighting, static linting and refactorings. It's important that you can somehow recover the whitespaces and comments from tokens for refactoring and formatting though! For other use-cases, like a REPL, exposing APIs to the executable generator would also be cool. So if we have a symbols API I'm happy and I think that will be the goal of any DCD replacement program too :p Keep up the good work on this!
Jun 16 2020
parent RazvanN <razvan.nitu1305 gmail.com> writes:
On Tuesday, 16 June 2020 at 10:48:13 UTC, WebFreak001 wrote:

 very cool, thanks for all the work! I think however increasing 
 the compilation time significantly is a major no-go, at least 
 when it affects all code, expect of course if it allows us to 
 do faster things like incremental compilation.
Incremental compilation is out of discussion when referring to dmd as a lib. For that we would need to start from scratch an implementation that takes care of all the various cases.
 For architecture ideas you might want to check out how 
 Microsoft implemented their Roslyn compiler platform: 
 https://github.com/dotnet/roslyn/wiki/Roslyn-Overview

 Just AST and visitors is not the most useful on its own, we 
 have libdparse for this already which works fine too. Much more 
 interesting is the semantic analysis.
But the whole point of dmd as a lib is to offer the ability to use semantic analysis by using semantic visitors. For example this PR https://github.com/dlang/dmd/pull/11265 enables the ability to inherit semantic visitors that are used in the compiler and override or extend the functionality.
 If there is one thing I would want exposed by dmd for anything, 
 being completion, dynamic linting, navigation, etc., I would 
 really really want a symbol API. Much like dsymbol one 
 incremental database of all defined symbols (modules, types, 
 aliases, parameters, template parameters, variables, etc.) with 
 references, definitions, types (of variables and parameters), 
 names and all traits information. This database would contain 
 all symbols in the entire compilation unit, be aware of scopes 
 at any given point and be able to incrementally update by 
 adding/removing or changing files.
One step in that direction: https://github.com/dlang/dmd/pull/11092 With that, you can provide a function that pulls out all the symbols in a particular scope. It is not incremental, though.
 The semantic analysis needs to be incremental here too though, 
 so symbols would need some kind of dependency graph for things 
 using mixin or templates. Also it would be difficult to extract 
 information from scopes like `version (Foo)` where version is 
 not Foo.

 But just this symbols database and some APIs to query it per 
 location, per file or per symbol name would be enough to 
 implement nearly all features a user would expect from tooling 
 and more.

 Otherwise raw token access and AST visitors is all you really 
 need to implement the rest like formatting, highlighting, 
 static linting and refactorings. It's important that you can 
 somehow recover the whitespaces and comments from tokens for 
 refactoring and formatting though!

 For other use-cases, like a REPL, exposing APIs to the 
 executable generator would also be cool.

 So if we have a symbols API I'm happy and I think that will be 
 the goal of any DCD replacement program too :p
We are working on this and will soon publish that work.
 Keep up the good work on this!
Jun 16 2020
prev sibling parent reply Jacob Carlborg <doob me.com> writes:
On Tuesday, 16 June 2020 at 04:13:10 UTC, RazvanN wrote:
 A few years ago, I worked on a project to refactor the dmd 
 codebase so that it becomes easier to be used as a library. 
 This has the advantage that tools that use dmd-as-a-lib will 
 rely on the latest working compiler. From that perspective, I 
 made several PRs:

 What remains to be done is to:
- Refactor dmd to offer a decent interface. Sometimes people
 argue against this point by saying that some changes affect the 
 compilation time of the compiler. Honestly, it takes 5 seconds 
 to compile dmd, even if we double that time I say it is worth 
 it if we gain a decent interface for dmd-as-a-lib. On this 
 point, me and Edi Staniloiu have been working with a bachelor 
 student to see what interface is required to be able to use 
 dmd-as-a-lib in tools in the ecosystem (like DCD).
 - Add some visitors that make it easy for 3rd party tools to 
 use compiler features.
I don't think API is not the most important part, as long as you can do what you need to do with the library. In my opinion there's a lot of functionality that is missing. For example, I've tried to use DMD as a library to do source code transformation. It falls very short in this area: * AST nodes without locations * Locations don't contain an end point/length * Locations don't contain the buffer offset * Indirect files are always read from disk. There's no option to make a full compilation purely from memory
 Me and Edi are in the position where we can use bachelor 
 students to do the heavy-lifting on helping this project to 
 cross the finish line, however, it is critical that the Dlang 
 Foundation leadership has a clear direction/vision on this.
I agree, you need a full buy in from the leadership and the compiler developers. I think this will be very difficult.
 So, how do we move forward?
I think the way forward is to fork DMD to allow you to make the necessary changes as you see fit without having to bother with discussion and politics. If you're lucky you can merge changes from upstream easily. Otherwise if you can't easily sync with upstream you can treat it as a separate compiler and evolve it on its own. I've already done this, if you're interested in collaborating [1]. [1] https://github.com/jacob-carlborg/ddc -- /Jacob Carlborg
Jun 16 2020
parent reply RazvanN <razvan.nitu1305 gmail.com> writes:
On Tuesday, 16 June 2020 at 12:05:11 UTC, Jacob Carlborg wrote:

 I don't think API is not the most important part, as long as 
 you can do what you need to do with the library. In my opinion 
 there's a lot of functionality that is missing. For example, 
 I've tried to use DMD as a library to do source code 
 transformation. It falls very short in this area:

 * AST nodes without locations
This typically happens when the compiler rewrites segments of the AST. It creates nodes, but doesn't bother with location since that code is not meant to be seen by any user.
 * Locations don't contain an end point/length
 * Locations don't contain the buffer offset
 * Indirect files are always read from disk. There's no option 
 to make a full compilation purely from memory
There are all valid points. I mostly thought about dmd as a lib as a way to analyze the AST and output relevant information (e.g. DCD), not as a tool to modify source code, however, I was expecting that the hdrgen visitor would help with that.
 I agree, you need a full buy in from the leadership and the 
 compiler developers. I think this will be very difficult.
Currently, the dmd as a library project is a state of limbo. We all agree that it needs to be pushed forward, but we don't know exactly how. This should be a good start for discussions, I guess.
 So, how do we move forward?
I think the way forward is to fork DMD to allow you to make the necessary changes as you see fit without having to bother with discussion and politics. If you're lucky you can merge changes from upstream easily. Otherwise if you can't easily sync with upstream you can treat it as a separate compiler and evolve it on its own. I've already done this, if you're interested in collaborating [1].
I am still hoping that we can work our way with the main compiler, but if things don't sort out, yes, collaborating on your fork is definitely the best alternative.
 [1] https://github.com/jacob-carlborg/ddc

 --
 /Jacob Carlborg
Jun 16 2020
parent Jacob Carlborg <doob me.com> writes:
On 2020-06-17 07:08, RazvanN wrote:

 This typically happens when the compiler rewrites segments of the AST. 
 It creates nodes, but doesn't bother with location since that code is 
 not meant to be seen by any user.
I wasn't referring to those cases. There are other cases I was thinking of: ("bar") void foo(); In the above code, the first location will point to, IIRC, the quote symbol (") and not the at sign ( ). This is before running any semantic analysis.
 There are all valid points. I mostly thought about dmd as a lib as a way 
 to analyze the AST and output relevant information (e.g. DCD), not as a 
 tool to modify source code, however,
I don't know how far you've come with this progress and I'm certainly not an expert in this subject and I've only glanced at the LSP specification. But if the compiler cannot compile all files from memory the LSP server needs to store the source code in temporary files, or it needs to read the files directly from the project directory, the latter is what DCD is doing. In that case all files, perhaps except the one you're currently are editing in need to be saved. That is, you cannot have multiple unsaved files and get the correct result, you'll get stale data instead. Perhaps not that common. But you can definitely end up with multiple unsaved files after a global search-and-replace. When it comes to start and end position and buffer offset, keep in mind that LSP does support modifying the source code with the "rename" feature [1]. If you don't know the buffer offset of a token, how would you know where to make the changes in the buffer? In this case you might get away with what the current compiler supports because this feature only applies to identifiers and you do know the length of an identifier. But don't you need to know where in the buffer the identifier start? I guess you could run the lexer again and count the number of bytes. But seems quite inefficient for something the compiler should support out of the box. Another feature which seems to depend on start and end position, and possible buffer offset as well, is the "foldingRange" feature [2].
 I was expecting that the hdrgen  visitor would help with that.
I haven't looked at the implementation of hdrgen but if the lexer doesn't preserve the information how would hdrgen get access to it?
 I am still hoping that we can work our way with the main compiler
Yeah, I don't want to wait anymore. [1] https://microsoft.github.io/language-server-protocol/specifications/specification-3-14/#textDocument_rename [2] https://microsoft.github.io/language-server-protocol/specifications/specification-3-14/#textDocument_foldingRange -- /Jacob Carlborg
Jun 17 2020