digitalmars.D - DMD as a library - recap, next steps
- RazvanN (55/55) Jun 15 2020 A few years ago, I worked on a project to refactor the dmd
- evilrat (21/30) Jun 15 2020 Nice progress so far, but I would also like to note that there
- Stefan Koch (24/41) Jun 16 2020 No if we double that time, it's a massive change to the rapid
- Petar Kirov [ZombineDev] (48/90) Jun 16 2020 It is useful for serialization and source generation purposes at
- Petar Kirov [ZombineDev] (10/17) Jun 16 2020 BTW, I really don't get this obsession with bad OOP design in
- RazvanN (18/70) Jun 16 2020 Indeed!
- Petar Kirov [ZombineDev] (2/4) Jun 16 2020 My pleasure, please keep up the good work on this amazing project!
- Jacob Carlborg (11/18) Jun 16 2020 I agree. The Eclipse Java compiler has quite a few "modes" in
- RazvanN (20/62) Jun 16 2020 It is not useless, the fact that libdparse exists and it is used
- Jacob Carlborg (8/11) Jun 16 2020 Ideally the compiler should be modified to preserve all
- RazvanN (9/20) Jun 16 2020 Would it have been easier if you had the ability override certain
- Jacob Carlborg (26/34) Jun 17 2020 If it's possible to design the interface so it's possible to add a
- Jacob Carlborg (13/16) Jun 16 2020 It's useful to be able to do. It's not up the compiler developers
- WebFreak001 (40/53) Jun 16 2020 very cool, thanks for all the work! I think however increasing
- RazvanN (15/54) Jun 16 2020 Incremental compilation is out of discussion when referring to
- Jacob Carlborg (23/44) Jun 16 2020 I don't think API is not the most important part, as long as you
- RazvanN (14/38) Jun 16 2020 This typically happens when the compiler rewrites segments of the
- Jacob Carlborg (38/46) Jun 17 2020 I wasn't referring to those cases. There are other cases I was thinking ...
A few years ago, I worked on a project to refactor the dmd codebase so that it becomes easier to be used as a library. This has the advantage that tools that use dmd-as-a-lib will rely on the latest working compiler. From that perspective, I made several PRs: 1. Template the parser to remove reliance on modules that implement semantic analysis: https://github.com/dlang/dmd/pull/6625 2. Create ASTBase, an AST family that contains the minimum information to separate the parser from the rest of the compiler: https://github.com/dlang/dmd/pull/6836 3. A series of PRs to pull out all the semantic methods from AST nodes into visitors or free functions so that AST nodes in the compiler will replace the ASTBase family (which is essentially duplicated code): https://github.com/dlang/dmd/pull/7031 https://github.com/dlang/dmd/pull/7048 https://github.com/dlang/dmd/pull/7049 https://github.com/dlang/dmd/pull/7114 https://github.com/dlang/dmd/pull/7119 https://github.com/dlang/dmd/pull/7122 4. I created visitors for semantic time analysis: https://github.com/dlang/dmd/pull/7411 At that point I've hit some bugs that prevented me from moving forward. I started fixing them and after a while I moved to another project which left the dmd as a library project in a half baked state. What remains to be done is to: - Further strip the AST nodes of functions that require semantic analysis so that we can remove the code duplication in ASTBase - Refactor dmd to offer a decent interface. Sometimes people argue against this point by saying that some changes affect the compilation time of the compiler. Honestly, it takes 5 seconds to compile dmd, even if we double that time I say it is worth it if we gain a decent interface for dmd-as-a-lib. On this point, me and Edi Staniloiu have been working with a bachelor student to see what interface is required to be able to use dmd-as-a-lib in tools in the ecosystem (like DCD). - Add some visitors that make it easy for 3rd party tools to use compiler features. In the mean time there were some PRs that regressed the state of dmd as a lib, PRs such as this one: https://github.com/dlang/dmd/pull/9010 . That PR makes it impossible for someone to override the type semantic. This is moving things in the exact opposite direction of every PR that was showcased on this post and is a showstopper for this: https://github.com/dlang/dmd/pull/11265 moving forward in a consistent way. Me and Edi are in the position where we can use bachelor students to do the heavy-lifting on helping this project to cross the finish line, however, it is critical that the Dlang Foundation leadership has a clear direction/vision on this. So, how do we move forward? Cheers, RazvanN
Jun 15 2020
On Tuesday, 16 June 2020 at 04:13:10 UTC, RazvanN wrote:A few years ago, I worked on a project to refactor the dmd codebase so that it becomes easier to be used as a library. This has the advantage that tools that use dmd-as-a-lib will rely on the latest working compiler.Nice progress so far, but I would also like to note that there was some annoying change some time last winter that puts whole compiler configuration in a private functions. I know this is probably better addressed in dub instead, but it is just way less maintained than dmd. Simply put, this dmd-as-a-library can be incredibly useful in doing custom code transformation step by serving as dmd proxy, however dub probing is too strict and relies on CTFE introspection results in std output, and that changes I mentioned currently forces one to copy-paste ~500 lines of code for option parsing and configuration, so it can be recognized by dub as actual compiler. (last time I checked it in March)On this point, me and Edi Staniloiu have been working with a bachelor student to see what interface is required to be able to use dmd-as-a-lib in tools in the ecosystem (like DCD). - Add some visitors that make it easy for 3rd party tools to use compiler features.This would be awesome, it can even end the "IDE support sucks" complains. If dmd will be able to do recompile in memory on code update to provide compilation database updates in under 500ms it will be usable enough to use in LSP's and other productivity tooling while actually supporting every language feature including template instance body inspection and UFCS.
Jun 15 2020
On Tuesday, 16 June 2020 at 04:13:10 UTC, RazvanN wrote:- Further strip the AST nodes of functions that require semantic analysis so that we can remove the code duplication in ASTBaseThe AST without SemA is useless.- Refactor dmd to offer a decent interface. Sometimes people argue against this point by saying that some changes affect the compilation time of the compiler. Honestly, it takes 5 seconds to compile dmd, even if we double that time I say it is worth it if we gain a decent interface for dmd-as-a-lib.No if we double that time, it's a massive change to the rapid development we can have right now. Try compiling DMD with LDC and you'll see what I meanIn the mean time there were some PRs that regressed the state of dmd as a lib, PRs such as this one: https://github.com/dlang/dmd/pull/9010 . That PR makes it impossible for someone to override the type semantic.Why would you ever need or want to OVERRIDE semantic. If you override semantics than by definition you are no longer, in the same language space.Me and Edi are in the position where we can use bachelor students to do the heavy-lifting on helping this project to cross the finish line, however, it is critical that the Dlang Foundation leadership has a clear direction/vision on this. So, how do we move forward?First establish a usecase. I would say that the ability to custom add static-analysis passes should be at the forefront, since that is what I would want a compiler as a library to do. For Example, I want to be able to ask. Forall functions in module a, give me the ones which call a function called malloc, either directly or transitively, as far as you can see by the source code I gave you. Forall functions which only exist as declarations (you don't have the body), create a list and cross reference it with the call-graph of the selection we've got before. For that to work you need to be able to run time compiler until just before code-generation and you need to be able to walk that type/identifer-resolved tree in a useful manner. Some kind of C plugin api would be preferred.
Jun 16 2020
On Tuesday, 16 June 2020 at 08:16:25 UTC, Stefan Koch wrote:On Tuesday, 16 June 2020 at 04:13:10 UTC, RazvanN wrote:It is useful for serialization and source generation purposes at least. In other language communities, where they don't have the metaprogramming power of D, they do extensive source code generation, and most of the time you don't need much semantic for that (sometimes you just need a way to string interpolation on symbol names). Also perhaps one of the goal is to completely replace the parser with something more fault-tolerant. For example see: https://unallocated.com/blog/incremental-packrat-parsing-the-secret-to-fast-language-servers/ https://github.com/tree-sitter/tree-sitter https://github.com/tree-sitter/tree-sitter/blob/master/docs/index.md ^^^ I suggest you watch the talks linked from the last page.- Further strip the AST nodes of functions that require semantic analysis so that we can remove the code duplication in ASTBaseThe AST without SemA is useless.I agree. Though if a refactoring of this sort increases the compile-time by 2x we have serious problems elsewhere. I don't expect something like this to require more 1.15x in the worst case. And even then we should be able to find many ways to further decrease the compile-time to e.g. 0.85x.- Refactor dmd to offer a decent interface. Sometimes people argue against this point by saying that some changes affect the compilation time of the compiler. Honestly, it takes 5 seconds to compile dmd, even if we double that time I say it is worth it if we gain a decent interface for dmd-as-a-lib.No if we double that time, it's a massive change to the rapid development we can have right now. Try compiling DMD with LDC and you'll see what I meanI suggest you try using a language with an excellent You'd be amazed how well it works. It is able to make sense of all kinds of broken code (e.g. giving you auto-completion for a function with a missing closing curly brace. Most of these things are completely impossible to do with the current rigid nature of the dmd frontend. Of course Razvan, Edi and Cristian are in better position to answer, but I think that the main idea is that they don't want to change the language, but instead they want to be able to plug code in more parts of the compilation pipeline so the LSP can be notified when e.g. the compiler is visiting an overload set. For example, the compiler may stop looking when it finds the best overload match, while for the LSP you want to display overloads, as the user may have made a typo and so on. (Please don't read too much into this example, I may have made a mistake, but the general idea is that they need to be able to extract more info from the frontend.)In the mean time there were some PRs that regressed the state of dmd as a lib, PRs such as this one: https://github.com/dlang/dmd/pull/9010 . That PR makes it impossible for someone to override the type semantic.Why would you ever need or want to OVERRIDE semantic. If you override semantics than by definition you are no longer, in the same language space.The use case implementing all of this API: https://microsoft.github.io/language-server-protocol/specifications/specification-current/ Specifically, take a look at the "Language Features" section, e.g.: https://microsoft.github.io/language-server-protocol/specifications/specification-current/#textDocument_completionMe and Edi are in the position where we can use bachelor students to do the heavy-lifting on helping this project to cross the finish line, however, it is critical that the Dlang Foundation leadership has a clear direction/vision on this. So, how do we move forward?First establish a usecase. I would say that the ability to custom add static-analysis passes should be at the forefront, since that is what I would want a compiler as a library to do.For Example, I want to be able to ask. Forall functions in module a, give me the ones which call a function called malloc, either directly or transitively, as far as you can see by the source code I gave you. Forall functions which only exist as declarations (you don't have the body), create a list and cross reference it with the call-graph of the selection we've got before. For that to work you need to be able to run time compiler until just before code-generation and you need to be able to walk that type/identifer-resolved tree in a useful manner. Some kind of C plugin api would be preferred.The set of C developers that want to write a D langauge server is much smaller than the set of D developers that want to do the same :D But I agree on the general point that a well-defined and versioned API is much needed, just like it sucks when changes frontend break LDC and GDC.
Jun 16 2020
On Tuesday, 16 June 2020 at 09:00:55 UTC, Petar Kirov [ZombineDev] wrote:On Tuesday, 16 June 2020 at 08:16:25 UTC, Stefan Koch wrote:BTW, I really don't get this obsession with bad OOP design in DMD. The AST classes should be just pure data. SemA should functions that operate on this data. I can't find any good reason why one would put the SemA *implementation* inside the AST classes. Just imagine if the constructor of the FunctionDeclaration directly outputted x86 assembly :D So this is why I think all logic should be moved from the AST classes to other functions/classes/modules.On Tuesday, 16 June 2020 at 04:13:10 UTC, RazvanN wrote:- Further strip the AST nodes of functions that require semantic analysis so that we can remove the code duplication in ASTBaseThe AST without SemA is useless.
Jun 16 2020
On Tuesday, 16 June 2020 at 09:00:55 UTC, Petar Kirov [ZombineDev] wrote:On Tuesday, 16 June 2020 at 08:16:25 UTC, Stefan Koch wrote:[...]It is useful for serialization and source generation purposes at least. In other language communities, where they don't have the metaprogramming power of D, they do extensive source code generation, and most of the time you don't need much semantic for that (sometimes you just need a way to string interpolation on symbol names).Indeed!Also perhaps one of the goal is to completely replace the parser with something more fault-tolerant. For example see: https://unallocated.com/blog/incremental-packrat-parsing-the-secret-to-fast-language-servers/ https://github.com/tree-sitter/tree-sitter https://github.com/tree-sitter/tree-sitter/blob/master/docs/index.md ^^^ I suggest you watch the talks linked from the last page.That is exactly what I had in mind.You are correct. I was exaggerating for the sake of the argument. What I meant was: we are extremely fast, but we are lacking a compiler interface. Small performance regressions are acceptable if they offer a guaranteed benefit with regards to defining a good interface.[...]I agree. Though if a refactoring of this sort increases the compile-time by 2x we have serious problems elsewhere. I don't expect something like this to require more 1.15x in the worst case. And even then we should be able to find many ways to further decrease the compile-time to e.g. 0.85x.I suggest you try using a language with an excellent TypeScript. You'd be amazed how well it works. It is able to make sense of all kinds of broken code (e.g. giving you auto-completion for a function with a missing closing curly brace. Most of these things are completely impossible to do with the current rigid nature of the dmd frontend. Of course Razvan, Edi and Cristian are in better position to answer, but I think that the main idea is that they don't want to change the language, but instead they want to be able to plug code in more parts of the compilation pipeline so the LSP can be notified when e.g. the compiler is visiting an overload set. For example, the compiler may stop looking when it finds the best overload match, while for the LSP you want to display overloads, as the user may have made a typo and so on. (Please don't read too much into this example, I may have made a mistake, but the general idea is that they need to be able to extract more info from the frontend.)You are entirely right. We have replaced all uses of libdparse and other tools that mimic semantic analysis in DCD with dmd as a lib and it works, but it requires that we override current semantic analysis that is done for CallExps to be able to cope with semantic failures on incomplete code.Thanks for your reply. I feel that we are on the same page on this.[...]The use case implementing all of this API: https://microsoft.github.io/language-server-protocol/specifications/specification-current/ Specifically, take a look at the "Language Features" section, e.g.: https://microsoft.github.io/language-server-protocol/specifications/specification-current/#textDocument_completion[...]The set of C developers that want to write a D langauge server is much smaller than the set of D developers that want to do the same :D But I agree on the general point that a well-defined and versioned API is much needed, just like it sucks when changes frontend break LDC and GDC.
Jun 16 2020
On Tuesday, 16 June 2020 at 09:15:24 UTC, RazvanN wrote:Thanks for your reply. I feel that we are on the same page on this.My pleasure, please keep up the good work on this amazing project!
Jun 16 2020
On Tuesday, 16 June 2020 at 09:00:55 UTC, Petar Kirov [ZombineDev] wrote:Also perhaps one of the goal is to completely replace the parser with something more fault-tolerant. For example see: https://unallocated.com/blog/incremental-packrat-parsing-the-secret-to-fast-language-servers/ https://github.com/tree-sitter/tree-sitter https://github.com/tree-sitter/tree-sitter/blob/master/docs/index.md ^^^ I suggest you watch the talks linked from the last page.I agree. The Eclipse Java compiler has quite a few "modes" in which you can run it. Only run the parser or run various levels of semantic analysis. It allows you to compile and run code that does not compile. For example, the signature of a functions compiles but not the body. The compiler can just replace the body with throwing an exception. If that functions is not called at runtime, it's perfectly fine. -- /Jacob Carlborg
Jun 16 2020
On Tuesday, 16 June 2020 at 08:16:25 UTC, Stefan Koch wrote:On Tuesday, 16 June 2020 at 04:13:10 UTC, RazvanN wrote:It is not useless, the fact that libdparse exists and it is used as a standalone library is proof of that.- Further strip the AST nodes of functions that require semantic analysis so that we can remove the code duplication in ASTBaseThe AST without SemA is useless.The point here was that minor performance regressions that come out from refactorings should not be an obstacle if it offers a clear benefit from a dmd-as-a-lib standpoint.- Refactor dmd to offer a decent interface. Sometimes people argue against this point by saying that some changes affect the compilation time of the compiler. Honestly, it takes 5 seconds to compile dmd, even if we double that time I say it is worth it if we gain a decent interface for dmd-as-a-lib.No if we double that time, it's a massive change to the rapid development we can have right now. Try compiling DMD with LDC and you'll see what I meanAnalyzing the AST is one scenario, but there are other situations: 1. You want to extend the language with some feature 2. Semantic analysis mutates the AST in a way that makes it impossible for you to reason about what was there in the first place. One example here is an auto-complete tool that needs to be able to analyze incomplete code; if you do not override specific semantic methods, by the time you analyze the AST you will have error nodes that prevent you from doing any work. 3. You want to drop certain semantic passes because your tool does not necessitate them. Ideally we would offer maximum flexibility with dmd-as-a-lib. Currently, you are forced to run the full semantic analysis pass and hope for the best.In the mean time there were some PRs that regressed the state of dmd as a lib, PRs such as this one: https://github.com/dlang/dmd/pull/9010 . That PR makes it impossible for someone to override the type semantic.Why would you ever need or want to OVERRIDE semantic. If you override semantics than by definition you are no longer, in the same language space.Me and Edi are in the position where we can use bachelor students to do the heavy-lifting on helping this project to cross the finish line, however, it is critical that the Dlang Foundation leadership has a clear direction/vision on this. So, how do we move forward?First establish a usecase. I would say that the ability to custom add static-analysis passes should be at the forefront, since that is what I would want a compiler as a library to do. For Example, I want to be able to ask. Forall functions in module a, give me the ones which call a function called malloc, either directly or transitively, as far as you can see by the source code I gave you. Forall functions which only exist as declarations (you don't have the body), create a list and cross reference it with the call-graph of the selection we've got before. For that to work you need to be able to run time compiler until just before code-generation and you need to be able to walk that type/identifer-resolved tree in a useful manner.Some kind of C plugin api would be preferred.I don't understand what you are reffering to.
Jun 16 2020
On Tuesday, 16 June 2020 at 09:07:19 UTC, RazvanN wrote:2. Semantic analysis mutates the AST in a way that makes it impossible for you to reason about what was there in the first place.Ideally the compiler should be modified to preserve all information through all phases of the compilation. In my tool I had to make quite a bit of extra effort to preserve the information I needed. Basically walking the AST twice, once before running the semantic analyzer and once after. -- /Jacob Carlborg
Jun 16 2020
On Tuesday, 16 June 2020 at 11:34:43 UTC, Jacob Carlborg wrote:On Tuesday, 16 June 2020 at 09:07:19 UTC, RazvanN wrote:Would it have been easier if you had the ability override certain portions of the semantic analysis? What we are trying to push forward now is the ability to extend the semantic visitor and override/extend functionality as you wish, however, since some nodes have a lot of code that does semantic on them (CallExp ~1000 lines of code) you would have to copy paste a lot of code and modify only what interests you. The advantage is that you perform semantic only once.2. Semantic analysis mutates the AST in a way that makes it impossible for you to reason about what was there in the first place.Ideally the compiler should be modified to preserve all information through all phases of the compilation. In my tool I had to make quite a bit of extra effort to preserve the information I needed. Basically walking the AST twice, once before running the semantic analyzer and once after. -- /Jacob Carlborg
Jun 16 2020
On 2020-06-17 06:58, RazvanN wrote:Would it have been easier if you had the ability override certain portions of the semantic analysis?Yes, definitely.What we are trying to push forward now is the ability to extend the semantic visitor and override/extend functionality as you wish, however, since some nodes have a lot of code that does semantic on them (CallExp ~1000 lines of code) you would have to copy paste a lot of code and modify only what interests you. The advantage is that you perform semantic only once.If it's possible to design the interface so it's possible to add a customization points both before and after the original semantic implementation and give access it the original implementation it would be a good start. That would suffice for my needs in the current tool. Inheritance is a good example of this: class Foo { void foo() {} } class Bar : Foo { override foo() { // new code super.foo(); // call original implementation // new code } } The API doesn't need to be inheritance but it's a good example that shows that is possible to add new code both before and after the original implementation. And, you don't need to invoke the original implementation at all if you don't want to. -- /Jacob Carlborg
Jun 17 2020
On Tuesday, 16 June 2020 at 08:16:25 UTC, Stefan Koch wrote:Why would you ever need or want to OVERRIDE semantic. If you override semantics than by definition you are no longer, in the same language space.It's useful to be able to do. It's not up the compiler developers to come up with every single use case. And just because they cannot come up with a use case doesn't mean there aren't any good ones. If a language or a library could only be used for what the creator could think of it would probably not be very useful at all. I have a tool that has made some minor changes to the semantic phase to allow to infer attributes for all functions. Then it outputs all the attributes that can be attached to all functions of a given file. -- /Jacob Carlborg
Jun 16 2020
On Tuesday, 16 June 2020 at 04:13:10 UTC, RazvanN wrote:A few years ago, I worked on a project to refactor the dmd codebase so that it becomes easier to be used as a library. This has the advantage that tools that use dmd-as-a-lib will rely on the latest working compiler. From that perspective, I made several PRs: [...] Me and Edi are in the position where we can use bachelor students to do the heavy-lifting on helping this project to cross the finish line, however, it is critical that the Dlang Foundation leadership has a clear direction/vision on this. So, how do we move forward? Cheers, RazvanNvery cool, thanks for all the work! I think however increasing the compilation time significantly is a major no-go, at least when it affects all code, expect of course if it allows us to do faster things like incremental compilation. For architecture ideas you might want to check out how Microsoft implemented their Roslyn compiler platform: https://github.com/dotnet/roslyn/wiki/Roslyn-Overview Just AST and visitors is not the most useful on its own, we have libdparse for this already which works fine too. Much more interesting is the semantic analysis. If there is one thing I would want exposed by dmd for anything, being completion, dynamic linting, navigation, etc., I would really really want a symbol API. Much like dsymbol one incremental database of all defined symbols (modules, types, aliases, parameters, template parameters, variables, etc.) with references, definitions, types (of variables and parameters), names and all traits information. This database would contain all symbols in the entire compilation unit, be aware of scopes at any given point and be able to incrementally update by adding/removing or changing files. The semantic analysis needs to be incremental here too though, so symbols would need some kind of dependency graph for things using mixin or templates. Also it would be difficult to extract information from scopes like `version (Foo)` where version is not Foo. But just this symbols database and some APIs to query it per location, per file or per symbol name would be enough to implement nearly all features a user would expect from tooling and more. Otherwise raw token access and AST visitors is all you really need to implement the rest like formatting, highlighting, static linting and refactorings. It's important that you can somehow recover the whitespaces and comments from tokens for refactoring and formatting though! For other use-cases, like a REPL, exposing APIs to the executable generator would also be cool. So if we have a symbols API I'm happy and I think that will be the goal of any DCD replacement program too :p Keep up the good work on this!
Jun 16 2020
On Tuesday, 16 June 2020 at 10:48:13 UTC, WebFreak001 wrote:very cool, thanks for all the work! I think however increasing the compilation time significantly is a major no-go, at least when it affects all code, expect of course if it allows us to do faster things like incremental compilation.Incremental compilation is out of discussion when referring to dmd as a lib. For that we would need to start from scratch an implementation that takes care of all the various cases.For architecture ideas you might want to check out how Microsoft implemented their Roslyn compiler platform: https://github.com/dotnet/roslyn/wiki/Roslyn-Overview Just AST and visitors is not the most useful on its own, we have libdparse for this already which works fine too. Much more interesting is the semantic analysis.But the whole point of dmd as a lib is to offer the ability to use semantic analysis by using semantic visitors. For example this PR https://github.com/dlang/dmd/pull/11265 enables the ability to inherit semantic visitors that are used in the compiler and override or extend the functionality.If there is one thing I would want exposed by dmd for anything, being completion, dynamic linting, navigation, etc., I would really really want a symbol API. Much like dsymbol one incremental database of all defined symbols (modules, types, aliases, parameters, template parameters, variables, etc.) with references, definitions, types (of variables and parameters), names and all traits information. This database would contain all symbols in the entire compilation unit, be aware of scopes at any given point and be able to incrementally update by adding/removing or changing files.One step in that direction: https://github.com/dlang/dmd/pull/11092 With that, you can provide a function that pulls out all the symbols in a particular scope. It is not incremental, though.The semantic analysis needs to be incremental here too though, so symbols would need some kind of dependency graph for things using mixin or templates. Also it would be difficult to extract information from scopes like `version (Foo)` where version is not Foo. But just this symbols database and some APIs to query it per location, per file or per symbol name would be enough to implement nearly all features a user would expect from tooling and more. Otherwise raw token access and AST visitors is all you really need to implement the rest like formatting, highlighting, static linting and refactorings. It's important that you can somehow recover the whitespaces and comments from tokens for refactoring and formatting though! For other use-cases, like a REPL, exposing APIs to the executable generator would also be cool. So if we have a symbols API I'm happy and I think that will be the goal of any DCD replacement program too :pWe are working on this and will soon publish that work.Keep up the good work on this!
Jun 16 2020
On Tuesday, 16 June 2020 at 04:13:10 UTC, RazvanN wrote:A few years ago, I worked on a project to refactor the dmd codebase so that it becomes easier to be used as a library. This has the advantage that tools that use dmd-as-a-lib will rely on the latest working compiler. From that perspective, I made several PRs: What remains to be done is to: - Refactor dmd to offer a decent interface. Sometimes people argue against this point by saying that some changes affect the compilation time of the compiler. Honestly, it takes 5 seconds to compile dmd, even if we double that time I say it is worth it if we gain a decent interface for dmd-as-a-lib. On this point, me and Edi Staniloiu have been working with a bachelor student to see what interface is required to be able to use dmd-as-a-lib in tools in the ecosystem (like DCD). - Add some visitors that make it easy for 3rd party tools to use compiler features.I don't think API is not the most important part, as long as you can do what you need to do with the library. In my opinion there's a lot of functionality that is missing. For example, I've tried to use DMD as a library to do source code transformation. It falls very short in this area: * AST nodes without locations * Locations don't contain an end point/length * Locations don't contain the buffer offset * Indirect files are always read from disk. There's no option to make a full compilation purely from memoryMe and Edi are in the position where we can use bachelor students to do the heavy-lifting on helping this project to cross the finish line, however, it is critical that the Dlang Foundation leadership has a clear direction/vision on this.I agree, you need a full buy in from the leadership and the compiler developers. I think this will be very difficult.So, how do we move forward?I think the way forward is to fork DMD to allow you to make the necessary changes as you see fit without having to bother with discussion and politics. If you're lucky you can merge changes from upstream easily. Otherwise if you can't easily sync with upstream you can treat it as a separate compiler and evolve it on its own. I've already done this, if you're interested in collaborating [1]. [1] https://github.com/jacob-carlborg/ddc -- /Jacob Carlborg
Jun 16 2020
On Tuesday, 16 June 2020 at 12:05:11 UTC, Jacob Carlborg wrote:I don't think API is not the most important part, as long as you can do what you need to do with the library. In my opinion there's a lot of functionality that is missing. For example, I've tried to use DMD as a library to do source code transformation. It falls very short in this area: * AST nodes without locationsThis typically happens when the compiler rewrites segments of the AST. It creates nodes, but doesn't bother with location since that code is not meant to be seen by any user.* Locations don't contain an end point/length * Locations don't contain the buffer offset * Indirect files are always read from disk. There's no option to make a full compilation purely from memoryThere are all valid points. I mostly thought about dmd as a lib as a way to analyze the AST and output relevant information (e.g. DCD), not as a tool to modify source code, however, I was expecting that the hdrgen visitor would help with that.I agree, you need a full buy in from the leadership and the compiler developers. I think this will be very difficult.Currently, the dmd as a library project is a state of limbo. We all agree that it needs to be pushed forward, but we don't know exactly how. This should be a good start for discussions, I guess.I am still hoping that we can work our way with the main compiler, but if things don't sort out, yes, collaborating on your fork is definitely the best alternative.So, how do we move forward?I think the way forward is to fork DMD to allow you to make the necessary changes as you see fit without having to bother with discussion and politics. If you're lucky you can merge changes from upstream easily. Otherwise if you can't easily sync with upstream you can treat it as a separate compiler and evolve it on its own. I've already done this, if you're interested in collaborating [1].[1] https://github.com/jacob-carlborg/ddc -- /Jacob Carlborg
Jun 16 2020
On 2020-06-17 07:08, RazvanN wrote:This typically happens when the compiler rewrites segments of the AST. It creates nodes, but doesn't bother with location since that code is not meant to be seen by any user.I wasn't referring to those cases. There are other cases I was thinking of: ("bar") void foo(); In the above code, the first location will point to, IIRC, the quote symbol (") and not the at sign ( ). This is before running any semantic analysis.There are all valid points. I mostly thought about dmd as a lib as a way to analyze the AST and output relevant information (e.g. DCD), not as a tool to modify source code, however,I don't know how far you've come with this progress and I'm certainly not an expert in this subject and I've only glanced at the LSP specification. But if the compiler cannot compile all files from memory the LSP server needs to store the source code in temporary files, or it needs to read the files directly from the project directory, the latter is what DCD is doing. In that case all files, perhaps except the one you're currently are editing in need to be saved. That is, you cannot have multiple unsaved files and get the correct result, you'll get stale data instead. Perhaps not that common. But you can definitely end up with multiple unsaved files after a global search-and-replace. When it comes to start and end position and buffer offset, keep in mind that LSP does support modifying the source code with the "rename" feature [1]. If you don't know the buffer offset of a token, how would you know where to make the changes in the buffer? In this case you might get away with what the current compiler supports because this feature only applies to identifiers and you do know the length of an identifier. But don't you need to know where in the buffer the identifier start? I guess you could run the lexer again and count the number of bytes. But seems quite inefficient for something the compiler should support out of the box. Another feature which seems to depend on start and end position, and possible buffer offset as well, is the "foldingRange" feature [2].I was expecting that the hdrgen visitor would help with that.I haven't looked at the implementation of hdrgen but if the lexer doesn't preserve the information how would hdrgen get access to it?I am still hoping that we can work our way with the main compilerYeah, I don't want to wait anymore. [1] https://microsoft.github.io/language-server-protocol/specifications/specification-3-14/#textDocument_rename [2] https://microsoft.github.io/language-server-protocol/specifications/specification-3-14/#textDocument_foldingRange -- /Jacob Carlborg
Jun 17 2020