digitalmars.D - DMD as a library

digitalmars.D - DMD as a library - recap, next steps

RazvanN (55/55) Jun 15 2020 A few years ago, I worked on a project to refactor the dmd

evilrat (21/30) Jun 15 2020 Nice progress so far, but I would also like to note that there
Stefan Koch (24/41) Jun 16 2020 No if we double that time, it's a massive change to the rapid

Petar Kirov [ZombineDev] (48/90) Jun 16 2020 It is useful for serialization and source generation purposes at

Petar Kirov [ZombineDev] (10/17) Jun 16 2020 BTW, I really don't get this obsession with bad OOP design in
RazvanN (18/70) Jun 16 2020 Indeed!

Petar Kirov [ZombineDev] (2/4) Jun 16 2020 My pleasure, please keep up the good work on this amazing project!

Jacob Carlborg (11/18) Jun 16 2020 I agree. The Eclipse Java compiler has quite a few "modes" in

RazvanN (20/62) Jun 16 2020 It is not useless, the fact that libdparse exists and it is used

Jacob Carlborg (8/11) Jun 16 2020 Ideally the compiler should be modified to preserve all

RazvanN (9/20) Jun 16 2020 Would it have been easier if you had the ability override certain

Jacob Carlborg (26/34) Jun 17 2020 If it's possible to design the interface so it's possible to add a

Jacob Carlborg (13/16) Jun 16 2020 It's useful to be able to do. It's not up the compiler developers

WebFreak001 (40/53) Jun 16 2020 very cool, thanks for all the work! I think however increasing

RazvanN (15/54) Jun 16 2020 Incremental compilation is out of discussion when referring to

Jacob Carlborg (23/44) Jun 16 2020 I don't think API is not the most important part, as long as you

RazvanN (14/38) Jun 16 2020 This typically happens when the compiler rewrites segments of the

Jacob Carlborg (38/46) Jun 17 2020 I wasn't referring to those cases. There are other cases I was thinking ...

RazvanN <razvan.nitu1305 gmail.com> writes:

A few years ago, I worked on a project to refactor the dmd 
codebase so that it becomes easier to be used as a library. This 
has the advantage that tools that use dmd-as-a-lib will rely on 
the latest working compiler. From that perspective, I made 
several PRs:

1. Template the parser to remove reliance on modules that 
implement semantic analysis: 
https://github.com/dlang/dmd/pull/6625

2. Create ASTBase, an AST family that contains the minimum 
information to separate the parser from the rest of the compiler: 
https://github.com/dlang/dmd/pull/6836

3. A series of PRs to pull out all the semantic methods from AST 
nodes into visitors or free functions so that AST nodes in the 
compiler will replace the ASTBase family (which is essentially 
duplicated code):
     https://github.com/dlang/dmd/pull/7031
     https://github.com/dlang/dmd/pull/7048
     https://github.com/dlang/dmd/pull/7049
     https://github.com/dlang/dmd/pull/7114
     https://github.com/dlang/dmd/pull/7119
     https://github.com/dlang/dmd/pull/7122

4. I created visitors for semantic time analysis: 
https://github.com/dlang/dmd/pull/7411

At that point I've hit some bugs that prevented me from moving 
forward. I started fixing them and after a while I moved to 
another project which left the dmd as a library project in a half 
baked state.

What remains to be done is to:

- Further strip the AST nodes of functions that require semantic 
analysis so that we can remove the code duplication in ASTBase
- Refactor dmd to offer a decent interface. Sometimes people 
argue against this point by saying that some changes affect the 
compilation time of the compiler. Honestly, it takes 5 seconds to 
compile dmd, even if we double that time I say it is worth it if 
we gain a decent interface for dmd-as-a-lib. On this point, me 
and Edi Staniloiu have been working with a bachelor student to 
see what interface is required to be able to use dmd-as-a-lib in 
tools in the ecosystem (like DCD).
- Add some visitors that make it easy for 3rd party tools to use 
compiler features.

In the mean time there were some PRs that regressed the state of 
dmd as a lib, PRs such as this one: 
https://github.com/dlang/dmd/pull/9010 . That PR makes it 
impossible for someone to override the type semantic. This is 
moving things in the exact opposite direction of every PR that 
was showcased on this post and is a showstopper for this: 
https://github.com/dlang/dmd/pull/11265 moving forward in a 
consistent way.

Me and Edi are in the position where we can use bachelor students 
to do the heavy-lifting on helping this project to cross the 
finish line, however, it is critical that the Dlang Foundation 
leadership has a clear direction/vision on this.

So, how do we move forward?

Cheers,
RazvanN

Jun 15 2020

evilrat <evilrat666 gmail.com> writes:

On Tuesday, 16 June 2020 at 04:13:10 UTC, RazvanN wrote:
 A few years ago, I worked on a project to refactor the dmd 
 codebase so that it becomes easier to be used as a library. 
 This has the advantage that tools that use dmd-as-a-lib will 
 rely on the latest working compiler.

Nice progress so far, but I would also like to note that there 
was some annoying change some time last winter that puts whole 
compiler configuration in a private functions.

I know this is probably better addressed in dub instead, but it 
is just way less maintained than dmd.

Simply put, this dmd-as-a-library can be incredibly useful in 
doing custom code transformation step by serving as dmd proxy, 
however dub probing is too strict and relies on CTFE 
introspection results in std output, and that changes I mentioned 
currently forces one to copy-paste ~500 lines of code for option 
parsing and configuration, so it can be recognized by dub as 
actual compiler.
(last time I checked it in March)

 On this point, me and Edi Staniloiu have been working with a 
 bachelor student to see what interface is required to be able 
 to use dmd-as-a-lib in tools in the ecosystem (like DCD).
 - Add some visitors that make it easy for 3rd party tools to 
 use compiler features.

This would be awesome, it can even end the "IDE support sucks" 
complains.
If dmd will be able to do recompile in memory on code update to 
provide compilation database updates in under 500ms it will be 
usable enough to use in LSP's and other productivity tooling 
while actually supporting every language feature including 
template instance body inspection and UFCS.

Jun 15 2020

Stefan Koch <uplink.coder googlemail.com> writes:

On Tuesday, 16 June 2020 at 04:13:10 UTC, RazvanN wrote:

 - Further strip the AST nodes of functions that require 
 semantic analysis so that we can remove the code duplication in 
 ASTBase

The AST without SemA is useless.


 - Refactor dmd to offer a decent interface. Sometimes people 
 argue against this point by saying that some changes affect the 
 compilation time of the compiler. Honestly, it takes 5 seconds 
 to compile dmd, even if we double that time I say it is worth 
 it if we gain a decent interface for dmd-as-a-lib.

No if we double that time, it's a massive change to the rapid
development we can have right now.
Try compiling DMD with LDC and you'll see what I mean

 In the mean time there were some PRs that regressed the state 
 of dmd as a lib, PRs such as this one: 
 https://github.com/dlang/dmd/pull/9010 . That PR makes it 
 impossible for someone to override the type semantic.

Why would you ever need or want to OVERRIDE semantic.
If you override semantics than by definition you are no longer,
in the same language space.

 Me and Edi are in the position where we can use bachelor 
 students to do the heavy-lifting on helping this project to 
 cross the finish line, however, it is critical that the Dlang 
 Foundation leadership has a clear direction/vision on this.

 So, how do we move forward?

First establish a usecase.
I would say that the ability to custom add static-analysis passes 
should be at the forefront, since that is what I would want a 
compiler as a library to do.

For Example, I want to be able to ask.

Forall functions in module a, give me the ones which call a 
function called malloc,
either directly or transitively, as far as you can see by the 
source code I gave you.
Forall functions which only exist as declarations (you don't have 
the body), create a list and cross reference it with the 
call-graph of the selection we've got before.

For that to work you need to be able to run time compiler until 
just before code-generation and you need to be able to walk that 
type/identifer-resolved tree in a useful manner.

Some kind of C plugin api would be preferred.

Jun 16 2020

Petar Kirov [ZombineDev] <petar.p.kirov gmail.com> writes:

On Tuesday, 16 June 2020 at 08:16:25 UTC, Stefan Koch wrote:
On Tuesday, 16 June 2020 at 04:13:10 UTC, RazvanN wrote:

- Further strip the AST nodes of functions that require
semantic analysis so that we can remove the code duplication
in ASTBase

The AST without SemA is useless.

It is useful for serialization and source generation purposes at
least. In other language communities, where they don't have the
metaprogramming power of D, they do extensive source code
generation, and most of the time you don't need much semantic for
that (sometimes you just need a way to string interpolation on
symbol names).

Also perhaps one of the goal is to completely replace the parser
with something more fault-tolerant. For example see:

https://unallocated.com/blog/incremental-packrat-parsing-the-secret-to-fast-language-servers/

https://github.com/tree-sitter/tree-sitter
https://github.com/tree-sitter/tree-sitter/blob/master/docs/index.md

^^^
I suggest you watch the talks linked from the last page.

- Refactor dmd to offer a decent interface. Sometimes people
argue against this point by saying that some changes affect
the compilation time of the compiler. Honestly, it takes 5
seconds to compile dmd, even if we double that time I say it
is worth it if we gain a decent interface for dmd-as-a-lib.

No if we double that time, it's a massive change to the rapid
development we can have right now.
Try compiling DMD with LDC and you'll see what I mean

I agree. Though if a refactoring of this sort increases the
compile-time by 2x we have serious problems elsewhere. I don't
expect something like this to require more 1.15x in the worst
case. And even then we should be able to find many ways to
further decrease the compile-time to e.g. 0.85x.

In the mean time there were some PRs that regressed the state
of dmd as a lib, PRs such as this one:
https://github.com/dlang/dmd/pull/9010 . That PR makes it
impossible for someone to override the type semantic.

Why would you ever need or want to OVERRIDE semantic.
If you override semantics than by definition you are no longer,
in the same language space.

I suggest you try using a language with an excellent

You'd be amazed how well it works. It is able to make sense of
all kinds of broken code (e.g. giving you auto-completion for a
function with a missing closing curly brace. Most of these things
are completely impossible to do with the current rigid nature of
the dmd frontend.

Of course Razvan, Edi and Cristian are in better position to
answer, but I think that the main idea is that they don't want to
change the language, but instead they want to be able to plug
code in more parts of the compilation pipeline so the LSP can be
notified when e.g. the compiler is visiting an overload set. For
example, the compiler may stop looking when it finds the best
overload match, while for the LSP you want to display overloads,
as the user may have made a typo and so on. (Please don't read
too much into this example, I may have made a mistake, but the
general idea is that they need to be able to extract more info
from the frontend.)

Me and Edi are in the position where we can use bachelor
students to do the heavy-lifting on helping this project to
cross the finish line, however, it is critical that the Dlang
Foundation leadership has a clear direction/vision on this.

So, how do we move forward?

First establish a usecase.
I would say that the ability to custom add static-analysis
passes should be at the forefront, since that is what I would
want a compiler as a library to do.

The use case implementing all of this API:

https://microsoft.github.io/language-server-protocol/specifications/specification-current/

Specifically, take a look at the "Language Features" section,
e.g.:
https://microsoft.github.io/language-server-protocol/specifications/specification-current/#textDocument_completion

For Example, I want to be able to ask.

Forall functions in module a, give me the ones which call a
function called malloc,
either directly or transitively, as far as you can see by the
source code I gave you.
Forall functions which only exist as declarations (you don't
have the body), create a list and cross reference it with the
call-graph of the selection we've got before.

For that to work you need to be able to run time compiler until
just before code-generation and you need to be able to walk
that type/identifer-resolved tree in a useful manner.

Some kind of C plugin api would be preferred.

The set of C developers that want to write a D langauge server is
much smaller than the set of D developers that want to do the
same :D

But I agree on the general point that a well-defined and
versioned API is much needed, just like it sucks when changes
frontend break LDC and GDC.

Jun 16 2020

Petar Kirov [ZombineDev] <petar.p.kirov gmail.com> writes:

On Tuesday, 16 June 2020 at 09:00:55 UTC, Petar Kirov 
[ZombineDev] wrote:
 On Tuesday, 16 June 2020 at 08:16:25 UTC, Stefan Koch wrote:
 On Tuesday, 16 June 2020 at 04:13:10 UTC, RazvanN wrote:

 - Further strip the AST nodes of functions that require 
 semantic analysis so that we can remove the code duplication 
 in ASTBase

 The AST without SemA is useless.


BTW, I really don't get this obsession with bad OOP design in 
DMD. The AST classes should be just pure data. SemA should 
functions that operate on this data. I can't find any good reason 
why one would put the SemA *implementation* inside the AST 
classes. Just imagine if the constructor of the 
FunctionDeclaration directly outputted x86 assembly :D

So this is why I think all logic should be moved from the AST 
classes to other functions/classes/modules.

Jun 16 2020

RazvanN <razvan.nitu1305 gmail.com> writes:

On Tuesday, 16 June 2020 at 09:00:55 UTC, Petar Kirov
[ZombineDev] wrote:
On Tuesday, 16 June 2020 at 08:16:25 UTC, Stefan Koch wrote:
[...]

It is useful for serialization and source generation purposes
at least. In other language communities, where they don't have
the metaprogramming power of D, they do extensive source code
generation, and most of the time you don't need much semantic
for that (sometimes you just need a way to string interpolation
on symbol names).

Indeed!

Also perhaps one of the goal is to completely replace the
parser with something more fault-tolerant. For example see:

https://unallocated.com/blog/incremental-packrat-parsing-the-secret-to-fast-language-servers/

https://github.com/tree-sitter/tree-sitter
https://github.com/tree-sitter/tree-sitter/blob/master/docs/index.md

^^^
I suggest you watch the talks linked from the last page.

That is exactly what I had in mind.

[...]

You are correct. I was exaggerating for the sake of the argument.
What I meant was: we are extremely fast, but we are lacking a
compiler interface. Small performance regressions are acceptable
if they offer a guaranteed benefit with regards to defining a
good interface.

I suggest you try using a language with an excellent

TypeScript. You'd be amazed how well it works. It is able to
make sense of all kinds of broken code (e.g. giving you
auto-completion for a function with a missing closing curly
brace. Most of these things are completely impossible to do
with the current rigid nature of the dmd frontend.

Of course Razvan, Edi and Cristian are in better position to
answer, but I think that the main idea is that they don't want
to change the language, but instead they want to be able to
plug code in more parts of the compilation pipeline so the LSP
can be notified when e.g. the compiler is visiting an overload
set. For example, the compiler may stop looking when it finds
the best overload match, while for the LSP you want to display
overloads, as the user may have made a typo and so on. (Please
don't read too much into this example, I may have made a
mistake, but the general idea is that they need to be able to
extract more info from the frontend.)

You are entirely right. We have replaced all uses of libdparse and
other tools that mimic semantic analysis in DCD with dmd as a lib
and
it works, but it requires that we override current semantic
analysis that
is done for CallExps to be able to cope with semantic failures on
incomplete code.

[...]

The use case implementing all of this API:

https://microsoft.github.io/language-server-protocol/specifications/specification-current/

Specifically, take a look at the "Language Features" section,
e.g.:
https://microsoft.github.io/language-server-protocol/specifications/specification-current/#textDocument_completion

[...]

The set of C developers that want to write a D langauge server
is much smaller than the set of D developers that want to do
the same :D

But I agree on the general point that a well-defined and
versioned API is much needed, just like it sucks when changes
frontend break LDC and GDC.

Thanks for your reply. I feel that we are on the same page on
this.

Jun 16 2020

Petar Kirov [ZombineDev] <petar.p.kirov gmail.com> writes:

On Tuesday, 16 June 2020 at 09:15:24 UTC, RazvanN wrote:
 Thanks for your reply. I feel that we are on the same page on 
 this.

My pleasure, please keep up the good work on this amazing project!

Jun 16 2020

Jacob Carlborg <doob me.com> writes:

On Tuesday, 16 June 2020 at 09:00:55 UTC, Petar Kirov 
[ZombineDev] wrote:

 Also perhaps one of the goal is to completely replace the 
 parser with something more fault-tolerant. For example see:

 https://unallocated.com/blog/incremental-packrat-parsing-the-secret-to-fast-language-servers/

 https://github.com/tree-sitter/tree-sitter
 https://github.com/tree-sitter/tree-sitter/blob/master/docs/index.md

 ^^^
 I suggest you watch the talks linked from the last page.

I agree. The Eclipse Java compiler has quite a few "modes" in 
which you can run it. Only run the parser or run various levels 
of semantic analysis. It allows you to compile and run code that 
does not compile. For example, the signature of a functions 
compiles but not the body. The compiler can just replace the body 
with throwing an exception. If that functions is not called at 
runtime, it's perfectly fine.

--
/Jacob Carlborg

Jun 16 2020

RazvanN <razvan.nitu1305 gmail.com> writes:

On Tuesday, 16 June 2020 at 08:16:25 UTC, Stefan Koch wrote:
 On Tuesday, 16 June 2020 at 04:13:10 UTC, RazvanN wrote:

 - Further strip the AST nodes of functions that require 
 semantic analysis so that we can remove the code duplication 
 in ASTBase

 The AST without SemA is useless.

It is not useless, the fact that libdparse exists and it is used 
as a standalone library is proof of that.
 - Refactor dmd to offer a decent interface. Sometimes people 
 argue against this point by saying that some changes affect 
 the compilation time of the compiler. Honestly, it takes 5 
 seconds to compile dmd, even if we double that time I say it 
 is worth it if we gain a decent interface for dmd-as-a-lib.

 No if we double that time, it's a massive change to the rapid
 development we can have right now.
 Try compiling DMD with LDC and you'll see what I mean

The point here was that minor performance regressions that come 
out from refactorings should not be an obstacle if it offers a 
clear benefit from a dmd-as-a-lib standpoint.

 In the mean time there were some PRs that regressed the state 
 of dmd as a lib, PRs such as this one: 
 https://github.com/dlang/dmd/pull/9010 . That PR makes it 
 impossible for someone to override the type semantic.

 Why would you ever need or want to OVERRIDE semantic.
 If you override semantics than by definition you are no longer,
 in the same language space.

 Me and Edi are in the position where we can use bachelor 
 students to do the heavy-lifting on helping this project to 
 cross the finish line, however, it is critical that the Dlang 
 Foundation leadership has a clear direction/vision on this.

 So, how do we move forward?

 First establish a usecase.
 I would say that the ability to custom add static-analysis 
 passes should be at the forefront, since that is what I would 
 want a compiler as a library to do.

 For Example, I want to be able to ask.

 Forall functions in module a, give me the ones which call a 
 function called malloc,
 either directly or transitively, as far as you can see by the 
 source code I gave you.
 Forall functions which only exist as declarations (you don't 
 have the body), create a list and cross reference it with the 
 call-graph of the selection we've got before.

 For that to work you need to be able to run time compiler until 
 just before code-generation and you need to be able to walk 
 that type/identifer-resolved tree in a useful manner.

Analyzing the AST is one scenario, but there are other situations:

1. You want to extend the language with some feature
2. Semantic analysis mutates the AST in a way that makes it 
impossible for you to reason about what was there in the first 
place. One example here is an auto-complete tool that needs to be 
able to analyze incomplete code; if you do not override specific 
semantic methods, by the time you analyze the AST you will have 
error nodes that prevent you from doing any work.
3. You want to drop certain semantic passes because your tool 
does not necessitate them.

Ideally we would offer maximum flexibility with dmd-as-a-lib. 
Currently, you are forced to run the full semantic analysis pass 
and hope for the best.

 Some kind of C plugin api would be preferred.

I don't understand what you are reffering to.

Jun 16 2020

Jacob Carlborg <doob me.com> writes:

On Tuesday, 16 June 2020 at 09:07:19 UTC, RazvanN wrote:

 2. Semantic analysis mutates the AST in a way that makes it 
 impossible for you to reason about what was there in the first 
 place.

Ideally the compiler should be modified to preserve all 
information through all phases of the compilation.

In my tool I had to make quite a bit of extra effort to preserve 
the information I needed. Basically walking the AST twice, once 
before running the semantic analyzer and once after.

--
/Jacob Carlborg

Jun 16 2020

RazvanN <razvan.nitu1305 gmail.com> writes:

On Tuesday, 16 June 2020 at 11:34:43 UTC, Jacob Carlborg wrote:
 On Tuesday, 16 June 2020 at 09:07:19 UTC, RazvanN wrote:

 2. Semantic analysis mutates the AST in a way that makes it 
 impossible for you to reason about what was there in the first 
 place.

 Ideally the compiler should be modified to preserve all 
 information through all phases of the compilation.

 In my tool I had to make quite a bit of extra effort to 
 preserve the information I needed. Basically walking the AST 
 twice, once before running the semantic analyzer and once after.

 --
 /Jacob Carlborg

Would it have been easier if you had the ability override certain 
portions of the semantic analysis? What we are trying to push 
forward now is the ability to extend the semantic visitor and 
override/extend functionality as you wish, however, since some 
nodes have a lot of code that does semantic on them (CallExp 
~1000 lines of code) you would have to copy paste a lot of code 
and modify only what interests you.
The advantage is that you perform semantic only once.

Jun 16 2020

Jacob Carlborg <doob me.com> writes:

On 2020-06-17 06:58, RazvanN wrote:

 Would it have been easier if you had the ability override certain 
 portions of the semantic analysis? 

Yes, definitely.

 What we are trying to push forward 
 now is the ability to extend the semantic visitor and override/extend 
 functionality as you wish, however, since some nodes have a lot of code 
 that does semantic on them (CallExp ~1000 lines of code) you would have 
 to copy paste a lot of code and modify only what interests you.
 The advantage is that you perform semantic only once.

If it's possible to design the interface so it's possible to add a 
customization points both before and after the original semantic 
implementation and give access it the original implementation it would 
be a good start. That would suffice for my needs in the current tool.

Inheritance is a good example of this:

class Foo
{
     void foo() {}
}

class Bar : Foo
{
     override foo()
     {
         // new code
         super.foo(); // call original implementation
         // new code
     }
}

The API doesn't need to be inheritance but it's a good example that 
shows that is possible to add new code both before and after the 
original implementation. And, you don't need to invoke the original 
implementation at all if you don't want to.

-- 
/Jacob Carlborg

Jun 17 2020

Jacob Carlborg <doob me.com> writes:

On Tuesday, 16 June 2020 at 08:16:25 UTC, Stefan Koch wrote:

 Why would you ever need or want to OVERRIDE semantic.
 If you override semantics than by definition you are no longer,
 in the same language space.

It's useful to be able to do. It's not up the compiler developers 
to come up with every single use case. And just because they 
cannot come up with a use case doesn't mean there aren't any good 
ones. If a language or a library could only be used for what the 
creator could think of it would probably not be very useful at 
all.

I have a tool that has made some minor changes to the semantic 
phase to allow to infer attributes for all functions. Then it 
outputs all the attributes that can be attached to all functions 
of a given file.

--
/Jacob Carlborg

Jun 16 2020

WebFreak001 <d.forum webfreak.org> writes:

On Tuesday, 16 June 2020 at 04:13:10 UTC, RazvanN wrote:
 A few years ago, I worked on a project to refactor the dmd 
 codebase so that it becomes easier to be used as a library. 
 This has the advantage that tools that use dmd-as-a-lib will 
 rely on the latest working compiler. From that perspective, I 
 made several PRs:

 [...]

 Me and Edi are in the position where we can use bachelor 
 students to do the heavy-lifting on helping this project to 
 cross the finish line, however, it is critical that the Dlang 
 Foundation leadership has a clear direction/vision on this.

 So, how do we move forward?

 Cheers,
 RazvanN

very cool, thanks for all the work! I think however increasing 
the compilation time significantly is a major no-go, at least 
when it affects all code, expect of course if it allows us to do 
faster things like incremental compilation.

For architecture ideas you might want to check out how Microsoft 
implemented their Roslyn compiler platform: 
https://github.com/dotnet/roslyn/wiki/Roslyn-Overview

Just AST and visitors is not the most useful on its own, we have 
libdparse for this already which works fine too. Much more 
interesting is the semantic analysis.

If there is one thing I would want exposed by dmd for anything, 
being completion, dynamic linting, navigation, etc., I would 
really really want a symbol API. Much like dsymbol one 
incremental database of all defined symbols (modules, types, 
aliases, parameters, template parameters, variables, etc.) with 
references, definitions, types (of variables and parameters), 
names and all traits information. This database would contain all 
symbols in the entire compilation unit, be aware of scopes at any 
given point and be able to incrementally update by 
adding/removing or changing files.

The semantic analysis needs to be incremental here too though, so 
symbols would need some kind of dependency graph for things using 
mixin or templates. Also it would be difficult to extract 
information from scopes like `version (Foo)` where version is not 
Foo.

But just this symbols database and some APIs to query it per 
location, per file or per symbol name would be enough to 
implement nearly all features a user would expect from tooling 
and more.

Otherwise raw token access and AST visitors is all you really 
need to implement the rest like formatting, highlighting, static 
linting and refactorings. It's important that you can somehow 
recover the whitespaces and comments from tokens for refactoring 
and formatting though!

For other use-cases, like a REPL, exposing APIs to the executable 
generator would also be cool.

So if we have a symbols API I'm happy and I think that will be 
the goal of any DCD replacement program too :p

Keep up the good work on this!

Jun 16 2020

RazvanN <razvan.nitu1305 gmail.com> writes:

On Tuesday, 16 June 2020 at 10:48:13 UTC, WebFreak001 wrote:

 very cool, thanks for all the work! I think however increasing 
 the compilation time significantly is a major no-go, at least 
 when it affects all code, expect of course if it allows us to 
 do faster things like incremental compilation.

Incremental compilation is out of discussion when referring to 
dmd as a lib. For that we would need to start from scratch an 
implementation that takes care of all the various cases.

 For architecture ideas you might want to check out how 
 Microsoft implemented their Roslyn compiler platform: 
 https://github.com/dotnet/roslyn/wiki/Roslyn-Overview

 Just AST and visitors is not the most useful on its own, we 
 have libdparse for this already which works fine too. Much more 
 interesting is the semantic analysis.

But the whole point of dmd as a lib is to offer the ability to 
use semantic analysis by using semantic visitors. For example 
this PR
https://github.com/dlang/dmd/pull/11265 enables the ability to 
inherit semantic visitors that are used in the compiler and 
override or extend the functionality.

 If there is one thing I would want exposed by dmd for anything, 
 being completion, dynamic linting, navigation, etc., I would 
 really really want a symbol API. Much like dsymbol one 
 incremental database of all defined symbols (modules, types, 
 aliases, parameters, template parameters, variables, etc.) with 
 references, definitions, types (of variables and parameters), 
 names and all traits information. This database would contain 
 all symbols in the entire compilation unit, be aware of scopes 
 at any given point and be able to incrementally update by 
 adding/removing or changing files.

One step in that direction: 
https://github.com/dlang/dmd/pull/11092
With that, you can provide a function that pulls out all the 
symbols in a particular scope. It is not incremental, though.

 The semantic analysis needs to be incremental here too though, 
 so symbols would need some kind of dependency graph for things 
 using mixin or templates. Also it would be difficult to extract 
 information from scopes like `version (Foo)` where version is 
 not Foo.

 But just this symbols database and some APIs to query it per 
 location, per file or per symbol name would be enough to 
 implement nearly all features a user would expect from tooling 
 and more.

 Otherwise raw token access and AST visitors is all you really 
 need to implement the rest like formatting, highlighting, 
 static linting and refactorings. It's important that you can 
 somehow recover the whitespaces and comments from tokens for 
 refactoring and formatting though!

 For other use-cases, like a REPL, exposing APIs to the 
 executable generator would also be cool.

 So if we have a symbols API I'm happy and I think that will be 
 the goal of any DCD replacement program too :p

We are working on this and will soon publish that work.

 Keep up the good work on this!

Jun 16 2020

Jacob Carlborg <doob me.com> writes:

On Tuesday, 16 June 2020 at 04:13:10 UTC, RazvanN wrote:
 A few years ago, I worked on a project to refactor the dmd 
 codebase so that it becomes easier to be used as a library. 
 This has the advantage that tools that use dmd-as-a-lib will 
 rely on the latest working compiler. From that perspective, I 
 made several PRs:

 What remains to be done is to:
- Refactor dmd to offer a decent interface. Sometimes people
 argue against this point by saying that some changes affect the 
 compilation time of the compiler. Honestly, it takes 5 seconds 
 to compile dmd, even if we double that time I say it is worth 
 it if we gain a decent interface for dmd-as-a-lib. On this 
 point, me and Edi Staniloiu have been working with a bachelor 
 student to see what interface is required to be able to use 
 dmd-as-a-lib in tools in the ecosystem (like DCD).
 - Add some visitors that make it easy for 3rd party tools to 
 use compiler features.

I don't think API is not the most important part, as long as you 
can do what you need to do with the library. In my opinion 
there's a lot of functionality that is missing. For example, I've 
tried to use DMD as a library to do source code transformation. 
It falls very short in this area:

* AST nodes without locations
* Locations don't contain an end point/length
* Locations don't contain the buffer offset
* Indirect files are always read from disk. There's no option to 
make a full compilation purely from memory

 Me and Edi are in the position where we can use bachelor 
 students to do the heavy-lifting on helping this project to 
 cross the finish line, however, it is critical that the Dlang 
 Foundation leadership has a clear direction/vision on this.

I agree, you need a full buy in from the leadership and the 
compiler developers. I think this will be very difficult.

 So, how do we move forward?

I think the way forward is to fork DMD to allow you to make the 
necessary changes as you see fit without having to bother with 
discussion and politics.

If you're lucky you can merge changes from upstream easily. 
Otherwise if you can't easily sync with upstream you can treat it 
as a separate compiler and evolve it on its own.

I've already done this, if you're interested in collaborating [1].

[1] https://github.com/jacob-carlborg/ddc

--
/Jacob Carlborg

Jun 16 2020

RazvanN <razvan.nitu1305 gmail.com> writes:

On Tuesday, 16 June 2020 at 12:05:11 UTC, Jacob Carlborg wrote:

 I don't think API is not the most important part, as long as 
 you can do what you need to do with the library. In my opinion 
 there's a lot of functionality that is missing. For example, 
 I've tried to use DMD as a library to do source code 
 transformation. It falls very short in this area:

 * AST nodes without locations

This typically happens when the compiler rewrites segments of the 
AST. It creates nodes, but doesn't bother with location since 
that code is not meant to be seen by any user.

 * Locations don't contain an end point/length
 * Locations don't contain the buffer offset
 * Indirect files are always read from disk. There's no option 
 to make a full compilation purely from memory

There are all valid points. I mostly thought about dmd as a lib 
as a way to analyze the AST and output relevant information (e.g. 
DCD), not as a tool to modify source code, however, I was 
expecting that the hdrgen visitor would help with that.
 I agree, you need a full buy in from the leadership and the 
 compiler developers. I think this will be very difficult.

Currently, the dmd as a library project is a state of limbo. We 
all agree that it needs to be pushed forward, but we don't know 
exactly how. This should be a good start for discussions, I guess.

 So, how do we move forward?

 I think the way forward is to fork DMD to allow you to make the 
 necessary changes as you see fit without having to bother with 
 discussion and politics.

 If you're lucky you can merge changes from upstream easily. 
 Otherwise if you can't easily sync with upstream you can treat 
 it as a separate compiler and evolve it on its own.

 I've already done this, if you're interested in collaborating 
 [1].

I am still hoping that we can work our way with the main 
compiler, but if things don't sort out, yes, collaborating on 
your fork is definitely the best alternative.

 [1] https://github.com/jacob-carlborg/ddc

 --
 /Jacob Carlborg

Jun 16 2020

Jacob Carlborg <doob me.com> writes:

On 2020-06-17 07:08, RazvanN wrote:

 This typically happens when the compiler rewrites segments of the AST. 
 It creates nodes, but doesn't bother with location since that code is 
 not meant to be seen by any user.

I wasn't referring to those cases. There are other cases I was thinking of:

 ("bar")
void foo();

In the above code, the first location will point to, IIRC, the quote 
symbol (") and not the at sign ( ). This is before running any semantic 
analysis.

 There are all valid points. I mostly thought about dmd as a lib as a way 
 to analyze the AST and output relevant information (e.g. DCD), not as a 
 tool to modify source code, however,

I don't know how far you've come with this progress and I'm certainly 
not an expert in this subject and I've only glanced at the LSP 
specification. But if the compiler cannot compile all files from memory 
the LSP server needs to store the source code in temporary files, or it 
needs to read the files directly from the project directory, the latter 
is what DCD is doing. In that case all files, perhaps except the one 
you're currently are editing in need to be saved. That is, you cannot 
have multiple unsaved files and get the correct result, you'll get stale 
data instead. Perhaps not that common. But you can definitely end up 
with multiple unsaved files after a global search-and-replace.

When it comes to start and end position and buffer offset, keep in mind 
that LSP does support modifying the source code with the "rename" 
feature [1]. If you don't know the buffer offset of a token, how would 
you know where to make the changes in the buffer? In this case you might 
get away with what the current compiler supports because this feature 
only applies to identifiers and you do know the length of an identifier. 
But don't you need to know where in the buffer the identifier start? I 
guess you could run the lexer again and count the number of bytes. But 
seems quite inefficient for something the compiler should support out of 
the box.

Another feature which seems to depend on start and end position, and 
possible buffer offset as well, is the "foldingRange" feature [2].

 I was expecting that the hdrgen  visitor would help with that.

I haven't looked at the implementation of hdrgen but if the lexer 
doesn't preserve the information how would hdrgen get access to it?

 I am still hoping that we can work our way with the main compiler

Yeah, I don't want to wait anymore.


[1] 
https://microsoft.github.io/language-server-protocol/specifications/specification-3-14/#textDocument_rename

[2] 
https://microsoft.github.io/language-server-protocol/specifications/specification-3-14/#textDocument_foldingRange

-- 
/Jacob Carlborg

Jun 17 2020

D Programming

C/C++ Programming

Other

digitalmars.D - DMD as a library - recap, next steps