www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Adding D support to Clang format

reply Zachary Yedidia <zyedidia gmail.com> writes:
Clang format is a high quality auto-formatter developed by the 
LLVM project, mainly for use with C and C++, but it also supports 

TextProto, and Verilog. I think it would be great to add support 
for D, and hopefully shouldn't be too difficult since it is a 
C-like language. Clang format has good support for aligning 
tokens (comments, assignments, etc), which is something important 
that existing D formatters (dfmt, sdfmt) don't support, and Clang 
format has many configuration options. Clang format also uses an 
incomplete parser, so it is relatively resilient to new language 
syntax and doesn't need a full D parser (and can format code with 
syntax errors for what it's worth).

I have started an implementation for D here: 
https://github.com/zyedidia/llvm-project/tree/clang-format-d, 
with some information about the implementation here (how to 
download/build/test): 
https://github.com/zyedidia/llvm-project/blob/clang-format-d/clang-format-d.md.
There are still a number of things that are formatted incorrectly, so there is
still work to do (I have added tests for the things I have noticed aren't
working right). I'm not sure how much time I can dedicate to working on this,
so any help would be appreciated. If we get to a point where this fork of
clang-format has good support for D, then hopefully we could get these changes
merged upstream into the LLVM project.
Apr 28 2023
next sibling parent max haughton <maxhaton gmail.com> writes:
On Friday, 28 April 2023 at 23:30:37 UTC, Zachary Yedidia wrote:
 incomplete parser
sdfmt already does that, and it is important. Worth noting that Amaury who did sdfmt is an LLVM guy.
 [...]
Go for it, although I should say that nothing listed would convince me to switch.
Apr 28 2023
prev sibling parent reply Johan <j j.nl> writes:
On Friday, 28 April 2023 at 23:30:37 UTC, Zachary Yedidia wrote:
 Clang format is a high quality auto-formatter developed by the 
 LLVM project, mainly for use with C and C++, but it also 

 TableGen, TextProto, and Verilog. I think it would be great to 
 add support for D, and hopefully shouldn't be too difficult 
 since it is a C-like language.
clang-format is indeed a godsent formatter. I recently also read that it can do languages quite different from C, and hoped for D support in the future. It will be very nice for people (like me) who work on mixed C++/D codebases.
 I have started an implementation for D here: 
 https://github.com/zyedidia/llvm-project/tree/clang-format-d, 
 with some information about the implementation here (how to 
 download/build/test): 
 https://github.com/zyedidia/llvm-project/blob/clang-format-d/clang-format-d.md.
Great you are picking this up and leading the effort! Hope to be able to help out. Tip: already start discussing on the LLVM maillist of how to upstream the work, to avoid have to rework certain pieces later. Obviously adhere to the LLVM writing style (which you already do), but there may be some other concerns that are less obvious. cheers, Johan
Apr 29 2023
parent reply Johan <j j.nl> writes:
On Saturday, 29 April 2023 at 10:08:41 UTC, Johan wrote:
 On Friday, 28 April 2023 at 23:30:37 UTC, Zachary Yedidia wrote:
 Clang format is a high quality auto-formatter developed by the 
 LLVM project, mainly for use with C and C++, but it also 

 TableGen, TextProto, and Verilog. I think it would be great to 
 add support for D, and hopefully shouldn't be too difficult 
 since it is a C-like language.
clang-format is indeed a godsent formatter. I recently also read that it can do languages quite different from C, and hoped for D support in the future. It will be very nice for people (like me) who work on mixed C++/D codebases.
 I have started an implementation for D here: 
 https://github.com/zyedidia/llvm-project/tree/clang-format-d, 
 with some information about the implementation here (how to 
 download/build/test): 
 https://github.com/zyedidia/llvm-project/blob/clang-format-d/clang-format-d.md.
Great you are picking this up and leading the effort! Hope to be able to help out.
I'm enthusiastic, but after some more thinking I think it's wise to talk with the sdfmt team and discuss which way to go: better to have everyone work on one project, rather than two. I'm very interested in working on a good formatter, but not two ;-) Perhaps the upsides of clang-format (what is already there) are not so large. A big downside of clang-format is that you are tied to LLVM's project (release schedule, way of working, community, ...). In order to format D code, you'd have to install clang-format (and the whole LLVM dependency?). And you are programming in C++, which is a pity for a project that is purely for D. Because sdfmt has its own 'parser' (excellent!), it is quite a small program that can easily be built using `dub`. That makes the barrier to entry for developers very small (no need to build or test `sdc` itself). -Johan
Apr 29 2023
next sibling parent reply Guillaume Piolat <first.last spam.org> writes:
On Saturday, 29 April 2023 at 10:46:45 UTC, Johan wrote:
 I'm enthusiastic, but after some more thinking I think it's 
 wise to talk with the sdfmt team and discuss which way to go: 
 better to have everyone work on one project, rather than two. 
 I'm very interested in working on a good formatter, but not two 
 ;-)
I liked sdfmt when I tried it. sdfmt doesn't support spaces as indentation (or even specifying the indent level), so it seems to contradict the "D style". So we have a tab vs space debate on our hands :) and even K&R vs Allman.
Apr 29 2023
parent max haughton <maxhaton gmail.com> writes:
On Saturday, 29 April 2023 at 14:32:02 UTC, Guillaume Piolat 
wrote:
 On Saturday, 29 April 2023 at 10:46:45 UTC, Johan wrote:
 I'm enthusiastic, but after some more thinking I think it's 
 wise to talk with the sdfmt team and discuss which way to go: 
 better to have everyone work on one project, rather than two. 
 I'm very interested in working on a good formatter, but not 
 two ;-)
I liked sdfmt when I tried it. sdfmt doesn't support spaces as indentation (or even specifying the indent level), so it seems to contradict the "D style". So we have a tab vs space debate on our hands :) and even K&R vs Allman.
You can tell sdfmt to use spaces in a config file. D style is bad IMO (or at least has some warts) - sdfmt output is so much more efficient than (say) dfmt by default
Apr 29 2023
prev sibling parent reply Zachary Yedidia <zyedidia gmail.com> writes:
On Saturday, 29 April 2023 at 10:46:45 UTC, Johan wrote:
 On Saturday, 29 April 2023 at 10:08:41 UTC, Johan wrote:
 On Friday, 28 April 2023 at 23:30:37 UTC, Zachary Yedidia

 [...]
I'm enthusiastic, but after some more thinking I think it's wise to talk with the sdfmt team and discuss which way to go: better to have everyone work on one project, rather than two. I'm very interested in working on a good formatter, but not two ;-) Perhaps the upsides of clang-format (what is already there) are not so large. A big downside of clang-format is that you are tied to LLVM's project (release schedule, way of working, community, ...). In order to format D code, you'd have to install clang-format (and the whole LLVM dependency?). And you are programming in C++, which is a pity for a project that is purely for D. Because sdfmt has its own 'parser' (excellent!), it is quite a small program that can easily be built using `dub`. That makes the barrier to entry for developers very small (no need to build or test `sdc` itself). -Johan
I think you make some good points. The clang-format codebase is a bit gnarly to work with (~20,000 lines of C++ without any clear separation/organization for formatting all the various languages). I wasn't aware of sdfmt until the foundation meeting forum post yesterday, but maybe it is a better direction to go in. Having briefly tried sdfmt, I think the main thing I miss is comment/assignment alignment. I was under the impression that this might be difficult to implement in sdfmt, but if not I think that would resolve a lot for me (sdfmt already does better than dfmt for me). It would also be great to have import sorting. Some other minor issues with sdfmt are: not many configuration options (though I found the defaults to be pretty good), no documentation on how to set the configuration options that do exist (i.e., the .sdfmt JSON file is not documented), and a lack of visibility of the project overall because it is hidden within SDC (probably why I was not aware of it until recently). I think for now I'll just sit on my clang-format fork and see if there is interest/it's feasible to implement these things in sdfmt. If so, then I would be happy to use (and possibly contribute to) sdfmt instead.
Apr 29 2023
parent reply max haughton <maxhaton gmail.com> writes:
On Saturday, 29 April 2023 at 19:08:14 UTC, Zachary Yedidia wrote:
 On Saturday, 29 April 2023 at 10:46:45 UTC, Johan wrote:
 On Saturday, 29 April 2023 at 10:08:41 UTC, Johan wrote:
 On Friday, 28 April 2023 at 23:30:37 UTC, Zachary Yedidia

 [...]
I'm enthusiastic, but after some more thinking I think it's wise to talk with the sdfmt team and discuss which way to go: better to have everyone work on one project, rather than two. I'm very interested in working on a good formatter, but not two ;-) Perhaps the upsides of clang-format (what is already there) are not so large. A big downside of clang-format is that you are tied to LLVM's project (release schedule, way of working, community, ...). In order to format D code, you'd have to install clang-format (and the whole LLVM dependency?). And you are programming in C++, which is a pity for a project that is purely for D. Because sdfmt has its own 'parser' (excellent!), it is quite a small program that can easily be built using `dub`. That makes the barrier to entry for developers very small (no need to build or test `sdc` itself). -Johan
I think you make some good points. The clang-format codebase is a bit gnarly to work with (~20,000 lines of C++ without any clear separation/organization for formatting all the various languages). I wasn't aware of sdfmt until the foundation meeting forum post yesterday, but maybe it is a better direction to go in. Having briefly tried sdfmt, I think the main thing I miss is comment/assignment alignment. I was under the impression that this might be difficult to implement in sdfmt, but if not I think that would resolve a lot for me (sdfmt already does better than dfmt for me). It would also be great to have import sorting.
Why is import sorting useful in practice? Anything more than ~3 top level imports for me is a pretty big red flag for me - D has local imports, use them. Wouldn't be hard to implement in theory, although keep in mind that sdfmt internally has no concept of an import other than in the "parser", so that might be more of an AST->AST type of thing (which are much, much, easier to implement if you have a formatter for the output).
 Some other minor issues with sdfmt are: not many configuration 
 options (though I found the defaults to be pretty good), no 
 documentation on how to set the configuration options that do 
 exist (i.e., the .sdfmt JSON file is not documented), and a 
 lack of visibility of the project overall because it is hidden 
 within SDC (probably why I was not aware of it until recently).
sdfmt not being particularly configurable is sort of by design.
 I think for now I'll just sit on my clang-format fork and see 
 if there is interest/it's feasible to implement these things in 
 sdfmt. If so, then I would be happy to use (and possibly 
 contribute to) sdfmt instead.
The sdfmt algorithm is basically a simplified take on the way clang format works as far as I'm aware, implementing the alignment stuff shouldn't be ridiculously hard although I'm not sure how clang format has it in their decision-making/heuristics.
Apr 29 2023
parent reply Johan <j j.nl> writes:
On Saturday, 29 April 2023 at 21:07:28 UTC, max haughton wrote:
 On Saturday, 29 April 2023 at 19:08:14 UTC, Zachary Yedidia 
 wrote:
 It would also be great to have import sorting.
Why is import sorting useful in practice? Anything more than ~3 top level imports for me is a pretty big red flag for me - D has local imports, use them.
I think these kind of discussions should be kept to a minimum: a formatter should not force a specific formatting taste on the user, and instead provide options such that the user can tailor it to taste. I would also appreciate import sorting (with option to separate stdlib imports from user libraries), including sorting symbols of specific imports. Another wish is grouping all UDAs either before/after the function. If I think longer, I'm sure I have other wishes that clash with someone else's taste, like you had (https://github.com/snazzy-d/sdc/issues/231) ;-) Hence options options options! cheers, Johan
Apr 29 2023
parent max haughton <maxhaton gmail.com> writes:
On Saturday, 29 April 2023 at 21:27:51 UTC, Johan wrote:
 On Saturday, 29 April 2023 at 21:07:28 UTC, max haughton wrote:
 On Saturday, 29 April 2023 at 19:08:14 UTC, Zachary Yedidia 
 wrote:
 It would also be great to have import sorting.
Why is import sorting useful in practice? Anything more than ~3 top level imports for me is a pretty big red flag for me - D has local imports, use them.
I think these kind of discussions should be kept to a minimum: a formatter should not force a specific formatting taste on the user, and instead provide options such that the user can tailor it to taste.
In this case I think the question only makes sense if you are writing suboptimal code e.g. some files in dmd currently have almost 100 lines of `import`s at the top.
 I would also appreciate import sorting (with option to separate 
 stdlib imports from user libraries), including sorting symbols 
 of specific imports.
 Another wish is grouping all UDAs either before/after the 
 function.
 If I think longer, I'm sure I have other wishes that clash with 
 someone else's taste, like you had 
 (https://github.com/snazzy-d/sdc/issues/231) ;-)
 Hence options options options!
I guess, but this felt inconsistent rather than merely not to taste. For the most part I genuinely don't care how the code is actually formatted as long as it feels space efficient (using sdfmt has made me hate Allman braces, other than that not that much to report) and isn't going to trip my eyes up. At a scale larger than 1 person formatting isn't really about aesthetics anyway, it's about uniformity (both for tools and people). As long as it isn't completely brain-damaged I'm not that bothered about the format itself. At the level of a team of programmers, consider - as we all have done, and will do as long as there are programmers, and a bit longer after that as long as people remember what digital computers are - debates about how exactly code should be formatted: In using a relatively inflexible formatter you mostly eliminate the politics and distraction of these debates whereas in having to decide you run the risk of just moving the distraction around. YMMV. Following on from the above, if someone wants to implement more options in sdfmt (other than it being up to Amaury) I don't see much of a problem but I just think people miss why (and when) formatters are a good idea.
Apr 29 2023