digitalmars.D - Why use a DFA instead of DIP1000?
- Richard (Rikki) Andrew Cattermole (43/43) Sep 12 Yesterday (or today, depending upon who you ask), Dennis asked a
- Dejan Lekic (1/1) Sep 13 Good stuff, Rikki. Thanks for working on this.
- Richard (Rikki) Andrew Cattermole (2/3) Sep 13 Thanks!
- IchorDev (13/18) Sep 13 It could run if asserts were off. Obviously asserts are never
- Richard (Rikki) Andrew Cattermole (10/29) Sep 13 Its not meant to be too realistic, what it is meant to do is have all
- kdevel (13/26) Sep 13 What about "release" mode? And what about that code:
- IchorDev (8/22) Sep 13 The program has entered an invalid state and the behaviour of the
- kdevel (19/38) Sep 13 I thought that the intent of the original code
- IchorDev (3/20) Sep 13 Then you clearly didn't read the function beforehand.
- Richard (Rikki) Andrew Cattermole (4/53) Sep 13 Half right.
- kdevel (7/35) Sep 13 Then it is dead code that should be manually removed? Isn't that
- Richard (Rikki) Andrew Cattermole (19/56) Sep 15 In practice, when this is applied people don't complain about it.
- Richard (Rikki) Andrew Cattermole (11/44) Sep 13 What assert lowers to is irrelevant as far as a frontend DFA is
- Dennis (13/16) Sep 13 Thanks for providing the example, it's a great discussion starter.
- Richard (Rikki) Andrew Cattermole (16/36) Sep 13 That isn't my interpretation.
- Walter Bright (8/8) Sep 13 Requiring the language to remove all dead code before checking for error...
- Richard (Rikki) Andrew Cattermole (48/59) Sep 13 It seems we have a miscommunication on this.
- Walter Bright (4/4) Sep 14 A DFA that only works with a forward pass will generate a lot of complai...
- Richard (Rikki) Andrew Cattermole (37/42) Sep 14 Oh I see where you're thinking.
- Adam Wilson (32/43) Sep 15 I wasn't entirely sure which post to respond to because I needed
- Dennis (38/42) Sep 16 When Walter talks about DFA, he refers to the specific algorithm
- Richard (Rikki) Andrew Cattermole (40/81) Sep 16 You haven't got my stuff entirely right.
- Dennis (8/24) Sep 16 I was referring to C#'s unused variable warnings that Adam
- Richard (Rikki) Andrew Cattermole (56/67) Sep 16 Hmm, I think I see a missing piece of the puzzle.
- Walter Bright (25/25) Sep 22 This is what I am talking about:
- Walter Bright (13/13) Sep 22 In order to do @nogc DFA, the following is a start:
- Richard (Rikki) Andrew Cattermole (41/74) Sep 23 I will assume that ``@nogc`` is lint level attribute that does not
- Walter Bright (7/18) Sep 23 I am reluctant to try to fix this issue with even more attributes.
- Richard (Rikki) Andrew Cattermole (17/38) Sep 23 For @nogc and pure, it is that simple.
- Dennis (62/65) Sep 23 I think there's a simpler way. You can model this as a graph
- Walter Bright (4/4) Sep 23 Your algorithm is interesting, but I don't think it is fundamentally dif...
- Dennis (44/48) Sep 23 I think both algorithms compute the same thing, but your
- Walter Bright (21/57) Sep 23 I don't think it is anywhere near that bad. It does more than O(V) only ...
- Dennis (33/44) Sep 23 Just step 2, constructing the bit vectors, is already O(V^2).
- Richard (Rikki) Andrew Cattermole (10/21) Sep 23 When the problem space has a known finite lifetime, like the case for
- Walter Bright (2/4) Sep 23 Is there a discussion on this?
- Dennis (5/6) Sep 23 I've briefly mentioned this at DConf in the past two years, I
- Walter Bright (3/8) Sep 23 Ok. There are other things that affect this, too, like typing of pointer...
- Dennis (5/7) Sep 23 Which is incidentally one of the many long standing problems with
- Dennis (17/26) Sep 16 So the answer to my question is: this it not motivated by
- Richard (Rikki) Andrew Cattermole (91/118) Sep 16 I am not trying to improve D based upon what other languages can
- Dennis (9/14) Sep 16 You gave 4 code snippets. 1 and 2 already compile as expected, 3
- Richard (Rikki) Andrew Cattermole (30/46) Sep 16 As expected.
- Dennis (41/64) Sep 16 Thanks a lot! I don't need many, I just need 1 good one, so you
- Richard (Rikki) Andrew Cattermole (30/89) Sep 16 I wish I could have offered more in WONTFIX, it would have been
- Dennis (17/25) Sep 16 My search gives:
- Walter Bright (4/7) Sep 13 That is correct. For the false path of a `static if`, the code must be
- Richard (Rikki) Andrew Cattermole (2/10) Sep 13 The linked code is not static if, its a regular if.
- Walter Bright (2/3) Sep 14 I know. I was trying to clarify the difference between regular if and st...
- Walter Bright (2/2) Sep 13 DIP1000 is necessary when the DFA is faced with only a function prototyp...
- Richard (Rikki) Andrew Cattermole (12/14) Sep 13 I'm curious as to why you said this.
Yesterday (or today, depending upon who you ask), Dennis asked a wonderful question in the monthly meeting. Paraphrased, why should we use a DFA engine instead of DIP1000? And you know what, this is a great question, especially since DIP1000 is a subset of a specialised DFA engine. But with some serious ceiling limitations. Afterwards, I came up with an example, where DIP1000 will error: ```d int* ptr; void func(bool b, scope int* p) safe { assert(!b); if (b) { ptr = p; // Error: scope variable `p` assigned to global variable `ptr` } } ``` This is clearly a false positive; that branch could never run! One of the hallmarks of a real data flow analysis engine is called dead code elimination; all optimising backends implement it. In fact, it's one of the first that ever gets implemented and DIP1000 can't do it! But what if we did have the ability to model truthiness? Well then we could cull that dead branch, and not do the report. But that is slow right? NOPE! We can actually do this fast! Here is a modified example from my fast DFA engine, which I've been working on for around six months for both truthiness and nullability: ```d void branchKill(bool b) { assert(!b); if (b) { int* ptr; int val = *ptr; // no error branch dead } } ``` My hope, should it succeed, is to be on by default; this requires it to be fast and not have false positives like the above code. I hope this is enlightening for those who don't know what data flow analysis is all about!
Sep 12
On 13/09/2025 9:42 PM, Dejan Lekic wrote:Good stuff, Rikki. Thanks for working on this.Thanks!
Sep 13
On Saturday, 13 September 2025 at 02:39:40 UTC, Richard (Rikki) Andrew Cattermole wrote:This is clearly a false positive; that branch could never run!It could run if asserts were off. Obviously asserts are never meant to fail when the program is in a correct state though.One of the hallmarks of a real data flow analysis engine is called dead code elimination; all optimising backends implement it. In fact, it's one of the first that ever gets implemented and DIP1000 can't do it!This example doesn't seem very realistic. A branch that never runs shouldn't really exist to begin with, right? Can you think of any more realistic examples where DFA allows you to infer safety where DIP1000 is currently limited? Also I'm curious what other problems DFA could be applied to? For instance, detecting what types a function throws and allowing you to catch only those types instead of always having to `catch(Exception)` to satisfy the compiler. Or doing more complex type inference.
Sep 13
On 13/09/2025 10:13 PM, IchorDev wrote:On Saturday, 13 September 2025 at 02:39:40 UTC, Richard (Rikki) Andrew Cattermole wrote:Its not meant to be too realistic, what it is meant to do is have all the primitives in place to show that the compiler has ability to track state and what happens should X happen. Everything from loops, assignments, if statements, to function calls mess with this, and its not a great example. The logs from stuff like this get very big. Small font size + multi-gigabyte logs. Not fun and certainly not educational.This is clearly a false positive; that branch could never run!It could run if asserts were off. Obviously asserts are never meant to fail when the program is in a correct state though.One of the hallmarks of a real data flow analysis engine is called dead code elimination; all optimising backends implement it. In fact, it's one of the first that ever gets implemented and DIP1000 can't do it!This example doesn't seem very realistic. A branch that never runs shouldn't really exist to begin with, right? Can you think of any more realistic examples where DFA allows you to infer safety where DIP1000 is currently limited?Also I'm curious what other problems DFA could be applied to? For instance, detecting what types a function throws and allowing you to catch only those types instead of always having to `catch(Exception)` to satisfy the compiler. Or doing more complex type inference.Exceptions need a set in the compiler rather than just "is nothrow flag set". The compiler can (and should) do this without DFA.
Sep 13
On Saturday, 13 September 2025 at 02:39:40 UTC, Richard (Rikki) Andrew Cattermole wrote:[...] Afterwards, I came up with an example, where DIP1000 will error: ```d int* ptr; void func(bool b, scope int* p) safe { assert(!b); if (b) { ptr = p; // Error: scope variable `p` assigned to global variable `ptr` } } ``` This is clearly a false positive; that branch could never run!What about "release" mode? And what about that code: ```d void main () { assert (0); string s = "x"; s [0] = 'y'; } ``` Is "Error: cannot modify `immutable` expression `s[0]`" a false positive, too?
Sep 13
On Saturday, 13 September 2025 at 10:19:45 UTC, kdevel wrote:On Saturday, 13 September 2025 at 02:39:40 UTC, Richard (Rikki) Andrew Cattermole wrote:The program has entered an invalid state and the behaviour of the continuing execution of the program is undefined. https://dlang.org/spec/expression.html#AssertExpression That said, I don't know if we should be suppressing these errors. Doesn't seem useful to me?This is clearly a false positive; that branch could never run!What about "release" mode? And what about that code: ```d void main () { assert (0); string s = "x"; s [0] = 'y'; } ```Is "Error: cannot modify `immutable` expression `s[0]`" a false positive, too?It's not a 'false positive' because it's not related to DIP1000's stack escape rules at all.
Sep 13
On Saturday, 13 September 2025 at 11:08:29 UTC, IchorDev wrote:[...]Please note that the program is not run but just compiled.What about "release" mode? And what about that code: ```d void main () { assert (0); string s = "x"; s [0] = 'y'; } ```The program has entered an invalid state and the behaviour of the continuing execution of the program is undefined.[...]I thought that the intent of the original code ```d int* ptr; void func(bool b, scope int* p) safe { assert(!b); if (b) { ptr = p; // Error: scope variable `p` assigned to global variable `ptr` } } ``` was to show that there is some code-block ```d ptr = p; // Error: scope variable `p` assigned to global variable `ptr` ``` which is never executed regardless of the value of `b`.Is "Error: cannot modify `immutable` expression `s[0]`" a false positive, too?It's not a 'false positive' because it's not related to DIP1000's stack escape rules at all.
Sep 13
On Saturday, 13 September 2025 at 13:06:06 UTC, kdevel wrote:Please note that the program is not run but just compiled.What does that even mean?I thought that the intent of the original code ```d void func(bool b, scope int* p) safe { assert(!b); if (b) { ptr = p; // Error: scope variable `p` assigned to global variable `ptr` } } ``` was to show that there is some code-block ```d ptr = p; // Error: scope variable `p` assigned to global variable `ptr` ``` which is never executed regardless of the value of `b`.Then you clearly didn't read the function beforehand.
Sep 13
On 14/09/2025 1:06 AM, kdevel wrote:On Saturday, 13 September 2025 at 11:08:29 UTC, IchorDev wrote:Half right. There is a code block that will never be executed, it is because of variable b's state is known to prevent that code block from execution.[...]Please note that the program is not run but just compiled.What about "release" mode? And what about that code: ```d void main () { assert (0); string s = "x"; s [0] = 'y'; } ```The program has entered an invalid state and the behaviour of the continuing execution of the program is undefined.[...]I thought that the intent of the original code ```d int* ptr; void func(bool b, scope int* p) safe { assert(!b); if (b) { ptr = p; // Error: scope variable `p` assigned to global variable `ptr` } } ``` was to show that there is some code-block ```d ptr = p; // Error: scope variable `p` assigned to global variable `ptr` ``` which is never executed regardless of the value of `b`.Is "Error: cannot modify `immutable` expression `s[0]`" a false positive, too?It's not a 'false positive' because it's not related to DIP1000's stack escape rules at all.
Sep 13
On Saturday, 13 September 2025 at 15:43:27 UTC, Richard (Rikki) Andrew Cattermole wrote:[...]Then it is dead code that should be manually removed? Isn't that the problem here? The compiler correctly complains about invalid code, this is not a false positive, even if that code is never executed. If I want code not to be executed I comment it out or delete it.I thought that the intent of the original code ```d int* ptr; void func(bool b, scope int* p) safe { assert(!b); if (b) { ptr = p; // Error: scope variable `p` assigned to global variable `ptr` } } ``` was to show that there is some code-block ```d ptr = p; // Error: scope variable `p` assigned to global variable `ptr` ``` which is never executed regardless of the value of `b`.Half right. There is a code block that will never be executed, it is because of variable b's state is known to prevent that code block from execution.
Sep 13
On 14/09/2025 7:58 AM, kdevel wrote:On Saturday, 13 September 2025 at 15:43:27 UTC, Richard (Rikki) Andrew Cattermole wrote:In practice, when this is applied people don't complain about it. https://github.com/dlang/dmd/issues/21859 Compiles with full D, but not -betterC (which does the extra check without the culling which is a bug). ```d void func() nothrow { if (0) { throw new Exception(""); // Error: cannot use `throw` statements with -betterC } } ``` When both DFA and Control Flow graph (CFG) processing is working correctly its a well loved compiler feature. You don't know its there unless something has gone wrong like in the above code. When I learned how nothrow inference worked I was really impressed with how it was applying CFG analysis. I did not expect the frontend to be doing it.[...]Then it is dead code that should be manually removed? Isn't that the problem here? The compiler correctly complains about invalid code, this is not a false positive, even if that code is never executed. If I want code not to be executed I comment it out or delete it.I thought that the intent of the original code ```d int* ptr; void func(bool b, scope int* p) safe { assert(!b); if (b) { ptr = p; // Error: scope variable `p` assigned to global variable `ptr` } } ``` was to show that there is some code-block ```d ptr = p; // Error: scope variable `p` assigned to global variable `ptr` ``` which is never executed regardless of the value of `b`.Half right. There is a code block that will never be executed, it is because of variable b's state is known to prevent that code block from execution.
Sep 15
On 13/09/2025 10:19 PM, kdevel wrote:On Saturday, 13 September 2025 at 02:39:40 UTC, Richard (Rikki) Andrew Cattermole wrote:What assert lowers to is irrelevant as far as a frontend DFA is concerned. It runs before that is taken into account.[...] Afterwards, I came up with an example, where DIP1000 will error: ```d int* ptr; void func(bool b, scope int* p) safe { assert(!b); if (b) { ptr = p; // Error: scope variable `p` assigned to global variable `ptr` } } ``` This is clearly a false positive; that branch could never run!What about "release" mode? And what about that code:```d void main () { assert (0); string s = "x"; s [0] = 'y'; } ``` Is "Error: cannot modify `immutable` expression `s[0]`" a false positive, too?Yes actually. Not all languages, and some in the ML line should have the ability to not process anything after the assert statement. They apply DFA into the type system to varying degrees. We don't. For halt, we may have the ability to kill it off eventually. However we can't cull non-halt asserts, as we are not tieing our semantic analysis to a DFA engine. It is too late to change that. Note: a halt assert is: assert(0)
Sep 13
On Saturday, 13 September 2025 at 02:39:40 UTC, Richard (Rikki) Andrew Cattermole wrote:Paraphrased, why should we use a DFA engine instead of DIP1000? (...) Afterwards, I came up with an example, where DIP1000 will error:Thanks for providing the example, it's a great discussion starter. Like I said yesterday, it's not that I don't "know what data flow analysis is all about" - when you have a solution, you can come up with problems it solves, sure. I was mostly curious what real world code made you believe that more advanced DFA is necessary for useful scope pointer analysis. There's many problems with DIP1000, but no issue was ever opened requesting scope checks to be disabled in dead code. As far as I'm aware, [users expect dead code to be type checked by the frontend as usual](https://forum.dlang.org/post/hullwyrnmgcaoghaqbux forum.dlang.org).
Sep 13
On 13/09/2025 11:06 PM, Dennis wrote:On Saturday, 13 September 2025 at 02:39:40 UTC, Richard (Rikki) Andrew Cattermole wrote:That isn't my interpretation. The code you linked would have been removed by a backend optimizer. It is dead code. It exists to get effects from the frontend (poorly), due to lacking language capabilities to archive the results needed. DFA engines, for the purposes of linting like this thread is about, should focus on what code could actually run at runtime. The fact that D is not architectured to be able to stop these false positives is an intended bug in our design. Not all PL's work like this the ML family certainly don't. They tie very advanced analysis engines including DFA into their type system. Just because people don't understand that this is possible, doesn't mean it hasn't been an issue. I see it as a case that people don't know what they don't know. So they don't complain. This is a hole that the C family relies upon having due to historical reasons and so hasn't reached common knowledge.Paraphrased, why should we use a DFA engine instead of DIP1000? (...) Afterwards, I came up with an example, where DIP1000 will error:Thanks for providing the example, it's a great discussion starter. Like I said yesterday, it's not that I don't "know what data flow analysis is all about" - when you have a solution, you can come up with problems it solves, sure. I was mostly curious what real world code made you believe that more advanced DFA is necessary for useful scope pointer analysis. There's many problems with DIP1000, but no issue was ever opened requesting scope checks to be disabled in dead code. As far as I'm aware, [users expect dead code to be type checked by the frontend as usual](https:// forum.dlang.org/post/hullwyrnmgcaoghaqbux forum.dlang.org).
Sep 13
Requiring the language to remove all dead code before checking for errors means the entire program must be subjected to DFA. The result is compilation will slow down dramatically. It means separate compilation is no longer possible. As a pragmatic choice, D is designed to not require DFA. It is a deliberate architectural choice, not a bug. That said, an optional DFA can be quite useful in detecting otherwise hidden programming errors. But I'm not convinced it's a good idea to require it in order to successfully compile code, as your opening example suggests.
Sep 13
On 14/09/2025 3:20 PM, Walter Bright wrote:Requiring the language to remove all dead code before checking for errors means the entire program must be subjected to DFA. The result is compilation will slow down dramatically. It means separate compilation is no longer possible. As a pragmatic choice, D is designed to not require DFA. It is a deliberate architectural choice, not a bug.It seems we have a miscommunication on this. The requirements of D say its a feature, the PL literature says its a bug. It can be both! And yes, I agree the choice D went with is a good one for the C family. Very pragmatic. Where DFA and CFG analysis is implemented (such as limited VRP and blockexit.d) are very impressive, I enjoyed seeing it. However, these algorithms when in use, are not applied consistently. https://github.com/dlang/dmd/issues/21859 It passes nothrow, but not -betterC: ```d void func() nothrow { if (0) { throw new Exception(""); // Error: cannot use `throw` statements with -betterC } } ```That said, an optional DFA can be quite useful in detecting otherwise hidden programming errors. But I'm not convinced it's a good idea to require it in order to successfully compile code, as your opening example suggests.DFA is a rather large body is literature, we have to split it up before going eww no on any of it. There is: - Single forward pass: fast DFA, DIP1000, limited VRP/CFG analysis in frontend. - Work list algorithm: live, slow DFA if I ever build it. Also: - In the type system: i.e. storage classes, dictates what templates instantiate. - Lint level: does not effect mangling, only limits what code is considered valid, no mutation of type system. Considerations on the type system we can basically go no on, simply because dmd isn't architectured to handle it. Which is a good choice as I said elsewhere for the C family, but not all languages are like that and they do get benefits from doing so. Work list algorithm is great but it has to be opt-in, not everyone is going to benefit from the slow down. That kinda leaves us with single forward pass and lint level for anything on by default. Which unfortunately DIP1000 is not (due to it messing with storage classes, not that I'm arguing that was a wrong choice when it was implemented). You can do an absolute load of stuff at lint level with a single forward pass engine. Which I'm finding out as I learn that my fast DFA engine is probably close to bleeding edge oops. Worthy of some serious pride as I am reevaluating other language options lol. This isn't a situation where we only have the worst set of options available to us. Each has their strengths and weaknesses. Its possible to have multiple opinions based upon how you are categorizing things for this particular subject.
Sep 13
A DFA that only works with a forward pass will generate a lot of complaints as people get baffled by the situations where it fails. The forward pass attribute inference D has generates complaints and bug reports when it fails, because it does not handle cycles.
Sep 14
On 15/09/2025 7:55 AM, Walter Bright wrote:A DFA that only works with a forward pass will generate a lot of complaints as people get baffled by the situations where it fails. The forward pass attribute inference D has generates complaints and bug reports when it fails, because it does not handle cycles.Oh I see where you're thinking. Okay, consider the following reports, does any report (or lack thereof) suggest people are going to be baffled by it? Relevant files (start.d for reports source code, out.txt for full log): https://gist.github.com/rikkimax/652f96a33934bd8ae0a68fb849821385#file-start-d ``` start.d testdfa\start.d(236): Error: Variable `ptr` was required to be non-null and has become null foreach (i; 0 .. 2) ^ testdfa\start.d(257): Error: Variable `ptr` was required to be non-null and has become null foreach (i; 0 .. 2) ^ testdfa\start.d(308): Error: Dereference on null variable `ptr` int v = *ptr; // error ^ testdfa\start.d(330): Error: Dereference on null variable `ptr` int v = *ptr; // error ^ testdfa\start.d(613): Error: Assert can be proven to be false assert(val); // Error: val is 0 ^ testdfa\start.d(619): Error: Assert can be proven to be false assert(stack.length == 1); // Error: stack is null ^ testdfa\start.d(673): Error: Dereference on null variable `ptr` return *ptr; // error could be null ^ testdfa\start.d(703): Error: Assert can be proven to be false assert(ptr !is null); // error ^ extracted.d ``` Worth looking at extracted.d and the full log.
Sep 14
I wasn't entirely sure which post to respond to because I needed Rikki's examples, but this response is for Walter. On Monday, 15 September 2025 at 01:26:31 UTC, Richard (Rikki) Andrew Cattermole wrote:``` start.d testdfa\start.d(236): Error: Variable `ptr` was required to be non-null and has become null foreach (i; 0 .. 2) ^ testdfa\start.d(330): Error: Dereference on null variable `ptr` int v = *ptr; // error ^ extracted.d ```Walter, the twin assertions that the above errors will be confusing to programmer, and that they will slow down the compiler are entirely false. Millions of programmers every day will know exactly what is meant. 250kLOC project in less than one second. ``` Main.xaml.cs(530,22,530,24): warning CS0168: The variable 'ex' is declared but never used Customers.cs(26,37,26,53): warning CS0649: Field 'Customers.customersService' is never assigned to, and will always have its default value null DataUploadController.cs(41,46,41,54): warning CS0169: The field 'DataUploadController.resolver' is never used ``` The VB.NET compiler also has DFA and is equally as fast (they share codebases): ``` WorkOrderDirectory.vb(26,13): warning BC42024: Unused local variable: 'J'. ``` The above are all real world examples from the code base that I just rebuilt. There is no rule that states that DFA must be slow as there exist examples disproving the assertion.
Sep 15
On Monday, 15 September 2025 at 23:02:25 UTC, Adam Wilson wrote:(...) There is no rule that states that DFA must be slow as there exist examples disproving the assertion.When Walter talks about DFA, he refers to the specific algorithm he learned in the 1980s and implemented in his backend optimizer, which solves data-flow equations in a loop until it converges on a fixed point. This algorithm is superlinear, so considered 'too slow' for the frontend which aims to analyse n statements in O(n) time. What Rikki implements is a 4th semantic pass, which analyzes the code after the existing 3 linear semantic passes have completed. I don't know exactly how it works, but from his communication I gather it's a subset of 'full' DFA which trades a bit of accuracy for compile speed. experimentation on https://dotnetfiddle.net, even simpler than that. They don't take control flow into account at all: public class Program { static int y; public static void Main() { int x = 0; if (y < 0) if (y > 0) // this branch is dead code unused } } ``` That's not to say it's not useful, I think dmd should compute this information similarly and expose it through an LSP interface. However, this can be implemented in the existing linear semantic3 pass with a simple usage counter/boolean per declaration. These 3 things have all been labeled 'DFA' so far which makes the discussion really confusing, so it's best to clarify exactly which DFA flavor you're talking about.
Sep 16
On 16/09/2025 9:59 PM, Dennis wrote:On Monday, 15 September 2025 at 23:02:25 UTC, Adam Wilson wrote:You haven't got my stuff entirely right. The fast DFA which is what I'm working on right now, runs in semantic 3. Its very similar to DIP1000 in terms of it doing a single forward pass with the statement and expression walkers. The slow DFA which I want to work on eventually, is indeed more like the classical DFA's of the 80's and would run in semantic 4. Since this is a known quantity and it isn't the default experience which all these features hinge on having a solution to, so it isn't a priority. Like classical DFA's the fast DFA does indeed use lattices, join and meet operators. But I have extra information available like depth checks, write counts and scopes modelling that classical DFA's do not have. The fast DFA does something quite interesting, it isn't modelling what is modellable, it instead models what it can't model, and what is left is what is modellable. You need to have unknown state, with ways of putting variables into it. This allows me to cull as much false and perceived false positives as possible. Data flow analysis is the derivative of proof assistants. They work on properties and sometimes in the case of VRP they integrate values into their calculations. If it weren't for my zero false & perceived false positives rule, the fast DFA engine would be almost as good as a full DFA engine. Its missing a complete interprocedural analysis, and single function cyclicity stuff.(...) There is no rule that states that DFA must be slow as there exist examples disproving the assertion.When Walter talks about DFA, he refers to the specific algorithm he learned in the 1980s and implemented in his backend optimizer, which solves data-flow equations in a loop until it converges on a fixed point. This algorithm is superlinear, so considered 'too slow' for the frontend which aims to analyse n statements in O(n) time. What Rikki implements is a 4th semantic pass, which analyzes the code after the existing 3 linear semantic passes have completed. I don't know exactly how it works, but from his communication I gather it's a subset of 'full' DFA which trades a bit of accuracy for compile speed.https://dotnetfiddle.net, even simpler than that. They don't take control flow into account at all: public class Program { static int y; public static void Main() { int x = 0; if (y < 0) if (y > 0) // this branch is dead code unreachable, or x as unused } } ```It does actually. https://csharp.godbolt.org/z/nMjPqYens ``` Program:Main() (FullOpts): xor edi, edi tail.jmp [System.Console:WriteLine(int)] ``` It silently culled the branches. Exactly what DFA should do in this situation.That's not to say it's not useful, I think dmd should compute this information similarly and expose it through an LSP interface. However, this can be implemented in the existing linear semantic3 pass with a simple usage counter/boolean per declaration.And that would be a rewrite and I don't mean just DIP1000. Doing this stuff as part of the type system isn't the normal way to handle it, and for good reason. Attaching a DFA engine today directly into the type system to resolve this is quite frankly too late. From what I've seen that would be a bad direction to go in and have no desire to watch that being tried.
Sep 16
On Tuesday, 16 September 2025 at 10:47:03 UTC, Richard (Rikki) Andrew Cattermole wrote:It does actually. https://csharp.godbolt.org/z/nMjPqYens ``` Program:Main() (FullOpts): xor edi, edi tail.jmp [System.Console:WriteLine(int)] ``` It silently culled the branches. Exactly what DFA should do in this situation.brought up, not the code generator.And that would be a rewrite and I don't mean just DIP1000. Doing this stuff as part of the type system isn't the normal way to handle it, and for good reason. Attaching a DFA engine today directly into the type system to resolve this is quite frankly too late. From what I've seen that would be a bad direction to go in and have no desire to watch that being tried.I don't understand this paragraph at all, you're referring to 5 different things (rewrite, DIP1000, too late, type system, things you've seen) and I don't know the context/relevance of any of those.
Sep 16
On 16/09/2025 11:20 PM, Dennis wrote:And that would be a rewrite and I don't mean just DIP1000. Doing this stuff as part of the type system isn't the normal way to handle it, and for good reason. Attaching a DFA engine today directly into the type system to resolve this is quite frankly too late. From what I've seen that would be a bad direction to go in and have no desire to watch that being tried. I don't understand this paragraph at all, you're referring to 5 different things (rewrite, DIP1000, too late, type system, things you've seen) and I don't know the context/relevance of any of those.Hmm, I think I see a missing piece of the puzzle. A type system is the types and declarations and how they get processed in a language. When a DFA is attached to a type system properly, you are not calling functions like runSemantic on AST nodes. This is handled by the worklist algorithm. This is an important distinction between what we have and what the more advanced type systems with a DFA attached to them have. Take a look at all of the issues we have with resolving templates. These stem from not having the DFA directed type system. Its very easy to exceed what the compiler can instantiate with cycles and other ordering issues. We've all run into them and they are rarely fixable. And no, this isn't a new thing. This is from the 70's just a different branch of DFA and related analysis engines. I doubt Walter would have learned this branch, and may not have been exposed to it, its mostly academic (for a reason up until the 90's). The papers he has sent me are all backend related, not these other branches. I do want to say, that I have been knowingly confusing DFA and other analysis engines, this is fully intended. Their interplay can be pretty heavy and would feed from one to the other here. Lines get blurred in the implementation and their solutions can be based upon the same theory & algorithms. Now compare us to the ML family with signatures (type classes), modules and their variation of meta-programming. They can resolve stuff we wouldn't have a dream of doing. We have to employ CTFE just to come close to what they can do, and even then there is stuff we simply will never be able to do due to ordering issues. This would be a good time to define what lint level means to a type system. In essence it puts anything that the type system knows about into a read only mode. You can't change it in a given lint level engine. This makes DIP1000 in the type system, not lint level. Due to it messing with storage classes. What analysis dmd does do like VRP and CFG, is impressive. Not because they are the most advanced thing in the world (not even close). But because our type system wasn't meant to support it. This isn't a strength of its design. A good thing to note here is how SDC uses a worklist algorithm to process its type system. https://github.com/snazzy-d/sdc/blob/master/src/d/semantic/scheduler.d SDC will have the ability to solve problems that dmd cannot do. Not because the language prevents it, but because the type system implementation does not allow for it by having a much lower ceiling in its capability. To give dmd the kind of capability where you have a DFA engine dictating analysis of the type system, or feeding a significant amount of information throughout a function is a rewrite. Simply because the existing architecture wasn't written with that in mind. It was never meant to have templates like we have today, nor analysis like VRP. The literature is massive on these topics, and gets quite broad. It can be argued differently than what I am. Some people who read this, may conclude that dmd needs a rewrite, and to that end I know SDC folks with love some help. However my take away from a pragmatic perspective is to recognize what we can't do, to help determine what we can do. D is a production language, not an academic or toy one. If we have to give up strong guarantees by default for DFA features then so be it, I'd rather have some than none.
Sep 16
This is what I am talking about: ``` nogc void foo() { bar(); } void bar() { abc(); } void abc() { bar(); } ``` Both bar() and abc() are nogc. But the forward-only checking cannot deduce it, and the result is: test.d(3): Error: ` nogc` function `test.foo` cannot call non- nogc function `test.bar` This is the simplest version of this problem. It can be quite complex, but the result is the same - it cannot be figured out without solving recursive data flow equations. Another version of the problem: ``` nogc void foo() { bar(); } void bar() { abc(); } void abc(); ``` The body of abc() is not available to the compiler, so the compiler cannot deduce nogc for abc(). So, in order to deduce attributes of abc(), the compiler must have the source code available for it. To do a complete job, then, requires compilation of the entire program's code base. Which will require a lot of memory and time. Adam, Rikki, if I'm wrong, please show me how.
Sep 22
In order to do nogc DFA, the following is a start: 1. construct an array of all the functions 2. each function gets allocated a bit vector with one bit for each function in (1) 3. each of those (2) bit vectors gets a corresponding bit set for every function that is called 4. a single bit vector gets a bit set for every function that does a gc operation 5. loop through all the bit vectors, and every function that calls a function that has a bit set in (4) also gets a bit set in (4) 6. keep looping until no more changes in (4) 7. mark all the functions that don't have a bit set as nogc Pruning of the number of functions that need a bit can be done by not considering any functions already marked as nogc, and functions that only call nogc functions and don't do any other allocations can be eagerly marked nogc.
Sep 22
On 23/09/2025 5:54 PM, Walter Bright wrote:This is what I am talking about: ``` nogc void foo() { bar(); } void bar() { abc(); } void abc() { bar(); } ``` Both bar() and abc() are nogc. But the forward-only checking cannot deduce it, and the result is: test.d(3): Error: ` nogc` function `test.foo` cannot call non- nogc function `test.bar` This is the simplest version of this problem. It can be quite complex, but the result is the same - it cannot be figured out without solving recursive data flow equations. Another version of the problem: ``` nogc void foo() { bar(); } void bar() { abc(); } void abc(); ``` The body of abc() is not available to the compiler, so the compiler cannot deduce nogc for abc(). So, in order to deduce attributes of abc(), the compiler must have the source code available for it. To do a complete job, then, requires compilation of the entire program's code base. Which will require a lot of memory and time. Adam, Rikki, if I'm wrong, please show me how.I will assume that `` nogc`` is lint level attribute that does not effect mangling, as that would be a requirement for everything to link. If it does effect mangling, things get so much harder to solve and a lot less flexible, hence I am not keen on such approach for DFA attributes. To solve this is very simple, but it does require the compiler to see the result of the compilation of all three functions. Attributes via the .di generator, or a dedicated special file containing the information are both valid options here. Instead of writing `` nogc``, we write `` nogc iff(foo, bar, abc)``. From there its a simple dependency resolver either during the compilation of the three functions or on usage. This is the basis for my solution to a problem Adam has proposed for PhobosV3, which essentially is the desire to whitelist attributes. Use `` localnogc`` & if the function calls are all `` nogc``, therefore the function is allowed to be `` nogc``. ```d void foo() nogc => bar; // dependency resolve bar&abc's nogc iff's void bar() nogc iff(abc) => abc; void abc() nogc iff(bar) => bar; ``` ```d void foo() nogc => bar; // dependency resolve bar&abc's nogc iff's void bar() nogc iff(abc); // body is irrelevant void abc() nogc iff(bar); // body is irrelevant ``` This is a very simple demonstrating of whole program graph analysis with whole function graph representation of equations per function. As far as I know no production language supports this and has been left in experimental 90's languages. I won't say any .net language does this, but because they are an application VM, they do in fact see the whole program (minus FFI). What they can do, and what we can do without excellent serialization capabilities in the compiler isn't the same. Of course I'm not suggesting nogc should get this particular treatment, in practice its "good enough" (if we ignore white listing), but as an example of effect analysis its a good one even if it isn't a classical problem like pure. Am I right to assume that the reason you haven't presented this solution, is because you want attributes to be finalized when bar&abc complete?
Sep 23
On 9/23/2025 2:43 AM, Richard (Rikki) Andrew Cattermole wrote:Instead of writing `` nogc``, we write `` nogc iff(foo, bar, abc)``. From there its a simple dependency resolver either during the compilation of the three functions or on usage.It's not so simple when there is recursion in the flow graph.This is the basis for my solution to a problem Adam has proposed for PhobosV3, which essentially is the desire to whitelist attributes. Use `` localnogc`` & if the function calls are all `` nogc``, therefore the function is allowed to be `` nogc``.I am reluctant to try to fix this issue with even more attributes.Am I right to assume that the reason you haven't presented this solution, is because you want attributes to be finalized when bar&abc complete?I showed the solution to illustrate the requirement for iteration and convergence to resolve cycles in the flow graph. A DFA that cannot deal with recursion is just going to leave people frustrated, as the current nogc inference engine does.
Sep 23
On 24/09/2025 5:54 AM, Walter Bright wrote:On 9/23/2025 2:43 AM, Richard (Rikki) Andrew Cattermole wrote:For nogc and pure, it is that simple. Other attributes can't be solved with this solution. They are too complex, or effect the calling convention of the function. I don't have a generalized solution.Instead of writing `` nogc``, we write `` nogc iff(foo, bar, abc)``. From there its a simple dependency resolver either during the compilation of the three functions or on usage.It's not so simple when there is recursion in the flow graph.If you don't, you're limited to a single compilation. Which may be fine for a single module, but might not be cross-module.This is the basis for my solution to a problem Adam has proposed for PhobosV3, which essentially is the desire to whitelist attributes. Use `` localnogc`` & if the function calls are all `` nogc``, therefore the function is allowed to be `` nogc``.I am reluctant to try to fix this issue with even more attributes.Indeed. I'd call it an all right example, a better example might be: ```d void* global; void thing(T)(scope void* u, T* v) { global = v; thing(null, u); } ``` You can't solve for this without applying outputs into inputs. Therefore no shortcuts.Am I right to assume that the reason you haven't presented this solution, is because you want attributes to be finalized when bar&abc complete?I showed the solution to illustrate the requirement for iteration and convergence to resolve cycles in the flow graph. A DFA that cannot deal with recursion is just going to leave people frustrated, as the current nogc inference engine does.
Sep 23
On Tuesday, 23 September 2025 at 05:54:00 UTC, Walter Bright wrote:This is the simplest version of this problem. It can be quite complex, but the result is the same - it cannot be figured out without solving recursive data flow equations.I think there's a simpler way. You can model this as a graph reachability problem, with vertices for each function, directed edges for function calls, and a 'gc' node representing possible GC operations such as `new`, closures, extern function without ` nogc` etc. Then all you need is a graph search starting from the gc node, and every node you don't find is nogc. Below is a D implementation you can toy with. If you find a case where this fails where solving data flow equations would succeed, let me know. N.b. the reason dmd can't do this (or the data flow approach) currently is because ` nogc`-ness needs to be determined before the call graph is complete, see https://forum.dlang.org/post/ydemnavvtzytjkmibwvz forum.dlang.org If we remove nogc from a function's type/mangle in a next edition (which I'm all for) it can work. --- ```D import std; class Func { string name; Func[] calledFrom; bool isNogc = true; bool isVisited = false; this(string name) { this.name = name; } void call(Func f) { f.calledFrom ~= this; } } void computeNogc(Func gc) { Func[] stack = [gc]; while (!stack.empty) { auto node = stack[$ - 1]; stack.length--; node.isNogc = false; node.isVisited = true; foreach (f; node.calledFrom.filter!(x => !x.isVisited)) stack ~= f; } } void main() { auto foo = new Func("foo"); auto bar = new Func("bar"); auto abc = new Func("abc"); // pseudo function representing `new`, closures, anything that allocates auto gc = new Func("gc"); // Add your function calls here: foo.call(bar); bar.call(abc); abc.call(bar); abc.call(gc); // abc() does something that allocates computeNogc(gc); // Print results foreach (a; [foo, bar, abc]) writeln(a.name, ". nogc = ", a.isNogc); } ```
Sep 23
Your algorithm is interesting, but I don't think it is fundamentally different from the one I sketched out. Also, it uses expensive operations like array appends, and the filter function. The bit mask route should be much faster.
Sep 23
On Tuesday, 23 September 2025 at 17:46:20 UTC, Walter Bright wrote:Your algorithm is interesting, but I don't think it is fundamentally different from the one I sketched out.I think both algorithms compute the same thing, but your algorithm represents the graph with an adjacency matrix and does an inefficient search. The algorithm be O(V + E), with V = vertices = number of function declarations, and E = edges = number of function calls. That is, compile time should scale linearly with the amount of source code being compiled. Your algorithm does a O(V^2) traversal O(V) times, running in O(V^3) time. That's a pathological worst case scenario, but I'm not convinced the constant factor is so large that it's more practical than a linear search.Also, it uses expensive operations like array appends, and the filter function. The bit mask route should be much faster.I wrote the example as a simple illustration, not as optimized production code. The filter function is trivially replaced with an if statement if the optimizer doesn't already do that. On a slight tangent, I'm not a fan of how dmd's performance culture seems to be hyper focused on local constant time code optimizations that worsen code quality (like bit-hacking and using global variables for reusable memory buffers) when there's a number of glaring issues related to time complexity, which are way more severe than a redundant allocation here and there could ever be. Whenever I investigate what slows down compilation time of real code, it's usually because it triggers a bad case of one of the many quadratic or even exponential algorithms in dmd's code. I'm talking: - O(n^2) MSCoff section insertion (https://github.com/dlang/dmd/pull/16780) - O(n^2) Checking for missing/duplicated switch case statements (TODO) - O(2^n) Nested function sibling detection (https://github.com/dlang/dmd/pull/14886) - O(2^n) Return type isolated from parameters check (https://github.com/dlang/dmd/pull/12885) - O(n*m) CTFE execution because AA literals aren't memoized (https://github.com/dlang/dmd/issues/20179) - O(2^n) Function inlining (https://github.com/dlang/dmd/issues/20315) - O(n^2) Loop optimizations (https://github.com/dlang/dmd/issues/20535) A fast constant factor is nice, but let's not introduce more 'compilation time landmines' into dmd's code base, and prefer linear algorithms over quadratic ones.
Sep 23
On 9/23/2025 1:01 PM, Dennis wrote:On Tuesday, 23 September 2025 at 17:46:20 UTC, Walter Bright wrote:I don't think it is anywhere near that bad. It does more than O(V) only when there are cycles.Your algorithm is interesting, but I don't think it is fundamentally different from the one I sketched out.I think both algorithms compute the same thing, but your algorithm represents the graph with an adjacency matrix and does an inefficient search. The algorithm be O(V + E), with V = vertices = number of function declarations, and E = edges = number of function calls. That is, compile time should scale linearly with the amount of source code being compiled. Your algorithm does a O(V^2) traversal O(V) times, running in O(V^3) time. That's a pathological worst case scenario, but I'm not convinced the constant factor is so large that it's more practical than a linear search.On a slight tangent, I'm not a fan of how dmd's performance culture seems to be hyper focused on local constant time code optimizations that worsen code quality (like bit-hacking and using global variables for reusable memory buffers)The reusable memory buffer thing was benchmarked by myself long ago, and was a big win. I don't know what you mean by bit-hacking. My general approach is to make it work first, not worrying too much about algorithms. I've discovered that D makes it fairly easy to change algorithms, while C/C++ code tends to be a lot more brittle. Recoding for performance later takes advantage of there being a test suite in place for it.when there's a number of glaring issues related to time complexity, which are way more severe than a redundant allocation here and there could ever be.They both matter.Whenever I investigate what slows down compilation time of real code, it's usually because it triggers a bad case of one of the many quadratic or even exponential algorithms in dmd's code. I'm talking:I've discovered many such algorithms in my C++ compiler, and fixed them. I've been less diligent with D's source, mainly because I have too much on my plate. It's great that you're looking into this!- O(n^2) MSCoff section insertion (https://github.com/dlang/dmd/pull/16780) - O(n^2) Checking for missing/duplicated switch case statements (TODO) - O(2^n) Nested function sibling detection (https://github.com/dlang/dmd/pull/14886) - O(2^n) Return type isolated from parameters check (https://github.com/dlang/dmd/pull/12885) - O(n*m) CTFE execution because AA literals aren't memoized (https://github.com/dlang/dmd/issues/20179) - O(2^n) Function inlining (https://github.com/dlang/dmd/issues/20315) - O(n^2) Loop optimizations (https://github.com/dlang/dmd/issues/20535)This is all valuable stuff!A fast constant factor is nice, but let's not introduce more 'compilation time landmines' into dmd's code base, and prefer linear algorithms over quadratic ones.There's mathematical time, the O(whatever) thing, and there's the detail coding practice time. When I worked on the Warp preprocessor, I spent a lot of time trying different algorithms and data structures, and benchmarking. It paid off to not be too pre-judgmental about what time it would take. Other things matter a lot, too, like cache-friendly data structures. I appreciate your efforts here, it is very valuable work you are doing!
Sep 23
On Tuesday, 23 September 2025 at 22:10:22 UTC, Walter Bright wrote:It does more than O(V) only when there are cycles.Just step 2, constructing the bit vectors, is already O(V^2). Even without cycles, you can hit the O(V^3) case if you have a deep call stack. On Tuesday, 23 September 2025 at 22:10:22 UTC, Walter Bright wrote:The reusable memory buffer thing was benchmarked by myself long ago, and was a big win.The problem is, a trick is found to be useful in one spot at one time, and 15 years later it's still blanketed everywhere without question. Meanwhile, it's 2025, the TypeScript compiler got a 3x speed up from leveraging multi-threading, and dmd is still stuck in `__gshared` town, population 500+. Mutable global buffers are even being used in dmd's brand new ARM disassembler, which is a) not performance critical and b) if it were, the key to make it run fast on modern hardware is not `__gshared char[5 + hw.sizeof * 3 + 1 + 1]`.I don't know what you mean by bit-hacking.I meant things like using an `int flags` parameter with 2 bits or'd in instead of 2 `bool` parameters. But it's not the best example, I'm talking about the general pattern of micro-optimizations, not specific instances of the problem.My general approach is to make it work first, not worrying too much about algorithms. (...) It paid off to not be too pre-judgmental about what time it would take.Which is why it puzzles me: There's a simple, classic algorithm that runs in linear time, which is pre-judged to be slow because there's array concatenation in it. Then there's a complex algorithm with a nested for-loop triple, which is pre-judged to be fast because it's going to use bit vectors, and it has extra logic to prune the search space making it not as bad hopefully. And you think the latter fits the philosophy "make it work first"? I mean, be my guest and implement it, I just think you're making it harder for yourself if you do. And I am going to be wary and test performance when something like that lands in dmd 😉I appreciate your efforts here, it is very valuable work you are doing!Thanks!
Sep 23
On 24/09/2025 11:48 AM, Dennis wrote:My general approach is to make it work first, not worrying too much about algorithms. (...) It paid off to not be too pre-judgmental about what time it would take. Which is why it puzzles me: There's a simple, classic algorithm that runs in linear time, which is pre-judged to be slow because there's array concatenation in it.When the problem space has a known finite lifetime, like the case for dmd's AST, to its global object. There is a solution to this particular problem: inlined linked lists. I use them heavily for any compiler type problems that I do. They work great, even if they cost a tiny bit more ram. They work out to be very cheap, and basically non-existant in terms of cost during iteration. At least when applied to stuff like the algorithms me and Dennis presented wrt. deferred logic.
Sep 23
On 9/23/2025 4:52 AM, Dennis wrote:If we remove nogc from a function's type/mangle in a next edition (which I'm all for) it can work.Is there a discussion on this?
Sep 23
On Tuesday, 23 September 2025 at 17:47:23 UTC, Walter Bright wrote:Is there a discussion on this?I've briefly mentioned this at DConf in the past two years, I haven't written a proposal or seen anyone else propose it in detail.
Sep 23
On 9/23/2025 1:05 PM, Dennis wrote:On Tuesday, 23 September 2025 at 17:47:23 UTC, Walter Bright wrote:Ok. There are other things that affect this, too, like typing of pointers to functions.Is there a discussion on this?I've briefly mentioned this at DConf in the past two years, I haven't written a proposal or seen anyone else propose it in detail.
Sep 23
On Tuesday, 23 September 2025 at 22:11:19 UTC, Walter Bright wrote:Ok. There are other things that affect this, too, like typing of pointers to functions.Which is incidentally one of the many long standing problems with function attributes being part of function types, see for example https://github.com/dlang/DIPs/pull/198
Sep 23
On Saturday, 13 September 2025 at 16:12:06 UTC, Richard (Rikki) Andrew Cattermole wrote:The fact that D is not architectured to be able to stop these false positives is an intended bug in our design. Not all PL's work like this the ML family certainly don't. They tie very advanced analysis engines including DFA into their type system.So the answer to my question is: this it not motivated by reported issues from D users, but by other languages doing this. If the only example for D you can give is an "unrealistic" one, can you please give a realistic example from another language then?Just because people don't understand that this is possible, doesn't mean it hasn't been an issue. I see it as a case that people don't know what they don't know. So they don't complain. This is a hole that the C family relies upon having due to historical reasons and so hasn't reached common knowledge.I understand DFA is an interesting research topic, and if your claim was that it might lead to interesting discoveries or new paradigms I fully agree. But when you say it's necessary to fix DIP1000 specifically, I need some evidence to believe it. There's been over 100 DIP1000-related bug reports with simple non-DFA solutions, and I'm skeptical that people 'don't know' to report DFA related issues. Several users reported issues about Value Range Propagation (VRP) not working across statements. Even when they don't explicitly know DFA, they still experience a problem and can intuit a solution being available.
Sep 16
On Tuesday, 16 September 2025 at 08:50:00 UTC, Dennis wrote:On Saturday, 13 September 2025 at 16:12:06 UTC, Richard (Rikki) Andrew Cattermole wrote:I am not trying to improve D based upon what other languages can do. They can only be an inspiration. For instance I take note of how people view other languages and compilers solutions both historically and in real time.The fact that D is not architectured to be able to stop these false positives is an intended bug in our design. Not all PL's work like this the ML family certainly don't. They tie very advanced analysis engines including DFA into their type system.So the answer to my question is: this it not motivated by reported issues from D users, but by other languages doing this. If the only example for D you can give is an "unrealistic" one, can you please give a realistic example from another language then?In my last post, which was a giant wall of text, I explained how this isn't the case. Noting templates exhibit some of the underlying issues that DIP1000 has. https://forum.dlang.org/post/10abme4$2ekj$1 digitalmars.com Both me and Walter seem to have failed to convey these failures in dmd's architecture. I will be ignoring the lacking features in DIP1000's attribution capabilities. This is purely engine implementation design stuff. Here are some code similar to what I've sent Walter over the past year for what I want to work by default for escape analysis in D. These are backed up by all the debugging and helping of other people over the years. This will give the best experience over all I expect. ```d extern(C) int printf(const char*, ...); void myFunc1(scope const char* str) { printf("Text: %s\n", str); // ok } ``` This works by acknowledging that variable state isn't "set" and "not-set" for any given property, its "set", "not-set", and "unknown" at a minimum. My fast DFA engine does this, you can see it in code that looks like: ```d void aPrototype(ref bool); void myFunc2() { bool b; aPrototype(b); assert(b); // ok } ``` The truthiness value went from false, to unknown with the function call not being attributed. Escape into a global variable: ```d int* global; void myFunc3(scope int* ptr) { global = ptr; // Error: ptr going into global escapes } ``` Might not error (depends on how it plays out): ```d int* global; struct Foo { int* field; void method() { global = field; } } void myFunc4(scope int* ptr) { Foo foo; foo.field = ptr; foo.method; } ``` The point of this set of tunings isn't because I want to have opinions. The point is to get a subset of escape analysis turned on by default that the community will find value in. Without hitting any of the many issues that they have perceptually had with DIP1000. To implement this, I did review DIP1000 and concluded that it would be a rewrite. It simply isn't designed to handle these kind of properties. At which point you might as well have a DFA for the escape analysis to live in, and up the ceiling of capabilities. At this months monthly meeting I asked if people would find value in the ``myFunc3`` error, hands did go up. I also raised the double-duty of ``scope`` wrt. stack allocation. Fact is people find value in a subset of errors from escape analysis, but it isn't the same subset that Atila is proposing with his "no inferred scope variable reports" approach. If I'm not convincing for any reason, I suggest doing a survey to see how it plays out. I expect you will get four groups: 1. No protection ( system only), tiny minority 2. False positive heavy, as much protection as possible default, tiny minority 3. False positive heavy, as much protection as possible opt-in, small minority 4. A subset of protection ( safe and the above examples), the majority These are based upon my previous survey. If you can do this without a full rewrite of DIP1000 I will be impressed. However, you will still need to solve borrowing so that I can get reference counting.Just because people don't understand that this is possible, doesn't mean it hasn't been an issue. I see it as a case that people don't know what they don't know. So they don't complain. This is a hole that the C family relies upon having due to historical reasons and so hasn't reached common knowledge.I understand DFA is an interesting research topic, and if your claim was that it might lead to interesting discoveries or new paradigms I fully agree. But when you say it's necessary to fix DIP1000 specifically, I need some evidence to believe it. There's been over 100 DIP1000-related bug reports with simple non-DFA solutions, and I'm skeptical that people 'don't know' to report DFA related issues. Several users reported issues about Value Range Propagation (VRP) not working across statements. Even when they don't explicitly know DFA, they still experience a problem and can intuit a solution being available.
Sep 16
On Tuesday, 16 September 2025 at 14:21:07 UTC, Richard (Rikki) Andrew Cattermole wrote:Here are some code similar to what I've sent Walter over the past year for what I want to work by default for escape analysis in D.You gave 4 code snippets. 1 and 2 already compile as expected, 3 already errors as expected if you add ` safe`, and 4 has no defined expected behavior. None of these have a source linking them to an existing D user / code base. To reiterate, the question is:Can you think of any more realistic examples where DFA allows you to infer safety where DIP1000 is currently limited?As long as you keep dancing around it, I'm going to assume the answer is 'no'.
Sep 16
On 17/09/2025 3:52 AM, Dennis wrote:On Tuesday, 16 September 2025 at 14:21:07 UTC, Richard (Rikki) Andrew Cattermole wrote:As expected. These examples show true positives. The problem is eliminating perceived false positives but keeping the intended true positives.Here are some code similar to what I've sent Walter over the past year for what I want to work by default for escape analysis in D.You gave 4 code snippets. 1 and 2 already compile as expected, 3 already errors as expected if you add ` safe`, and 4 has no defined expected behavior. None of these have a source linking them to an existing D user / code base. To reiterate, the question is:Okay fine. Bugzilla is down, so I can't look through the WONTFIX's and I'm not going to go hunting on the N.G. learn forum. I can spot: https://github.com/dlang/dmd/issues/19679 You tried to fix it and haven't in the past. With a DFA for the infrastructure, that wouldn't be possible to have existed. Another example from you for the above: https://github.com/dlang/dmd/issues/18113 Here is a ticket, with a comment from you showing why DIP1000 needs to be lint level attribution. https://github.com/dlang/dmd/issues/20302#issuecomment-2542028404 If it isn't you break things like the .di generator since it runs before semantic. It also results in issues like: https://github.com/dlang/dmd/issues/19853 Another example where DIP1000 loses track of lifetimes: https://github.com/dlang/dmd/issues/18153 Pretty nasty. To fix that you would need to keep track of a stack of lifetimes, with full knowledge of the variables in a function signature. All things a DFA would be good at. An example where appropriate CFG processing with a DFA would allow this to not error: https://github.com/dlang/dmd/issues/19659 Sadly my fast DFA isn't able to model it so the statement walker marks the variable as unmodellable which prevents the false positive. To model this across function boundaries, we need to add an escape set in the CFG of blockexit and on the function declaration.Can you think of any more realistic examples where DFA allows you to infer safety where DIP1000 is currently limited?As long as you keep dancing around it, I'm going to assume the answer is 'no'.
Sep 16
On Tuesday, 16 September 2025 at 17:09:18 UTC, Richard (Rikki) Andrew Cattermole wrote:Okay fine. Bugzilla is down, so I can't look through the WONTFIX's and I'm not going to go hunting on the N.G. learn forum.Thanks a lot! I don't need many, I just need 1 good one, so you delivered already.I can spot: https://github.com/dlang/dmd/issues/19679 You tried to fix it and haven't in the past. With a DFA for the infrastructure, that wouldn't be possible to have existed. Another example from you for the above: https://github.com/dlang/dmd/issues/18113The escape checking logic for assigning a parameter with scope inference to a local variable is bad, definitely. Like you say I've been working on a fix which remains unfinished because: - dmd/escape.d is full of duplicated spaghetti logic (though it has improved) - The test suite and buildkite projects rely on existing quirks - dip1000 by default is no longer a priority issue, so I shifted work to more important stuff Unless you consider my refactoring and WIP solution a rewrite/DFA, I wouldn't say a DFA rewrite is *required*, but I certainly was tempted to throw out dmd/escape.d and rewrite that hot mess from scratch 🙂.Here is a ticket, with a comment from you showing why DIP1000 needs to be lint level attribution. https://github.com/dlang/dmd/issues/20302#issuecomment-2542028404You haven't defined what "lint level attribution" means, but that issue would be solved if we add inferred `scope` parameters to a function type's mangle in a next edition so I don't see how this issue necessitates a rewrite/DFA.If it isn't you break things like the .di generator since it runs before semantic.The .di generator also needs to run semantic to emit correct inferred function attributes, that change doesn't require rewriting DIP1000 / adding DFA.It also results in issues like: https://github.com/dlang/dmd/issues/19853That issue is invalid, see closing comment.Another example where DIP1000 loses track of lifetimes: https://github.com/dlang/dmd/issues/18153 To fix that you would need to keep track of a stack of lifetimes, with full knowledge of the variables in a function signature. All things a DFA would be good at.I haven't studied that issue, but I suspect it has more to do with the internal opApply rewrite than a lack of DFA.To model this across function boundaries, we need to add an escape set in the CFG of blockexit and on the function declaration.How would you throw a `scope` Exception across function boundaries? I can't picture that being valid but I haven't thought very hard. --- So far, the most representative issue is definitely https://github.com/dlang/dmd/issues/19679. I agree that a rewrite using a DFA engine would be sufficient to handle that case, I just don't think it's necessary. And that's because I'm noticing a theme in the code snippets you use to showcase your DFA engine vs. the reduced code snippets from the bug reports you just listed. Notice how unlike DFA examples, the DIP1000 bugs are not about asserts, if-statements or loops. The problematic code almost always exists inside the same basic block. That's why I think considering a function's control-flow graph is currently overkill for DIP1000.
Sep 16
On 17/09/2025 8:00 AM, Dennis wrote:On Tuesday, 16 September 2025 at 17:09:18 UTC, Richard (Rikki) Andrew Cattermole wrote:I wish I could have offered more in WONTFIX, it would have been illuminating in false positive/perceived false positive territory.Okay fine. Bugzilla is down, so I can't look through the WONTFIX's and I'm not going to go hunting on the N.G. learn forum.Thanks a lot! I don't need many, I just need 1 good one, so you delivered already.rewrite which you determined independently of me? I suspect he is operating under the assumption that to get it on by default is just a matter of tweaking.I can spot: https://github.com/dlang/dmd/issues/19679 You tried to fix it and haven't in the past. With a DFA for the infrastructure, that wouldn't be possible to have existed. Another example from you for the above: https://github.com/dlang/dmd/ issues/18113The escape checking logic for assigning a parameter with scope inference to a local variable is bad, definitely. Like you say I've been working on a fix which remains unfinished because: - dmd/escape.d is full of duplicated spaghetti logic (though it has improved) - The test suite and buildkite projects rely on existing quirks - dip1000 by default is no longer a priority issue, so I shifted work to more important stuff Unless you consider my refactoring and WIP solution a rewrite/DFA, I wouldn't say a DFA rewrite is *required*, but I certainly was tempted to throw out dmd/escape.d and rewrite that hot mess from scratch 🙂.Is Atila aware that DIP1000 implementation has these issues requiring aDuring this I found a few that needed closing as they were out of date. Sent to Nic to close since I don't have rights to do so.It also results in issues like: https://github.com/dlang/dmd/issues/19853That issue is invalid, see closing comment.Something akin to: ```d void myThrow(return MyException e) throws(MyException) { throw e; } void caller() { scope e = new MyException; try { myThrow(e); } catch(Exception) { } } ``` Why would someone do that? I don't know. However the main thing is to map function calls to their corresponding catches, allowing the CFG analysis to do its job. Impossible for it to work correctly otherwise in regards to catch statements. Otherwise any try block with function calls to throwing functions has to be marked the catches unmodellable. Which is rather limiting.Another example where DIP1000 loses track of lifetimes: https:// github.com/dlang/dmd/issues/18153 To fix that you would need to keep track of a stack of lifetimes, with full knowledge of the variables in a function signature. All things a DFA would be good at.I haven't studied that issue, but I suspect it has more to do with the internal opApply rewrite than a lack of DFA.To model this across function boundaries, we need to add an escape set in the CFG of blockexit and on the function declaration.How would you throw a `scope` Exception across function boundaries? I can't picture that being valid but I haven't thought very hard.--- So far, the most representative issue is definitely https://github.com/ dlang/dmd/issues/19679. I agree that a rewrite using a DFA engine would be sufficient to handle that case, I just don't think it's necessary. And that's because I'm noticing a theme in the code snippets you use to showcase your DFA engine vs. the reduced code snippets from the bug reports you just listed. Notice how unlike DFA examples, the DIP1000 bugs are not about asserts, if-statements or loops. The problematic code almost always exists inside the same basic block. That's why I think considering a function's control-flow graph is currently overkill for DIP1000.Of course that trend exists, I'm currently working on the truthiness & nullability features!Its all about CFG analysis. But there is a reason for it. Each analysis feeds into each other, giving better results over all.
Sep 16
On Tuesday, 16 September 2025 at 23:02:05 UTC, Richard (Rikki) Andrew Cattermole wrote:I wish I could have offered more in WONTFIX, it would have been illuminating in false positive/perceived false positive territory.My search gives: - 18295 [Scope][dip1000] `scope class` check too conservative under -dip1000 - 18637 [scope][DIP1000] "copying & i into allocated memory escapes a reference to local variable i" where it's inappropriate - 19301 [DIP1000] missing overload abilities - 23941 [DIP1000] Overloading by scope should be allowed There's nothing interesting there.Is Atila aware that DIP1000 implementation has these issues requiring a rewrite which you determined independently of me?I said I was tempted to rewrite, not that it is required. You have to ask Atila whether he is aware, but I did talk about the DIP1000s issues and implementation quality at DConf '23 and '24.I suspect he is operating under the assumption that to get it on by default is just a matter of tweaking.It is. The thing that's blocking DIP1000 being enabled by default is that existing code relies on slicing local static arrays being freely allowed in ` safe` functions, but that could be disallowed in a next edition, unless you explicitly opt into scope semantics.
Sep 16
On 9/13/2025 4:06 AM, Dennis wrote:As far as I'm aware, [users expect dead code to be type checked by the frontend as usual](https://forum.dlang.org/post/hullwyrnmgcaoghaqbux forum.dlang.org).That is correct. For the false path of a `static if`, the code must be syntactically correct but does not have to be semantically correct. `static if` would be rather useless if it required semantic correctness.
Sep 13
On 14/09/2025 3:12 PM, Walter Bright wrote:On 9/13/2025 4:06 AM, Dennis wrote:The linked code is not static if, its a regular if.As far as I'm aware, [users expect dead code to be type checked by the frontend as usual](https://forum.dlang.org/post/ hullwyrnmgcaoghaqbux forum.dlang.org).That is correct. For the false path of a `static if`, the code must be syntactically correct but does not have to be semantically correct. `static if` would be rather useless if it required semantic correctness.
Sep 13
On 9/13/2025 8:42 PM, Richard (Rikki) Andrew Cattermole wrote:The linked code is not static if, its a regular if.I know. I was trying to clarify the difference between regular if and static if.
Sep 14
DIP1000 is necessary when the DFA is faced with only a function prototype rather than having the body of the function available.
Sep 13
On 14/09/2025 3:09 PM, Walter Bright wrote:DIP1000 is necessary when the DFA is faced with only a function prototype rather than having the body of the function available.I'm curious as to why you said this. There are only three differences I can come up with between a classical DFA and DIP1000: - Attributes and lack thereof - Single forward pass vs work list algorithm - In the type system (storage classes ugh), vs lint level or optimization level Any variation of the three can be present in a _modern_ frontend DFA. The delineation between the two is basically nonexistent from my perspective. Unless you are not referring to modern frontend DFA, and actually mean classical DFA's which are indeed quite different.
Sep 13