digitalmars.D - Why use a DFA instead of DIP1000?

Richard (Rikki) Andrew Cattermole (43/43) Sep 12 Yesterday (or today, depending upon who you ask), Dennis asked a

Dejan Lekic (1/1) Sep 13 Good stuff, Rikki. Thanks for working on this.

Richard (Rikki) Andrew Cattermole (2/3) Sep 13 Thanks!

IchorDev (13/18) Sep 13 It could run if asserts were off. Obviously asserts are never

Richard (Rikki) Andrew Cattermole (10/29) Sep 13 Its not meant to be too realistic, what it is meant to do is have all

kdevel (13/26) Sep 13 What about "release" mode? And what about that code:

IchorDev (8/22) Sep 13 The program has entered an invalid state and the behaviour of the

kdevel (19/38) Sep 13 I thought that the intent of the original code

IchorDev (3/20) Sep 13 Then you clearly didn't read the function beforehand.
Richard (Rikki) Andrew Cattermole (4/53) Sep 13 Half right.

kdevel (7/35) Sep 13 Then it is dead code that should be manually removed? Isn't that

Richard (Rikki) Andrew Cattermole (19/56) Sep 15 In practice, when this is applied people don't complain about it.

Richard (Rikki) Andrew Cattermole (11/44) Sep 13 What assert lowers to is irrelevant as far as a frontend DFA is

Dennis (13/16) Sep 13 Thanks for providing the example, it's a great discussion starter.

Richard (Rikki) Andrew Cattermole (16/36) Sep 13 That isn't my interpretation.

Walter Bright (8/8) Sep 13 Requiring the language to remove all dead code before checking for error...

Richard (Rikki) Andrew Cattermole (48/59) Sep 13 It seems we have a miscommunication on this.

Walter Bright (4/4) Sep 14 A DFA that only works with a forward pass will generate a lot of complai...

Richard (Rikki) Andrew Cattermole (37/42) Sep 14 Oh I see where you're thinking.

Adam Wilson (32/43) Sep 15 I wasn't entirely sure which post to respond to because I needed

Dennis (38/42) Sep 16 When Walter talks about DFA, he refers to the specific algorithm

Richard (Rikki) Andrew Cattermole (40/81) Sep 16 You haven't got my stuff entirely right.

Dennis (8/24) Sep 16 I was referring to C#'s unused variable warnings that Adam

Richard (Rikki) Andrew Cattermole (56/67) Sep 16 Hmm, I think I see a missing piece of the puzzle.

Walter Bright (25/25) Sep 22 This is what I am talking about:

Walter Bright (13/13) Sep 22 In order to do @nogc DFA, the following is a start:
Richard (Rikki) Andrew Cattermole (41/74) Sep 23 I will assume that ``@nogc`` is lint level attribute that does not

Walter Bright (7/18) Sep 23 I am reluctant to try to fix this issue with even more attributes.

Richard (Rikki) Andrew Cattermole (17/38) Sep 23 For @nogc and pure, it is that simple.

Dennis (62/65) Sep 23 I think there's a simpler way. You can model this as a graph

Walter Bright (4/4) Sep 23 Your algorithm is interesting, but I don't think it is fundamentally dif...

Dennis (44/48) Sep 23 I think both algorithms compute the same thing, but your

Walter Bright (21/57) Sep 23 I don't think it is anywhere near that bad. It does more than O(V) only ...

Dennis (33/44) Sep 23 Just step 2, constructing the bit vectors, is already O(V^2).

Richard (Rikki) Andrew Cattermole (10/21) Sep 23 When the problem space has a known finite lifetime, like the case for

Walter Bright (2/4) Sep 23 Is there a discussion on this?

Dennis (5/6) Sep 23 I've briefly mentioned this at DConf in the past two years, I

Walter Bright (3/8) Sep 23 Ok. There are other things that affect this, too, like typing of pointer...

Dennis (5/7) Sep 23 Which is incidentally one of the many long standing problems with

Dennis (17/26) Sep 16 So the answer to my question is: this it not motivated by

Richard (Rikki) Andrew Cattermole (91/118) Sep 16 I am not trying to improve D based upon what other languages can

Dennis (9/14) Sep 16 You gave 4 code snippets. 1 and 2 already compile as expected, 3

Richard (Rikki) Andrew Cattermole (30/46) Sep 16 As expected.

Dennis (41/64) Sep 16 Thanks a lot! I don't need many, I just need 1 good one, so you

Richard (Rikki) Andrew Cattermole (30/89) Sep 16 I wish I could have offered more in WONTFIX, it would have been

Dennis (17/25) Sep 16 My search gives:

Walter Bright (4/7) Sep 13 That is correct. For the false path of a `static if`, the code must be

Richard (Rikki) Andrew Cattermole (2/10) Sep 13 The linked code is not static if, its a regular if.

Walter Bright (2/3) Sep 14 I know. I was trying to clarify the difference between regular if and st...

Walter Bright (2/2) Sep 13 DIP1000 is necessary when the DFA is faced with only a function prototyp...

Richard (Rikki) Andrew Cattermole (12/14) Sep 13 I'm curious as to why you said this.

Richard (Rikki) Andrew Cattermole <richard cattermole.co.nz> writes:

Yesterday (or today, depending upon who you ask), Dennis asked a 
wonderful question in the monthly meeting.

Paraphrased, why should we use a DFA engine instead of DIP1000?

And you know what, this is a great question, especially since 
DIP1000 is a subset of a specialised DFA engine. But with some 
serious ceiling limitations.

Afterwards, I came up with an example, where DIP1000 will error:

```d
int* ptr;

void func(bool b, scope int* p)  safe {
   assert(!b);

   if (b) {
     ptr = p; // Error: scope variable `p` assigned to global 
variable `ptr`
   }
}
```

This is clearly a false positive; that branch could never run!

One of the hallmarks of a real data flow analysis engine is 
called dead code elimination; all optimising backends implement 
it. In fact, it's one of the first that ever gets implemented and 
DIP1000 can't do it!

But what if we did have the ability to model truthiness? Well 
then we could cull that dead branch, and not do the report. But 
that is slow right? NOPE! We can actually do this fast!

Here is a modified example from my fast DFA engine, which I've 
been working on for around six months for both truthiness and 
nullability:

```d
void branchKill(bool b)
{
     assert(!b);

     if (b)
     {
         int* ptr;
         int val = *ptr; // no error branch dead
     }
}
```

My hope, should it succeed, is to be on by default; this requires 
it to be fast and not have false positives like the above code.

I hope this is enlightening for those who don't know what data 
flow analysis is all about!

Sep 12

Dejan Lekic <dejan.lekic gmail.com> writes:

Good stuff, Rikki. Thanks for working on this.

Sep 13

"Richard (Rikki) Andrew Cattermole" <richard cattermole.co.nz> writes:

On 13/09/2025 9:42 PM, Dejan Lekic wrote:
 Good stuff, Rikki. Thanks for working on this.

Thanks!

Sep 13

IchorDev <zxinsworld gmail.com> writes:

On Saturday, 13 September 2025 at 02:39:40 UTC, Richard (Rikki) 
Andrew Cattermole wrote:
 This is clearly a false positive; that branch could never run!

It could run if asserts were off. Obviously asserts are never 
meant to fail when the program is in a correct state though.

 One of the hallmarks of a real data flow analysis engine is 
 called dead code elimination; all optimising backends implement 
 it. In fact, it's one of the first that ever gets implemented 
 and DIP1000 can't do it!

This example doesn't seem very realistic. A branch that never 
runs shouldn't really exist to begin with, right? Can you think 
of any more realistic examples where DFA allows you to infer 
safety where DIP1000 is currently limited?

Also I'm curious what other problems DFA could be applied to? For 
instance, detecting what types a function throws and allowing you 
to catch only those types instead of always having to 
`catch(Exception)` to satisfy the compiler. Or doing more complex 
type inference.

Sep 13

"Richard (Rikki) Andrew Cattermole" <richard cattermole.co.nz> writes:

On 13/09/2025 10:13 PM, IchorDev wrote:
 On Saturday, 13 September 2025 at 02:39:40 UTC, Richard (Rikki) Andrew 
 Cattermole wrote:
 This is clearly a false positive; that branch could never run!

 
 It could run if asserts were off. Obviously asserts are never meant to 
 fail when the program is in a correct state though.
 
 One of the hallmarks of a real data flow analysis engine is called 
 dead code elimination; all optimising backends implement it. In fact, 
 it's one of the first that ever gets implemented and DIP1000 can't do it!

 
 This example doesn't seem very realistic. A branch that never runs 
 shouldn't really exist to begin with, right? Can you think of any more 
 realistic examples where DFA allows you to infer safety where DIP1000 is 
 currently limited?

Its not meant to be too realistic, what it is meant to do is have all 
the primitives in place to show that the compiler has ability to track 
state and what happens should X happen.

Everything from loops, assignments, if statements, to function calls 
mess with this, and its not a great example. The logs from stuff like 
this get very big. Small font size + multi-gigabyte logs. Not fun and 
certainly not educational.

 Also I'm curious what other problems DFA could be applied to? For 
 instance, detecting what types a function throws and allowing you to 
 catch only those types instead of always having to `catch(Exception)` to 
 satisfy the compiler. Or doing more complex type inference.

Exceptions need a set in the compiler rather than just "is nothrow flag 
set". The compiler can (and should) do this without DFA.

Sep 13

kdevel <kdevel vogtner.de> writes:

On Saturday, 13 September 2025 at 02:39:40 UTC, Richard (Rikki) 
Andrew Cattermole wrote:
 [...]
 Afterwards, I came up with an example, where DIP1000 will error:

 ```d
 int* ptr;

 void func(bool b, scope int* p)  safe {
   assert(!b);

   if (b) {
     ptr = p; // Error: scope variable `p` assigned to global 
 variable `ptr`
   }
 }
 ```

 This is clearly a false positive; that branch could never run!

What about "release" mode? And what about that code:

```d
void main ()
{
    assert (0);

    string s = "x";
    s [0] = 'y';
}
```

Is "Error: cannot modify `immutable` expression `s[0]`" a false 
positive, too?

Sep 13

IchorDev <zxinsworld gmail.com> writes:

On Saturday, 13 September 2025 at 10:19:45 UTC, kdevel wrote:
 On Saturday, 13 September 2025 at 02:39:40 UTC, Richard (Rikki) 
 Andrew Cattermole wrote:
 This is clearly a false positive; that branch could never run!

 What about "release" mode? And what about that code:

 ```d
 void main ()
 {
    assert (0);

    string s = "x";
    s [0] = 'y';
 }
 ```

The program has entered an invalid state and the behaviour of the 
continuing execution of the program is undefined.
https://dlang.org/spec/expression.html#AssertExpression
That said, I don't know if we should be suppressing these errors. 
Doesn't seem useful to me?

 Is "Error: cannot modify `immutable` expression `s[0]`" a false 
 positive, too?

It's not a 'false positive' because it's not related to DIP1000's 
stack escape rules at all.

Sep 13

kdevel <kdevel vogtner.de> writes:

On Saturday, 13 September 2025 at 11:08:29 UTC, IchorDev wrote:
 [...]
 What about "release" mode? And what about that code:

 ```d
 void main ()
 {
    assert (0);

    string s = "x";
    s [0] = 'y';
 }
 ```

 The program has entered an invalid state and the behaviour of 
 the continuing execution of the program is undefined.

Please note that the program is not run but just compiled.

 [...]
 Is "Error: cannot modify `immutable` expression `s[0]`" a 
 false positive, too?

 It's not a 'false positive' because it's not related to 
 DIP1000's stack escape rules at all.

I thought that the intent of the original code

```d
int* ptr;

void func(bool b, scope int* p)  safe {
   assert(!b);

   if (b) {
     ptr = p; // Error: scope variable `p` assigned to global 
variable `ptr`
   }
}
```

was to show that there is some code-block

```d
     ptr = p; // Error: scope variable `p` assigned to global 
variable `ptr`
```

which is never executed regardless of the value of `b`.

Sep 13

IchorDev <zxinsworld gmail.com> writes:

On Saturday, 13 September 2025 at 13:06:06 UTC, kdevel wrote:
 Please note that the program is not run but just compiled.

What does that even mean?

 I thought that the intent of the original code

 ```d
 void func(bool b, scope int* p)  safe {
   assert(!b);

   if (b) {
     ptr = p; // Error: scope variable `p` assigned to global 
 variable `ptr`
   }
 }
 ```

 was to show that there is some code-block

 ```d
     ptr = p; // Error: scope variable `p` assigned to global 
 variable `ptr`
 ```

 which is never executed regardless of the value of `b`.

Then you clearly didn't read the function beforehand.

Sep 13

"Richard (Rikki) Andrew Cattermole" <richard cattermole.co.nz> writes:

On 14/09/2025 1:06 AM, kdevel wrote:
 On Saturday, 13 September 2025 at 11:08:29 UTC, IchorDev wrote:
 [...]
 What about "release" mode? And what about that code:

 ```d
 void main ()
 {
    assert (0);

    string s = "x";
    s [0] = 'y';
 }
 ```

 The program has entered an invalid state and the behaviour of the 
 continuing execution of the program is undefined.

 
 Please note that the program is not run but just compiled.
 
 [...]
 Is "Error: cannot modify `immutable` expression `s[0]`" a false 
 positive, too?

 It's not a 'false positive' because it's not related to DIP1000's 
 stack escape rules at all.

 
 I thought that the intent of the original code
 
 ```d
 int* ptr;
 
 void func(bool b, scope int* p)  safe {
    assert(!b);
 
    if (b) {
      ptr = p; // Error: scope variable `p` assigned to global variable 
 `ptr`
    }
 }
 ```
 
 was to show that there is some code-block
 
 ```d
      ptr = p; // Error: scope variable `p` assigned to global variable 
 `ptr`
 ```
 
 which is never executed regardless of the value of `b`.

Half right.

There is a code block that will never be executed, it is because of 
variable b's state is known to prevent that code block from execution.

Sep 13

kdevel <kdevel vogtner.de> writes:

On Saturday, 13 September 2025 at 15:43:27 UTC, Richard (Rikki) 
Andrew Cattermole wrote:
 [...]
 I thought that the intent of the original code
 
 ```d
 int* ptr;
 
 void func(bool b, scope int* p)  safe {
    assert(!b);
 
    if (b) {
      ptr = p; // Error: scope variable `p` assigned to global 
 variable `ptr`
    }
 }
 ```
 
 was to show that there is some code-block
 
 ```d
      ptr = p; // Error: scope variable `p` assigned to global 
 variable `ptr`
 ```
 
 which is never executed regardless of the value of `b`.

 Half right.

 There is a code block that will never be executed, it is 
 because of variable b's state is known to prevent that code 
 block from execution.

Then it is dead code that should be manually removed? Isn't that 
the problem here? The compiler correctly complains about invalid 
code, this is not a false positive, even if that code is never 
executed.

If I want code not to be executed I comment it out or delete it.

Sep 13

"Richard (Rikki) Andrew Cattermole" <richard cattermole.co.nz> writes:

On 14/09/2025 7:58 AM, kdevel wrote:
 On Saturday, 13 September 2025 at 15:43:27 UTC, Richard (Rikki) Andrew 
 Cattermole wrote:
 [...]
 I thought that the intent of the original code

 ```d
 int* ptr;

 void func(bool b, scope int* p)  safe {
    assert(!b);

    if (b) {
      ptr = p; // Error: scope variable `p` assigned to global 
 variable `ptr`
    }
 }
 ```

 was to show that there is some code-block

 ```d
      ptr = p; // Error: scope variable `p` assigned to global 
 variable `ptr`
 ```

 which is never executed regardless of the value of `b`.

 Half right.

 There is a code block that will never be executed, it is because of 
 variable b's state is known to prevent that code block from execution.

 
 Then it is dead code that should be manually removed? Isn't that the 
 problem here? The compiler correctly complains about invalid code, this 
 is not a false positive, even if that code is never executed.
 
 If I want code not to be executed I comment it out or delete it.

In practice, when this is applied people don't complain about it.

https://github.com/dlang/dmd/issues/21859

Compiles with full D, but not -betterC (which does the extra check 
without the culling which is a bug).

```d
void func() nothrow {
     if (0) {
      	 throw new Exception("");  // Error: cannot use `throw` 
statements with -betterC
     }
}
```

When both DFA and Control Flow graph (CFG) processing is working 
correctly its a well loved compiler feature. You don't know its there 
unless something has gone wrong like in the above code.

When I learned how nothrow inference worked I was really impressed with 
how it was applying CFG analysis. I did not expect the frontend to be 
doing it.

Sep 15

"Richard (Rikki) Andrew Cattermole" <richard cattermole.co.nz> writes:

On 13/09/2025 10:19 PM, kdevel wrote:
 On Saturday, 13 September 2025 at 02:39:40 UTC, Richard (Rikki) Andrew 
 Cattermole wrote:
 [...]
 Afterwards, I came up with an example, where DIP1000 will error:

 ```d
 int* ptr;

 void func(bool b, scope int* p)  safe {
   assert(!b);

   if (b) {
     ptr = p; // Error: scope variable `p` assigned to global variable 
 `ptr`
   }
 }
 ```

 This is clearly a false positive; that branch could never run!

 
 What about "release" mode? And what about that code:

What assert lowers to is irrelevant as far as a frontend DFA is 
concerned. It runs before that is taken into account.

 ```d
 void main ()
 {
     assert (0);
 
     string s = "x";
     s [0] = 'y';
 }
 ```
 
 Is "Error: cannot modify `immutable` expression `s[0]`" a false 
 positive, too?

Yes actually.

Not all languages, and some in the ML line should have the ability to 
not process anything after the assert statement. They apply DFA into the 
type system to varying degrees. We don't.

For halt, we may have the ability to kill it off eventually. However we 
can't cull non-halt asserts, as we are not tieing our semantic analysis 
to a DFA engine. It is too late to change that.

Note: a halt assert is: assert(0)

Sep 13

Dennis <dkorpel gmail.com> writes:

On Saturday, 13 September 2025 at 02:39:40 UTC, Richard (Rikki) 
Andrew Cattermole wrote:
 Paraphrased, why should we use a DFA engine instead of DIP1000?

 (...)

 Afterwards, I came up with an example, where DIP1000 will error:

Thanks for providing the example, it's a great discussion starter.

Like I said yesterday, it's not that I don't "know what data flow 
analysis is all about" - when you have a solution, you can come 
up with problems it solves, sure.

I was mostly curious what real world code made you believe that 
more advanced DFA is necessary for useful scope pointer analysis. 
There's many problems with DIP1000, but no issue was ever opened 
requesting scope checks to be disabled in dead code. As far as 
I'm aware, [users expect dead code to be type checked by the 
frontend as 
usual](https://forum.dlang.org/post/hullwyrnmgcaoghaqbux forum.dlang.org).

Sep 13

"Richard (Rikki) Andrew Cattermole" <richard cattermole.co.nz> writes:

On 13/09/2025 11:06 PM, Dennis wrote:
 On Saturday, 13 September 2025 at 02:39:40 UTC, Richard (Rikki) Andrew 
 Cattermole wrote:
 Paraphrased, why should we use a DFA engine instead of DIP1000?

 (...)

 Afterwards, I came up with an example, where DIP1000 will error:

 
 Thanks for providing the example, it's a great discussion starter.
 
 Like I said yesterday, it's not that I don't "know what data flow 
 analysis is all about" - when you have a solution, you can come up with 
 problems it solves, sure.
 
 I was mostly curious what real world code made you believe that more 
 advanced DFA is necessary for useful scope pointer analysis. There's 
 many problems with DIP1000, but no issue was ever opened requesting 
 scope checks to be disabled in dead code. As far as I'm aware, [users 
 expect dead code to be type checked by the frontend as usual](https:// 
 forum.dlang.org/post/hullwyrnmgcaoghaqbux forum.dlang.org).

That isn't my interpretation.

The code you linked would have been removed by a backend optimizer. It 
is dead code. It exists to get effects from the frontend (poorly), due 
to lacking language capabilities to archive the results needed.

DFA engines, for the purposes of linting like this thread is about, 
should focus on what code could actually run at runtime.

The fact that D is not architectured to be able to stop these false 
positives is an intended bug in our design. Not all PL's work like this 
the ML family certainly don't. They tie very advanced analysis engines 
including DFA into their type system.

Just because people don't understand that this is possible, doesn't mean 
it hasn't been an issue. I see it as a case that people don't know what 
they don't know. So they don't complain. This is a hole that the C 
family relies upon having due to historical reasons and so hasn't 
reached common knowledge.

Sep 13

Walter Bright <newshound2 digitalmars.com> writes:

Requiring the language to remove all dead code before checking for errors means 
the entire program must be subjected to DFA. The result is compilation will
slow 
down dramatically. It means separate compilation is no longer possible.

As a pragmatic choice, D is designed to not require DFA. It is a deliberate 
architectural choice, not a bug.

That said, an optional DFA can be quite useful in detecting otherwise hidden 
programming errors. But I'm not convinced it's a good idea to require it in 
order to successfully compile code, as your opening example suggests.

Sep 13

"Richard (Rikki) Andrew Cattermole" <richard cattermole.co.nz> writes:

On 14/09/2025 3:20 PM, Walter Bright wrote:
 Requiring the language to remove all dead code before checking for 
 errors means the entire program must be subjected to DFA. The result is 
 compilation will slow down dramatically. It means separate compilation 
 is no longer possible.
 
 As a pragmatic choice, D is designed to not require DFA. It is a 
 deliberate architectural choice, not a bug.

It seems we have a miscommunication on this.

The requirements of D say its a feature, the PL literature says its a bug.

It can be both!

And yes, I agree the choice D went with is a good one for the C family. 
Very pragmatic.

Where DFA and CFG analysis is implemented (such as limited VRP and 
blockexit.d) are very impressive, I enjoyed seeing it.

However, these algorithms when in use, are not applied consistently.

https://github.com/dlang/dmd/issues/21859

It passes nothrow, but not -betterC:

```d
void func() nothrow {
     if (0) {
      	 throw new Exception("");  // Error: cannot use `throw` 
statements with -betterC
     }
}
```

 That said, an optional DFA can be quite useful in detecting otherwise 
 hidden programming errors. But I'm not convinced it's a good idea to 
 require it in order to successfully compile code, as your opening 
 example suggests.

DFA is a rather large body is literature, we have to split it up before 
going eww no on any of it.

There is:

- Single forward pass: fast DFA, DIP1000, limited VRP/CFG analysis in 
frontend.
- Work list algorithm:  live, slow DFA if I ever build it.

Also:

- In the type system: i.e. storage classes, dictates what templates 
instantiate.
- Lint level: does not effect mangling, only limits what code is 
considered valid, no mutation of type system.

Considerations on the type system we can basically go no on, simply 
because dmd isn't architectured to handle it. Which is a good choice as 
I said elsewhere for the C family, but not all languages are like that 
and they do get benefits from doing so.

Work list algorithm is great but it has to be opt-in, not everyone is 
going to benefit from the slow down.

That kinda leaves us with single forward pass and lint level for 
anything on by default. Which unfortunately DIP1000 is not (due to it 
messing with storage classes, not that I'm arguing that was a wrong 
choice when it was implemented).

You can do an absolute load of stuff at lint level with a single forward 
pass engine. Which I'm finding out as I learn that my fast DFA engine is 
probably close to bleeding edge oops. Worthy of some serious pride as I
am reevaluating other language options lol.

This isn't a situation where we only have the worst set of options 
available to us. Each has their strengths and weaknesses. Its possible 
to have multiple opinions based upon how you are categorizing things for 
this particular subject.

Sep 13

Walter Bright <newshound2 digitalmars.com> writes:

A DFA that only works with a forward pass will generate a lot of complaints as 
people get baffled by the situations where it fails.

The forward pass attribute inference D has generates complaints and bug reports 
when it fails, because it does not handle cycles.

Sep 14

"Richard (Rikki) Andrew Cattermole" <richard cattermole.co.nz> writes:

On 15/09/2025 7:55 AM, Walter Bright wrote:
 A DFA that only works with a forward pass will generate a lot of 
 complaints as people get baffled by the situations where it fails.
 
 The forward pass attribute inference D has generates complaints and bug 
 reports when it fails, because it does not handle cycles.

Oh I see where you're thinking.

Okay, consider the following reports, does any report (or lack thereof) 
suggest people are going to be baffled by it?

Relevant files (start.d for reports source code, out.txt for full log):

https://gist.github.com/rikkimax/652f96a33934bd8ae0a68fb849821385#file-start-d

```
start.d
testdfa\start.d(236): Error: Variable `ptr` was required to be non-null 
and has become null
     foreach (i; 0 .. 2)
     ^
testdfa\start.d(257): Error: Variable `ptr` was required to be non-null 
and has become null
     foreach (i; 0 .. 2)
     ^
testdfa\start.d(308): Error: Dereference on null variable `ptr`
         int v = *ptr; // error
                 ^
testdfa\start.d(330): Error: Dereference on null variable `ptr`
         int v = *ptr; // error
                 ^
testdfa\start.d(613): Error: Assert can be proven to be false
     assert(val); // Error: val is 0
            ^
testdfa\start.d(619): Error: Assert can be proven to be false
     assert(stack.length == 1); // Error: stack is null
                         ^
testdfa\start.d(673): Error: Dereference on null variable `ptr`
         return *ptr; // error could be null
                ^
testdfa\start.d(703): Error: Assert can be proven to be false
         assert(ptr !is null); // error
                    ^
extracted.d
```

Worth looking at extracted.d and the full log.

Sep 14

Adam Wilson <flyboynw gmail.com> writes:

I wasn't entirely sure which post to respond to because I needed 
Rikki's examples, but this response is for Walter.

On Monday, 15 September 2025 at 01:26:31 UTC, Richard (Rikki) 
Andrew Cattermole wrote:
 ```
 start.d
 testdfa\start.d(236): Error: Variable `ptr` was required to be 
 non-null and has become null
     foreach (i; 0 .. 2)
     ^
 testdfa\start.d(330): Error: Dereference on null variable `ptr`
         int v = *ptr; // error
                 ^
 extracted.d
 ```



Walter, the twin assertions that the above errors will be 
confusing to programmer, and that they will slow down the 
compiler are entirely false. Millions of programmers every day 

will know exactly what is meant.


250kLOC project in less than one second.


```
Main.xaml.cs(530,22,530,24): warning CS0168: The variable 'ex' is 
declared but never used

Customers.cs(26,37,26,53): warning CS0649: Field 
'Customers.customersService' is never assigned to, and will 
always have its default value null

DataUploadController.cs(41,46,41,54): warning CS0169: The field 
'DataUploadController.resolver' is never used
```
The VB.NET compiler also has DFA and is equally as fast (they 
share codebases):
```
WorkOrderDirectory.vb(26,13): warning BC42024: Unused local 
variable: 'J'.
```
The above are all real world examples from the code base that I 
just rebuilt.

There is no rule that states that DFA must be slow as there exist 
examples disproving the assertion.

Sep 15

Dennis <dkorpel gmail.com> writes:

On Monday, 15 September 2025 at 23:02:25 UTC, Adam Wilson wrote:


 (...)

 There is no rule that states that DFA must be slow as there 
 exist examples disproving the assertion.

When Walter talks about DFA, he refers to the specific algorithm 
he learned in the 1980s and implemented in his backend optimizer, 
which solves data-flow equations in a loop until it converges on 
a fixed point. This algorithm is superlinear, so considered 'too 
slow' for the frontend which aims to analyse n statements in O(n) 
time.

What Rikki implements is a 4th semantic pass, which analyzes the 
code after the existing 3 linear semantic passes have completed. 
I don't know exactly how it works, but from his communication I 
gather it's a subset of 'full' DFA which trades a bit of accuracy 
for compile speed.


experimentation on https://dotnetfiddle.net, even simpler than 
that. They don't take control flow into account at all:


public class Program
{
	static int y;
	
	public static void Main()
	{
		int x = 0;
		if (y < 0)
			if (y > 0) // this branch is dead code

unused
	}
}
```

That's not to say it's not useful, I think dmd should compute 
this information similarly and expose it through an LSP 
interface. However, this can be implemented in the existing 
linear semantic3 pass with a simple usage counter/boolean per 
declaration.

These 3 things have all been labeled 'DFA' so far which makes the 
discussion really confusing, so it's best to clarify exactly 
which DFA flavor you're talking about.

Sep 16

"Richard (Rikki) Andrew Cattermole" <richard cattermole.co.nz> writes:

On 16/09/2025 9:59 PM, Dennis wrote:
 On Monday, 15 September 2025 at 23:02:25 UTC, Adam Wilson wrote:


 (...)

 There is no rule that states that DFA must be slow as there exist 
 examples disproving the assertion.

 
 When Walter talks about DFA, he refers to the specific algorithm he 
 learned in the 1980s and implemented in his backend optimizer, which 
 solves data-flow equations in a loop until it converges on a fixed 
 point. This algorithm is superlinear, so considered 'too slow' for the 
 frontend which aims to analyse n statements in O(n) time.
 
 What Rikki implements is a 4th semantic pass, which analyzes the code 
 after the existing 3 linear semantic passes have completed. I don't know 
 exactly how it works, but from his communication I gather it's a subset 
 of 'full' DFA which trades a bit of accuracy for compile speed.

You haven't got my stuff entirely right.

The fast DFA which is what I'm working on right now, runs in semantic 3.

Its very similar to DIP1000 in terms of it doing a single forward pass 
with the statement and expression walkers.

The slow DFA which I want to work on eventually, is indeed more like the 
classical DFA's of the 80's and would run in semantic 4. Since this is a 
known quantity and it isn't the default experience which all these 
features hinge on having a solution to, so it isn't a priority.

Like classical DFA's the fast DFA does indeed use lattices, join and 
meet operators. But I have extra information available like depth 
checks, write counts and scopes modelling that classical DFA's do not have.

The fast DFA does something quite interesting, it isn't modelling what 
is modellable, it instead models what it can't model, and what is left 
is what is modellable. You need to have unknown state, with ways of 
putting variables into it. This allows me to cull as much false and 
perceived false positives as possible.

Data flow analysis is the derivative of proof assistants. They work on 
properties and sometimes in the case of VRP they integrate values into 
their calculations.

If it weren't for my zero false & perceived false positives rule, the 
fast DFA engine would be almost as good as a full DFA engine. Its 
missing a complete interprocedural analysis, and single function 
cyclicity stuff.


 https://dotnetfiddle.net, even simpler than that. They don't take 
 control flow into account at all:
 

 public class Program
 {
      static int y;
 
      public static void Main()
      {
          int x = 0;
          if (y < 0)
              if (y > 0) // this branch is dead code

unreachable, 
 or x as unused
      }
 }
 ```

It does actually.



https://csharp.godbolt.org/z/nMjPqYens

```
Program:Main() (FullOpts):
        xor      edi, edi
        tail.jmp [System.Console:WriteLine(int)]
```

It silently culled the branches. Exactly what DFA should do in this 
situation.

 That's not to say it's not useful, I think dmd should compute this 
 information similarly and expose it through an LSP interface. However, 
 this can be implemented in the existing linear semantic3 pass with a 
 simple usage counter/boolean per declaration.

And that would be a rewrite and I don't mean just DIP1000.

Doing this stuff as part of the type system isn't the normal way to 
handle it, and for good reason. Attaching a DFA engine today directly 
into the type system to resolve this is quite frankly too late. From 
what I've seen that would be a bad direction to go in and have no desire 
to watch that being tried.

Sep 16

Dennis <dkorpel gmail.com> writes:

On Tuesday, 16 September 2025 at 10:47:03 UTC, Richard (Rikki) 
Andrew Cattermole wrote:
 It does actually.



 https://csharp.godbolt.org/z/nMjPqYens

 ```
 Program:Main() (FullOpts):
        xor      edi, edi
        tail.jmp [System.Console:WriteLine(int)]
 ```

 It silently culled the branches. Exactly what DFA should do in 
 this situation.


brought up, not the code generator.

 And that would be a rewrite and I don't mean just DIP1000.

 Doing this stuff as part of the type system isn't the normal 
 way to handle it, and for good reason. Attaching a DFA engine 
 today directly into the type system to resolve this is quite 
 frankly too late. From what I've seen that would be a bad 
 direction to go in and have no desire to watch that being tried.

I don't understand this paragraph at all, you're referring to 5 
different things (rewrite, DIP1000, too late, type system, things 
you've seen) and I don't know the context/relevance of any of 
those.

Sep 16

"Richard (Rikki) Andrew Cattermole" <richard cattermole.co.nz> writes:

On 16/09/2025 11:20 PM, Dennis wrote:
     And that would be a rewrite and I don't mean just DIP1000.
 
     Doing this stuff as part of the type system isn't the normal way to
     handle it, and for good reason. Attaching a DFA engine today
     directly into the type system to resolve this is quite frankly too
     late. From what I've seen that would be a bad direction to go in and
     have no desire to watch that being tried.
 
 I don't understand this paragraph at all, you're referring to 5 
 different things (rewrite, DIP1000, too late, type system, things you've 
 seen) and I don't know the context/relevance of any of those.

Hmm, I think I see a missing piece of the puzzle.

A type system is the types and declarations and how they get processed 
in a language.

When a DFA is attached to a type system properly, you are not calling 
functions like runSemantic on AST nodes. This is handled by the worklist 
algorithm. This is an important distinction between what we have and 
what the more advanced type systems with a DFA attached to them have.

Take a look at all of the issues we have with resolving templates. These 
stem from not having the DFA directed type system. Its very easy to 
exceed what the compiler can instantiate with cycles and other ordering 
issues. We've all run into them and they are rarely fixable.

And no, this isn't a new thing. This is from the 70's just a different 
branch of DFA and related analysis engines. I doubt Walter would have 
learned this branch, and may not have been exposed to it, its mostly 
academic (for a reason up until the 90's). The papers he has sent me are 
all backend related, not these other branches.

I do want to say, that I have been knowingly confusing DFA and other 
analysis engines, this is fully intended. Their interplay can be pretty 
heavy and would feed from one to the other here. Lines get blurred in 
the implementation and their solutions can be based upon the same theory 
& algorithms.

Now compare us to the ML family with signatures (type classes), modules 
and their variation of meta-programming. They can resolve stuff we 
wouldn't have a dream of doing. We have to employ CTFE just to come 
close to what they can do, and even then there is stuff we simply will 
never be able to do due to ordering issues.

This would be a good time to define what lint level means to a type 
system. In essence it puts anything that the type system knows about 
into a read only mode. You can't change it in a given lint level engine.

This makes DIP1000 in the type system, not lint level. Due to it messing 
with storage classes.

What analysis dmd does do like VRP and CFG, is impressive. Not because 
they are the most advanced thing in the world (not even close). But 
because our type system wasn't meant to support it. This isn't a 
strength of its design.

A good thing to note here is how SDC uses a worklist algorithm to 
process its type system. 
https://github.com/snazzy-d/sdc/blob/master/src/d/semantic/scheduler.d

SDC will have the ability to solve problems that dmd cannot do. Not 
because the language prevents it, but because the type system 
implementation does not allow for it by having a much lower ceiling in 
its capability.

To give dmd the kind of capability where you have a DFA engine dictating 
analysis of the type system, or feeding a significant amount of 
information throughout a function is a rewrite. Simply because the 
existing architecture wasn't written with that in mind. It was never 
meant to have templates like we have today, nor analysis like VRP.

The literature is massive on these topics, and gets quite broad. It can 
be argued differently than what I am.

Some people who read this, may conclude that dmd needs a rewrite, and to 
that end I know SDC folks with love some help. However my take away from 
a pragmatic perspective is to recognize what we can't do, to help 
determine what we can do. D is a production language, not an academic or 
toy one. If we have to give up strong guarantees by default for DFA 
features then so be it, I'd rather have some than none.

Sep 16

Walter Bright <newshound2 digitalmars.com> writes:

This is what I am talking about:
```
 nogc void foo() { bar(); }

void bar() { abc(); }

void abc() { bar(); }
```
Both bar() and abc() are  nogc. But the forward-only checking cannot deduce it, 
and the result is:

test.d(3): Error: ` nogc` function `test.foo` cannot call non- nogc function 
`test.bar`

This is the simplest version of this problem. It can be quite complex, but the 
result is the same - it cannot be figured out without solving recursive data 
flow equations.

Another version of the problem:
```
 nogc void foo() { bar(); }

void bar() { abc(); }

void abc();
```
The body of abc() is not available to the compiler, so the compiler cannot 
deduce  nogc for abc(). So, in order to deduce attributes of abc(), the
compiler 
must have the source code available for it. To do a complete job, then,
requires 
compilation of the entire program's code base. Which will require a lot of 
memory and time.

Adam, Rikki, if I'm wrong, please show me how.

Sep 22

Walter Bright <newshound2 digitalmars.com> writes:

In order to do  nogc DFA, the following is a start:

1. construct an array of all the functions

2. each function gets allocated a bit vector with one bit for each function in
(1)

3. each of those (2) bit vectors gets a corresponding bit set for every
function 
that is called

4. a single bit vector gets a bit set for every function that does a gc
operation

5. loop through all the bit vectors, and every function that calls a function 
that has a bit set in (4) also gets a bit set in (4)

6. keep looping until no more changes in (4)

7. mark all the functions that don't have a bit set as  nogc

Pruning of the number of functions that need a bit can be done by not 
considering any functions already marked as  nogc, and functions that only call 
 nogc functions and don't do any other allocations can be eagerly marked  nogc.

Sep 22

"Richard (Rikki) Andrew Cattermole" <richard cattermole.co.nz> writes:

On 23/09/2025 5:54 PM, Walter Bright wrote:
 This is what I am talking about:
 ```
  nogc void foo() { bar(); }
 
 void bar() { abc(); }
 
 void abc() { bar(); }
 ```
 Both bar() and abc() are  nogc. But the forward-only checking cannot 
 deduce it, and the result is:
 
 test.d(3): Error: ` nogc` function `test.foo` cannot call non- nogc 
 function `test.bar`
 
 This is the simplest version of this problem. It can be quite complex, 
 but the result is the same - it cannot be figured out without solving 
 recursive data flow equations.
 
 Another version of the problem:
 ```
  nogc void foo() { bar(); }
 
 void bar() { abc(); }
 
 void abc();
 ```
 The body of abc() is not available to the compiler, so the compiler 
 cannot deduce  nogc for abc(). So, in order to deduce attributes of 
 abc(), the compiler must have the source code available for it. To do a 
 complete job, then, requires compilation of the entire program's code 
 base. Which will require a lot of memory and time.
 
 Adam, Rikki, if I'm wrong, please show me how.

I will assume that `` nogc`` is lint level attribute that does not 
effect mangling, as that would be a requirement for everything to link.
If it does effect mangling, things get so much harder to solve and a lot 
less flexible, hence I am not keen on such approach for DFA attributes.

To solve this is very simple, but it does require the compiler to see 
the result of the compilation of all three functions. Attributes via the 
.di generator, or a dedicated special file containing the information 
are both valid options here.

Instead of writing `` nogc``, we write `` nogc iff(foo, bar, abc)``.

 From there its a simple dependency resolver either during the 
compilation of the three functions or on usage.

This is the basis for my solution to a problem Adam has proposed for 
PhobosV3, which essentially is the desire to whitelist attributes. Use 
`` localnogc`` & if the function calls are all `` nogc``, therefore the 
function is allowed to be `` nogc``.

```d
void foo()  nogc => bar; // dependency resolve bar&abc's  nogc iff's
void bar()  nogc iff(abc) => abc;
void abc()  nogc iff(bar) => bar;
```

```d
void foo()  nogc => bar; // dependency resolve bar&abc's  nogc iff's
void bar()  nogc iff(abc); // body is irrelevant
void abc()  nogc iff(bar); // body is irrelevant
```

This is a very simple demonstrating of whole program graph analysis with 
whole function graph representation of equations per function. As far as 
I know no production language supports this and has been left in 
experimental 90's languages.

I won't say any .net language does this, but because they are an 
application VM, they do in fact see the whole program (minus FFI). What 
they can do, and what we can do without excellent serialization 
capabilities in the compiler isn't the same.

Of course I'm not suggesting  nogc should get this particular treatment, 
in practice its "good enough" (if we ignore white listing), but as an 
example of effect analysis its a good one even if it isn't a classical 
problem like pure.

Am I right to assume that the reason you haven't presented this 
solution, is because you want attributes to be finalized when bar&abc 
complete?

Sep 23

Walter Bright <newshound2 digitalmars.com> writes:

On 9/23/2025 2:43 AM, Richard (Rikki) Andrew Cattermole wrote:
 Instead of writing `` nogc``, we write `` nogc iff(foo, bar, abc)``.
 
  From there its a simple dependency resolver either during the compilation of 
 the three functions or on usage.

It's not so simple when there is recursion in the flow graph.

 
 This is the basis for my solution to a problem Adam has proposed for PhobosV3, 
 which essentially is the desire to whitelist attributes. Use `` localnogc`` &
if 
 the function calls are all `` nogc``, therefore the function is allowed to be 
 `` nogc``.

I am reluctant to try to fix this issue with even more attributes.


 Am I right to assume that the reason you haven't presented this solution, is 
 because you want attributes to be finalized when bar&abc complete?

I showed the solution to illustrate the requirement for iteration and 
convergence to resolve cycles in the flow graph. A DFA that cannot deal with 
recursion is just going to leave people frustrated, as the current  nogc 
inference engine does.

Sep 23

"Richard (Rikki) Andrew Cattermole" <richard cattermole.co.nz> writes:

On 24/09/2025 5:54 AM, Walter Bright wrote:
 On 9/23/2025 2:43 AM, Richard (Rikki) Andrew Cattermole wrote:
 Instead of writing `` nogc``, we write `` nogc iff(foo, bar, abc)``.

  From there its a simple dependency resolver either during the 
 compilation of the three functions or on usage.

 
 It's not so simple when there is recursion in the flow graph.

For  nogc and pure, it is that simple.

Other attributes can't be solved with this solution. They are too 
complex, or effect the calling convention of the function.

I don't have a generalized solution.

 This is the basis for my solution to a problem Adam has proposed for 
 PhobosV3, which essentially is the desire to whitelist attributes. Use 
 `` localnogc`` & if the function calls are all `` nogc``, therefore 
 the function is allowed to be `` nogc``.

 
 I am reluctant to try to fix this issue with even more attributes.

If you don't, you're limited to a single compilation.

Which may be fine for a single module, but might not be cross-module.

 Am I right to assume that the reason you haven't presented this 
 solution, is because you want attributes to be finalized when bar&abc 
 complete?

 
 I showed the solution to illustrate the requirement for iteration and 
 convergence to resolve cycles in the flow graph. A DFA that cannot deal 
 with recursion is just going to leave people frustrated, as the current 
  nogc inference engine does. 

Indeed. I'd call it an all right example, a better example might be:

```d
void* global;

void thing(T)(scope void* u, T* v) {
	global = v;
	thing(null, u);
}
```

You can't solve for this without applying outputs into inputs.
Therefore no shortcuts.

Sep 23

Dennis <dkorpel gmail.com> writes:

On Tuesday, 23 September 2025 at 05:54:00 UTC, Walter Bright 
wrote:
 This is the simplest version of this problem. It can be quite 
 complex, but the result is the same - it cannot be figured out 
 without solving recursive data flow equations.

I think there's a simpler way. You can model this as a graph 
reachability problem, with vertices for each function, directed 
edges for function calls, and a 'gc' node representing possible 
GC operations such as `new`, closures, extern function without 
` nogc` etc.
Then all you need is a graph search starting from the gc node, 
and every node you don't find is  nogc.

Below is a D implementation you can toy with. If you find a case 
where this fails where solving data flow equations would succeed, 
let me know.

N.b. the reason dmd can't do this (or the data flow approach) 
currently is because ` nogc`-ness needs to be determined before 
the call graph is complete, see 
https://forum.dlang.org/post/ydemnavvtzytjkmibwvz forum.dlang.org

If we remove  nogc from a function's type/mangle in a next 
edition (which I'm all for) it can work.

---


```D
import std;

class Func
{
     string name;
     Func[] calledFrom;
     bool isNogc = true;
     bool isVisited = false;

     this(string name) { this.name = name; }
     void call(Func f) { f.calledFrom ~= this; }
}

void computeNogc(Func gc)
{
     Func[] stack = [gc];

     while (!stack.empty)
     {
         auto node = stack[$ - 1];
         stack.length--;
         node.isNogc = false;
         node.isVisited = true;

         foreach (f; node.calledFrom.filter!(x => !x.isVisited))
             stack ~= f;
     }
}

void main()
{
     auto foo = new Func("foo");
     auto bar = new Func("bar");
     auto abc = new Func("abc");

     // pseudo function representing `new`, closures, anything 
that allocates
     auto gc = new Func("gc");

     // Add your function calls here:
     foo.call(bar);
     bar.call(abc);
     abc.call(bar);

     abc.call(gc); // abc() does something that allocates

     computeNogc(gc);

     // Print results
     foreach (a; [foo, bar, abc])
         writeln(a.name, ". nogc = ", a.isNogc);
}
```

Sep 23

Walter Bright <newshound2 digitalmars.com> writes:

Your algorithm is interesting, but I don't think it is fundamentally different 
from the one I sketched out.

Also, it uses expensive operations like array appends, and the filter function. 
The bit mask route should be much faster.

Sep 23

Dennis <dkorpel gmail.com> writes:

On Tuesday, 23 September 2025 at 17:46:20 UTC, Walter Bright 
wrote:
 Your algorithm is interesting, but I don't think it is 
 fundamentally different from the one I sketched out.

I think both algorithms compute the same thing, but your 
algorithm represents the graph with an adjacency matrix and does 
an inefficient search. The algorithm be O(V + E), with V = 
vertices = number of function declarations, and E = edges = 
number of function calls.

That is, compile time should scale linearly with the amount of 
source code being compiled.

Your algorithm does a O(V^2) traversal O(V) times, running in 
O(V^3) time. That's a pathological worst case scenario, but I'm 
not convinced the constant factor is so large that it's more 
practical than a linear search.

 Also, it uses expensive operations like array appends, and the 
 filter function. The bit mask route should be much faster.

I wrote the example as a simple illustration, not as optimized 
production code. The filter function is trivially replaced with 
an if statement if the optimizer doesn't already do that.

On a slight tangent, I'm not a fan of how dmd's performance 
culture seems to be hyper focused on local constant time code 
optimizations that worsen code quality (like bit-hacking and 
using global variables for reusable memory buffers) when there's 
a number of glaring issues related to time complexity, which are 
way more severe than a redundant allocation here and there could 
ever be.

Whenever I investigate what slows down compilation time of real 
code, it's usually because it triggers a bad case of one of the 
many quadratic or even exponential algorithms in dmd's code. I'm 
talking:

- O(n^2) MSCoff section insertion 
(https://github.com/dlang/dmd/pull/16780)
- O(n^2) Checking for missing/duplicated switch case statements 
(TODO)
- O(2^n) Nested function sibling detection 
(https://github.com/dlang/dmd/pull/14886)
- O(2^n) Return type isolated from parameters check 
(https://github.com/dlang/dmd/pull/12885)
- O(n*m) CTFE execution because AA literals aren't memoized 
(https://github.com/dlang/dmd/issues/20179)
- O(2^n) Function inlining 
(https://github.com/dlang/dmd/issues/20315)
- O(n^2) Loop optimizations 
(https://github.com/dlang/dmd/issues/20535)

A fast constant factor is nice, but let's not introduce more 
'compilation time landmines' into dmd's code base, and prefer 
linear algorithms over quadratic ones.

Sep 23

Walter Bright <newshound2 digitalmars.com> writes:

On 9/23/2025 1:01 PM, Dennis wrote:
 On Tuesday, 23 September 2025 at 17:46:20 UTC, Walter Bright wrote:
 Your algorithm is interesting, but I don't think it is fundamentally different 
 from the one I sketched out.

 
 I think both algorithms compute the same thing, but your algorithm represents 
 the graph with an adjacency matrix and does an inefficient search. The
algorithm 
 be O(V + E), with V = vertices = number of function declarations, and E =
edges 
 = number of function calls.
 
 That is, compile time should scale linearly with the amount of source code
being 
 compiled.
 
 Your algorithm does a O(V^2) traversal O(V) times, running in O(V^3) time. 
 That's a pathological worst case scenario, but I'm not convinced the constant 
 factor is so large that it's more practical than a linear search.

I don't think it is anywhere near that bad. It does more than O(V) only when 
there are cycles.


 On a slight tangent, I'm not a fan of how dmd's performance culture seems to
be 
 hyper focused on local constant time code optimizations that worsen code
quality 
 (like bit-hacking and using global variables for reusable memory buffers)

The reusable memory buffer thing was benchmarked by myself long ago, and was a 
big win. I don't know what you mean by bit-hacking.

My general approach is to make it work first, not worrying too much about 
algorithms. I've discovered that D makes it fairly easy to change algorithms, 
while C/C++ code tends to be a lot more brittle.

Recoding for performance later takes advantage of there being a test suite in 
place for it.

 when 
 there's a number of glaring issues related to time complexity, which are way 
 more severe than a redundant allocation here and there could ever be.

They both matter.


 Whenever I investigate what slows down compilation time of real code, it's 
 usually because it triggers a bad case of one of the many quadratic or even 
 exponential algorithms in dmd's code. I'm talking:

I've discovered many such algorithms in my C++ compiler, and fixed them. I've 
been less diligent with D's source, mainly because I have too much on my plate. 
It's great that you're looking into this!

 - O(n^2) MSCoff section insertion (https://github.com/dlang/dmd/pull/16780)
 - O(n^2) Checking for missing/duplicated switch case statements (TODO)
 - O(2^n) Nested function sibling detection 
 (https://github.com/dlang/dmd/pull/14886)
 - O(2^n) Return type isolated from parameters check 
 (https://github.com/dlang/dmd/pull/12885)
 - O(n*m) CTFE execution because AA literals aren't memoized 
 (https://github.com/dlang/dmd/issues/20179)
 - O(2^n) Function inlining (https://github.com/dlang/dmd/issues/20315)
 - O(n^2) Loop optimizations (https://github.com/dlang/dmd/issues/20535)

This is all valuable stuff!


 A fast constant factor is nice, but let's not introduce more 'compilation time 
 landmines' into dmd's code base, and prefer linear algorithms over quadratic
ones.

There's mathematical time, the O(whatever) thing, and there's the detail coding 
practice time. When I worked on the Warp preprocessor, I spent a lot of time 
trying different algorithms and data structures, and benchmarking. It paid off 
to not be too pre-judgmental about what time it would take.

Other things matter a lot, too, like cache-friendly data structures.

I appreciate your efforts here, it is very valuable work you are doing!

Sep 23

Dennis <dkorpel gmail.com> writes:

On Tuesday, 23 September 2025 at 22:10:22 UTC, Walter Bright 
wrote:
 It does more than O(V) only when there are cycles.

Just step 2, constructing the bit vectors, is already O(V^2). 
Even without cycles, you can hit the O(V^3) case if you have a 
deep call stack.

On Tuesday, 23 September 2025 at 22:10:22 UTC, Walter Bright 
wrote:
 The reusable memory buffer thing was benchmarked by myself long 
 ago, and was a big win.

The problem is, a trick is found to be useful in one spot at one 
time, and 15 years later it's still blanketed everywhere without 
question. Meanwhile, it's 2025, the TypeScript compiler got a 3x 
speed up from leveraging multi-threading, and dmd is still stuck 
in `__gshared` town, population 500+. Mutable global buffers are 
even being used in dmd's brand new ARM disassembler, which is a) 
not performance critical and b) if it were, the key to make it 
run fast on modern hardware is not `__gshared char[5 + hw.sizeof 
* 3 + 1 + 1]`.

 I don't know what you mean by bit-hacking.

I meant things like using an `int flags` parameter with 2 bits 
or'd in instead of 2 `bool` parameters. But it's not the best 
example, I'm talking about the general pattern of 
micro-optimizations, not specific instances of the problem.

 My general approach is to make it work first, not worrying too 
 much about algorithms.

 (...)

 It paid off to not be too pre-judgmental about what time it 
 would take.

Which is why it puzzles me:

There's a simple, classic algorithm that runs in linear time, 
which is pre-judged to be slow because there's array 
concatenation in it.

Then there's a complex algorithm with a nested for-loop triple, 
which is pre-judged to be fast because it's going to use bit 
vectors, and it has extra logic to prune the search space making 
it not as bad hopefully.

And you think the latter fits the philosophy "make it work 
first"? I mean, be my guest and implement it, I just think you're 
making it harder for yourself if you do. And I am going to be 
wary and test performance when something like that lands in dmd 😉

 I appreciate your efforts here, it is very valuable work you 
 are doing!

Thanks!

Sep 23

"Richard (Rikki) Andrew Cattermole" <richard cattermole.co.nz> writes:

On 24/09/2025 11:48 AM, Dennis wrote:
     My general approach is to make it work first, not worrying too much
     about algorithms.
 
     (...)
 
     It paid off to not be too pre-judgmental about what time it would take.
 
 Which is why it puzzles me:
 
 There's a simple, classic algorithm that runs in linear time, which is 
 pre-judged to be slow because there's array concatenation in it.

When the problem space has a known finite lifetime, like the case for 
dmd's AST, to its global object.
There is a solution to this particular problem: inlined linked lists.

I use them heavily for any compiler type problems that I do. They work 
great, even if they cost a tiny bit more ram.

They work out to be very cheap, and basically non-existant in terms of 
cost during iteration.

At least when applied to stuff like the algorithms me and Dennis 
presented wrt. deferred logic.

Sep 23

Walter Bright <newshound2 digitalmars.com> writes:

On 9/23/2025 4:52 AM, Dennis wrote:
 If we remove  nogc from a function's type/mangle in a next edition (which I'm 
 all for) it can work.

Is there a discussion on this?

Sep 23

Dennis <dkorpel gmail.com> writes:

On Tuesday, 23 September 2025 at 17:47:23 UTC, Walter Bright 
wrote:
 Is there a discussion on this?

I've briefly mentioned this at DConf in the past two years, I 
haven't written a proposal or seen anyone else propose it in 
detail.

Sep 23

Walter Bright <newshound2 digitalmars.com> writes:

On 9/23/2025 1:05 PM, Dennis wrote:
 On Tuesday, 23 September 2025 at 17:47:23 UTC, Walter Bright wrote:
 Is there a discussion on this?

 
 I've briefly mentioned this at DConf in the past two years, I haven't written
a 
 proposal or seen anyone else propose it in detail.

Ok. There are other things that affect this, too, like typing of pointers to 
functions.

Sep 23

Dennis <dkorpel gmail.com> writes:

On Tuesday, 23 September 2025 at 22:11:19 UTC, Walter Bright 
wrote:
 Ok. There are other things that affect this, too, like typing 
 of pointers to functions.

Which is incidentally one of the many long standing problems with 
function attributes being part of function types, see for example 
https://github.com/dlang/DIPs/pull/198

Sep 23

Dennis <dkorpel gmail.com> writes:

On Saturday, 13 September 2025 at 16:12:06 UTC, Richard (Rikki) 
Andrew Cattermole wrote:
 The fact that D is not architectured to be able to stop these 
 false positives is an intended bug in our design. Not all PL's 
 work like this the ML family certainly don't. They tie very 
 advanced analysis engines including DFA into their type system.

So the answer to my question is: this it not motivated by 
reported issues from D users, but by other languages doing this. 
If the only example for D you can give is an "unrealistic" one, 
can you please give a realistic example from another language 
then?

 Just because people don't understand that this is possible, 
 doesn't mean it hasn't been an issue. I see it as a case that 
 people don't know what they don't know. So they don't complain.
 This is a hole that the C family relies upon having due to 
 historical reasons and so hasn't reached common knowledge.

I understand DFA is an interesting research topic, and if your 
claim was that it might lead to interesting discoveries or new 
paradigms I fully agree. But when you say it's necessary to fix 
DIP1000 specifically, I need some evidence to believe it. There's 
been over 100 DIP1000-related bug reports with simple non-DFA 
solutions, and I'm skeptical that people 'don't know' to report 
DFA related issues. Several users reported issues about Value 
Range Propagation (VRP) not working across statements. Even when 
they don't explicitly know DFA, they still experience a problem 
and can intuit a solution being available.

Sep 16

Richard (Rikki) Andrew Cattermole <richard cattermole.co.nz> writes:

On Tuesday, 16 September 2025 at 08:50:00 UTC, Dennis wrote:
 On Saturday, 13 September 2025 at 16:12:06 UTC, Richard (Rikki) 
 Andrew Cattermole wrote:
 The fact that D is not architectured to be able to stop these 
 false positives is an intended bug in our design. Not all PL's 
 work like this the ML family certainly don't. They tie very 
 advanced analysis engines including DFA into their type system.

 So the answer to my question is: this it not motivated by 
 reported issues from D users, but by other languages doing 
 this. If the only example for D you can give is an 
 "unrealistic" one, can you please give a realistic example from 
 another language then?

I am not trying to improve D based upon what other languages can 
do. They can only be an inspiration. For instance I take note of 
how people view other languages and compilers solutions both 
historically and in real time.

 Just because people don't understand that this is possible, 
 doesn't mean it hasn't been an issue. I see it as a case that 
 people don't know what they don't know. So they don't complain.
 This is a hole that the C family relies upon having due to 
 historical reasons and so hasn't reached common knowledge.

 I understand DFA is an interesting research topic, and if your 
 claim was that it might lead to interesting discoveries or new 
 paradigms I fully agree. But when you say it's necessary to fix 
 DIP1000 specifically, I need some evidence to believe it. 
 There's been over 100 DIP1000-related bug reports with simple 
 non-DFA solutions, and I'm skeptical that people 'don't know' 
 to report DFA related issues. Several users reported issues 
 about Value Range Propagation (VRP) not working across 
 statements. Even when they don't explicitly know DFA, they 
 still experience a problem and can intuit a solution being 
 available.

In my last post, which was a giant wall of text, I explained how 
this isn't the case. Noting templates exhibit some of the 
underlying issues that DIP1000 has. 
https://forum.dlang.org/post/10abme4$2ekj$1 digitalmars.com

Both me and Walter seem to have failed to convey these failures 
in dmd's architecture.

I will be ignoring the lacking features in DIP1000's attribution 
capabilities. This is purely engine implementation design stuff.

Here are some code similar to what I've sent Walter over the past 
year for what I want to work by default for escape analysis in D. 
These are backed up by all the debugging and helping of other 
people over the years. This will give the best experience over 
all I expect.

```d
extern(C) int printf(const char*, ...);

void myFunc1(scope const char* str) {
     printf("Text: %s\n", str); // ok
}
```

This works by acknowledging that variable state isn't "set" and 
"not-set" for any given property, its "set", "not-set", and 
"unknown" at a minimum.

My fast DFA engine does this, you can see it in code that looks 
like:

```d
void aPrototype(ref bool);

void myFunc2() {
     bool b;
     aPrototype(b);
     assert(b); // ok
}
```

The truthiness value went from false, to unknown with the 
function call not being attributed.

Escape into a global variable:

```d
int* global;

void myFunc3(scope int* ptr) {
     global = ptr; // Error: ptr going into global escapes
}
```

Might not error (depends on how it plays out):

```d
int* global;

struct Foo {
     int* field;

     void method() {
     	global = field;
     }
}

void myFunc4(scope int* ptr) {
     Foo foo;
     foo.field = ptr;
     foo.method;
}
```

The point of this set of tunings isn't because I want to have 
opinions.
The point is to get a subset of escape analysis turned on by 
default that the community will find value in. Without hitting 
any of the many issues that they have perceptually had with 
DIP1000.

To implement this, I did review DIP1000 and concluded that it 
would be a rewrite. It simply isn't designed to handle these kind 
of properties. At which point you might as well have a DFA for 
the escape analysis to live in, and up the ceiling of 
capabilities.

At this months monthly meeting I asked if people would find value 
in the ``myFunc3`` error, hands did go up. I also raised the 
double-duty of ``scope`` wrt. stack allocation.

Fact is people find value in a subset of errors from escape 
analysis, but it isn't the same subset that Atila is proposing 
with his "no inferred scope variable reports" approach. If I'm 
not convincing for any reason, I suggest doing a survey to see 
how it plays out. I expect you will get four groups:

1. No protection ( system only), tiny minority
2. False positive heavy, as much protection as possible default, 
tiny minority
3. False positive heavy, as much protection as possible opt-in, 
small minority
4. A subset of protection ( safe and the above examples), the 
majority

These are based upon my previous survey.

If you can do this without a full rewrite of DIP1000 I will be 
impressed. However, you will still need to solve borrowing so 
that I can get reference counting.

Sep 16

Dennis <dkorpel gmail.com> writes:

On Tuesday, 16 September 2025 at 14:21:07 UTC, Richard (Rikki) 
Andrew Cattermole wrote:
 Here are some code similar to what I've sent Walter over the 
 past year for what I want to work by default for escape 
 analysis in D.

You gave 4 code snippets. 1 and 2 already compile as expected, 3 
already errors as expected if you add ` safe`, and 4 has no 
defined expected behavior. None of these have a source linking 
them to an existing D user / code base.

To reiterate, the question is:

 Can you think of any more realistic examples where DFA allows 
 you to infer safety where DIP1000 is currently limited?

As long as you keep dancing around it, I'm going to assume the 
answer is 'no'.

Sep 16

"Richard (Rikki) Andrew Cattermole" <richard cattermole.co.nz> writes:

On 17/09/2025 3:52 AM, Dennis wrote:
 On Tuesday, 16 September 2025 at 14:21:07 UTC, Richard (Rikki) Andrew 
 Cattermole wrote:
 Here are some code similar to what I've sent Walter over the past year 
 for what I want to work by default for escape analysis in D.

 
 You gave 4 code snippets. 1 and 2 already compile as expected, 3 already 
 errors as expected if you add ` safe`, and 4 has no defined expected 
 behavior. None of these have a source linking them to an existing D 
 user / code base.
 
 To reiterate, the question is:

As expected.

These examples show true positives. The problem is eliminating perceived 
false positives but keeping the intended true positives.

 Can you think of any more realistic examples where DFA allows you to 
 infer safety where DIP1000 is currently limited?

 
 As long as you keep dancing around it, I'm going to assume the answer is 
 'no'.

Okay fine.
Bugzilla is down, so I can't look through the WONTFIX's and I'm not 
going to go hunting on the N.G. learn forum.

I can spot: https://github.com/dlang/dmd/issues/19679
You tried to fix it and haven't in the past. With a DFA for the 
infrastructure, that wouldn't be possible to have existed.

Another example from you for the above: 
https://github.com/dlang/dmd/issues/18113

Here is a ticket, with a comment from you showing why DIP1000 needs to 
be lint level attribution. 
https://github.com/dlang/dmd/issues/20302#issuecomment-2542028404
If it isn't you break things like the .di generator since it runs before 
semantic.

It also results in issues like: https://github.com/dlang/dmd/issues/19853

Another example where DIP1000 loses track of lifetimes: 
https://github.com/dlang/dmd/issues/18153
Pretty nasty.
To fix that you would need to keep track of a stack of lifetimes, with 
full knowledge of the variables in a function signature. All things a 
DFA would be good at.

An example where appropriate CFG processing with a DFA would allow this 
to not error: https://github.com/dlang/dmd/issues/19659
Sadly my fast DFA isn't able to model it so the statement walker marks 
the variable as unmodellable which prevents the false positive.
To model this across function boundaries, we need to add an escape set 
in the CFG of blockexit and on the function declaration.

Sep 16

Dennis <dkorpel gmail.com> writes:

On Tuesday, 16 September 2025 at 17:09:18 UTC, Richard (Rikki) 
Andrew Cattermole wrote:
 Okay fine.
 Bugzilla is down, so I can't look through the WONTFIX's and I'm 
 not going to go hunting on the N.G. learn forum.

Thanks a lot! I don't need many, I just need 1 good one, so you 
delivered already.

 I can spot: https://github.com/dlang/dmd/issues/19679
 You tried to fix it and haven't in the past. With a DFA for the 
 infrastructure, that wouldn't be possible to have existed.

 Another example from you for the above: 
 https://github.com/dlang/dmd/issues/18113

The escape checking logic for assigning a parameter with scope 
inference to a local variable is bad, definitely. Like you say 
I've been working on a fix which remains unfinished because:

- dmd/escape.d is full of duplicated spaghetti logic (though it 
has improved)
- The test suite and buildkite projects rely on existing quirks
- dip1000 by default is no longer a priority issue, so I shifted 
work to more important stuff

Unless you consider my refactoring and WIP solution a 
rewrite/DFA, I wouldn't say a DFA rewrite is *required*, but I 
certainly was tempted to throw out dmd/escape.d and rewrite that 
hot mess from scratch 🙂.

 Here is a ticket, with a comment from you showing why DIP1000 
 needs to be lint level attribution. 
 https://github.com/dlang/dmd/issues/20302#issuecomment-2542028404

You haven't defined what "lint level attribution" means, but that 
issue would be solved if we add inferred `scope` parameters to a 
function type's mangle in a next edition so I don't see how this 
issue necessitates a rewrite/DFA.

 If it isn't you break things like the .di generator since it 
 runs before semantic.

The .di generator also needs to run semantic to emit correct 
inferred function attributes, that change doesn't require 
rewriting DIP1000 / adding DFA.

 It also results in issues like:
 https://github.com/dlang/dmd/issues/19853

That issue is invalid, see closing comment.

 Another example where DIP1000 loses track of lifetimes: 
 https://github.com/dlang/dmd/issues/18153
 To fix that you would need to keep track of a stack of 
 lifetimes, with full knowledge of the variables in a function 
 signature. All things a DFA would be good at.

I haven't studied that issue, but I suspect it has more to do 
with the internal opApply rewrite than a lack of DFA.

 To model this across function boundaries, we need to add an 
 escape set in the CFG of blockexit and on the function 
 declaration.

How would you throw a `scope` Exception across function 
boundaries? I can't picture that being valid but I haven't 
thought very hard.

---

So far, the most representative issue is definitely 
https://github.com/dlang/dmd/issues/19679.

I agree that a rewrite using a DFA engine would be sufficient to 
handle that case, I just don't think it's necessary. And that's 
because I'm noticing a theme in the code snippets you use to 
showcase your DFA engine vs. the reduced code snippets from the 
bug reports you just listed. Notice how unlike DFA examples, the 
DIP1000 bugs are not about asserts, if-statements or loops. The 
problematic code almost always exists inside the same basic 
block. That's why I think considering a function's control-flow 
graph is currently overkill for DIP1000.

Sep 16

"Richard (Rikki) Andrew Cattermole" <richard cattermole.co.nz> writes:

On 17/09/2025 8:00 AM, Dennis wrote:
 On Tuesday, 16 September 2025 at 17:09:18 UTC, Richard (Rikki) Andrew 
 Cattermole wrote:
 Okay fine.
 Bugzilla is down, so I can't look through the WONTFIX's and I'm not 
 going to go hunting on the N.G. learn forum.

 
 Thanks a lot! I don't need many, I just need 1 good one, so you 
 delivered already.

I wish I could have offered more in WONTFIX, it would have been 
illuminating in false positive/perceived false positive territory.
 I can spot: https://github.com/dlang/dmd/issues/19679
 You tried to fix it and haven't in the past. With a DFA for the 
 infrastructure, that wouldn't be possible to have existed.

 Another example from you for the above: https://github.com/dlang/dmd/ 
 issues/18113

 
 The escape checking logic for assigning a parameter with scope inference 
 to a local variable is bad, definitely. Like you say I've been working 
 on a fix which remains unfinished because:
 
 - dmd/escape.d is full of duplicated spaghetti logic (though it has 
 improved)
 - The test suite and buildkite projects rely on existing quirks
 - dip1000 by default is no longer a priority issue, so I shifted work to 
 more important stuff
 
 Unless you consider my refactoring and WIP solution a rewrite/DFA, I 
 wouldn't say a DFA rewrite is *required*, but I certainly was tempted to 
 throw out dmd/escape.d and rewrite that hot mess from scratch 🙂.Is Atila
aware that DIP1000 implementation has these issues requiring a 

rewrite which you determined independently of me?

I suspect he is operating under the assumption that to get it on by 
default is just a matter of tweaking.
 It also results in issues like:
 https://github.com/dlang/dmd/issues/19853

 
 That issue is invalid, see closing comment.

During this I found a few that needed closing as they were out of date. 
Sent to Nic to close since I don't have rights to do so.

 Another example where DIP1000 loses track of lifetimes: https:// 
 github.com/dlang/dmd/issues/18153
 To fix that you would need to keep track of a stack of lifetimes, with 
 full knowledge of the variables in a function signature. All things a 
 DFA would be good at.

 
 I haven't studied that issue, but I suspect it has more to do with the 
 internal opApply rewrite than a lack of DFA.
 
 To model this across function boundaries, we need to add an escape set 
 in the CFG of blockexit and on the function declaration.

 
 How would you throw a `scope` Exception across function boundaries? I 
 can't picture that being valid but I haven't thought very hard.

Something akin to:

```d
void myThrow(return MyException e)  throws(MyException) {
	throw e;
}

void caller() {
	scope e = new MyException;
	try {
		myThrow(e);
	} catch(Exception) {
	}
}
```

Why would someone do that? I don't know.

However the main thing is to map function calls to their corresponding 
catches, allowing the CFG analysis to do its job. Impossible for it to 
work correctly otherwise in regards to catch statements.

Otherwise any try block with function calls to throwing functions has to 
be marked the catches unmodellable. Which is rather limiting.

 ---
 
 So far, the most representative issue is definitely https://github.com/ 
 dlang/dmd/issues/19679.
 
 I agree that a rewrite using a DFA engine would be sufficient to handle 
 that case, I just don't think it's necessary. And that's because I'm 
 noticing a theme in the code snippets you use to showcase your DFA 
 engine vs. the reduced code snippets from the bug reports you just 
 listed. Notice how unlike DFA examples, the DIP1000 bugs are not about 
 asserts, if-statements or loops. The problematic code almost always 
 exists inside the same basic block. That's why I think considering a 
 function's control-flow graph is currently overkill for DIP1000.

Of course that trend exists, I'm currently working on the truthiness & 
nullability features!Its all about CFG analysis. But there is a reason 
for it. Each analysis feeds into each other, giving better results over all.

Sep 16

Dennis <dkorpel gmail.com> writes:

On Tuesday, 16 September 2025 at 23:02:05 UTC, Richard (Rikki) 
Andrew Cattermole wrote:
 I wish I could have offered more in WONTFIX, it would have been 
 illuminating in false positive/perceived false positive 
 territory.

My search gives:

- 18295 [Scope][dip1000] `scope class` check too conservative 
under -dip1000
- 18637 [scope][DIP1000] "copying & i into allocated memory 
escapes a reference to local variable i" where it's inappropriate
- 19301 [DIP1000] missing overload abilities
- 23941 [DIP1000] Overloading by scope should be allowed

There's nothing interesting there.

 Is Atila aware that DIP1000 implementation has these issues 
 requiring a
 rewrite which you determined independently of me?

I said I was tempted to rewrite, not that it is required. You 
have to ask Atila whether he is aware, but I did talk about the 
DIP1000s issues and implementation quality at DConf '23 and '24.

 I suspect he is operating under the assumption that to get it 
 on by default is just a matter of tweaking.

It is. The thing that's blocking DIP1000 being enabled by default 
is that existing code relies on slicing local static arrays being 
freely allowed in ` safe` functions, but that could be disallowed 
in a next edition, unless you explicitly opt into scope semantics.

Sep 16

Walter Bright <newshound2 digitalmars.com> writes:

On 9/13/2025 4:06 AM, Dennis wrote:
 As far as I'm aware, [users expect dead code to be type checked by 
 the frontend as 
 usual](https://forum.dlang.org/post/hullwyrnmgcaoghaqbux forum.dlang.org).

That is correct. For the false path of a `static if`, the code must be 
syntactically correct but does not have to be semantically correct. `static if` 
would be rather useless if it required semantic correctness.

Sep 13

"Richard (Rikki) Andrew Cattermole" <richard cattermole.co.nz> writes:

On 14/09/2025 3:12 PM, Walter Bright wrote:
 On 9/13/2025 4:06 AM, Dennis wrote:
 As far as I'm aware, [users expect dead code to be type checked by the 
 frontend as usual](https://forum.dlang.org/post/ 
 hullwyrnmgcaoghaqbux forum.dlang.org).

 
 That is correct. For the false path of a `static if`, the code must be 
 syntactically correct but does not have to be semantically correct. 
 `static if` would be rather useless if it required semantic correctness.

The linked code is not static if, its a regular if.

Sep 13

Walter Bright <newshound2 digitalmars.com> writes:

On 9/13/2025 8:42 PM, Richard (Rikki) Andrew Cattermole wrote:
 The linked code is not static if, its a regular if.

I know. I was trying to clarify the difference between regular if and static if.

Sep 14

Walter Bright <newshound2 digitalmars.com> writes:

DIP1000 is necessary when the DFA is faced with only a function prototype
rather 
than having the body of the function available.

Sep 13

"Richard (Rikki) Andrew Cattermole" <richard cattermole.co.nz> writes:

On 14/09/2025 3:09 PM, Walter Bright wrote:
 DIP1000 is necessary when the DFA is faced with only a function 
 prototype rather than having the body of the function available.

I'm curious as to why you said this.

There are only three differences I can come up with between a classical 
DFA and DIP1000:

- Attributes and lack thereof
- Single forward pass vs work list algorithm
- In the type system (storage classes ugh), vs lint level or 
optimization level

Any variation of the three can be present in a _modern_ frontend DFA.

The delineation between the two is basically nonexistent from my 
perspective. Unless you are not referring to modern frontend DFA, and 
actually mean classical DFA's which are indeed quite different.

Sep 13

D Programming

C/C++ Programming

Other

digitalmars.D - Why use a DFA instead of DIP1000?