digitalmars.D - Clang static analysis results for dmd
- Robert Clipsham (15/15) Jul 24 2011 Hi all,
- Robert Clipsham (6/18) Jul 24 2011 Question: Is it worth making bug reports out of any of these? If so,
- Andrei Alexandrescu (6/15) Jul 24 2011 Very interesting. I selected randomly one of each category and found
- Robert Clipsham (9/27) Jul 24 2011 That's true for quite a few of the null pointer dereferences, there are
- Walter Bright (6/11) Jul 24 2011 dmd uses its own definition of assert(). It is marked as:
- Robert Clipsham (7/23) Jul 24 2011 http://clang.llvm.org/docs/LanguageExtensions.html#__builtin_unreachable
- Walter Bright (2/5) Jul 24 2011 Done.
- bearophile (5/6) Jul 24 2011 Thank you, this will be useful, and hopefully it will have some good con...
- Walter Bright (7/14) Jul 24 2011 Looks like clang has some bugs itself.
- Robert Clipsham (7/13) Jul 26 2011 It does if value is always within the range of an unsigned short. I'm
- Walter Bright (3/18) Jul 26 2011 It's not a bug in dmc. The code is quite deliberate and necessary. The b...
- Robert Clipsham (20/22) Jul 26 2011 $ cat test.c
- Don (6/23) Jul 25 2011 It seems to have ignored the backend source files which are generated by...
- Robert Clipsham (6/11) Jul 26 2011 It has indeed ignored them. This is a consequence of the way I was
- Don (13/30) Jul 27 2011 Here's a clang bug:
- Don (9/44) Jul 27 2011 Another one:
- Trass3r (2/14) Jul 27 2011 Yeah, though that code should really use else instead of another check.
- Robert Clipsham (10/22) Jul 27 2011 I've posted updated results:
- bearophile (4/5) Jul 27 2011 I'd like all those tests done by the D compiler too (Clang doesn't perfo...
- Walter Bright (10/14) Jul 28 2011 The signal to noise ratio of this kind of flow analysis is rather poor. ...
- bearophile (119/121) Jul 28 2011 I don't fully understand your answers on this. I am a bit confused, but ...
- Walter Bright (26/34) Jul 28 2011 If they are part of the compiler, I guarantee you that people will regar...
- Walter Bright (5/5) Jul 28 2011 Take a look at all the complaints about evalu8.c. They're all false posi...
- bearophile (7/14) Jul 28 2011 Somewhere I have read that the errors found by Clang are so often true b...
- Walter Bright (8/11) Jul 28 2011 The thing is, I've tried the things Clang tries way back in the 80's. It...
- bearophile (5/7) Jul 29 2011 "Unused variables", "dead assignments", "dead last assignments" aren't b...
- Walter Bright (22/22) Jul 29 2011 Here's another one:
- Brad Roberts (17/44) Jul 29 2011 I look at it in (at last) two ways which are are contradictory..
- Walter Bright (7/15) Jul 29 2011 In general I agree with this, which is why I've made some changes to the...
- Brad Roberts (8/27) Jul 29 2011 Two real, hitable, bugs?
- Walter Bright (2/3) Jul 29 2011 Nobody has run into them yet.
- Trass3r (4/16) Jul 29 2011 Clang's static analysis isn't very mature yet.
- Walter Bright (2/18) Jul 29 2011 It's not about maturity, it's simply impossible to correctly do the abov...
- Brad Roberts (14/37) Jul 29 2011 You keep using the term impossible. It's impossible to be 100% correct ...
- Walter Bright (20/33) Jul 29 2011 It's the halting problem. How are you going to prove that a->b->c yields...
- bearophile (5/11) Jul 29 2011 Clang static analysis is a very new sub-tool. On the market there are co...
- Andrew Wiley (9/39) Jul 29 2011 Woah Woah Woah
- bearophile (7/9) Jul 29 2011 They are now trying again using F* (derived by F#, that is a ML-derived ...
- Walter Bright (8/29) Jul 29 2011 I think you misunderstand the scope of this problem. They've done data f...
- Andrei Alexandrescu (3/6) Jul 29 2011 But that's also an approximate thing :o).
- Walter Bright (3/8) Jul 29 2011 True, but it's done in a way that enables things rather than disallowing...
- dennis luehring (10/38) Jul 30 2011 but your code does not reflect your "B is true if and only if A is
- Don (8/40) Jul 31 2011 The example I posted was clearer.
- dennis luehring (5/40) Jul 31 2011 but i think clang was right with Walters code - i do not understand
- Walter Bright (8/12) Jul 31 2011 What I meant was:
- Walter Bright (2/6) Jul 27 2011 I thought I did. Will check.
Hi all, I took the liberty of running dmd through clang's static analysis, and it's turned up quite a few (486) potential bugs in dmd: http://octarineparrot.com/assets/dmd/ There's bound to be a few false positives in there (please feel free to report them to the nice folk over at llvm), and the dead code won't cause any problems (and most of the "dead code" in the backend is probably in use with dmc or something), but there are still a few hundred potential crashes in there. The link above is rather slow - if anyone wants a local copy to work with let me know and I'll see about getting one to you... The compressed source to those pages is ~206MB, it's about 2.8GB without compression! -- Robert http://octarineparrot.com/
Jul 24 2011
On 24/07/2011 22:06, Robert Clipsham wrote:Hi all, I took the liberty of running dmd through clang's static analysis, and it's turned up quite a few (486) potential bugs in dmd: http://octarineparrot.com/assets/dmd/ There's bound to be a few false positives in there (please feel free to report them to the nice folk over at llvm), and the dead code won't cause any problems (and most of the "dead code" in the backend is probably in use with dmc or something), but there are still a few hundred potential crashes in there. The link above is rather slow - if anyone wants a local copy to work with let me know and I'll see about getting one to you... The compressed source to those pages is ~206MB, it's about 2.8GB without compression!Question: Is it worth making bug reports out of any of these? If so, which, and should I group them? -- Robert http://octarineparrot.com/
Jul 24 2011
On 7/24/11 4:06 PM, Robert Clipsham wrote:Hi all, I took the liberty of running dmd through clang's static analysis, and it's turned up quite a few (486) potential bugs in dmd: http://octarineparrot.com/assets/dmd/ There's bound to be a few false positives in there (please feel free to report them to the nice folk over at llvm), and the dead code won't cause any problems (and most of the "dead code" in the backend is probably in use with dmc or something), but there are still a few hundred potential crashes in there.Very interesting. I selected randomly one of each category and found only false positives though, mostly because the tool does not understand that assert(p) will subsequently guarantee p is non-null, and assert(0) terminates the program. Andrei
Jul 24 2011
On 24/07/2011 22:29, Andrei Alexandrescu wrote:On 7/24/11 4:06 PM, Robert Clipsham wrote:That's true for quite a few of the null pointer dereferences, there are some where there are no assertions though. It might be worth reporting these false positives to the folk working on it. In the other categories there are far fewer where they're false positives due to the assertion. It tends to be the ones with shorter path lengths that exhibit this problem. -- Robert http://octarineparrot.com/Hi all, I took the liberty of running dmd through clang's static analysis, and it's turned up quite a few (486) potential bugs in dmd: http://octarineparrot.com/assets/dmd/ There's bound to be a few false positives in there (please feel free to report them to the nice folk over at llvm), and the dead code won't cause any problems (and most of the "dead code" in the backend is probably in use with dmc or something), but there are still a few hundred potential crashes in there.Very interesting. I selected randomly one of each category and found only false positives though, mostly because the tool does not understand that assert(p) will subsequently guarantee p is non-null, and assert(0) terminates the program. Andrei
Jul 24 2011
On 7/24/2011 2:43 PM, Robert Clipsham wrote:That's true for quite a few of the null pointer dereferences, there are some where there are no assertions though. It might be worth reporting these false positives to the folk working on it. In the other categories there are far fewer where they're false positives due to the assertion. It tends to be the ones with shorter path lengths that exhibit this problem.dmd uses its own definition of assert(). It is marked as: #pragma noreturn(util_assert) in tassert.h. If there is an equivalent for clang, I suggest adding that in and rerunning the analysis. Getting rid of hundreds of false positives will save a lot of time.
Jul 24 2011
On 24/07/2011 22:59, Walter Bright wrote:On 7/24/2011 2:43 PM, Robert Clipsham wrote:If you'd do the honours of adding this in I'll rerun the analysis next time I get a chance. -- Robert http://octarineparrot.com/That's true for quite a few of the null pointer dereferences, there are some where there are no assertions though. It might be worth reporting these false positives to the folk working on it. In the other categories there are far fewer where they're false positives due to the assertion. It tends to be the ones with shorter path lengths that exhibit this problem.dmd uses its own definition of assert(). It is marked as: #pragma noreturn(util_assert) in tassert.h. If there is an equivalent for clang, I suggest adding that in and rerunning the analysis. Getting rid of hundreds of false positives will save a lot of time.
Jul 24 2011
On 7/24/2011 3:11 PM, Robert Clipsham wrote:If you'd do the honours of adding this in I'll rerun the analysis next time I get a chance.Done.
Jul 24 2011
Robert Clipsham:I took the liberty of running dmd through clang's static analysis,Thank you, this will be useful, and hopefully it will have some good consequences :-) (I'm reading a random sample of the bugs.) Bye, bearophile
Jul 24 2011
On 7/24/2011 2:06 PM, Robert Clipsham wrote:I took the liberty of running dmd through clang's static analysis, and it's turned up quite a few (486) potential bugs in dmd: http://octarineparrot.com/assets/dmd/ There's bound to be a few false positives in there (please feel free to report them to the nice folk over at llvm), and the dead code won't cause any problems (and most of the "dead code" in the backend is probably in use with dmc or something), but there are still a few hundred potential crashes in there.Looks like clang has some bugs itself. http://octarineparrot.com/assets/dmd/report-ZGvM0T.html#EndPath if ((unsigned short)value != value) 6 Both operands to '!=' always have the same value No, that's not true. value is a uint64_t, and casting it to unsigned short and then back to uint64_t does not yield the same value.
Jul 24 2011
On 25/07/2011 03:10, Walter Bright wrote:Looks like clang has some bugs itself. http://octarineparrot.com/assets/dmd/report-ZGvM0T.html#EndPath if ((unsigned short)value != value) 6 Both operands to '!=' always have the same value No, that's not true. value is a uint64_t, and casting it to unsigned short and then back to uint64_t does not yield the same value.It does if value is always within the range of an unsigned short. I'm not sure whether it is in this case or not, if not we need to narrow down a test case and report a bug. -- Robert http://octarineparrot.com/
Jul 26 2011
On 7/26/2011 11:01 AM, Robert Clipsham wrote:On 25/07/2011 03:10, Walter Bright wrote:It's not a bug in dmc. The code is quite deliberate and necessary. The bug is in clang.Looks like clang has some bugs itself. http://octarineparrot.com/assets/dmd/report-ZGvM0T.html#EndPath if ((unsigned short)value != value) 6 Both operands to '!=' always have the same value No, that's not true. value is a uint64_t, and casting it to unsigned short and then back to uint64_t does not yield the same value.It does if value is always within the range of an unsigned short. I'm not sure whether it is in this case or not, if not we need to narrow down a test case and report a bug.
Jul 26 2011
On 26/07/2011 19:08, Walter Bright wrote:It's not a bug in dmc. The code is quite deliberate and necessary. The bug is in clang.$ cat test.c #include <stdint.h> #include <stdlib.h> int main(int argc, char *argv[]) { uint64_t value = atoi(argv[0]); if ((unsigned short)value != value) { return 0; } return 1; } $ scan-build clang -g test.c Note that this leads to no errors from the analyzer - it will only flag it if it knows statically whether value fits within an unsigned short or not. We'll see if it's still flagged when I re-run it. -- Robert http://octarineparrot.com/
Jul 26 2011
Robert Clipsham wrote:Hi all, I took the liberty of running dmd through clang's static analysis, and it's turned up quite a few (486) potential bugs in dmd: http://octarineparrot.com/assets/dmd/ There's bound to be a few false positives in there (please feel free to report them to the nice folk over at llvm), and the dead code won't cause any problems (and most of the "dead code" in the backend is probably in use with dmc or something), but there are still a few hundred potential crashes in there. The link above is rather slow - if anyone wants a local copy to work with let me know and I'll see about getting one to you... The compressed source to those pages is ~206MB, it's about 2.8GB without compression!It seems to have ignored the backend source files which are generated by early build steps optabgen.c, etc. Either that, or else it doesn't realize that the arrays in those files (eg, optab1[] in optab.c) are constant values. Every backend report that I looked at, was a false positive caused by this.
Jul 25 2011
On 25/07/2011 13:05, Don wrote:It seems to have ignored the backend source files which are generated by early build steps optabgen.c, etc. Either that, or else it doesn't realize that the arrays in those files (eg, optab1[] in optab.c) are constant values. Every backend report that I looked at, was a false positive caused by this.It has indeed ignored them. This is a consequence of the way I was running the build, I'll see if I can sort this out when I run it again. -- Robert http://octarineparrot.com/
Jul 26 2011
Robert Clipsham wrote:Hi all, I took the liberty of running dmd through clang's static analysis, and it's turned up quite a few (486) potential bugs in dmd: http://octarineparrot.com/assets/dmd/ There's bound to be a few false positives in there (please feel free to report them to the nice folk over at llvm), and the dead code won't cause any problems (and most of the "dead code" in the backend is probably in use with dmc or something), but there are still a few hundred potential crashes in there. The link above is rather slow - if anyone wants a local copy to work with let me know and I'll see about getting one to you... The compressed source to those pages is ~206MB, it's about 2.8GB without compression!Here's a clang bug: http://octarineparrot.com/assets/dmd/report-vtxpYt.html#EndPath 3436 if (!exp) <9> Taking false branch 3437 fd->nrvo_can = 0; 3438 3439 if (exp) <10> Taking false branch 3440 { That is, (!exp) is false, and (exp) is also false.
Jul 27 2011
Don wrote:Robert Clipsham wrote:Another one: http://octarineparrot.com/assets/dmd/report-bksOGf.html#EndPath Quite bizarre -- it seems to think the static member array Type::sizeTy[TMAX] is a null pointer. I finally found one genuine DMD bug report: http://d.puremagic.com/issues/show_bug.cgi?id=6389 But it looks to me as though the reports show more bugs in clang, than in DMD.Hi all, I took the liberty of running dmd through clang's static analysis, and it's turned up quite a few (486) potential bugs in dmd: http://octarineparrot.com/assets/dmd/ There's bound to be a few false positives in there (please feel free to report them to the nice folk over at llvm), and the dead code won't cause any problems (and most of the "dead code" in the backend is probably in use with dmc or something), but there are still a few hundred potential crashes in there. The link above is rather slow - if anyone wants a local copy to work with let me know and I'll see about getting one to you... The compressed source to those pages is ~206MB, it's about 2.8GB without compression!Here's a clang bug: http://octarineparrot.com/assets/dmd/report-vtxpYt.html#EndPath 3436 if (!exp) <9> Taking false branch 3437 fd->nrvo_can = 0; 3438 3439 if (exp) <10> Taking false branch 3440 { That is, (!exp) is false, and (exp) is also false.
Jul 27 2011
Am 27.07.2011, 10:22 Uhr, schrieb Don <nospam nospam.com>:Here's a clang bug: http://octarineparrot.com/assets/dmd/report-vtxpYt.html#EndPath 3436 if (!exp) <9> Taking false branch 3437 fd->nrvo_can = 0; 3438 3439 if (exp) <10> Taking false branch 3440 { That is, (!exp) is false, and (exp) is also false.Yeah, though that code should really use else instead of another check.
Jul 27 2011
On 24/07/2011 22:06, Robert Clipsham wrote:Hi all, I took the liberty of running dmd through clang's static analysis, and it's turned up quite a few (486) potential bugs in dmd: http://octarineparrot.com/assets/dmd/ There's bound to be a few false positives in there (please feel free to report them to the nice folk over at llvm), and the dead code won't cause any problems (and most of the "dead code" in the backend is probably in use with dmc or something), but there are still a few hundred potential crashes in there. The link above is rather slow - if anyone wants a local copy to work with let me know and I'll see about getting one to you... The compressed source to those pages is ~206MB, it's about 2.8GB without compression!I've posted updated results: http://octarineparrot.com/assets/dmd/ You'll probably need to hit refresh before the new results come up. Note that in addition to Walter's global.h changes, __attribute__((noreturn)) needed to be added to local_assert and util_assert in tassert.h too (could you add these too Walter?) -- Robert http://octarineparrot.com/
Jul 27 2011
Robert Clipsham:I've posted updated results:I'd like all those tests done by the D compiler too (Clang doesn't perform those tests on default because they take time, you have to add the --analyze compiler switch). Some of them like Dead store look better as warnings, other ones seem better as errors. Bye, bearophile
Jul 27 2011
On 7/27/2011 3:50 PM, bearophile wrote:I'd like all those tests done by the D compiler too (Clang doesn't perform those tests on default because they take time, you have to add the --analyze compiler switch). Some of them like Dead store look better as warnings, other ones seem better as errors.The signal to noise ratio of this kind of flow analysis is rather poor. While it does find some legitimate bugs, the rate of false positives is far too high to be a standard part of the language. I would agree that adding extra conditionals to the source code will both eliminate the false positives and make the code more readable, but those extra conditionals exact a performance penalty and would not be something a high performance coder would want. I originally did have stuff like this in the optimizer, but removed it because the false positive rate was untenable.
Jul 28 2011
Walter:The signal to noise ratio of this kind of flow analysis is rather poor. While it does find some legitimate bugs, the rate of false positives is far too high to be a standard part of the language.<I don't fully understand your answers on this. I am a bit confused, but it's not your fault. In Clang those tests aren't a standard part of the C or C++ languages. They are extra tests, like a lint tool built in the compiler, and they aren't a part of the normal compilation (if you use --analyze it doesn't produce a compiled binary, but an HTML of the test results). I take a look at a random sample of the first groups of the results: --------------------- Some Dead code, Idempotent operation: uRegmask3 = ASM_GET_uRegmask(popnd3->usFlags); Value stored to 'uRegmask3' is never read ty = ta->ty; Value stored to 'ty' is never read --------------------- Some Dead code, dead assignment: uSizemask3 = ASM_GET_uSizemask(popnd3->usFlags); Value stored to 'uSizemask3' is never read s = retregs & mES; Value stored to 's' is never read --------------------- Some Dead code, Dead increment: offset += vtblInterfaces->dim * (4 * PTRSIZE); Value stored to 'offset' is never read flags |= 1; // already deduced, so don't to toHeadMutable() Value stored to 'flags' is never read --------------------- Dead store Dead initialization: TY tyto = t->toBasetype()->ty; Value stored to 'tyto' during its initialization is never read int aimports_dim = aimports.dim; Value stored to 'aimports_dim' during its initialization is never read --------------------- Logic error Assigned value is garbage or undefined: Parameters *FuncDeclaration::getParameters(int *pvarargs) 2918 { Parameters *fparameters; 2919 int fvarargs; 2920 2921 if (type) 1 Taking false branch 2922 { 2923 assert(type->ty == Tfunction); 2924 TypeFunction *fdtype = (TypeFunction *)type; 2925 fparameters = fdtype->parameters; 2926 fvarargs = fdtype->varargs; 2927 } 2928 if (pvarargs) 2 Taking true branch 2929 *pvarargs = fvarargs; 3 Assigned value is garbage or undefined --------------------- Logic error Dereference of undefined pointer value: STATIC void ivfamelems(register Iv *biv,register elem **pn) 2447 { register unsigned op; 2448 register tym_t ty,c2ty; 2449 register famlist *f; 2450 register elem *n,*n1,*n2; 2451 2452 assert(pn); 2453 n = *pn; 2454 assert(biv && n); 2455 op = n->Eoper; 2456 if (OTunary(op)) 1 Taking true branch 2457 { ivfamelems(biv,&n->E1); 2458 n1 = n->E1; 2459 } 2460 else if (OTbinary(op)) 2461 { ivfamelems(biv,&n->E1); 2462 ivfamelems(biv,&n->E2); /* LTOR or RTOL order is unimportant */ 2463 n1 = n->E1; 2464 n2 = n->E2; 2465 } 2466 else /* else leaf elem */ 2467 return; /* which can't be in the family */ 2468 2469 if (op == OPmul || op == OPadd || op == OPmin || 2 Taking true branch 2470 op == OPneg || op == OPshl) 2471 { /* Note that we are wimping out and not considering */ 2472 /* LI variables as part of c1 and c2, but only constants. */ 2473 2474 ty = n->Ety; 2485 2486 /* If we have (li + var), swap the leaves. */ 2487 if (op == OPadd && isLI(n1) && n1->Eoper == OPvar && n2->Eoper == OPvar) 3 Dereference of undefined pointer value -------------------- 2819 targ_ldouble el_toldouble(elem *e) 2820 { targ_ldouble result; 2821 2822 elem_debug(e); 2823 assert(cnst(e)); 2824 #if TX86 2825 switch (tybasic(typemask(e))) 1 'Default' branch taken. Execution continues on line 2860 2860 return result; 2 Undefined or garbage value returned to caller 2861 } ----------------------- Is Clang correct there, or are those false positives? If it's correct then I'd like the D compiler to tell me 100% of those I have listed here, even if not even one of those is a real bug. In some cases you store a value in a variable even if you know you will not use it (example: last iteration of a loop, to code simpler and shoter. But I'd like to know every time I do this. I like to write tidy code. Returning values that can be undefined is less easy to catch in D, because the language initializes variables, to their initial default value is sometimes what the programmer wants.I would agree that adding extra conditionals to the source code will both eliminate the false positives and make the code more readable, but those extra conditionals exact a performance penalty and would not be something a high performance coder would want. I originally did have stuff like this in the optimizer, but removed it because the false positive rate was untenable.<I don't fully understand what you are saying, but is __assume useful here? http://msdn.microsoft.com/en-us/library/1b3fsfxw%28VS.80%29.aspx Bye, bearophile
Jul 28 2011
On 7/28/2011 3:27 PM, bearophile wrote:In Clang those tests aren't a standard part of the C or C++ languages. They are extra tests, like a lint tool built in the compiler, and they aren't a part of the normal compilation (if you use --analyze it doesn't produce a compiled binary, but an HTML of the test results).If they are part of the compiler, I guarantee you that people will regard it as a standard part of the language, and the complaints about false positives will cause problems.Some Dead code, Idempotent operation:Dead code is not a bug. It's more of a stylistic issue, and sometimes that "dead" code is needed in other code that has been #if'd out. The compiler complaining about dead code is also a nuisance when turning on and off sections of code that is a normal part of the dev process.Is Clang correct there, or are those false positives? If it's correct then I'd like the D compiler to tell me 100% of those I have listed here, even if not even one of those is a real bug.I've been slowly going through the reports, and so far all of them have been false positives. Don found one that's a real bug, but I haven't gotten to it yet. Here's an example of a false positive - clang complains the comparison is idempotent: size_t e2factor; ... if (e2factor == (int)e2factor) For a 32 bit compile, yes, it's idempotent. But not for a 64 bit compile! I *want* it to be a no-op for a 32 bit compile and to become active in a 64 bit compile. Of course, I could also use an ugly #ifdef, but I like that little idiom, it works, and it is correct. I know why clang is doing what it is doing, but that shows a weakness in its static analysis. There are other false positives for things like assigning an uninitialized value to a field in a data structure that will never be used in the cases where it is uninitialized. I could add a conditional, but that's slower than just assigning it anyway. Trying to figure these things out with static analysis is impossible - it would be solving the halting problem - hence you're stuck with false positives.
Jul 28 2011
Take a look at all the complaints about evalu8.c. They're all false positives. Consider that this code has been in use for 25 years now. It's been used by several hundred thousand developers. If it was so chock full of scores of undefined values and null pointer dereferences, I probably would have heard about it by now!
Jul 28 2011
Walter:I've been slowly going through the reports, and so far all of them have been false positives.Somewhere I have read that the errors found by Clang are so often true bugs they have a tool that submits bug reports automatically. I now presume they were wrong. But keep in mind that Clang static analysis is a *very young* sub-tool. It's something like a year or so old. There are commercial C/C++ lints that are probably about or more than 20 years old that are probably less buggy and more precise than the Clang compiler, and they find far more kinds of problems in the code. It often finds issues in my code (usually stylistic issues, not true bugs, but I usually agree with its advice and change my code).There are other false positives for things like assigning an uninitialized value to a field in a data structure that will never be used in the cases where it is uninitialized. I could add a conditional, but that's slower than just assigning it anyway. Trying to figure these things out with static analysis is impossible - it would be solving the halting problem - hence you're stuck with false positives.Probably there are ways for a programmer+language to tell such simple semantics to a compiler, but C language is not good enough for this. Thank you for your answers. It's always interesting when "theory" (of practical-purposed software tools) meets the road of the experiment reality. I hope Clang has not wasted too much of your time. Bye, bearophile
Jul 28 2011
On 7/28/2011 5:19 PM, bearophile wrote:Thank you for your answers. It's always interesting when "theory" (of practical-purposed software tools) meets the road of the experiment reality. I hope Clang has not wasted too much of your time.The thing is, I've tried the things Clang tries way back in the 80's. It's attempting the impossible. C is just not amenable to static proofs (if I were a better theoretician I might be able to show that this is partially related to C being memory unsafe). Even so, I still plan to go through them all just to be sure. Clang does put out a nice report, though, far better than I attempted. A kickass bug finding tool that really *does* work on C/C++ code is valgrind.
Jul 28 2011
Walter:The thing is, I've tried the things Clang tries way back in the 80's. It's attempting the impossible."Unused variables", "dead assignments", "dead last assignments" aren't bugs and they are sometimes acceptable while you debug some code, but you too agree they are better removed from code meant to be good and tidy (also because they sometimes associated with bugs). You have appreciated some of those "dead code" (useless variables) notes from Clang: https://github.com/D-Programming-Language/dmd/compare/c50a936...08aa57a Bye, bearophile
Jul 29 2011
Here's another one: T* p; ... if (A) p = &t; ... if (B) ... *p ... B is true if and only if A is true. B can even be the same expression as A, such as a->b->c. Clang complains on the *p that p is "uninitialized". Ok, so I rewrite as: T* p = NULL; ... if (A) p = &t; ... if (B) ... *p ... but now clang says I'm dereferencing a NULL pointer. At this point, I'm faced with some significant trial-and-error refactoring to get rid of the message. At what point does this cease to be fixing "bugs" and become "contort to fit clang's quirks"?
Jul 29 2011
On Fri, 29 Jul 2011, Walter Bright wrote:Here's another one: T* p; ... if (A) p = &t; ... if (B) ... *p ... B is true if and only if A is true. B can even be the same expression as A, such as a->b->c. Clang complains on the *p that p is "uninitialized". Ok, so I rewrite as: T* p = NULL; ... if (A) p = &t; ... if (B) ... *p ... but now clang says I'm dereferencing a NULL pointer. At this point, I'm faced with some significant trial-and-error refactoring to get rid of the message. At what point does this cease to be fixing "bugs" and become "contort to fit clang's quirks"?I look at it in (at last) two ways which are are contradictory.. 1) static analysis reported issues are of materialy less value than a bug report that comes with a repro case. reason: the analysis doesn't have a test case that can prove it stays fixed short of integrating the tool as part of the test process -- an almost pointless exercise unless there's additional tooling around it to mask out false positives. 2) if the tool has trouble analyzing the code, there's a not unreasonable chance a person also has trouble. The above case is a good one where depending on how close those two if's are in the code and how obvious it is that B is a super set of A, it's the kind of thing someone's going to have trouble with too. By and large though, this isn't the way I'd spend my time, unless you goal is to reduce test cases to feed into clang to improve it. The cost/benefit ratio just doesn't meet the bar. My 2 cents, Brad
Jul 29 2011
On 7/29/2011 1:30 PM, Brad Roberts wrote:2) if the tool has trouble analyzing the code, there's a not unreasonable chance a person also has trouble. The above case is a good one where depending on how close those two if's are in the code and how obvious it is that B is a super set of A, it's the kind of thing someone's going to have trouble with too.In general I agree with this, which is why I've made some changes to the source code to 'fix' some of the non-bugs identified by clang. I felt the changes made the code more readable.By and large though, this isn't the way I'd spend my time, unless you goal is to reduce test cases to feed into clang to improve it. The cost/benefit ratio just doesn't meet the bar.So far, two real bugs have been identified. This makes it worth one pass through the clang results, but as you say, the rate of false positives is so high it is not worth continuing to use it.
Jul 29 2011
On Fri, 29 Jul 2011, Walter Bright wrote:On 7/29/2011 1:30 PM, Brad Roberts wrote:Two real, hitable, bugs? I still look at cost/benefit.. in that same time a number of other things could be done that had at least as much direct benefit. Don't get me wrong, I really love static analysis tools, but ones that are mature and have mechanisms for managing the false positives. Later, Brad2) if the tool has trouble analyzing the code, there's a not unreasonable chance a person also has trouble. The above case is a good one where depending on how close those two if's are in the code and how obvious it is that B is a super set of A, it's the kind of thing someone's going to have trouble with too.In general I agree with this, which is why I've made some changes to the source code to 'fix' some of the non-bugs identified by clang. I felt the changes made the code more readable.By and large though, this isn't the way I'd spend my time, unless you goal is to reduce test cases to feed into clang to improve it. The cost/benefit ratio just doesn't meet the bar.So far, two real bugs have been identified. This makes it worth one pass through the clang results, but as you say, the rate of false positives is so high it is not worth continuing to use it.
Jul 29 2011
On 7/29/2011 2:49 PM, Brad Roberts wrote:Two real, hitable, bugs?Nobody has run into them yet.
Jul 29 2011
Here's another one: T* p; ... if (A) p = &t; ... if (B) ... *p ... B is true if and only if A is true. B can even be the same expression as A, such as a->b->c. Clang complains on the *p that p is "uninitialized"....At what point does this cease to be fixing "bugs" and become "contort to fit clang's quirks"?Clang's static analysis isn't very mature yet. Also Apple primarily pushes the Objective-C analysis development so they can integrate it into XCode.
Jul 29 2011
On 7/29/2011 1:35 PM, Trass3r wrote:It's not about maturity, it's simply impossible to correctly do the above case.Here's another one: T* p; ... if (A) p = &t; ... if (B) ... *p ... B is true if and only if A is true. B can even be the same expression as A, such as a->b->c. Clang complains on the *p that p is "uninitialized"....At what point does this cease to be fixing "bugs" and become "contort to fit clang's quirks"?Clang's static analysis isn't very mature yet.
Jul 29 2011
On Fri, 29 Jul 2011, Walter Bright wrote:On 7/29/2011 1:35 PM, Trass3r wrote:You keep using the term impossible. It's impossible to be 100% correct in 100% of the code bases. Sure, but that's obviously not the goal. The goal is to be correct enough on enough code to be useful. A far more achievable goal. Clang isn't there yet, but with time and effort it can improve towards that goal. That's where producing test cases on things it fails on is useful.. standard bug reporting (or feature gap reporting in this case). If B can be shown to be a super set of A, which is very doable for a reasonable set of cases, then that false positive can be eliminated. Continuing to think of these tools as either perfect or useless and useful discussions about them are kinda hard to have. Later, BradIt's not about maturity, it's simply impossible to correctly do the above case.Here's another one: T* p; ... if (A) p = &t; ... if (B) ... *p ... B is true if and only if A is true. B can even be the same expression as A, such as a->b->c. Clang complains on the *p that p is "uninitialized"....At what point does this cease to be fixing "bugs" and become "contort to fit clang's quirks"?Clang's static analysis isn't very mature yet.
Jul 29 2011
On 7/29/2011 2:53 PM, Brad Roberts wrote:You keep using the term impossible.It's the halting problem. How are you going to prove that a->b->c yields the same result after you've done pretty much any function call?It's impossible to be 100% correct in 100% of the code bases.When it is built into the compiler, I think it should be 100%. And frankly, the marketing of these false positive tools bugs me. They'll run their tool over some code base, then publish the results and make claims about all these hundreds of "bugs" they've found. Even worse, others will uncritically read those pages and assume those code bases are "buggy".Sure, but that's obviously not the goal. The goal is to be correct enough on enough code to be useful. A far more achievable goal. Clang isn't there yet, but with time and effort it can improve towards that goal. That's where producing test cases on things it fails on is useful.. standard bug reporting (or feature gap reporting in this case). If B can be shown to be a super set of A, which is very doable for a reasonable set of cases, then that false positive can be eliminated. Continuing to think of these tools as either perfect or useless and useful discussions about them are kinda hard to have.There are two things one can try with static analysis: 1. Flag the code if one cannot prove it is good. 2. Flag the code if one can prove it is bad. clang is going for (1). I think (2) is more appropriate. prover, and what a great tool that was. Reading the documentation on it, it looked great. So I tried it out in pastebin. It pretty much only works on the examples given in the marketing literature. It couldn't even handle bitwise ops. I know enough about data flow analysis to infer what clang is doing from its pattern of reports, and it hasn't even begun to solve this problem. As for breaking new ground, I really like what D is doing with value range propagation in implicit conversions. So far it's been a solid base hit.
Jul 29 2011
Walter:prover, and what a great tool that was. Reading the documentation on it, it looked great. So I tried it out in pastebin. It pretty much only works on the examples given in the marketing literature. It couldn't even handle bitwise ops.that are coming out now) is to create a small kernel (for an operating system) that is 100% proved correct (correct according to its specs, etc). If they don't use an automatic theorem prover, then they have to do 100% of the proofs by hand. Even if the tool is able to do only 30% of the proofs automatically (and another 40% with a bit of help) it's a lot of manual work saved. The papers report the percentage of the theorems done by the tool, the percentage of the theorems done partially by hand, and the remaining percentage done by hand. The source code is all available, to verify the numbers inside the papers are not made up.I know enough about data flow analysis to infer what clang is doing from its pattern of reports, and it hasn't even begun to solve this problem.Clang static analysis is a very new sub-tool. On the market there are commercial lint tools that are probably 15-18 times older than Clang. Give Clang two or three years are we'll see. Its creators have created a good C++ compiler in a very short time, and its back-end is already rivalling GCC in optimizations (it's not there yet, but the distance is not so much, auto-vectorization is one of the most important differences between the two back-ends). Bye, bearophile
Jul 29 2011
On Fri, Jul 29, 2011 at 4:51 PM, bearophile <bearophileHUGS lycos.com>wrote:Walter:Woah Woah Woah the entire CLR as well?contractprover, and what a great tool that was. Reading the documentation on it,itlooked great. So I tried it out in pastebin. It pretty much only works ontheexamples given in the marketing literature. It couldn't even handlebitwise ops. that are coming out now) is to create a small kernel (for an operating system)that is 100% proved correct (correct according to its specs, etc). If they don't use an automatic theorem prover, then they have to do 100% of the proofs by hand. Even if the tool is able to do only 30% of the proofs automatically (and another 40% with a bit of help) it's a lot of manual work saved. The papers report the percentage of the theorems done by the tool, the percentage of the theorems done partially by hand, and the remaining percentage done by hand. The source code is all available, to verify the numbers inside the papers are not made up.I've seen several times that a major feature of a lint tool was that you could disable warnings of certain types, which is necessary because there are just too many false positives. Clang will doubtlessly get better, but the fact will still remain that C and C++ are terrible languages to try to run static analyses on.I know enough about data flow analysis to infer what clang is doing fromitspattern of reports, and it hasn't even begun to solve this problem.Clang static analysis is a very new sub-tool. On the market there are commercial lint tools that are probably 15-18 times older than Clang. Give Clang two or three years are we'll see. Its creators have created a good C++ compiler in a very short time, and its back-end is already rivalling GCC in optimizations (it's not there yet, but the distance is not so much, auto-vectorization is one of the most important differences between the two back-ends).
Jul 29 2011
Andrew Wiley:functional language), that has a Coq-proved type system that seems a tour the force: http://lambda-the-ultimate.org/node/4318Have they proven the entire CLR as well?The project is unrelated to the CLR. contracts to the whole CLR. Bye, bearophile
Jul 29 2011
On 7/29/2011 4:51 PM, bearophile wrote:Walter:From the way you originally positioned it, I had higher expectations.contract prover, and what a great tool that was. Reading the documentation on it, it looked great. So I tried it out in pastebin. It pretty much only works on the examples given in the marketing literature. It couldn't even handle bitwise ops.that are coming out now) is to create a small kernel (for an operating system) that is 100% proved correct (correct according to its specs, etc). If they don't use an automatic theorem prover, then they have to do 100% of the proofs by hand. Even if the tool is able to do only 30% of the proofs automatically (and another 40% with a bit of help) it's a lot of manual work saved. The papers report the percentage of the theorems done by the tool, the percentage of the theorems done partially by hand, and the remaining percentage done by hand. The source code is all available, to verify the numbers inside the papers are not made up.I think you misunderstand the scope of this problem. They've done data flow analysis, an off-the-shelf technology for the last 30 years at least. They haven't done any of the hard stuff that I can see. It's like doing natural language translation using a dictionary. It looks like it'll work, and you can have promising early success with it, but then you hit a wall.I know enough about data flow analysis to infer what clang is doing from its pattern of reports, and it hasn't even begun to solve this problem.Clang static analysis is a very new sub-tool. On the market there are commercial lint tools that are probably 15-18 times older than Clang. Give Clang two or three years are we'll see.
Jul 29 2011
On 7/29/11 5:48 PM, Walter Bright wrote:As for breaking new ground, I really like what D is doing with value range propagation in implicit conversions. So far it's been a solid base hit.But that's also an approximate thing :o). Andrei
Jul 29 2011
On 7/29/2011 5:48 PM, Andrei Alexandrescu wrote:On 7/29/11 5:48 PM, Walter Bright wrote:True, but it's done in a way that enables things rather than disallowing them. This is much more palatable.As for breaking new ground, I really like what D is doing with value range propagation in implicit conversions. So far it's been a solid base hit.But that's also an approximate thing :o).
Jul 29 2011
Am 29.07.2011 22:02, schrieb Walter Bright:Here's another one: T* p; ... if (A) p = &t; ... if (B) ... *p ... B is true if and only if A is true. B can even be the same expression as A, such as a->b->c. Clang complains on the *p that p is "uninitialized". Ok, so I rewrite as: T* p = NULL; ... if (A) p = &t; ... if (B) ... *p ... but now clang says I'm dereferencing a NULL pointer. At this point, I'm faced with some significant trial-and-error refactoring to get rid of the message. At what point does this cease to be fixing "bugs" and become "contort to fit clang's quirks"?but your code does not reflect your "B is true if and only if A is true." statement T* p = NULL; if (A) { p = &t; if (B) ... *p ... }
Jul 30 2011
dennis luehring wrote:Am 29.07.2011 22:02, schrieb Walter Bright:The example I posted was clearer. Where p is just a variable: if (!p) { ... } if (p) { ... } it thinks that NEITHER of the code paths is taken. I would rate that as a show-stopper bug. Don't waste any more time on clang until that's fixed.Here's another one: T* p; ... if (A) p = &t; ... if (B) ... *p ... B is true if and only if A is true. B can even be the same expression as A, such as a->b->c. Clang complains on the *p that p is "uninitialized". Ok, so I rewrite as: T* p = NULL; ... if (A) p = &t; ... if (B) ... *p ... but now clang says I'm dereferencing a NULL pointer. At this point, I'm faced with some significant trial-and-error refactoring to get rid of the message. At what point does this cease to be fixing "bugs" and become "contort to fit clang's quirks"?but your code does not reflect your "B is true if and only if A is true." statement
Jul 31 2011
Am 31.07.2011 11:47, schrieb Don:dennis luehring wrote:but i think clang was right with Walters code - i do not understand why he splits up the A and B conditional block into spereated ones when B does not work if A wasn't true before... makes it sense (in exactly this case) to write the code like he does?Am 29.07.2011 22:02, schrieb Walter Bright:The example I posted was clearer.Here's another one: T* p; ... if (A) p = &t; ... if (B) ... *p ... B is true if and only if A is true. B can even be the same expression as A, such as a->b->c. Clang complains on the *p that p is "uninitialized". Ok, so I rewrite as: T* p = NULL; ... if (A) p = &t; ... if (B) ... *p ... but now clang says I'm dereferencing a NULL pointer. At this point, I'm faced with some significant trial-and-error refactoring to get rid of the message. At what point does this cease to be fixing "bugs" and become "contort to fit clang's quirks"?but your code does not reflect your "B is true if and only if A is true." statement
Jul 31 2011
On 7/31/2011 3:28 AM, dennis luehring wrote:but i think clang was right with Walters code - i do not understand why he splits up the A and B conditional block into spereated ones when B does not work if A wasn't true before... makes it sense (in exactly this case) to write the code like he does?What I meant was: "A" represents an expression "B" represents a different expression than "A", but is true if and only if "A" is true. For example: "A" might be: x=0;x=a->b->c?1:0;a->b->c "B" might be: x
Jul 31 2011
On 7/27/2011 3:08 PM, Robert Clipsham wrote:Note that in addition to Walter's global.h changes, __attribute__((noreturn)) needed to be added to local_assert and util_assert in tassert.h too (could you add these too Walter?)I thought I did. Will check.
Jul 27 2011