digitalmars.D - C++ static analysis
- bearophile (39/39) May 05 2011 Through Reddit I've seen another document that shows common little bugs ...
- Don (17/48) May 06 2011 I would say that disallowing it would _always_ lead to clearer code. It
- bearophile (27/36) May 06 2011 OK. It's in Bugzilla since some time:
- Timon Gehr (46/85) May 06 2011 Actually
- Timon Gehr (4/8) May 06 2011 Oh, it is not. I thought it was because I always use loop local counter ...
- bearophile (14/28) May 06 2011 A simple equivalence is enough here: the code with normalized whitespace...
- Timon Gehr (39/69) May 06 2011 Coccinelle project too.
- bearophile (5/8) May 06 2011 You are worried about compilation time. I think the feature we're talkin...
- Timon Gehr (7/15) May 07 2011 otherwise it performs a normal amount of static test on the code. This g...
Through Reddit I've seen another document that shows common little bugs in C/C++ code found by a static analysis tool. The bugs shown seem to be real, from real software: http://www.slideshare.net/Andrey_Karpov/add2011-en From the slides I have selected seven of the bugs, here translated to D2: enum uint FLAG1 = 0b001; void main() { int x = 1000; int y = 1; uint flags; int i; for (i = 0; i < 10; i++) } Notes about the bugs: flags. it's probably a bug. it's probably a bug. operator and the * (mult) operator. Notes about D2: D2 language (we have discussed this bug in past too, but the catching code is not present yet). I think D2 may require parentheses here. know how to catch it, or if it's worth catching. But it's probably a not so uncommon bug. ones. use foreach, or for with local index variable (that D doesn't allow you to redefine). Once in a while that kind of code is correct, maybe. know if D2 wants to actually turn this correct code into a real compile-time error. But it's worth thinking more about it. think it's not a so common bug. So: another related bug, with class/struct members assignments, http://d.puremagic.com/issues/show_bug.cgi?id=3878 ). anyway. Bye, bearophile
May 05 2011
bearophile wrote:Through Reddit I've seen another document that shows common little bugs in C/C++ code found by a static analysis tool. The bugs shown seem to be real, from real software: http://www.slideshare.net/Andrey_Karpov/add2011-en From the slides I have selected seven of the bugs, here translated to D2: enum uint FLAG1 = 0b001; void main() { int x = 1000; int y = 1; uint flags; int i; for (i = 0; i < 10; i++) } Notes about the bugs:I would say that disallowing it would _always_ lead to clearer code. It should always be rewritten as: if (!y && x) // wasn't a bug, but was unclear OR if (!(y & x)) // was a bug OR in the one-in-a-million case where the existing code is correct: if ( (!y) & x ) // wasn't a bug, but even this case is clearer: x MUST be evaluated a really complicated way (allowing boolean expressions, but not integral ones other than UnaryExpressions). And that seems quite hard to justify. But I don't find the other cases convincing. They don't seem any more if (1 || y) {} is common and completely legitimate. Disallowing that would be very annoying.
May 06 2011
Don:I would say that disallowing it would _always_ lead to clearer code.OK. It's in Bugzilla since some time: http://d.puremagic.com/issues/show_bug.cgi?id=5409 (But if we want to disallow something then we have to list exactly what syntax are not allowed.)a really complicated way (allowing boolean expressions, but not integral ones other than UnaryExpressions). And that seems quite hard to justify.I think in this case introducing complex rules is not good. auto z = x * (y ? 10 : 20);But I don't find the other cases convincing. They don't seem any more if (1 || y) {} is common and completely legitimate. Disallowing that would be very annoying.I agree that sometimes it's not a bug (example: the 1 is the __ctfe pseudo-constant). Always true or always false boolean conditions (unless they are just a single "true", "false", "0" or "1" literal) are sometimes associated with bugs. Good C lints give just a warning in this case, no error. I agree that statically disallowing this in all cases is too much harsh. ------------------ probably a bug. A similar case (auto-assign): x = x; Or even (auto-assign, through alias): alias x y; x = y; Or even (repeated assign): x = 10; x = 10; (repeated assign, this happens often enough, the second variable is not x): x = Foo.barx; x = Foo.bary; On the other hand automatic code generation (in CTFE) sometimes produces some redundancy in code, this may cause some false positives. But similar redundancies that often hide bugs. So I think this class of troubles deserves more thinking. A common bug in my code is this one, I have had something like this six times or more: http://d.puremagic.com/issues/show_bug.cgi?id=3878 Bye, bearophile
May 06 2011
Through Reddit I've seen another document that shows common little bugs in C/C++code found by a static analysis tool. The bugs shown seem to be real, from real software:http://www.slideshare.net/Andrey_Karpov/add2011-en From the slides I have selected seven of the bugs, here translated to D2: enum uint FLAG1 = 0b001; void main() { int x = 1000; int y = 1; uint flags; int i; for (i = 0; i < 10; i++) } Notes about the bugs:Actuallyflags.it's probably a bug.it'sprobably a bug.operator and the * (mult) operator.Notes about D2:D2 language (we have discussed this bug in past too, but the catching code is not present yet). I think D2 may require parentheses here.know how to catch it, or if it's worth catching. But it's probably a not so uncommon bug.notThe problem is that generated/CTFE'd code might produce that kind of redundancy. I think it would be possible to just catch plain user-written literals, but then,use foreach, or for with local index variable (that D doesn't allow you to redefine). Once in > a while that kind of code is correct, maybe. This is already disallowed (deprecated).know if D2 wants to actually turn this correct code into a real compile-time error. But it's worth thinking more about it. Proving that two sub-expressions are equivalent is a hard task. Has anyone ever experienced a bug that was hard to find similar to those toy examples? Well, maybe if(x||y||z||a||b||x||d){/*do stuff*/}, but who writes such code?think it's not a so common bug. You can write (most) code without ?:. But if a programmer wants to use it, he has to be familiar with its precedence rules (very simple, lower precedence than anything but = and ,). You cannot "fix" it, because any code relying on the precedence (about any code that uses ?: without using many excessive parens) will get broken, which includes most of my programs. It would be possible to disallow ?: outside a (), [], ',' (comma), or ?: expression by just a small change in grammar, so it would be feasible. I as an excessive user of ?: am very much against it though. A little anecdote: Apparently, ?: are not only a source of bugs in user code, but in compilers as well: int a,b,c,d,e; int*p=&(a?b?c:d:e); See http://d.puremagic.com/issues/show_bug.cgi?id=5799.So:I think requiring parentheses (providing undefined or ambiguous precedence rules) is ugly. D has done it before and it seems reasonable to do so, but still ugly. It also implies that the precedence rules would be a design mistake if not for compatibility with C. What is the great catch about having & defined on (bool, int) anyways?another related bug, with class/struct members assignments, http://d.puremagic.com/issues/show_bug.cgi?id=3878 ).anyway.What property makes them so bug-prone? I think they are quite easy to use.Bye, bearophileTimon
May 06 2011
Timon Gehr wrote:Oh, it is not. I thought it was because I always use loop local counter variables. But again, where to make the cut? More complex nested loops may legitimately want to access the same variable.usuallyuse foreach, or for with local index variable (that D doesn't allow you to redefine). Once in > a while that kind of code is correct, maybe. This is already disallowed (deprecated).
May 06 2011
Timon Gehr:The problem is that generated/CTFE'd code might produce that kind of redundancy.Right.Proving that two sub-expressions are equivalent is a hard task.A simple equivalence is enough here: the code with normalized whitespace. Those tools are probably doing something not much complex than this.Has anyone ever experienced a bug that was hard to find similar to those toy examples? Well, maybe if(x||y||z||a||b||x||d){/*do stuff*/}, but who writes such code?The examples I've shown are present in real code. See the results of the Coccinelle project too.But if a programmer wants to use it, he has to be familiar with its precedence rules (very simple, lower precedence than anything but = and ,).In computer languages there are plenty simple things that are bug-prone.I as an excessive user of ?: am very much against it though.It's not advisable to use the ?: operator a *lot* :-)I think requiring parentheses (providing undefined or ambiguous precedence rules) is ugly. D has done it before and it seems reasonable to do so, but still ugly.I don't see better solutions given the D Zen contraints. And in this case adding parentheses may be better than doing nothing. http://d.puremagic.com/issues/show_bug.cgi?id=5409It also implies that the precedence rules would be a design mistake if not for compatibility with C.I don't fully understand.What property makes them so bug-prone?I am not sure, but I think it encourages a too much compact code style (and you have to be a bit careful with operator precedence too). Probably there is some optimum for compactness of C-like code. The sources of the first J interprer is the point of not allowing an easy read.I think they are quite easy to use.Right, the ?: operator is easy to use. Thank you for your comments, bye, bearophile
May 06 2011
Timon Gehr:tools are probably doing something not much complex than this.The problem is that generated/CTFE'd code might produce that kind of redundancy.Right.Proving that two sub-expressions are equivalent is a hard task.A simple equivalence is enough here: the code with normalized whitespace. ThoseCoccinelle project too. Okay, I see. Lexical equivalence can catch the infamous copy-paste bugs. An alternative is to never ever copy-paste code. But unfortunately the compiler cannot enforce this, catching only the most trivial cases. I think it might be a good idea though. However, having this feature means requiring one AST compare for every boolean operator.Has anyone ever experienced a bug that was hard to find similar to those toy examples? Well, maybe if(x||y||z||a||b||x||d){/*do stuff*/}, but who writes such code?The examples I've shown are present in real code. See the results of theI use it whenever I would have to write either if(c) x=y;else x=z; // => x=c?y:z; or if(c) x=z;else y=z; // => c?x:y=z; This reduces code duplication, mostly when x or z are complicated expressions. I do not use it often in larger arithmetical expressions though. Unfortunately, D's conditional operator is not quite as mighty as C++'s (Eg, it cannot throw an exception). On the plus side, it is simple (0.5 p in TDPL, where C++ allocates 1.5 p in the standard to it). I also like g++ style x?:y;But if a programmer wants to use it, he has to be familiar with its precedence rules (very simple, lower precedence than anything but = and ,).In computer languages there are plenty simple things that are bug-prone.I as an excessive user of ?: am very much against it though.It's not advisable to use the ?: operator a *lot* :-)parentheses may be better than doing nothing. I agree. You usually don't want to do this anyways.I think requiring parentheses (providing undefined or ambiguous precedence rules) is ugly. D has done it before and it seems reasonable to do so, but still ugly.I don't see better solutions given the D Zen contraints. And in this case addinghttp://d.puremagic.com/issues/show_bug.cgi?id=5409compatibility with C.It also implies that the precedence rules would be a design mistake if not forI don't fully understand.The fact that that kind of manual disambiguation is necessary implies that the precedence rules are counter-intuitive to many programmers. Eg if a&b/a<<b would behave more like a*b than a&&b/wtf, I think there would be less precedence related bugs. But of course, this breaks compatibility with C. If however C never existed, the precedence rules would be suboptimal.have to be a bit careful with operator precedence too). Probably there is some optimum forWhat property makes them so bug-prone?I am not sure, but I think it encourages a too much compact code style (and youcompactness of C-like code. The sources of the first J interprer is surely tooof not allowingan easy read.Afaik the average programmer introduces a new bug every 20 lines of code or so. Therefore, you better get your work done in less than that. ;) line of horizontal space does not make this any better. Unfortunately, balancing the compactness in an optimal way is not possible, because the optimum varies between programmers. I appreciate the fact, that in D most likely "very compact" is the optimal choice most of the time.The best feature to crush bugs would still be automated model checking, let's hope good program provers will be available soon =). TimonI think they are quite easy to use.Right, the ?: operator is easy to use. Thank you for your comments, bye, bearophile
May 06 2011
Timon Gehr:I think it might be a good idea though. However, having this feature means requiring one AST compare for every boolean operator.You are worried about compilation time. I think the feature we're talking about just tests the equivalence of the then/else clauses. Clang has a --analyze switch that runs the (potentially slow) static analyser, otherwise it performs a normal amount of static test on the code. This gives you choice between a faster compilation and a slower analysis able to find some other bugs. Bye, bearophile
May 06 2011
Timon Gehr:just tests the equivalence of the then/else clauses.I think it might be a good idea though. However, having this feature means requiring one AST compare for every boolean operator.You are worried about compilation time. I think the feature we're talking aboutClang has a --analyze switch that runs the (potentially slow) static analyser,otherwise it performs a normal amount of static test on the code. This gives you choice between a faster compilation and a slower analysis able to find some other bugs.Bye, bearophileI think Walter does not like introducing new switches, because it doubles the number of compilers to test and maintain. Timon
May 07 2011