digitalmars.D - What does Coverity/clang static analysis actually do?
- Walter Bright (52/52) Oct 01 2009 I've been interested in having the D compiler take advantage of the flow...
- BCS (6/23) Oct 01 2009 vote++ on silent
- Leandro Lucarella (26/46) Oct 01 2009 Clang is still in development. It will be released with LLVM in the
- Leandro Lucarella (10/27) Oct 09 2009 I just found this:
- Walter Bright (18/36) Oct 01 2009 The problem crops up when there are two connected variables:
- Lutger (7/48) Oct 01 2009 How hard is this to implement? I ask this because I would suggest to try...
- Walter Bright (3/9) Oct 01 2009 I did implement it long ago, then disabled it because of too many false
- =?ISO-8859-1?Q?=22J=E9r=F4me_M=2E_Berger=22?= (14/55) Oct 02 2009 I don't know about Coverity/clang, but I have used Klocwork a=20
- asd (9/32) Oct 17 2009 I've tested this exact code in clang analyzer and it's actually smart en...
- Bill Baxter (27/51) Oct 01 2009 Probably more like
- Christopher Wright (11/39) Oct 03 2009 The second one is an error; createFoo might use its argument before
- Andrei Alexandrescu (5/5) Oct 01 2009 Walter Bright wrote:
- David Gileadi (51/55) Oct 01 2009 I'm not familiar with C/C++ static checkers.
- Walter Bright (3/6) Oct 01 2009 This is the kind of thing I was asking for. Thanks! There are some good
- Lutger (4/5) Oct 01 2009 This one has helped me catch some bugs in .NET, really nice especially w...
- Walter Bright (28/62) Oct 01 2009 The difference is the maintenance programmer won't be left puzzling why
- grauzone (7/9) Oct 01 2009 I don't want to make this "discussion" even more complicated, but some
- Walter Bright (15/22) Oct 01 2009 This is a more general comment on your post (and similar ones by others,...
- BCS (8/23) Oct 02 2009 This is true of any long document if you are willing to string together ...
- Christopher Wright (3/13) Oct 02 2009 What is this Declaration of Intendance you speak of?
- Jarrett Billingsley (2/13) Oct 03 2009 Nicely played.
- BCS (8/23) Oct 03 2009 OK so I use spell check as a crutch (and I was trying to pay attention t...
- Brad Roberts (40/50) Oct 01 2009 Snipped a lot of the detail, because that's not really what makes the
- Walter Bright (19/56) Oct 01 2009 Modern compiler optimizers (including dmc and dmd) DO do
- Brad Roberts (35/103) Oct 01 2009 Exactly my point.. compilers tend to make the opposite trade off that to...
- Walter Bright (25/46) Oct 01 2009 A long time ago, the company I worked for decided to put up a huge chart
- Brad Roberts (6/56) Oct 01 2009 Existence of a bad case doesn't disprove the usefulness in general. Yes...
- BCS (7/19) Oct 01 2009 For open source and libs, yes. For proprietary code bases, I'd say it's ...
- Walter Bright (6/12) Oct 02 2009 Of course you can make third party tools work, but I think there is a
- Rainer Schuetze (33/38) Oct 02 2009 The development edition of VS2008 also has a static code analyzer. There
- Walter Bright (3/8) Oct 02 2009 Could you turn it all on, run it through the dmd source code, and send
- Rainer Deyke (46/46) Oct 03 2009 There is a simple and elegant solution to the problem of false positives
- bearophile (32/37) Oct 05 2009 In particular whole unused functions, methods, classes, modules, package...
I've been interested in having the D compiler take advantage of the flow analysis in the optimizer to do some more checking. Coverity and clang get a lot of positive press about doing this, but any details of exactly *what* they do have been either carefully hidden (in Coverity's case) or undocumented (clang's page on this is blank). All I can find is marketing hype and a lot of vague handwaving. Here is what I've been able to glean from much time spent with google on what they detect and my knowledge of how data flow analysis works: 1. dereference of NULL pointers (all reaching definitions of a pointer are NULL) 2. possible dereference of NULL pointers (some reaching definitions of a pointer are NULL) 3. use of uninitialized variables (no reaching definition) 4. dead assignments (assignment of a value to a variable that is never subsequently used) 5. dead code (code that can never be executed) 6. array overflows 7. proper pairing of allocate/deallocate function calls 8. improper use of signed integers (who knows what this actually is) Frankly, this is not an impressive list. These issues are discoverable using standard data flow analysis, and in fact are part of Digital Mars' optimizer. Here is the current state of it for dmd: 1. Optimizer discovers it, but ignores the information. Due to the recent thread on it, I added a report for it for D (still ignored for C). The downside is I can no longer use *cast(char*)0=0 to drop me into the debugger, but I can live with that as assert(0) will do the same thing. 2. Optimizer collects the info, but ignores this, because people are annoyed by false positives. 3. Optimizer detects and reports it. Irrelevant for D, though, because variables are always initialized. The =void case is rare enough to be irrelevant. 4. Dead assignments are automatically detected and removed. I'm not convinced this should be reported, as it can legitimately happen when generating source code. Generating false positives annoy the heck out of users. 5. Dead code is detected and silently removed by optimizer. dmd front end will complain about dead code. 6. Arrays are solidly covered by a runtime check. There is code in the optimizer to detect many cases of overflows at compile time, but the code is currently disabled because the runtime check covers 100% of the cases. 7. Not done because it requires the user to specify what the paired functions are. Given this info, it is rather simple to graft onto existing data flow analysis. 8. D2 has acquired some decent checking for this. There's a lot of hoopla about these static checkers, but I'm not impressed by them based on what I can find out about them. What do you know about what these checkers do that is not on this list? Any other kinds of checking that would be great to implement? D's dead code checking has been an encouraging success, and I think people will like the null dereference checks. More along these lines will be interesting.
Oct 01 2009
Hello Walter,Frankly, this is not an impressive list. These issues are discoverable using standard data flow analysis, and in fact are part of Digital Mars' optimizer. Here is the current state of it for dmd: 1. Optimizer discovers it, but ignores the information. Due to the recent thread on it, I added a report for it for D (still ignored for C). The downside is I can no longer use *cast(char*)0=0 to drop me into the debugger, but I can live with that as assert(0) will do the same thing.nice4. Dead assignments are automatically detected and removed. I'm not convinced this should be reported, as it can legitimately happen when generating source code. Generating false positives annoy the heck out of users.vote++ on silent6. Arrays are solidly covered by a runtime check. There is code in the optimizer to detect many cases of overflows at compile time, but the code is currently disabled because the runtime check covers 100% of the cases.I'd advocate for any compile time checks that never generate false positives running even if the runtime checks would get it also. I'd rather known sooner than later.
Oct 01 2009
Walter Bright, el 1 de octubre a las 11:21 me escribiste:I've been interested in having the D compiler take advantage of the flow analysis in the optimizer to do some more checking. Coverity and clang get a lot of positive press about doing this, but any details of exactly *what* they do have been either carefully hidden (in Coverity's case) or undocumented (clang's page on this is blank). All I can find is marketing hype and a lot of vague handwaving.Clang is still in development. It will be released with LLVM in the upcoming 2.6 version for the first time. The C and objective C support is supposed to be fairly mature though, but I guess documenting the static analyzer is not very high in their priority list (maybe this will change after the release). You can ask in the Clang ML, Clang developers (and LLVM in general) are very receptive.1. Optimizer discovers it, but ignores the information. Due to the recent thread on it, I added a report for it for D (still ignored for C). The downside is I can no longer use *cast(char*)0=0 to drop me into the debugger, but I can live with that as assert(0) will do the same thing.There are breakpoints too, you know? =PThere's a lot of hoopla about these static checkers, but I'm not impressed by them based on what I can find out about them. What do you know about what these checkers do that is not on this list? Any other kinds of checking that would be great to implement? D's dead code checking has been an encouraging success, and I think people will like the null dereference checks. More along these lines will be interesting.You can take a look at sparse too. AFAIK is used by the Linux kernel: http://www.kernel.org/pub/software/devel/sparse/ And Cppcheck: http://sourceforge.net/apps/mediawiki/cppcheck/index.php?title=Main_Page There is a list of tools at Wikipedia too: http://en.wikipedia.org/wiki/List_of_tools_for_static_code_analysis -- Leandro Lucarella (AKA luca) http://llucax.com.ar/ ---------------------------------------------------------------------- GPG Key: 5F5A8D05 (F8CD F9A7 BF00 5431 4145 104C 949E BFB6 5F5A 8D05) ---------------------------------------------------------------------- They love me like I was a brother They protect me, listen to me They dug me my very own garden Gave me sunshine, made me happy Nice dream, nice dream Nice dream
Oct 01 2009
Leandro Lucarella, el 1 de octubre a las 16:24 me escribiste:Walter Bright, el 1 de octubre a las 11:21 me escribiste:I just found this: http://clang-analyzer.llvm.org/available_checks.html -- Leandro Lucarella (AKA luca) http://llucax.com.ar/ ---------------------------------------------------------------------- GPG Key: 5F5A8D05 (F8CD F9A7 BF00 5431 4145 104C 949E BFB6 5F5A 8D05) ---------------------------------------------------------------------- El otro día tenía un plan Pero después me olvidé y me comí un flanI've been interested in having the D compiler take advantage of the flow analysis in the optimizer to do some more checking. Coverity and clang get a lot of positive press about doing this, but any details of exactly *what* they do have been either carefully hidden (in Coverity's case) or undocumented (clang's page on this is blank). All I can find is marketing hype and a lot of vague handwaving.Clang is still in development. It will be released with LLVM in the upcoming 2.6 version for the first time. The C and objective C support is supposed to be fairly mature though, but I guess documenting the static analyzer is not very high in their priority list (maybe this will change after the release). You can ask in the Clang ML, Clang developers (and LLVM in general) are very receptive.
Oct 09 2009
Nick Sabalausky wrote:"Walter Bright" <newshound1 digitalmars.com> wrote in messageThe problem crops up when there are two connected variables: void foo(bool flag) { char* p = null; if (flag) p = "hello"; ... if (flag) bar(*p); } The code is logically correct, there is no null pointer dereference possible. However, the data flow analysis will see the *p and see two reaching definitions for p: null and "hello", even though only one actually reaches. Hence the false positive. To eliminate the false error report, the user would have to insert a redundant null check. Does this happen in practice? Yes.2. possible dereference of NULL pointers (some reaching definitions of a pointer are NULL) 2. Optimizer collects the info, but ignores this, because people are annoyed by false positives.If you mean something like this: Foo f; if(cond) f = new Foo(); f.bar(); annoyed by it, in fact I've always appreciated it. I'm not aware of any mean though, then could you elaborate?
Oct 01 2009
Walter Bright wrote:Nick Sabalausky wrote:How hard is this to implement? I ask this because I would suggest to try it out and see how much it catches vs. how annoying it is. In VB.NET I have the rest of the language. VB.NET doesn't have a ternary operator for example. In D you have less need for pointers and generally a much more expressive vocabulary at your disposal than other C family languages."Walter Bright" <newshound1 digitalmars.com> wrote in messageThe problem crops up when there are two connected variables: void foo(bool flag) { char* p = null; if (flag) p = "hello"; ... if (flag) bar(*p); } The code is logically correct, there is no null pointer dereference possible. However, the data flow analysis will see the *p and see two reaching definitions for p: null and "hello", even though only one actually reaches. Hence the false positive. To eliminate the false error report, the user would have to insert a redundant null check. Does this happen in practice? Yes.2. possible dereference of NULL pointers (some reaching definitions of a pointer are NULL) 2. Optimizer collects the info, but ignores this, because people are annoyed by false positives.If you mean something like this: Foo f; if(cond) f = new Foo(); f.bar(); annoyed by it, in fact I've always appreciated it. I'm not aware of any mean though, then could you elaborate?
Oct 01 2009
Lutger wrote:How hard is this to implement? I ask this because I would suggest to try it out and see how much it catches vs. how annoying it is. In VB.NET I have the rest of the language. VB.NET doesn't have a ternary operator for example. In D you have less need for pointers and generally a much more expressive vocabulary at your disposal than other C family languages.I did implement it long ago, then disabled it because of too many false positives. It was more annoying than useful.
Oct 01 2009
Walter Bright wrote:Nick Sabalausky wrote:"Walter Bright" <newshound1 digitalmars.com> wrote in message2. possible dereference of NULL pointers (some reaching definitions=20 of a pointer are NULL) 2. Optimizer collects the info, but ignores this, because people are =annoyed by false positives.If you mean something like this: Foo f; if(cond) f =3D new Foo(); f.bar(); been annoyed by it, in fact I've always appreciated it. I'm not aware ==20not what you mean though, then could you elaborate?=20 The problem crops up when there are two connected variables: =20 void foo(bool flag) { char* p =3D null; if (flag) p =3D "hello"; ... if (flag) bar(*p); } =20 The code is logically correct, there is no null pointer dereference=20 possible. However, the data flow analysis will see the *p and see two=20 reaching definitions for p: null and "hello", even though only one=20 actually reaches. =20 Hence the false positive. To eliminate the false error report, the user=would have to insert a redundant null check. =20 Does this happen in practice? Yes.I don't know about Coverity/clang, but I have used Klocwork a=20 couple of times and it will work properly in the example you have=20 given. I.e it sees that both conditions are linked and that you=20 don't use p if you haven't initialized it. Of course, if the condition is more complex or if the condition=20 might be modified by another thread (even if you know it isn't),=20 there comes a point at which it will give a warning. Jerome --=20 mailto:jeberger free.fr http://jeberger.free.fr Jabber: jeberger jabber.fr
Oct 02 2009
Walter Bright Wrote:clang analyzer tries to avoid false positives very hard. To the point that every error message has link for sending a bug report.2. Optimizer collects the info, but ignores this, because people are annoyed by false positives.The problem crops up when there are two connected variables: void foo(bool flag) { char* p = null; if (flag) p = "hello"; ... if (flag) bar(*p); } The code is logically correct, there is no null pointer dereference possible. However, the data flow analysis will see the *p and see two reaching definitions for p: null and "hello", even though only one actually reaches. Hence the false positive. To eliminate the false error report, the user would have to insert a redundant null check. Does this happen in practice? Yes.I've tested this exact code in clang analyzer and it's actually smart enough no to report that as error! if (flag) bar(*p) is not reported, but: if (!flag) bar(*p) is reported, so the analyzer can follow connected variables properly.
Oct 17 2009
On Thu, Oct 1, 2009 at 1:38 PM, Nick Sabalausky <a a.a> wrote:Probably more like Foo f; createAndSetOutParam(f); f.bar(); or Foo f; if (a > 10) { f =3D new Foo(10); } ... if (a > 10) { f.bar(); } Or some complex intertwined gotos or any number of other things that would be hard for the compiler to verify without a lot of effort. I think I'd want to try this out as an optional warning for a while to see how annoying the false positives really are in practice. Still, it seems like there's a subset of cases where it can be proved with 100% certainty that the code is wrong. Just reporting those cases would be a big win, I think.2. possible dereference of NULL pointers (some reaching definitions of a pointer are NULL) 2. Optimizer collects the info, but ignores this, because people are annoyed by false positives.If you mean something like this: Foo f; if(cond) =A0 =A0f =3D new Foo(); f.bar();annoyed by it, in fact I've always appreciated it. I'm not aware of any mean though, then could you elaborate?e6. array overflows 6. Arrays are solidly covered by a runtime check. There is code in the optimizer to detect many cases of overflows at compile time, but the cod=.is currently disabled because the runtime check covers 100% of the cases=omeI'm puzzled by why you would prefer to leave this entirely runtime when s=of it can be detected at compile-time. Normally you agree that catching something at compile-time is better whenever possible. So shouldn't the array overflows that can be detected at compile-time be detected at compile-time? I would certainly prefer that.I agree. Run-time checks are only useful if the code is on a path that actually runs. Compile-time checks are useful even if the code is on the branch not taken. --bb
Oct 01 2009
Nick Sabalausky wrote:static void Main(string[] args) { Foo f; if(args.Count() > 2) { f = new Foo(); } if(args.Count() > 2) { f.bar(); // ERROR: Use of unassgned local variable 'f' } Foo f2; createFoo(ref f2); // ERROR: Use of unassgned local variable 'f2' f2.bar(); } static void createFoo(ref Foo f) { f = new Foo(); } --------------------------------------------- The first one is rather strange coding though and makes it easy to hide errors anyway. And the second one's a tad odd too, plus I don't see any harm in solving that with "Foo f2=null": it would at least be a hell of a lot better than the compiler doing that very same "=null" automatically. I know Walter doesn't agree, but I'd much rather have a few slightly inconvinient false positives (or would it really be a false negative?) than even a mere possibility for a hidden error.The second one is an error; createFoo might use its argument before assigning. You should have marked its argument as out instead, which would not yield an error. method bool TryGetValue(key, out value): DateTime date; if (dict.TryGetValue("key", out date)) { // use date }
Oct 03 2009
Walter Bright wrote: [snip] We need to have passable flow control to typecheck creation of immutable objects. Andrei
Oct 01 2009
Walter Bright wrote:There's a lot of hoopla about these static checkers, but I'm not impressed by them based on what I can find out about them. What do you know about what these checkers do that is not on this list? Any other kinds of checking that would be great to implement?I'm not familiar with C/C++ static checkers. Eclipse's Java compiler has decent support for code checks. I'm copying the list of items it can check here (except for those that seem Java-specific), in case it's of interest. For each of the below, you can choose whether it's an error, a warning, or ignored. I've included their defaults. Code Style: Non-static access to static member: Warning Indirect access to static member: Ignored Unqualified access to instance field: Ignored Undocumented empty block: Ignored Method with a constructor name: Warning Parameter assignment: Ignored Potential programming problems: Assignment has no effect (e.g. 'x = x'): Warning Possible accidental boolean assignment (e.g. 'if (a = b)'): Ignored 'finally' does not complete normally: Warning Empty statement: Ignored Hidden catch block: Warning Inexact type match for vararg argments: Warning Enum type constant not covered on 'switch': Ignored 'switch' case fall-through: Ignored Null pointer access: Warning Potential null pointer access: Ignored Comparing identical values ('x == x'): Warning Missing synchronized modifier on inherited method: Ignored Class overrides 'equals()' but not 'hashCode()': Ignored Dead code (e.g. 'if (false)'): Warning Name shadowing and conflicts: Field declaration hides another field or variable: Ignored Local variable declaration hides another field or variable: Ignored (If not ignored) Include constructor or setter method parameters Type parameter hides another type: Warning Method does not override package visible method: Warning Interface method conflicts with protected 'Object' method: Warning Unnecessary code: Local variable is never read: Warning Parameter is never read: Ignored (If not ignored) Ignore in overriding and implementing methods Unused import: Warning Unused local or private member: Warning Redundant null check: Ignored Unnecessary 'else' statement: Ignored Unnecessary cast or 'instanceof' operation: Ignored Unused 'break' or 'continue' label: Warning Redundant super interface: Ignored I understand DMD's policy on warnings, but some of the above checks are reasons why I've grown to like some warnings. Of course it helps that Eclipse's compiler is most often used with its IDE. -Dave
Oct 01 2009
David Gileadi wrote:Eclipse's Java compiler has decent support for code checks. I'm copying the list of items it can check here (except for those that seem Java-specific), in case it's of interest.This is the kind of thing I was asking for. Thanks! There are some good ideas in there.
Oct 01 2009
David Gileadi wrote: ....Non-static access to static member: WarningThis one has helped me catch some bugs in .NET, really nice especially with overloaded functions.
Oct 01 2009
Nick Sabalausky wrote:The difference is the maintenance programmer won't be left puzzling why there is an explicit assignment to the variable that is never used. The point to default initialization is consistency in the resulting behavior. Also, the optimizer will remove nearly all of the default initializers if they are dead assignments. Anyhow, I think this issue was beaten to death in the previous thread on null dereference. I don't wish to divert this thread into rediscussing it, but rather stick with what other kinds of bug-detecting data flow analyses there are?3. use of uninitialized variables (no reaching definition) 3. Optimizer detects and reports it. Irrelevant for D, though, because variables are always initialized. The =void case is rare enough to be irrelevant.D variable default-initialization is absolutely no different from your scenario of a programmer blindly tossing in =0 to shut up a compiler, *except* that the programmer is never even given the opportunity to do the right thing. This is *bad*. I *want* variables that haven't meen manually great.I congenitally dislike optional warnings, as I've pontificated at length about here before <g>. The problem is it makes for a wishy-washy definition of the language, and muddies what is legal versus illegal. D needs to advance the state of the art with clear thinking about what is legal and what isn't. Warnings and false positives are failures of language design.4. dead assignments (assignment of a value to a variable that is never subsequently used) 4. Dead assignments are automatically detected and removed. I'm not convinced this should be reported, as it can legitimately happen when generating source code. Generating false positives annoy the heck out of users.I'll agree with you here. But it might be nice to have an option to just simply report them anyway for when the programmer wants to see if there's any of these around that he can clean up.Because when I implemented it I discovered that a compile time check rarely (if ever) caught actual bugs, the real problems could only be caught at runtime. Normally one uses things like foreach which automatically generate code that won't array overflow. I generally regard as evil: 1. bugs that have erratic, random, irreproducible symptoms 2. source code that doesn't have an obvious purpose (this includes code inserted solely to suppress a false positive or warning) I regard as undesirable: 3. wishy-washy warnings 4. overzealous compiler messages that are more akin to nagging than finding actual bugs6. array overflows 6. Arrays are solidly covered by a runtime check. There is code in the optimizer to detect many cases of overflows at compile time, but the code is currently disabled because the runtime check covers 100% of the cases.I'm puzzled by why you would prefer to leave this entirely runtime when some of it can be detected at compile-time. Normally you agree that catching something at compile-time is better whenever possible. So shouldn't the array overflows that can be detected at compile-time be detected at compile-time? I would certainly prefer that.
Oct 01 2009
Walter Bright wrote:Anyhow, I think this issue was beaten to death in the previous thread on null dereference. I don't wish to divert this thread into rediscussingI don't want to make this "discussion" even more complicated, but some people (such as me) like the existing default initialization behavior, the null pointer problem either. I especially like that it's consistent with the initialization of member variables in structs/classes.
Oct 01 2009
Nick Sabalausky wrote:If you accept the idea of a compiler (like DMD) having rudimentary built-in optional versions of normally separate tools like profiling, unittesting, doc generation, etc., and you accept that lint tools are perfectly fine tools to use (as I think you do, or am I mistaken?), then I don't see what would make lint tools an exception to the "built-ins are ok" attitude (especially since a separate one would require a lot of redundant parsing/analysis.)This is a more general comment on your post (and similar ones by others, it's a recurring theme): Consider the Bible. It's long and complicated, and by careful examination of it you can find a verse here and there to justify *any* behavior. D is complicated, and is founded on principles that are not orthogonal - they are often at odds with each other. Any attempt to take one particular aspect of D's behavior and use it as a rule to impose elsewhere is surely doomed to conflict with some other rule. The only reasonable way forward is to evaluate each idea not only in terms of all of D's principles, but also on its own merits, and throw in one's best judgment. Nearly a decade with D has now shown that some ideas and choices were dead wrong, but others were more right than I even dreamed <g>.
Oct 01 2009
Hello Walter,Consider the Bible. It's long and complicated, and by careful examination of it you can find a verse here and there to justify *any* behavior.This is true of any long document if you are willing to string together enough quotes and ignore enough of it. I'd bet you could get the Declaration of Intendance out of Playboy if you wanted to.D is complicated, and is founded on principles that are not orthogonal - they are often at odds with each other. Any attempt to take one particular aspect of D's behavior and use it as a rule to impose elsewhere is surely doomed to conflict with some other rule. The only reasonable way forward is to evaluate each idea not only in terms of all of D's principles, but also on its own merits, and throw in one's best judgment. Nearly a decade with D has now shown that some ideas and choices were dead wrong, but others were more right than I even dreamed <g>.You sort of tangentially hit what I think is the key to avoiding creating meaning out of whole cloth; take the document as a whole, look for themes and trends (any document worth reading will have them), use them to answer your questions.
Oct 02 2009
BCS wrote:Hello Walter,What is this Declaration of Intendance you speak of? As for ignoring context...well, I'm willing to discuss this, but not here.Consider the Bible. It's long and complicated, and by careful examination of it you can find a verse here and there to justify *any* behavior.This is true of any long document if you are willing to string together enough quotes and ignore enough of it. I'd bet you could get the Declaration of Intendance out of Playboy if you wanted to.
Oct 02 2009
On Sat, Oct 3, 2009 at 4:00 AM, Nick Sabalausky <a a.a> wrote:"Christopher Wright" <dhasenan gmail.com> wrote in message news:ha5mtp$f3c$2 digitalmars.com...Nicely played.BCS wrote:That's the one they meant to write.This is true of any long document if you are willing to string together enough quotes and ignore enough of it. I'd bet you could get the Declaration of Intendance out of Playboy if you wanted to.What is this Declaration of Intendance you speak of?
Oct 03 2009
Hello Christopher,BCS wrote:OK so I use spell check as a crutch (and I was trying to pay attention to a lecture when I typed that).Hello Walter,What is this Declaration of Intendance you speak of?Consider the Bible. It's long and complicated, and by careful examination of it you can find a verse here and there to justify *any* behavior.This is true of any long document if you are willing to string together enough quotes and ignore enough of it. I'd bet you could get the Declaration of Intendance out of Playboy if you wanted to.As for ignoring context...well, I'm willing to discuss this, but not here.Note: I'm not claiming that some document is saying anything, but, that with enough cut and paste, any long enough document can be made to say anything you want and that sometimes this requiters that you ignore context. The implication is that a selected set of excerpts from some document can't be taken as proof that the document says or doesn't say anything.
Oct 03 2009
On Thu, 1 Oct 2009, Walter Bright wrote:I've been interested in having the D compiler take advantage of the flow analysis in the optimizer to do some more checking. Coverity and clang get a lot of positive press about doing this, but any details of exactly *what* they do have been either carefully hidden (in Coverity's case) or undocumented (clang's page on this is blank). All I can find is marketing hype and a lot of vague handwaving. Here is what I've been able to glean from much time spent with google on what they detect and my knowledge of how data flow analysis works:Snipped a lot of the detail, because that's not really what makes the tools interesting. There's a couple things that do, im my opinion -- with a little experience having used Fortify and looked at Coverity a couple times over the years (and would be using if it wasn't so much more expensive than Fortify). 1) Rich flow control. They go well beyond what's typically done by compiliers during their optimization passes. They tend to be whole-code in scope and actually DO the parts that are hard, like cross expression variable value tracking similar to a couple examples in this thread. Function boundaries are no obstacle to them. The only obstacle is where source isn't provided. 2) Due to working with whole source bases, the UI for managing the data produced is critical to overall usability. A lot of time goes into making it easy to manage the output.. both for single runs and for cross-run flow of data. Some examples: * suppression of false positives, * graphing of issue trends * categorization of issue types 3) Rule creation. The core engine usually generates some digested dataset upon rules are evaluated. The systems come with a builtin set that do the sorts of things already talked about. In addition they come with the ability to develop new rules specific to your application and business needs. For example: * tracking of taint from user data * what data is acceptable to log to files (for example NOT credit-cards) 4) They're expected to be slower than compilation, so it's ok to do things that are computationally prohibitive to do during compilation cycles. ---- I've seen these tools detect some amazing subtle bugs in c and c++ code. They're particularly handy in messy code. They can help find memory leaks where the call graphs are arbitrarily obscure. Sites where NULL pointers are passed into a function that dereferences without a null check even when the call graph has many layers. Yes, rigid contract systems and richer type systems can help reduce the need for some of these sorts of checks, but as we all know, there's tradeoffs. That help? Later, Brad
Oct 01 2009
Brad Roberts wrote:1) Rich flow control. They go well beyond what's typically done by compiliers during their optimization passes. They tend to be whole-code in scope and actually DO the parts that are hard, like cross expression variable value tracking similar to a couple examples in this thread. Function boundaries are no obstacle to them. The only obstacle is where source isn't provided.Modern compiler optimizers (including dmc and dmd) DO do cross-expression variable tracking. They just don't often do it inter-function (called inter-procedural analysis) because of time and memory constraints, not that it is technically more difficult. C and C++ do have some issues with inter-procedural analysis because the compiler view of them, at heart, is single file. D is much more conducive to that because the compiler sees an arbitrarily large part of the source code, and can obviously see all of it if the user so desires.2) Due to working with whole source bases, the UI for managing the data produced is critical to overall usability. A lot of time goes into making it easy to manage the output.. both for single runs and for cross-run flow of data. Some examples: * suppression of false positives,I'd rather do a better job and not have false positives.* graphing of issue trendsThat's a crock <g>.* categorization of issue typesI'm not convinced that is of critical value.3) Rule creation. The core engine usually generates some digested dataset upon rules are evaluated. The systems come with a builtin set that do the sorts of things already talked about. In addition they come with the ability to develop new rules specific to your application and business needs. For example: * tracking of taint from user data * what data is acceptable to log to files (for example NOT credit-cards)There have been several proposals for user-defined attributes for types, I think that is better than having some external rule file.4) They're expected to be slower than compilation, so it's ok to do things that are computationally prohibitive to do during compilation cycles.I agree.---- I've seen these tools detect some amazing subtle bugs in c and c++ code. They're particularly handy in messy code. They can help find memory leaks where the call graphs are arbitrarily obscure. Sites where NULL pointers are passed into a function that dereferences without a null check even when the call graph has many layers.Once you get the data flow analysis equations right, they'll detect it every time regardless of how subtle or layered the call graph is.Yes, rigid contract systems and richer type systems can help reduce the need for some of these sorts of checks, but as we all know, there's tradeoffs. That help?Yes, very much. In particular, I wasn't sure coverity did inter-procedural analysis.
Oct 01 2009
Walter Bright wrote:Brad Roberts wrote:Exactly my point.. compilers tend to make the opposite trade off that tools like Coverity do. Not that compilers can't or don't. They just usually do a small subset of what is possible in the grander sense of 'possible'.1) Rich flow control. They go well beyond what's typically done by compiliers during their optimization passes. They tend to be whole-code in scope and actually DO the parts that are hard, like cross expression variable value tracking similar to a couple examples in this thread. Function boundaries are no obstacle to them. The only obstacle is where source isn't provided.Modern compiler optimizers (including dmc and dmd) DO do cross-expression variable tracking. They just don't often do it inter-function (called inter-procedural analysis) because of time and memory constraints, not that it is technically more difficult.C and C++ do have some issues with inter-procedural analysis because the compiler view of them, at heart, is single file. D is much more conducive to that because the compiler sees an arbitrarily large part of the source code, and can obviously see all of it if the user so desires.Neither c nor c++ as languages are _required_ to do single file compilation. That's just what most compilers do. In fact, gcc/g++ have been capable of doing whole-app compilation for a couple years now -- though not many people use it that way as far as I can tell. See also, exact same issue as above.. it's a trade off.Of course you would.. everyone would. It's a meaningless statement since no one would ever contradict it with any seriousness. At the risk of repeating myself.. same tradeoffs again.2) Due to working with whole source bases, the UI for managing the data produced is critical to overall usability. A lot of time goes into making it easy to manage the output.. both for single runs and for cross-run flow of data. Some examples: * suppression of false positives,I'd rather do a better job and not have false positives.Uh, whatever. Most of the rest of us humans respond much better to pictures and trends than to raw numbers. Show me some visual indication of the quality of my code (ignoring the arguments about the validity of such graphs) and I can pretty much guarantee that I'll work to improve that measure. Nearly everyone I've ever worked with behaves similarly.. once they agree that the statistic being measured is useful. One of the best examples is percent of code covered by unit tests. The same applies to number of non-false positive issues discovered through static analysis. A specific case of this: At informix we had a step in our build process that ran lint (yes, it's ancient, but this was a decade ago and the practice was at least a decade old before I got there). Any new warnings weren't tolerated. The build automatically reported any delta over the previous build. It was standard practice and kept the code pretty darned clean.* graphing of issue trendsThat's a crock <g>.You don't need to be. You view too many things in black/white terms.* categorization of issue typesI'm not convinced that is of critical value.Again, this is where trade offs come in. If it can be done cheaply enough to warrant being done during compilation and is accurate enough in small scoped analysis.. yay. But sometimes you still want to do things that take more time and more completely.3) Rule creation. The core engine usually generates some digested dataset upon rules are evaluated. The systems come with a builtin set that do the sorts of things already talked about. In addition they come with the ability to develop new rules specific to your application and business needs. For example: * tracking of taint from user data * what data is acceptable to log to files (for example NOT credit-cards)There have been several proposals for user-defined attributes for types, I think that is better than having some external rule file.That's essentially all it does, if you boil away all the other interesting stuff layered on top of that core, but it does it very well and with a lot of tooling around it. Later, Brad4) They're expected to be slower than compilation, so it's ok to do things that are computationally prohibitive to do during compilation cycles.I agree.---- I've seen these tools detect some amazing subtle bugs in c and c++ code. They're particularly handy in messy code. They can help find memory leaks where the call graphs are arbitrarily obscure. Sites where NULL pointers are passed into a function that dereferences without a null check even when the call graph has many layers.Once you get the data flow analysis equations right, they'll detect it every time regardless of how subtle or layered the call graph is.Yes, rigid contract systems and richer type systems can help reduce the need for some of these sorts of checks, but as we all know, there's tradeoffs. That help?Yes, very much. In particular, I wasn't sure coverity did inter-procedural analysis.
Oct 01 2009
Brad Roberts wrote:A long time ago, the company I worked for decided to put up a huge chart on the wall that everyone could see, and every day the current bug count was plotted on it. The idea was to show a downward trend. It wasn't very long (a few days) before this scheme completely backfired: 1. engineers stopped submitting new bug reports 2. the engineers and QA would argue about what was a bug and what wasn't 3. multiple bugs would get combined into one bug report so it only counted once 4. if a bug was "X is not implemented", then when X was implemented, there might be 3 or 4 bugs against X. Therefore, X did not get implemented. 5. there was a great rush to submit half-assed fixes before the daily count was made 6. people would invent bugs for which they would simultaneously submit fixes (look ma, I fixed all these bugs!) 7. arguing about it started to consume a large fraction of the engineering day, including the managers who were always called in to resolve the disputes In other words, everyone figured out they were being judged on the graph, not the quality of the product, and quickly changed their behavior to "work the graph" rather than the quality. To the chagrin of the QA staff, management finally tore down the chart. Note that nobody involved in this was a moron. They all knew exactly what was happening, it was simply irresistible.Uh, whatever. Most of the rest of us humans respond much better to pictures and trends than to raw numbers. Show me some visual indication of the quality of my code (ignoring the arguments about the validity of such graphs) and I can pretty much guarantee that I'll work to improve that measure. Nearly everyone I've ever worked with behaves similarly.. once they agree that the statistic being measured is useful. One of the best examples is percent of code covered by unit tests. The same applies to number of non-false positive issues discovered through static analysis.* graphing of issue trendsThat's a crock <g>.A specific case of this: At informix we had a step in our build process that ran lint (yes, it's ancient, but this was a decade ago and the practice was at least a decade old before I got there). Any new warnings weren't tolerated. The build automatically reported any delta over the previous build. It was standard practice and kept the code pretty darned clean.I think that's something different - it's not graphing or trending the data.
Oct 01 2009
Walter Bright wrote:Brad Roberts wrote:Existence of a bad case doesn't disprove the usefulness in general. Yes, I agree that number of bugs is a bad metric to measure all by itself. Water can drown a person, but that doesn't make it something to avoid. Sigh, BradA long time ago, the company I worked for decided to put up a huge chart on the wall that everyone could see, and every day the current bug count was plotted on it. The idea was to show a downward trend. It wasn't very long (a few days) before this scheme completely backfired: 1. engineers stopped submitting new bug reports 2. the engineers and QA would argue about what was a bug and what wasn't 3. multiple bugs would get combined into one bug report so it only counted once 4. if a bug was "X is not implemented", then when X was implemented, there might be 3 or 4 bugs against X. Therefore, X did not get implemented. 5. there was a great rush to submit half-assed fixes before the daily count was made 6. people would invent bugs for which they would simultaneously submit fixes (look ma, I fixed all these bugs!) 7. arguing about it started to consume a large fraction of the engineering day, including the managers who were always called in to resolve the disputes In other words, everyone figured out they were being judged on the graph, not the quality of the product, and quickly changed their behavior to "work the graph" rather than the quality. To the chagrin of the QA staff, management finally tore down the chart. Note that nobody involved in this was a moron. They all knew exactly what was happening, it was simply irresistible.Uh, whatever. Most of the rest of us humans respond much better to pictures and trends than to raw numbers. Show me some visual indication of the quality of my code (ignoring the arguments about the validity of such graphs) and I can pretty much guarantee that I'll work to improve that measure. Nearly everyone I've ever worked with behaves similarly.. once they agree that the statistic being measured is useful. One of the best examples is percent of code covered by unit tests. The same applies to number of non-false positive issues discovered through static analysis.* graphing of issue trendsThat's a crock <g>.
Oct 01 2009
Hello Walter,For open source and libs, yes. For proprietary code bases, I'd say it's about a wash. Having it in another file could make the language/code base easier to read and also allow a much more powerful rules language (because it doesn't have to fit in the host language). And because only you will be maintaining the code, needing another tool (that you already have) and another build step isn't much of an issue.3) Rule creation. The core engine usually generates some digested dataset upon rules are evaluated. The systems come with a builtin set that do the sorts of things already talked about. In addition they come with the ability to develop new rules specific to your application and business needs. For example: * tracking of taint from user data * what data is acceptable to log to files (for example NOT credit-cards)There have been several proposals for user-defined attributes for types, I think that is better than having some external rule file.
Oct 01 2009
BCS wrote:For open source and libs, yes. For proprietary code bases, I'd say it's about a wash. Having it in another file could make the language/code base easier to read and also allow a much more powerful rules language (because it doesn't have to fit in the host language). And because only you will be maintaining the code, needing another tool (that you already have) and another build step isn't much of an issue.Of course you can make third party tools work, but I think there is a lot of merit and psychological advantage to building it into the base language. For example, unit testing and document creation are usually 3rd party tools, but with D they are built in, and I think them being built in has been a huge success.
Oct 02 2009
Hi, Walter Bright wrote:There's a lot of hoopla about these static checkers, but I'm not impressed by them based on what I can find out about them. What do you know about what these checkers do that is not on this list? Any other kinds of checking that would be great to implement?The development edition of VS2008 also has a static code analyzer. There is a list of issued warnings for C/C++ with examples at: http://msdn.microsoft.com/en-us/library/a5b9aa09.aspx There are a lot more messages generated for managed code: http://msdn.microsoft.com/en-us/library/ee1hzekz.aspx The analyzer does inter-procedural-analysis, so it (sometimes) knows, what range of values function arguments can have. I've run the analysis on the dmd source code, but deactivated some warnings regarding memory leaks and variable hiding to not get flooded with messages. After that, code analysis added about 150 warnings to the 1000 already issued by the compiler (mostly "signed/unsigned mismatch" and "unreferenced local variable"). The code analyzer warnings often deal with - "Dereferencing NULL pointer" (I've added __assume statements to assert, so the compiler does not warn null pointers, if there is an "assert(pointer)" somewhere before the indirection). Most of these warnings are not so easy to verify, but some are probably correct. - "Using uninitialized memory": mostly after switch-statements without variable initialization in default. - "Invalid data: accessing 'variable', the readable size is 'X' bytes, but 'Y' bytes might be read": as far as I could see, these are often false positives, because the analyzer could not deal with variable sized arrays at the end of structs (you can add annotation to help the analyzer, though, but I have not tried). Even with constant size arrays, some of the warnings are plain wrong. - stdlib/windows-api calls (e.g. printf arguments, ignored return codes) I guess, that you'll have to annotate the code a lot if you want to have the compiler always do the analyzing and get reasonable results. I've from the analyzer. Rainer
Oct 02 2009
Rainer Schuetze wrote:I've run the analysis on the dmd source code, but deactivated some warnings regarding memory leaks and variable hiding to not get flooded with messages. After that, code analysis added about 150 warnings to the 1000 already issued by the compiler (mostly "signed/unsigned mismatch" and "unreferenced local variable").Could you turn it all on, run it through the dmd source code, and send me the results? I'd like to see if it detected any real bugs.
Oct 02 2009
There is a simple and elegant solution to the problem of false positives in static analysis. Simple provide a way for the programmer to say, "I know this looks dangerous, but I know what I'm doing, so trust me on this." This could be done by adding some new syntax, either a new keyword or a reused old keyword, or by changing the semantics of void initialization. For example, this would be an error: T t; if (i < 10) { t = new T; } if (i < 10) { t.f(); } This would be legal: T t = unchecked; if (i < 10) { t = new T; } if (i < 10) { t.f(); } But this would be an error: T t = unchecked; t.f(); As would this: T t = unchecked; f(t); 'unchecked' has the following properties: - It is an error to use the value of a variable that is initialized as 'unchecked'. - 'unchecked' tells the static analysis that the programmer knows what he's doing. The static analysis should therefore be liberal rather than conservative. - At runtime, 'unchecked' is equivalent to 'null' for non-nullable reference types. This increases the chance that the error will be detected at runtime. On the other hand, variables without any explicit initializer at all will have the following properties: - It is an error to use the value of the variable. - The static analysis is conservative by default. If it thinks the variable could be used uninitialized, an error is reported. - Since the static analysis guarantees that the variable will not be used uninitialized, there is no need to initialize the variable at runtime. -- Rainer Deyke - rainerd eldwood.com
Oct 03 2009
I am Sorry for my delay for this small answer... Walter Bright:5. dead code (code that can never be executed)<In particular whole unused functions, methods, classes, modules, packages. Also rather important unused constants, variables. GCC has a function attribute named "used" that tells the optimizer to keep a function in the binary even if it appears to be unused (because it may be used in strange ways from asm, etc).The =void case is rare enough to be irrelevant.<In good D code it's very uncommon, but spotting such bugs too is positive. (And eventually D variables may not be initialyzed anymore?)6. Arrays are solidly covered by a runtime check. There is code in the optimizer to detect many cases of overflows at compile time, but the code is currently disabled because the runtime check covers 100% of the cases.<Spotting such bugs at compile time, where possible, is 200% better (if such test is reliable).8. D2 has acquired some decent checking for this.<In my opinion it's not enough, some optional runtime tests too are be useful.More along these lines will be interesting.<Sure. There's a lot that can be done still. Other possible bugs: does something like this). - safety: people that try to mess with the stack to try to break the safety of a program. Also stack canaries (now they can be used wuith LLVM). - errors in indentation, when an if-then-else has an indentation that's the opposite of the its semantics. New GCC can spot this problem. - finding where a structure is modified, but such changes will not influence the program, because the structure is copied by value. This is tricky, but I've fallen in this trap, for example modifying a struct array, if you don't use "ref" you will not change the original array. If inside a function that takes a string in input you change the string length, such string will not be seen outside the function unless you mark the string as "ref", etc. - Spotting bugs in "switch", for example where a programmer may have forgotten a break. I don't know if this can be done, probably not. - There are many other possibile bugs. I suggest to read online lists of common some of them. Lints like Splint and gimpel are designed for this, see below. - disallowing implicit signed=>unsigned conversions. You can also take a look at Splint: http://www.splint.org/ Splint also shows that adding few annotations in the source code can save from many bugs. Annotations for example that assert that a pointer is never null, etc. And gimpel: http://www.gimpel.com/ The bug of the month now is interactive, so you can use it to test how good this software is: http://www.gimpel-online.com/bugsLinkPage.html The very large list of possible bugs/problems spotted by gimpel software: http://gimpel-online.com/MsgRef.html The Cyclone language is a safe variant of C, it can give some ideas. For example warning every time a variable is supposed to be converted automatically into another type of variable with some information loss (eg. double => float, etc). Bye, bearophile
Oct 05 2009