www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Static analysis at Mozilla

reply bearophile <bearophileHUGS lycos.com> writes:
C++ Static Analysis as done on the large Mozilla codebase:
http://blog.ezyang.com/2010/06/static-analysis-mozilla/
It shows that it's important to have a more powerful static reflection in D. It
works well with scoped user-defined attributes too.

Bye,
bearophile
Jun 09 2010
parent reply Sean Kelly <sean invisibleduck.org> writes:
bearophile Wrote:

 C++ Static Analysis as done on the large Mozilla codebase:
 http://blog.ezyang.com/2010/06/static-analysis-mozilla/
 It shows that it's important to have a more powerful static reflection in D.
It works well with scoped user-defined attributes too.
As much as I like static analysis, it still has a long way to go. For example, here's some C code that a static analysis tool recently flagged as broken: size_t fn( char** pdst, char* src, size_t srclen ) { __thread static char* dst = NULL; __thread static size_t dstcap = 0; if( dstcap < srclen ) { dstcap = srclen; dst = realloc( dst, dstcap ); } memcpy( dst, src, srclen ); // Purify: ERROR - uninitialized write *pdst = dst; return srclen; } Basically, it wasn't smart enough to realize that dst would always be non-NULL when the memcpy occurred, let alone that it would also always be large enough. For such false positives, it's generally necessary to insert pointless code simply to silence the error, thus complicating the function and increasing the cost of maintenance. I still believe that the benefits of static analysis vastly outweigh the cost, but I'd love to see more intelligence in branch analysis if nothing else.
Jun 10 2010
next sibling parent Sean Kelly <sean invisibleduck.org> writes:
Sean Kelly Wrote:
         memcpy( dst, src, srclen ); // Purify: ERROR - uninitialized write
Oops, I mistyped that comment. I don't think Purify does static analysis :-)
Jun 10 2010
prev sibling parent reply =?UTF-8?B?QWxpIMOHZWhyZWxp?= <acehreli yahoo.com> writes:
Sean Kelly wrote:
 bearophile Wrote:

 C++ Static Analysis as done on the large Mozilla codebase:
 http://blog.ezyang.com/2010/06/static-analysis-mozilla/
 It shows that it's important to have a more powerful static 
reflection in D. It works well with scoped user-defined attributes too.
 As much as I like static analysis, it still has a long way to go. 
For example, here's some C code that a static analysis tool recently flagged as broken:
     size_t fn( char** pdst, char* src, size_t srclen ) {
         __thread static char* dst      = NULL;
         __thread static size_t dstcap = 0;
         if( dstcap < srclen ) {
             dstcap = srclen;
             dst      = realloc( dst, dstcap );
         }
         memcpy( dst, src, srclen ); // Purify: ERROR - uninitialized 
write
         *pdst = dst;
         return srclen;
     }

 Basically, it wasn't smart enough to realize that dst would
 always be non-NULL when the memcpy occurred, let alone that it
 would also always be large enough.  For such false positives,
 it's generally necessary to insert pointless code simply to
 silence the error, thus complicating the function and
 increasing the cost of maintenance.  I still believe that the
 benefits of static analysis vastly outweigh the cost, but I'd
 love to see more intelligence in branch analysis if nothing
 else.
realloc may return NULL. Perhaps they are catching that condition? Ali
Jun 10 2010
parent reply Sean Kelly <sean invisibleduck.org> writes:
Ali Çehreli Wrote:

 Sean Kelly wrote:
  > bearophile Wrote:
  >
  >> C++ Static Analysis as done on the large Mozilla codebase:
  >> http://blog.ezyang.com/2010/06/static-analysis-mozilla/
  >> It shows that it's important to have a more powerful static 
 reflection in D. It works well with scoped user-defined attributes too.
  >
  > As much as I like static analysis, it still has a long way to go. 
 For example, here's some C code that a static analysis tool recently 
 flagged as broken:
  >
  >     size_t fn( char** pdst, char* src, size_t srclen ) {
  >         __thread static char* dst      = NULL;
  >         __thread static size_t dstcap = 0;
  >         if( dstcap < srclen ) {
  >             dstcap = srclen;
  >             dst      = realloc( dst, dstcap );
  >         }
  >         memcpy( dst, src, srclen ); // Purify: ERROR - uninitialized 
 write
  >         *pdst = dst;
  >         return srclen;
  >     }
  >
  > Basically, it wasn't smart enough to realize that dst would
  > always be non-NULL when the memcpy occurred, let alone that it
  > would also always be large enough.  For such false positives,
  > it's generally necessary to insert pointless code simply to
  > silence the error, thus complicating the function and
  > increasing the cost of maintenance.  I still believe that the
  > benefits of static analysis vastly outweigh the cost, but I'd
  > love to see more intelligence in branch analysis if nothing
  > else.
 
 realloc may return NULL. Perhaps they are catching that condition?
I suppose so. Maybe I should change the if statement to a loop and see what happens.
Jun 10 2010
parent reply "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Thu, 10 Jun 2010 15:55:18 -0400, Sean Kelly <sean invisibleduck.org>  
wrote:

 Ali Çehreli Wrote:

 Sean Kelly wrote:
  > bearophile Wrote:
  >
  >> C++ Static Analysis as done on the large Mozilla codebase:
  >> http://blog.ezyang.com/2010/06/static-analysis-mozilla/
  >> It shows that it's important to have a more powerful static
 reflection in D. It works well with scoped user-defined attributes too.
  >
  > As much as I like static analysis, it still has a long way to go.
 For example, here's some C code that a static analysis tool recently
 flagged as broken:
  >
  >     size_t fn( char** pdst, char* src, size_t srclen ) {
  >         __thread static char* dst      = NULL;
  >         __thread static size_t dstcap = 0;
  >         if( dstcap < srclen ) {
  >             dstcap = srclen;
  >             dst      = realloc( dst, dstcap );
  >         }
  >         memcpy( dst, src, srclen ); // Purify: ERROR - uninitialized
 write
  >         *pdst = dst;
  >         return srclen;
  >     }
  >
  > Basically, it wasn't smart enough to realize that dst would
  > always be non-NULL when the memcpy occurred, let alone that it
  > would also always be large enough.  For such false positives,
  > it's generally necessary to insert pointless code simply to
  > silence the error, thus complicating the function and
  > increasing the cost of maintenance.  I still believe that the
  > benefits of static analysis vastly outweigh the cost, but I'd
  > love to see more intelligence in branch analysis if nothing
  > else.

 realloc may return NULL. Perhaps they are catching that condition?
I suppose so. Maybe I should change the if statement to a loop and see what happens.
What about if srclen is 0? Won't memcpy then be passed a null pointer via dst? Does the static analyzer look inside memcpy to see if it uses the pointer when the size is 0? -Steve
Jun 10 2010
next sibling parent div0 <div0 users.sourceforge.net> writes:
On 10/06/2010 20:57, Steven Schveighoffer wrote:
 On Thu, 10 Jun 2010 15:55:18 -0400, Sean Kelly <sean invisibleduck.org>
 wrote:

 Ali Çehreli Wrote:

 Sean Kelly wrote:
 bearophile Wrote:

 C++ Static Analysis as done on the large Mozilla codebase:
 http://blog.ezyang.com/2010/06/static-analysis-mozilla/
 It shows that it's important to have a more powerful static
reflection in D. It works well with scoped user-defined attributes too.
 As much as I like static analysis, it still has a long way to go.
For example, here's some C code that a static analysis tool recently flagged as broken:
 size_t fn( char** pdst, char* src, size_t srclen ) {
 __thread static char* dst = NULL;
 __thread static size_t dstcap = 0;
 if( dstcap < srclen ) {
 dstcap = srclen;
 dst = realloc( dst, dstcap );
 }
 memcpy( dst, src, srclen ); // Purify: ERROR - uninitialized
write
 *pdst = dst;
 return srclen;
 }

 Basically, it wasn't smart enough to realize that dst would
 always be non-NULL when the memcpy occurred, let alone that it
 would also always be large enough. For such false positives,
 it's generally necessary to insert pointless code simply to
 silence the error, thus complicating the function and
 increasing the cost of maintenance. I still believe that the
 benefits of static analysis vastly outweigh the cost, but I'd
 love to see more intelligence in branch analysis if nothing
 else.
realloc may return NULL. Perhaps they are catching that condition?
I suppose so. Maybe I should change the if statement to a loop and see what happens.
What about if srclen is 0? Won't memcpy then be passed a null pointer via dst? Does the static analyzer look inside memcpy to see if it uses the pointer when the size is 0? -Steve
memcpy may get passed a null, but it's never uninitialized. The analysis is wrong and so is the code. doh! -- My enormous talent is exceeded only by my outrageous laziness. http://www.ssTk.co.uk
Jun 10 2010
prev sibling parent Sean Kelly <sean invisibleduck.org> writes:
Steven Schveighoffer Wrote:

 On Thu, 10 Jun 2010 15:55:18 -0400, Sean Kelly <sean invisibleduck.org>  
 wrote:
 
 Ali Çehreli Wrote:

 Sean Kelly wrote:
  > bearophile Wrote:
  >
  >> C++ Static Analysis as done on the large Mozilla codebase:
  >> http://blog.ezyang.com/2010/06/static-analysis-mozilla/
  >> It shows that it's important to have a more powerful static
 reflection in D. It works well with scoped user-defined attributes too.
  >
  > As much as I like static analysis, it still has a long way to go.
 For example, here's some C code that a static analysis tool recently
 flagged as broken:
  >
  >     size_t fn( char** pdst, char* src, size_t srclen ) {
  >         __thread static char* dst      = NULL;
  >         __thread static size_t dstcap = 0;
  >         if( dstcap < srclen ) {
  >             dstcap = srclen;
  >             dst      = realloc( dst, dstcap );
  >         }
  >         memcpy( dst, src, srclen ); // Purify: ERROR - uninitialized
 write
  >         *pdst = dst;
  >         return srclen;
  >     }
  >
  > Basically, it wasn't smart enough to realize that dst would
  > always be non-NULL when the memcpy occurred, let alone that it
  > would also always be large enough.  For such false positives,
  > it's generally necessary to insert pointless code simply to
  > silence the error, thus complicating the function and
  > increasing the cost of maintenance.  I still believe that the
  > benefits of static analysis vastly outweigh the cost, but I'd
  > love to see more intelligence in branch analysis if nothing
  > else.

 realloc may return NULL. Perhaps they are catching that condition?
I suppose so. Maybe I should change the if statement to a loop and see what happens.
What about if srclen is 0? Won't memcpy then be passed a null pointer via dst? Does the static analyzer look inside memcpy to see if it uses the pointer when the size is 0?
That's an artifact of my simplified example. The real code doesn't dereference src if srclen is 0.
Jun 10 2010