digitalmars.D - More bugs found in OS code
- bearophile (146/146) Nov 02 2011 The now usual article that advertises the PVS-Studio tool that shows ple...
- Adam D. Ruppe (4/9) Nov 02 2011 That's not really an error. You might change out the format
- Andrej Mitrovic (5/5) Nov 02 2011 Personally I like using this for a sort of on/off switch:
- bearophile (5/11) Nov 02 2011 I use this, I think it's more explicit (but you have to state the variab...
- bearophile (6/16) Nov 02 2011 Right, the format string is in general a run-time value, so you can't al...
- Adam D. Ruppe (8/10) Nov 02 2011 I do a lot. The way I do it is the arguments are made
- bcs (5/15) Nov 03 2011 If you use a literal format string, don't use indexed formatting and
- bearophile (15/26) Nov 05 2011 '1'
The now usual article that advertises the PVS-Studio tool that shows plenty of (depressing) bugs found in already debugged source code of widely used C/C++ open source projects: http://software.intel.com/en-us/articles/90-errors-in-open-source-projects/ There is a Reddit discussion too, but I find it useless: http://www.reddit.com/r/programming/comments/lxjrb/examples_of_errors_detected_in_various_opensource/ In the Reddit discussion someone is free to list in D how many of the 91 bugs: - Are not applicable or are not normally done in D; - Are statically caught by D/DMD; - Are always caught at runtime by DMD in non release mode; - Are planned/discussed to be statically avoided or statically caught; - Are planned/discussed to be always caught at runtime by DMD in non release mode. Here I list and comment about some of the more interesting parts. Below I have divided the problems in 5 groups (I have not included all the bug groups shown in the article). A]======================================== Example 3. Fennec Media Project project. Complex expression. uint32 CUnBitArrayOld::DecodeValueRiceUnsigned(uint32 k) { ... while (!(m_pBitArray[m_nCurrentBitIndex >> 5] & Powers_of_Two_Reversed[m_nCurrentBitIndex++ & 31])) {} ... } The error was found through the V567 diagnostic: Undefined behavior. The 'm_nCurrentBitIndex' variable is modified while being used twice at single sequence point. MACLib unbitarrayold.cpp 78 ------------------------ Example 4. Miranda IM project. Complex expression. short ezxml_internal_dtd(ezxml_root_t root, char *s, size_t len) { ... while (*(n = ++s + strspn(s, EZXML_WS)) && *n != '>') { ... } The error was found through the V567 diagnostic: Undefined behavior. The 's' variable is modified while being used twice between sequence points.msne zxml.c 371 ------------------------ Currently DMD accepts this code: x = x++; Currently D is using the C semantics, this means the result of that line of code is undefined, this means you don't know what it will do once compiled by different D compilers or even with different optimization level on the same compiler. This means writing such code in C (and currently in D) is _always_ a bug (because a program with undefined semantics is often useless. There are other sources of undefined semantics, but the less there are, the better it is). I don't understand why C compilers don't refuse such code statically with an error. In my opinion this is not acceptable. Two basic solutions for D: 1) Define exactly what that code does in D. Walter has expressed few times his desire for this solution. 2) Turn similar lines of code into compile-time errors (and maybe accept them forever if the -d switch is used, but I am not sure this is a good idea). The first solution is good because it makes it simpler to port C code to D, it allows C/C++/Java programmers to use that code still. But such C code has undefined results, so I don't see a great advantage here (it's useful still to port Java code). But lately I am slowly leaning toward the second solution, because even if D defines exactly the semantics of code like this, so it gives the same result on all D compilers: (*(n = ++s + strspn(s, EZXML_WS)) && *n != '>') I don't want to read similar code in D programs I have to debug or modify. Even if it's unambiguous for the D language, it's a bit too much hard for me to understand. Go language has chosen this second solution. B]======================================== Curretly D2 accepts the usage of & and | among booleans: void main() { bool a, b; auto r1 = a & b; static assert(is(typeof(r1) == bool)); auto r2 = a | b; static assert(is(typeof(r2) == bool)); } But I am not sure this is useful enough to justify the risks that gives. Using booleans as integers is useful when I have to count them and in few other situations. But I don't remember the last time I've had to use bitwise or/bitwise and on boolean values, I think I have never had to do this. So maybe it's better to forbid this. The advantage of forbidding this operation is that it avoids several mistakes caused by operator precedence like (!x & y). Comments on this are welcome. C]======================================== Example 5. IPP Samples project. Priorities of ?: and | operations. vm_file* vm_file_fopen(...) { ... mds[3] = FILE_ATTRIBUTE_NORMAL | (islog == 0) ? 0 : FILE_FLAG_NO_BUFFERING; ... } The error was found through the V502 diagnostic: Perhaps the '?:' operator works in a different way than it was expected. The '?:' operator has a lower priority than the '|' operator. vm vm_file_win.c 393 ------------------------ Example 6. Newton Game Dynamics project. Priorities of ?: and * operations. dgInt32 CalculateConvexShapeIntersection (...) { ... den = dgFloat32 (1.0e-24f) * (den > dgFloat32 (0.0f)) ? dgFloat32 (1.0f) : dgFloat32 (-1.0f); ... } The error was found through the V502 diagnostic: Perhaps the '?:' operator works in a different way than it was expected. The '?:' operator has a lower priority than the '*' operator. physics dgminkowskiconv.cpp 1061 D]======================================== Currently this is not caught by D, it prints "12": import std.stdio; void main() { writefln("%d%d", 1, 2, 3); } I'd like this code to give a compile-time error. If this is not possible, then I'd like this code to give a run-time error. The third possibility is to print "123", but this is bad enough. Printing just "12" and ignoring 3, as done by DMD 2.056, is not acceptable, in my opinion. E]======================================== Example 1. Shareaza project. Value range of char type. void CRemote::Output(LPCTSTR pszName) { ... CHAR* pBytes = new CHAR[ nBytes ]; hFile.Read( pBytes, nBytes ); ... if ( nBytes > 3 && pBytes[0] == 0xEF && pBytes[1] == 0xBB && pBytes[2] == 0xBF ) { pBytes += 3; nBytes -= 3; bBOM = true; } ... } The error was found through the V547 diagnostic: Expression 'pBytes [ 0 ] == 0xEF' is always false. The value range of signed char type: [-128, 127]. Shareaza remote.cpp 350 ------------------------ Example 3. VirtualDub project. Unsigned type is always >= 0. typedef unsigned short wint_t; ... void lexungetc(wint_t c) { if (c < 0) return; g_backstack.push_back(c); } The error was found through the V547 diagnostic: Expression 'c < 0' is always false. Unsigned type value is never < 0. Ami lexer.cpp 225 The "c < 0" condition is always false because the variable of the unsigned type is always above or equal to 0. ------------------------ Example 7. QT project. Dangerous loop. bool equals( class1* val1, class2* val2 ) const{ { ... size_t size = val1->size(); ... while ( --size >= 0 ){ if ( !comp(*itr1,*itr2) ) return false; itr1++; itr2++; } ... } The error was found through the V547 diagnostic: Expression '--size >= 0' is always true. Unsigned type value is always >= 0. QtCLucene arrays.h 154 The (--size >= 0) condition is always true, since the size variable has the unsigned type. It means that if two sequences being compared are alike, we will get an overflow that will in its turn cause Access Violation or other program failures. ------------------------ Example 8. MySQL project. Error in condition. enum enum_mysql_timestamp_type str_to_datetime(...) { ... else if (str[0] != 'a' || str[0] != 'A') continue; /* Not AM/PM */ ... } The error was found through the V547 diagnostic: Expression 'str [0] != 'a' || str [0] != 'A'' is always true. Probably the '&&' operator should be used here. clientlib my_time.c 340 The condition is always true because the character is always either not equal to 'a' or to 'A'. This is the correct check: else if (str[0] != 'a' && str[0] != 'A') Comments are welcome. Bye, bearophile
Nov 02 2011
bearophile wrote:Currently this is not caught by D, it prints "12": import std.stdio; void main() { writefln("%d%d", 1, 2, 3); }That's not really an error. You might change out the format string at runtime based on user preferences, perhaps for internationalization, or other reasons.
Nov 02 2011
Personally I like using this for a sort of on/off switch: bool val; // .. val ^= 1; Maybe that's just stupid but I'm kind of used to it, lol. :p
Nov 02 2011
Andrej Mitrovic:Personally I like using this for a sort of on/off switch: bool val; // .. val ^= 1; Maybe that's just stupid but I'm kind of used to it, lol. :pI use this, I think it's more explicit (but you have to state the variable name two times): val = !val; Bye, bearophile
Nov 02 2011
Adam D. Ruppe:bearophile wrote:Right, the format string is in general a run-time value, so you can't always catch this situation at compile-time. But how many times do you want to ignore some of the arguments listed? Even if this happens (and I don't remember needing this), I think it's not common enough to justify so permissive semantics at run-time. Recently Andrei A. has expressed some opinions on this topic, but I don't remember the bug report number. Bye, bearophileCurrently this is not caught by D, it prints "12": import std.stdio; void main() { writefln("%d%d", 1, 2, 3); }That's not really an error. You might change out the format string at runtime based on user preferences, perhaps for internationalization, or other reasons.
Nov 02 2011
bearophile:But how many times do you want to ignore some of the arguments listed?I do a lot. The way I do it is the arguments are made available to the format, but it doesn't always need them at runtime. string f = showNames ? "%1$s\t%2$d" : "%2$d"; writefln(f, name, number); Though I don't literally use writefln for most my code the same idea applies.
Nov 02 2011
On 11/02/2011 09:00 PM, Adam D. Ruppe wrote:bearophile:If you use a literal format string, don't use indexed formatting and don't use all the args, I think it would be safe to call that a bug in most any case. Fail any one of those pre-conditions and you may have a point.But how many times do you want to ignore some of the arguments listed?I do a lot. The way I do it is the arguments are made available to the format, but it doesn't always need them at runtime. string f = showNames ? "%1$s\t%2$d" : "%2$d"; writefln(f, name, number); Though I don't literally use writefln for most my code the same idea applies.
Nov 03 2011
Adam D. Ruppe:I do a lot. The way I do it is the arguments are made available to the format, but it doesn't always need them at runtime. string f = showNames ? "%1$s\t%2$d" : "%2$d"; writefln(f, name, number); Though I don't literally use writefln for most my code the same idea applies.Python is strict here:'1'"%d" % (1)Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: not all arguments converted during string formatting Even if you use that idiom, it has costs, measured in bugs. My experience tells me that a sloppy semantics, introduced or kept for a little convenience, always manages to find a way to bite your ass later. So I'd like more tidyness here. You are allowed to write: if (showNames) writeln("%1$s\t%2$d", name, number); else writeln("%2$d", name); There are also other solutions that don't compromise the already low safety of writeln, that is a dynamically typed isle in a statically typed language. Bye, bearophile"%d" % (1, 2)
Nov 05 2011