Patterns of Bugs

January 06, 2011

written by Walter Bright

I spent the first three years of my career working on flight critical mechanical designs for the Boeing 757. Although these were gearboxes, hydraulics, cables and linkages, the methodology used to make error-free systems is very applicable to software design.

For starters is Boeing's attitude towards failure. It is not considered human error, fixable by hiring better people. It is a failure of process. The best people have bad days and make mistakes, so the solution is to change the process so the mistakes cannot happen or cannot propagate.

One simple example is an assembly that is bolted onto the frame with 4 bolts. The obvious bolt pattern is a rectangle. Unfortunately, a rectangle pattern can be assembled in two different ways, one of which is wrong. The solution is to offset one of the bolt holes — then the assembly can only be bolted on in one orientation. The possible mechanic's mistake is designed out of the system.

This idea permeates Boeing designs. Parts can only be assembled one way, the correct way. Sadly, this has failed to penetrate other industries. Take a look at the back of your computer — you can reverse the keyboard and mouse connections; it'll plug in just fine, it just won't work. Ditto for the microphone and speaker connections. And the expansion memory slots. Etc. My hotrod is full of parts that can be assembled the wrong way (often with disastrous results).

But enough about hardware, let's see how this applies to software bugs. What I look for are patterns of bugs — not problems I imagine will happen, but certain kinds of bugs that appear over and over, causing expense and grief. Once the pattern is identified, then look for changes in process that will permanently eliminate that bug. Eliminate enough of the bug patterns, and you should enjoy a substantial increase in productivity.

Changing the process means changing:

coding standards
the programming language
coding review checklists
testing tools
static analysis tools
testing methodology
best practices

Where can we look for patterns of bugs? Many have trod this ground before us, so examining existing improvements in those same processes will give a good start for improving our own processes. To go beyond that, the best way is to troll bug fix patches looking for patterns.

Wrong Operator

For example, this patch suggests that:

(!E && !E->fld)

is a nonsense expression, and what was probably meant was:

(!E || !E->fld)

What's the process fix for this bug pattern? I think we can agree that trying to improve coding standards, code reviews, or testing probably won't help for this one. Using a coverage analyser in combination with testing can illustrate that the second clause, if executed, will always seg fault. But the best way to attack this is with either a static analysis tool or by integrating such a check into the programming language compiler.

Operator Precedence

A related problem is the classic one due to the odd C operator precedence rules:

(a & b == c)

which is evaluated as:

(a & (b == c))

which is practically never intended. Again, this is best attacked with language solutions or static analysis. In the D programming language, we didn't wish to mess with the operator precedences in order to avoid behavior that would be surprising to experienced programmers. So we opted instead to change the grammar so that:

(a & b == c)

is simply illegal. It won't parse. It's the best kind of solution to a bug pattern, because it is defined out of existence, much like the offset mounting hole for an assembly.

This change immediately proved its worth by finding a couple instances of it in the D standard library.

Allow Or Deny

New programmers to C sometimes write the following:

(a < b < c)

expecting it to mean:

(a < b && b < c)

when it actually means:

((a < b ? 1 : 0) < c)

The two ways of dealing with this are to allow it or deny it. Python takes the former track, and interprets it as meaning b must be between a and c. D aims to not have behavior silently surprising to C programmers, and so with a small tweak to the grammar, such expressions simply won't pass the parsing stage.

Fencepost

A common pattern is the classic fencepost bug:

int A[10];
for (int i = 0; i <= 10; i++)
    ... = A[i];

The loop runs off the end of the array because <= was used for the limit rather than <. This kind of error is often picked up by languages that do array bounds checking. But that only detects the error if the end of the array is actually run off, it doesn't find fencepost errors in the general case.

A more general solution is to try to minimize the use of for loops, replacing them with the foreach:

foreach (v; A)
    ... = v;

Variations on foreach appear in most modern languages and are being added to existing ones. I think it's a big win for productivity. In any case, red flagging any use of <= in the for loop conditional should be on the code review checklist.

Off The Deep End

Naturally, there's always a Daahhk Side to changing the process to prevent bugs. Preventing one pattern can cause a worse pattern to emerge. The classic example of that is Java's checked exceptions. In a nutshell, a method must list all the exceptions that could be thrown by it or transitively by any method it calls. Sounds good! But it took years for the problem with it to be recognized. The trouble was it was annoying to add those annotations. Not only did one have to add it to the throwing method, but the callers of that method, and their callers, etc., up to where it was caught. Programmers being people, naturally they applied the quick & dirty fix instead:

try {
...
} catch (TheException e) { }

I.e. the exception was simply caught and ignored. The intent was to “fix it later”, but of course this never happened. See Bruce Eckel's excellent analysis of the situation.

It's a difficult line to find between eliminating a bug pattern and annoying the programmer so much that he makes things worse. The best solutions are ones where doing things the right way are easy, and the wrong way require hammers, grinders, and liberal amounts of “persuasion”.

Conclusion

I could go on and on listing common bug patterns. It's too bad there doesn't seem to be an online repository of them. They would make for great research material for programming language designers. Every one you can design out of existence will incrementally improve the productivity of programmers. Modern programming languages are definitely trending towards the elimination of common bug patterns, but there's a lot of low hanging fruit still there.

I've noticed in my decades of writing programs that I just don't make the kinds of mistakes I used to. Apparently I've unconsciously evolved coping strategies to avoid them. Identifying and building such strategies into the process means everyone can benefit from that experience.

Acknowledgements

Thanks to Brad Roberts, David Held, Eric Niebler, Andrei Alexandrescu, and Bartosz Milewski for their helpful comments.

Articles