www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Thoughts about unittest run order

reply "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
In theory, the order in which unittests are run ought to be irrelevant.
In practice, however, the order can either make debugging code changes
quite easy, or very frustrating.

I came from a C/C++ background, and so out of pure habit write things
"backwards", i.e., main() is at the bottom of the file and the stuff
that main() calls come just before it, and the stuff *they* call come
before them, etc., and at the top are type declarations and low-level
functions that later stuff in the module depend on.  After reading one
of Walter's articles recently about improving the way you write code, I
decided on a whim to write a helper utility in one of my projects "right
side up", since D doesn't actually require declarations before usage
like C/C++ do.  That is, main() goes at the very top, then the stuff
that main() calls, and so on, with the low-level stuff all the way at
the bottom of the file.

It was all going well, until I began to rewrite some of the low-level
code in the process of adding new features. D's unittests have been
immensely helpful when I refactor code, since they catch any obvious
bugs and regressions early on so I don't have to worry too much about
making large changes.  So I set about rewriting some low-level stuff
that required extensive changes, relying on the unittests to catch
mistakes.

But then I ran into a problem: because D's unittests are currently
defined to run in lexical order, that means the unittests for
higher-level functions will run first, followed by the lower-level
unittests, because of the order I put the code in.  So when I
accidentally introduced a bug in lower-level code, it was a high-level
unittest that failed first -- which is too high up to figure out where
exactly the real bug was. I had to gradually narrow it down from the
high-level call through the middle-level calls and work my way to the
low-level function where the bug was introduced.

This is quite the opposite from my usual experience with "upside-down
order" code: since the low-level code and unittests would appear first
in the module, any bugs in the low-level code would trigger failure in
the low-level unittests first, right where the problem was. Once I fix
the code to pass those tests, then the higher-level unittests would run
to ensure the low-level changes didn't break any behaviour the
higher-level functions were depending on.  This made development faster
as less time was spent narrowing down why a high-level unittest was
failing.

So now I'm tempted to switch back to "upside-down" coding order.

What do you guys think about this?


T

-- 
You have to expect the unexpected. -- RL
May 06
next sibling parent Jacob Carlborg <doob me.com> writes:
On 2019-05-06 20:13, H. S. Teoh wrote:
 In theory, the order in which unittests are run ought to be irrelevant.
 In practice, however, the order can either make debugging code changes
 quite easy, or very frustrating.
 
 I came from a C/C++ background, and so out of pure habit write things
 "backwards", i.e., main() is at the bottom of the file and the stuff
 that main() calls come just before it, and the stuff *they* call come
 before them, etc., and at the top are type declarations and low-level
 functions that later stuff in the module depend on.  After reading one
 of Walter's articles recently about improving the way you write code, I
 decided on a whim to write a helper utility in one of my projects "right
 side up", since D doesn't actually require declarations before usage
 like C/C++ do.  That is, main() goes at the very top, then the stuff
 that main() calls, and so on, with the low-level stuff all the way at
 the bottom of the file.
 
 It was all going well, until I began to rewrite some of the low-level
 code in the process of adding new features. D's unittests have been
 immensely helpful when I refactor code, since they catch any obvious
 bugs and regressions early on so I don't have to worry too much about
 making large changes.  So I set about rewriting some low-level stuff
 that required extensive changes, relying on the unittests to catch
 mistakes.
 
 But then I ran into a problem: because D's unittests are currently
 defined to run in lexical order, that means the unittests for
 higher-level functions will run first, followed by the lower-level
 unittests, because of the order I put the code in.  So when I
 accidentally introduced a bug in lower-level code, it was a high-level
 unittest that failed first -- which is too high up to figure out where
 exactly the real bug was. I had to gradually narrow it down from the
 high-level call through the middle-level calls and work my way to the
 low-level function where the bug was introduced.
 
 This is quite the opposite from my usual experience with "upside-down
 order" code: since the low-level code and unittests would appear first
 in the module, any bugs in the low-level code would trigger failure in
 the low-level unittests first, right where the problem was. Once I fix
 the code to pass those tests, then the higher-level unittests would run
 to ensure the low-level changes didn't break any behaviour the
 higher-level functions were depending on.  This made development faster
 as less time was spent narrowing down why a high-level unittest was
 failing.
 
 So now I'm tempted to switch back to "upside-down" coding order.
 
 What do you guys think about this?
There are different schools on how to order code in a file. Some will say put all the public functions first and then the private. Some will say code should read like a newspaper article, first an overview and the more and more you read you get deeper into the details. Others will says that you put related code next to each other, regardless if it's public or private symbols. I usually put public symbols first and the private symbols. When it comes to the order of unit tests, I think they should run in random order. If a test fails it should print a seed value. If the tests are run with this seed value the tests should be run in the same order as before. This helps with debugging if some tests are accidentally depending on each other. The problem you're facing, I'm guessing, is you run with the default unit test runner? If a single test fails, it will stop and run no other tests in that module (tests in other modules will run). If you pick and existing unit test framework or write your own unit test runner you can have the unit tests continue after a failure. Then you would see that the lower level tests are failing as well. Writing a custom unit test runner with the help of the "getUnitTests" trait [1] would allow you to do additional things like looking for UDAs that could set the order of the unit tests or group the unit tests in various ways. You could group them in high and low level groups and have the unit test runner run the low level tests first. For your particular problem it seems it would be solved by continue running the other tests in the same module when a test has failed. I think "silly" [2] looks really interesting. I haven't had the time yet to try it out so I don't know if it will continue after a failed test. [1] https://dlang.org/spec/traits.html#getUnitTests [2] https://code.dlang.org/packages/silly -- /Jacob Carlborg
May 06
prev sibling next sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 5/6/2019 11:13 AM, H. S. Teoh wrote:
 What do you guys think about this?
That thought never occurred to me, thanks for bringing it up. It suggests perhaps the order of unittests should be determined by a dependency graph, and should start with the leaves.
May 06
parent "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Mon, May 06, 2019 at 10:30:01PM -0700, Walter Bright via Digitalmars-d wrote:
 On 5/6/2019 11:13 AM, H. S. Teoh wrote:
 What do you guys think about this?
That thought never occurred to me, thanks for bringing it up. It suggests perhaps the order of unittests should be determined by a dependency graph, and should start with the leaves.
That was also my first thought, but how would you construct such a graph? In my case, almost all of the unittests are at module level, and call various module-level functions. It's not obvious how the compiler would divine which ones should come first just by looking at the unittest body. You'd have to construct the full function call dependency graph of the entire module to get that information. T -- Answer: Because it breaks the logical sequence of discussion. / Question: Why is top posting bad?
May 07
prev sibling parent reply John Colvin <john.loughran.colvin gmail.com> writes:
On Monday, 6 May 2019 at 18:13:37 UTC, H. S. Teoh wrote:
 In theory, the order in which unittests are run ought to be 
 irrelevant. In practice, however, the order can either make 
 debugging code changes quite easy, or very frustrating.

 [...]
Use a test runner that runs all the tests regardless of previous errors? (and does them in multiple threads, hooray!) https://github.com/atilaneves/unit-threaded Then you'll at least get to know everything that failed instead of just whatever happened to be lexically first. I agree that some ordering system might improve the time-to-narrow-down-bug-location a bit, but the above might be acceptable nonetheless.
May 07
next sibling parent reply Atila Neves <atila.neves gmail.com> writes:
On Tuesday, 7 May 2019 at 08:49:15 UTC, John Colvin wrote:
 On Monday, 6 May 2019 at 18:13:37 UTC, H. S. Teoh wrote:
 In theory, the order in which unittests are run ought to be 
 irrelevant. In practice, however, the order can either make 
 debugging code changes quite easy, or very frustrating.

 [...]
Use a test runner that runs all the tests regardless of previous errors? (and does them in multiple threads, hooray!) https://github.com/atilaneves/unit-threaded
unit-threaded can also do the random order and reuse a seed like Jacob mentioned above. If the order tests run in is important, the tests are coupled... friends don't let friends couple their tests.
May 07
parent reply "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Tue, May 07, 2019 at 09:40:27AM +0000, Atila Neves via Digitalmars-d wrote:
[...]
 If the order tests run in is important, the tests are coupled...
 friends don't let friends couple their tests.
How do you decouple the tests of two functions F and G in which F calls G? If a code change broke the behaviour of G, then both tests should fail. Then we run into this problem with the default test runner. To make F's tests independent of G would require that they pass *regardless* of the behaviour of G, which seems like an unattainable goal unless you also decouple F from G, which implies that every tested function must be a leaf function. Which seems unrealistic. T -- The trouble with TCP jokes is that it's like hearing the same joke over and over.
May 07
parent reply Atila Neves <atila.neves gmail.com> writes:
On Tuesday, 7 May 2019 at 11:29:43 UTC, H. S. Teoh wrote:
 On Tue, May 07, 2019 at 09:40:27AM +0000, Atila Neves via 
 Digitalmars-d wrote: [...]
 If the order tests run in is important, the tests are 
 coupled... friends don't let friends couple their tests.
How do you decouple the tests of two functions F and G in which F calls G?
It depends. If F and G are both public functions that are part of the API, then one can't. Otherwise I'd just test F since G is an implementation detail. I consider keeping tests around for implementation details an anti-pattern. Sometimes it's useful to write the tests if doing TDD or debugging, but afterwards I delete them.
May 07
parent "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Tue, May 07, 2019 at 04:50:23PM +0000, Atila Neves via Digitalmars-d wrote:
 On Tuesday, 7 May 2019 at 11:29:43 UTC, H. S. Teoh wrote:
 On Tue, May 07, 2019 at 09:40:27AM +0000, Atila Neves via Digitalmars-d
 wrote: [...]
 If the order tests run in is important, the tests are coupled...
 friends don't let friends couple their tests.
How do you decouple the tests of two functions F and G in which F calls G?
It depends. If F and G are both public functions that are part of the API, then one can't. Otherwise I'd just test F since G is an implementation detail. I consider keeping tests around for implementation details an anti-pattern. Sometimes it's useful to write the tests if doing TDD or debugging, but afterwards I delete them.
I almost never delete unittests. IME, they usually wind up catching a regression that would've been missed otherwise. T -- If you want to solve a problem, you need to address its root cause, not just its symptoms. Otherwise it's like treating cancer with Tylenol...
May 07
prev sibling parent reply "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Tue, May 07, 2019 at 08:49:15AM +0000, John Colvin via Digitalmars-d wrote:
 On Monday, 6 May 2019 at 18:13:37 UTC, H. S. Teoh wrote:
 In theory, the order in which unittests are run ought to be
 irrelevant.  In practice, however, the order can either make
 debugging code changes quite easy, or very frustrating.
 [...]
Use a test runner that runs all the tests regardless of previous errors? (and does them in multiple threads, hooray!)
That's certainly one way to go about it. But perhaps what I'm really looking for is a way to invoke a *specific* unittest (probably identified by starting line number, just like what dmd does to mangle unittests), so that I can iterate on a specific problem case until it's fixed before running through all the tests again.
 https://github.com/atilaneves/unit-threaded
 
 Then you'll at least get to know everything that failed instead of
 just whatever happened to be lexically first.
 
 I agree that some ordering system might improve the
 time-to-narrow-down-bug-location a bit, but the above might be
 acceptable nonetheless.
Yeah, not aborting immediately upon test failure would help a lot in this respect. T -- "If you're arguing, you're losing." -- Mike Thomas
May 07
parent reply "Nick Sabalausky (Abscissa)" <SeeWebsiteToContactMe semitwist.com> writes:
On 5/7/19 1:22 PM, H. S. Teoh wrote:
 
 But perhaps what I'm really looking for is a way to invoke a *specific*
 unittest
unit-threaded. Seriously, it's awesome. Use it. You'll be happy :) But as far as the default test runner and order of code layout, those are some really interesting points. With a low-level-to-high-level ordering, then in many cases you wouldn't even need to point at a particular test you want to focus on.
May 07
parent "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Tue, May 07, 2019 at 05:30:03PM -0400, Nick Sabalausky (Abscissa) via
Digitalmars-d wrote:
 On 5/7/19 1:22 PM, H. S. Teoh wrote:
 
 But perhaps what I'm really looking for is a way to invoke a
 *specific* unittest
unit-threaded. Seriously, it's awesome. Use it. You'll be happy :)
OK, point taken. I'll go check it out. :-P
 But as far as the default test runner and order of code layout, those
 are some really interesting points. With a low-level-to-high-level
 ordering, then in many cases you wouldn't even need to point at a
 particular test you want to focus on.
Well yes, and being the lazy coder that I am, this least-effort path is particularly appealing to me. T -- WINDOWS = Will Install Needless Data On Whole System -- CompuMan
May 07