30K application lines + 110K testing lines: Evidence of...?

I recently wrote an encomium to ResolverOne, the IronPython-based spreadsheet:

[T]heir use of pair programming and test-driven development has delivered high productivity; of the 140,000 lines of code, 110,000 are tests....ResolverOne has been in development for roughly two years, is written in a language without explicit type declarations, and is on an implementation that itself is in active development. It's been brought to beta in a credible (if not downright impressive) amount of time despite being developed by pairs of programmers writing far more lines of test than application. Yet no one can credibly dismiss the complexity of 30,000 lines of application logic or spreadsheet functionality, much less the truly innovative spreadsheet-program features.
ResolverOne is easily the most compelling data point I've heard for the practices of Extreme Programming. [Extreme Program, SD Times]

Allen Holub sees the glass as half-empty, writing:

I want to take exception to the notion that Python is adequate for a real programming project. The fact that 30K lines of code took 110K lines of tests is a real indictment of the language. My guess is that a significant portion of those tests are addressing potential errors that the compiler would have found in C# or Java. Moreover, all of those unnecessary tests take a lot of time to write, time that could have been spent working on the application.

I was taken aback by this, perhaps because it's been a good while since I've heard someone characterize tests as evidence of trouble as opposed to evidence of quality.

There are (at least) two ways of looking at tests:

  1. Tools for discovering errors, or
  2. Quality gates (they're one way -- are they quality diodes?)

There's no doubt that the software development tradition has favored the former view (once you've typed a line, everything you do next is "debugging"). However, the past decade has seen a ... wait for it ... paradigm shift.

The Agile Paradigm views change over time as a central issue; if it were still the 90s, I would undoubtedly refer to it as Change-Oriented Programming (COP). Tests are the measure of change -- not lines of code, not cyclomatic complexity, not object hierarchies, not even deployments.

(Perhaps "User stories" or scenarios are the "yard-stick" of change, tests are the "inch-stick" of change, and deployments are the "milestone" of change.)

So from within the Agile Paradigm / COP, a new test is written that fails, some new code is written, the test passes -- a one-way gate has been passed through, progress has been made, and credit accrues. From outside the paradigm, a test is seen as indicative of a problem that ought not to exist in the first place. The passing of the test is not seen as the salient point, the "need" for (i.e., existence of) the test is seen as evidence of low quality.

In true test-driven development, every pass fails at least once, because the tests are written before the code. What is perhaps not appreciated by those outside the Agile Paradigm, however, is that tests are written that one expects to run from the moment the relevant code is created. For instance, if one had fields for sub-total, taxes, and total, one would certainly write a test that confirmed that total = sub-total + taxes. One would also certainly expect that test to pass as soon as the code had been written.

As is often the case with paradigms, often just realizing that there are different mental models / worldviews in play is crucial to communication.

Update: This relates to Martin Fowler's recent post on Schools of Software Development.