These are my notes about the the book of London school of thought regarding TDD. Both Martin Fowler in his essay and Kent Beck in the Foreword say they practice TDD differently. Therefore I was expecting what I thought was mockist style of testing, with too-small unit tests coupled too much to the implementation, based on what I saw in some companies in Prague (Czechia). I was very surprised with the quality of tips the authors provided.
I don't like the first part of the title of the book, a farming metaphor of programming, because if overextended, it would seem like we are out of control of growing the software. I do like the subtitle, being "guided by the tests". The authors make a point of "listening to the tests" and responding to hints of the tests about the design of the production code as well.
We know building the software is a learning process and we know there will be changes. What we need is a process to help us with uncertainty as our experience with the project grows - to anticipate unanticipated changes. We think the best approach to learn is to get empirical feedback, as fast as possible. To support future changes, we need constant testing to catch regression errors and keep the code as simple as possible. We use TDD to get feedback on the quality of implementation and the design. We never write new functionality without a failing test.
There is an outer feedback loop of TDD, the end-to-end or acceptance tests. The acceptance tests should exercise the system without directly calling the system internals. Without the end-to-end tests, we risk horror stories like one presented in the book, where everything was unit-tested, except the application could start at all. There are additional feedback loops, like demos, which would have caught this problem as well.
The core interest of the book are unit tests, but there are tests out of scope of this book, which are:
We unit test objects in isolation. We mock collaborators and in practice they don't even exist when writing the test for the first time. We call this interface discovery.
There are two core principles of our approach. Continuous incremental development and expressive code. Attempting to deploy the system from the start helps the team to identify the external cooperators which the team will need and start to build those relationships early.
It seems tempting to start testing the domain objects and their interactions. The integration problems could bite use later. Therefore we start by considering the events coming into the system from the outside and treat the system as blackbox.
We name unit tests based on tested behavior, not method names. When writing unit and integration tests we stay alert for difficult to test code. When we find such code, we don't just ask how to test it, but also why it is so hard to test it. Maybe the design needs improving.
There is a balance between extensively testing all execution paths and integration testing. When choosing how fine-grained our tests should be, we need make sure the confidence in tests is justified. So we regularly adapt our testing strategy based on how well TDD is working for us.
We value code that is easy to maintain over the code that is easy to write. We use two main principles:
- Join - a bidder joins an auction.
- Bid - bidder sends a bidding price.
- Price - auction reports currently accepted price.
- Close - auction announces it is closed.
A bidder state can be represented with a simple state machine with the states: joining, lost, bidding, winning, won.
Even a small application like this cannot be implemented in one go. We have to figure out roughly which steps we need to take. A critical technique with incremental development is learning how to slice up the application so that it can be build a little at a time. The first smallest feature we can build is called the "walking skeleton". Here it will be a minimum path through the Swing, XMPP and the application. Then the sequence of steps will roughly follow:
Single item, lost without bidding. Single item, bid and lose. Single item, join, bid and win. Show price details, Multiple items. Add items through the UI. Stop bidding at the stop price. That is the initial plan.
Chapter 10 - The Walking Skeleton
For most projects developing the walking skeleton takes more time than expected. First, just deciding which feature to do will bring many questions about the place of the application in the world. Second, automating build, tests and deployment pipeline will flush out all sorts of technical and organizational questions.
The walking skeleton must cover all signification aspects of the application. In this cas Swing, XMPP and the application. We start by implementing a test as if the production code already existed - i.e. programming by wishful thinking. We test the simplest feature we can think of - the first from our to-do list. We use JUnit. We need a mechanism to control the UI and a fake auction server. We start by writing a test as if the all the code it needs exists and then we will fill the blanks. We keep the language of the test in domain terms, not technical terms. While implementing the test, we found the Openfire + Smack for faking auction server and WindowLicker for testing the UI.
The usual trouble with end-to-end testing is that the application runs in parallel with the test. So we cannot be sure when it starts and becomes responsive. Usual solution is to poll for effects and fail if they won't happen within a time limit. This might make end-to-end tests slower and more brittle. Some teams fight the randomly failing tests with trying couple of times. This technique is not acceptable in unit tests, those must pass every time. Note that the first test doesn't include the real auction service, so it is not really end-to-end. But we decided to set the boundary there and leave it as a known risk.
Chapter 11 - Passing the First Test
We first write a test infrastructure for our non-existent application, so we can see it fail. We will slowly add to application until we see the test pass.
Every our end-to-end test starts a Openfire server, creates an account for the Sniper app and the auction and then run the tests. We create the ApplicationRunner, which manipulates with the application through its UI. We call the application through its main function to make sure everything is assembled correctly. FakeAuctionServer is the implementation of the auction server for the tests. It has a minimal implementation just to support tests. The last component of the test rig is the XMPP message broker, which doesn't involve any coding. We have everything we need to watch the first end-to-end test fail, so we write the ant script to support running tests.
The test fails because it cannot find the UI component with the name "Auction Sniper Main". So we implement our application with just a top-level window with the correct name.
Now the test fails because it cannot find the display of the current state. So we hard-code the "Joining" status.
Now the UI is working, but the test fails because the auction server does not receive any call from the application. We write a small amount of ugly code, e.g. calling the auction server from the UI thread, just to move forward. But we keep a note not to leave the code ugly.
Now the test fails because the application does not display the "Lost" status. So we update the status when notified from the auction server.
Note the focus that is required to put together the first walking skeleton. We didn't even put any payload to the messages for the auction server as it would be the diversion from the basic proof that they can communicate. We also didn't sweat too hard over the overall code design. But the first item from the to-do list is done.
Chapter 12 - Getting Ready to Bid
We will continue by implementing the next simple use-case from the to-do list. From now on we can use acceptance tests to show progress. Each new acceptance test should contain just enough new requirements to force a manageable increase in functionality. We are using integers to represent money, but in a real application, we would define a domain type to represent monetary values.
We are using the same constant for both production code and test. We are removing duplication this way. On the other hand, we might get them both wrong. The critical question is, what do we think we are testing? Both answers can be correct in particular scenarios.
We run the second test and we got a surprise failure. We expected a assertion error about a label missing, but we got some kind of conflict instead. The two tests were reusing the same Sniper account. This is actually a production bug. We should be logging out in the production code.
Our approach to TDD is outside-in development. We start with an outside event and look for observable behaviors of the system. We cannot catch all the defects using this method, but it is our responsibility to test as much of the application as possible.
On the edge of the application, we notice a potential for a class for accepting the auction server messages. We will call it AuctionMessageTranslator. The current implementation just updates the state of the UI, but we don't want AuctionMessageTranslator to depend on the UI. Instead we make it depend on AuctionEventListener interface. We will unit-test this new class.
We put tests in a different package than the production code. This is because we don't want to leave the door open for the package-private backdoor and just test the classes through their public API. We also use null when argument doesn't matter. We just name the null with a meaningful constant name.
In the production code we make AuctionMessageTranslator depend on AuctionEventListener and have main implement the AuctionEventListener to keep the application working. We never drift too far away from the working application. In this baby step we introduced a new component which has a name and can be unit-tested.
We introduce a second kind of message in the second test. This forces us to generalize the production code - we will have to parse the message. We implement this the simplest way possible - by parsing the message into a key-value pairs in a map. We will take note we need some error handling to the end of the to-do list. We don't want to slow down right now.
Chapter 13 - The Sniper Makes a Bid
It feels it is a good time to introduce an AuctionSniper class, the central component of the application. It does not make sense for AuctionSniper to know user interface classes, such as the use of the Swing thread. AuctionSniper should be concerned with bidding policy and only notify status updates in its terms.
We add a test that Sniper should report it has lost when it receives a close event from the auction. We plug the AuctionSniper in by having it implementing AuctionEventListener and Main will implement the SniperListener, i.e. it will implement the events coming from Sniper. Once again, we've noticed a complexity in a class and use that to tease out a new concept from our initial skeleton implementation.
The next step is to have the Sniper make a bid to the auction. So who should the Sniper talk to? Extending SniperListener seems wrong, as it is a notification for the UI, not a dependency. After usual discussion, we introduce a new collaborator, Auction. The Auction is a dependency. Auction needs a Chat to send a bid message. To create a Chat, we need a Translator, the Translator needs a Sniper. So we have a dependency loop which we need to break. We can cross off another item from the to-do list, but our end to end test consist of just catching and printing the XMPPException. To distinguish between interface and implementation, we will call our implementation XMPPAuction. We start to see the domain more clearly with the new collaborator. We still need to pull the XMPP-related detail out of Main, but we're not ready to do that yet. Another point to keep in mind. Another activity Main is doing is showing the UI. The best name for new component for this would be SniperStateDisplayer.
We tidy up the Translator. We extract an inner class, AuctionEvent, from it. We have it covered by tests already, so the refactoring is safe.
Side note - we developed a habit of packaging up common types such as collections in our own classes. One rule of thumb is that we try to limit passing around types with generics. Particularly when applied to collections, we view it as a form of duplication.
What we hop is becoming clear from this chapter is how we're growing a design from what looks like an unpromising start. We alternate, more or less, between adding features and reflecting on and cleaning up the code that results. We noticed we are implementing a protection layer from the external dependencies and we do it incrementally.
Chapter 14 - The Sniper Wins the Auction
We will add the concept of state to the Sniper to support new Winning state.
When adding the PriceSource, we noticed it is a value type. It is an enum of two values. We prefer it to a boolean, which we would have to interpret every time we will see it. Determining whether a price is ours or not belongs to the translator's role. We are glad we refactored Translator slightly in the previouse chapter.
Failure of the end-to-end test tells us we should make the UI show when the Sniper is winning. Sniper will have to maintain some kind of state which it hasn't had to so far. We added it and so made steady progress.
Chapter 15 - Towards a Read User Interface
We grow the UI from a label to a table. We achieve this by adding one feature at a time, instead of taking the risk of replacing the whole thing on one go.
The client wants to see something that looks like a table. We need to show more price details from the auction and handle multiple items. The Swing pattern for usign a JTable is to associate it with a TableModel. The question is how to get there from here.
We want to do it with a minimum of change. The smallest step we can think of. Replace existing JLabel with one-cell JTable. We can grow it from there. We change the harness to expect table instead of label. Soon we have a Sniper with a single-cell table.
As usual, we work "outside-in", from the event that triggers the behavior. In this case it is the update from Southabee's On-Line.
We introduce a value type to carry the sniper state. We developed a habit of using public final fields in value types, at least while we are still sorting out what the value type should do. Our ambition, which we might not achieve, is to replace all field access with meaningful action methods on the type.
We noticed passing events through the Main window is not adding too much value, so we make a note to deal with that later. We also don't like switch, as it's not object-oriented, se we will keep an eye on that too.
We currently have on kind of Sniper event, Bidding, that we are handling all the way through the application. Now we have to do the same thing to Winning, Lost and Won. Frankly, that's just dull. There's too much repetetive work. Something is wrong with the design. We realize we could collapse our events into one notification that includes the prices and the Sniper status. We also decide that a better name for such class would be SniperSnapshot instead of SniperState.
We write a unit test of Column, which might seem unnecessary now, but it will protect us from our future selves.
SniperTableModel has one responsibility - to represent the state of our bidding in the user interface. It follows the heuristic we described in No Ands, Ors, or Buts.
In this chapter we added little slices of behavior, get each of them working before continuing to the next. There's a reason why surgeons prefer keyhole surgery to opening up a patient - it's less invasive and cheaper.
We also changed our mind from previous chapters a lot. That's a good thing. Deciding when to change a design which smells requires technical skill and experience.
Chapter 16 - Sniping for Multiple Items
As always, we start with a test. We want our new test to show that application can bid for and win 2 different items. We look at our current tests. Our single item test implicitly assumes there is only one item. We wrote the test and noticed assertion gives not easy to understand message. We fixed it.
The test has issues with asynchronicity of the auctions. We solved it, we just needed to identify the problem.
We can make development progress whilst the design is being sorted out. We keep our code (and attitude) flexible to respond to design ideas as they come up.
We needed a test which used WindowLicker but was not an end-to-end test. We called this test an integration test and we wrote it. It might seem unnecessary to add such test here, but it covers us from kind of bug which is too easy to miss otherwise.
We've been careful to keep class responsibilities focused - except Main, where we've put all our working compromises. It took us couple of tries to get the design right, because we were assigning the behaviors to wrong classes. So should we ship it now? No. "Working" is not the same as "finished". We've left quite a mess in Main. We noticed we are not getting any unit-test feedback about the internal quality of the code.
Chapter 17 - Teasing Apart Main
We slice up our application, shuffling behavior around to isolate the XMPP and UI code from the sniping logic.
In todays applications, the Main, the entry point, has a matchmaker role. The configuration might be in the XML file, but it is what it is. In our current application, Main is also implementing some of the components. So it has too many responsibilities. One clue is to look at its imports. Java tolerates package loops, but they are not something to be pleased with.
We extract XMPP logic to XMPPAuction. It makes sense for XMPPAuction to encapsulate a Chat as now it hides everything to do with communication between a request listener and an auction service.
The next thing to remove from Main is direct references to the XMPPConnection. We are moving in the right direction since we are narrowing the scope of where the XMPP constants are used.
Finally we need to do something about the direct reference to the SniperTableModel and the related SwingThreadSniperListener and the awfull notToBeGCd. The first step is to turn the anonymous implementation of UserRequestListener into a proper class so we can understand its dependencies.
SniperTableModel is implicitly responsible for both maintaining a record of our sniping and displaying the record. We want clearer separation of concerns, so we extract a SniperPortfolio to maintain our Snipers.
Our architecture happened incrementally, but it follows "ports and adapters". We reached this almost automatically by just following the code and taking care to keep it clean.
Sometimes when analysing the application, we need a dynamic diagram of what is happening., an interaction diagram (similar to sequence diagram).
Chapter 18 - Filling In the Details
We realize we should have created an Item type much sooner. We want to implement loosing a bid. We start with a failing test. This implies that we need a new input field in the UI for a stop price. We want to make the structure explicit, so we create a new class, Item. This is an example of budding off.
We are making UI incrementally. We can respond to changing needs, with test.
When doing TDD and not sure what to do, sometimes it helps to step back to index cards to regain the right direction.
Looking back we wish we created the Item type earlier. It is often better to define domain types to wrap not only Strings but other build-in types too, including collections.
Chapter 19 - Handling Failure
We add a new auction event that reports failure. As usual we work in small slices, make a new test, make it pass, then restructure the code. We use small methods to express intent. We try to make each level as readable as possible, until all the way down, we use a Java construct. We're prioritizing expressiveness over minimizing the source lines.
Logging infrastructure is better isolated, rather than scattered throughout the code.
Chapter 20 - Listening to the Tests
Sometimes we find it difficult to write a test. When this happens, we first check whether it's an opportunity to improve our code, before working around the design by making the test more complicated. We've found that the qualities that make an object easy to test also make our code responsive to change. TDD is about testing code. TDD is also about feedback on the code's internal qualities. Now when we find a feature that is difficult to test, we don't just ask ourselves how to test it, but also why it is difficult to test.
For example depending on current time. We make it obvious that Receiver is dependent on time. We want to know about this dependency especially when the service is rolled out across the world. Unit-testing tools that let the programmer bypass poor dependency management in the design waste a valuable source of feedback.
Logging is a feature. There are two separate features of it. Support logging (errors and info) and diagnostic logging (debug and trace).We can use notifications instead of logging, which is an implementation detail. Then we write the code in terms of intent - i.e. helping support people instead of implementation details, .i.e. logging.
Our intention in TDD is to use mock objects to bring out relationships between objects. When we extract an interface as part of our TDD process, we have to think up a name for it. Once something has a name, we can talk about it.
Never mock a value objects. There is no need. Just create them.
Being sensitive to complexity in tests can help us clarify our designs.
When we have a bloated constructor, there are 3 possible diagnosis:
- There might be a missing concept that needs to be created. It might be hard to find a name for it.
- The object might be too large itself with too many responsibilities. The tests will look confused too.
- It might have too many dependencies. Dependencies should be passed to constructor, but notifications and adjustments don't have to be - they could be set to defaults and reconfigured later.
We avoid too many asserts in tests because it blurs what's important in the test.
There are these benefits of listening to the tests:
- Testable components keep their knowledge local.
- We mock interfaces because we have to create a name for them.
- We also like to name implementations based on how objects communicate rather than what they are.
- We apply Tell, Don't Ask, so we pass behavior, not data.
Unit tests should not be 1000 lines long. It should focus on at most a few classes and should not need to create a large fixture or perform lots of preparation.
Chapter 21 - Test Readability
Tests must be readable. We take as much care to keep our unit tests clean as when keeping the production code clean, but with different style, as each serves a different purpose. Tests should describe what the production code does in examples, so it should use concrete values. Production code, on the other hand, should be abstract about the values, but concrete about how it gets the job done. We want our test code to read like a declarative description of what is being tested.
They should exercise a single feature. They should have a simple single structure. They should be short enough to easily see the point of the test.
Test names should clearly describe point of each test. Each test name reads like a sentence, with the target class as the implicit subject.
We write our tests in a standard form: Setup, Execute, Verify, Teardown. We often write test backwards, starting with assertions.
We extract common features into methods that can be shared between tests. But we are always careful not to make a test too abstract that we cannot see what it does anymore.
We write test data builders to build up a complex data structures. Test just the values that are relevant, so that the reader can understand the intent. Everything else can be defaulted.
Literals do not describe their intent. Use local variables and constants for this purpose.
Chapter 22 - Constructing Complex Test Data
An object mother is a class that contains a number of factory methods. E.g. ExampleOrders.newDeerstalkerAndCapeOrder(). It can be reused between tests. This pattern does not cope well with data variation. Each requires a new method.
Another solution is to use builder pattern. Tests can specify just the values where they differ from the default objects. E.g. use withers from Lombok.
Even someone non-technical should understand our tests. We use test data builders to make our tests more expressive.
Chapter 23 - Test Diagnostics
Avoid not being able to diagnose a test failure that happened. Fail informatively. Clearly explain what has failed. Write small, focused, well-named tests. Use self-describing values in the tests. Diagnostics are a first-class feature of the tests. We should explicitly improve them in our tests.
Chapter 24 - Test Flexibility
Maintaining the tests can become a burden if they haven't been written carefully. Make sure each test fails only when its relevant code is broken. Otherwise we have brittle tests. that will slow the development and inhibit refactoring. Common causes:
- Tests too tightly coupled to unrelated parts of the system.
- Tests over-specifying the expected behavior, constraining implementation more than necessary.
- Duplication when multiple tests exercise the same production behavior.
Specify exactly what should happen and no more. Only enforce invocation order when it matters.
Chapter 25 - Testing Persistence
Ensure that persistence tests are isolated from one another. Delete rows from the database tables before the test starts. It is better to do it at the start, not the end.
We usually extract transaction management into a subordinate object, called a transactor, that runs a unit of work within a transaction. In the tests the transactor uses the same transaction manager as the application.
Yes, persistence tests are slower than mocked unit tests, but they still have their place.
Chapter 26 - Unit Testing and Threads
Concurrency complicates matters. It's worth thinking about system's concurrency architecture ahead of time. We wanted to separate the logic that splits a request into multiple tasks from the technical details of how those tasks are executed concurrently. In our unit tests we'll give the AuctionSearch a fake task runner that calls tasks directly. In the real system, we'll give it a task runner that creates threads for tasks.
Separating the functional and synchronization concerns has let us test-drive the functional behavior of our AuctionSearch in isolation. Now it's time to test-drive the synchronization. We will do this by writing stress-tests that run multiple threads through the AuctionSearch implementation to cause synchronization errors. Specify one of the object's observable invariants with respect to concurrency. Write a stress test for the invariant that exercises the object multiple times from multiple threads. Watch the test fail, and tune the stress test until it reliably fails ion every test run. Make the test pass by adding synchronization. We often write both functional and stress tests before production code. Stress tests do not provide any guarantees, they offer just a degree of reassurance.
Chapter 27 - Testing Asynchronous Code
Some tests must cope with asynchronous behavior. Control returns to test before the tested activity is complete. An asynchronous test must wait for success and use timeouts to detect failure.
There are 2 ways a test can observe the system. By sampling its observable state or by listening for events that it sends out. Of these, sampling is often the only option because many systems don't send any monitoring events. Both observation strategies use a timeout to detect that the system has failed.
A test can fail intermittently if its timeout is too close to the time the tested behavior normally takes to run. Flickering tests can mask real defects. We need to make sure we understand what the real problem is before we ignore flickering tests. We should be paying attention to why the tests are flickering.
Succeed fast. Make asynchronous tests detect success ASAP so that they provide rapid feedback. Listening for events is the quickest. Sampling means repeated polling the system.
Synchronization and assertion is just the sort of behavior that's suitable for factoring out into subordinate objects because it usually turns into a bad case of duplication if we don't. It's also just the sort of tricky code we want to get right once and not have to change again.
Beware of asynchronous tests that return the system to the same state. We have to check that the intermediate state was reached as well.
Comments