Friday, August 2, 2019

Notes about Implementation Patterns

These are my notes about the Implementation Patterns book from 2008 by Kent Beck.

Chapter 3 - A Theory of Programming

Kent Beck values (in this particular order):
  1. Communication through the code to the reader. Communication has a economic background - the code is more frequently read than written. So we should optimize for readability.
  2. Simplicity, i.e. removing complexity from the code. With complexity we loose audience.
  3. Flexibility, i.e. responding to changes well (e.g. configurability).
He uses the following principles while programming:
  • Local Consequences - i.e. keep the changes as local as possible.
  • Minimize Repetition - minimize situations when you have to do the same change in many different places in the code. It can be achieved by splitting the code into small parts.
  • Logic and Data Together - logic and the data should be as close as possible. Ideally in one file, but a common or near package could suffice. The point again is to keep changes local.
  • Symmetry - I think a better name would be Consistency. The same idea should be expressed in the same way throughout the code. Another kind of application is keeping the method parts at the same level of abstraction.
  • Declarative Expression - if you need to simplify reading of a complex concept, sometimes declarative expression instead of algorithm works better. E.g. JUnit4 annotations instead of JUnit3 methods.
  • Rate of Change - or a temporal symmetry. If some things change at the same times during runtime, they should be put together.

Chapter 4 - Motivation

There is an old formula of the software development cost: it is the sum of cost to develop it and the cost to maintain it. It surprised many people that maintenance cost is much higher than development cost. Maintenance cost is the cost to understand + cost to change + cost to test + cost to deploy.

There were attempts to make code general enough for future improvements, but these efforts usually fail. They cannot foresee future business requirements. A better strategy is making smaller improvements to code quality, with short-term benefits, to smooth out the future development.

Kent Beck's strategy is to address the cost of understand.

Chapter 5 - Class

Classes are data bundled with logic which changes them. Class hierarchies are a form of compression. As with all compression techniques, they decrease understanding. We have to be careful when to use inheritance. From design point of view, classes are relatively expensive things. We should minimize their count as long as they are simple enough. 

Superclasses of class hierarchy should have names as simple as possible. An example is a Figure in a drawing application. Subclasses can have longer names for the benefit of having expressive names. We should be very careful to use class hierarchies. They are hard to untangle for little benefit over class composition.

We should program to minimal interfaces we need. Kent warns us before future-proofing again for the same reasons as in Chapter 4. He has no problem with prefixing interface names with I if that enables us to name the concrete classes better, e.g. IFile implemented by File.

We can use abstract classes if we want to protect ourselves from the changes in the interface. They have the disadvantage that the concrete classes can have only one superclass, but many interfaces.

If you need to add operations to an existing interface or otherwise change it, sometimes you need another interface which extends the former one. This is an ugly solution to an ugly problem. If you need many such interfaces, maybe its time to rethink the whole design of the system.

Kent Beck suggest using Value object in micro-parts of the codebase. Normal objects will have their state changed. They can protect a micro-hierarchy of classes which are mathematical values. This is an example of mix of the object and functional programming. According to Beck, there is much more to be written about how to bled, object, functional and procedural styles of programming. 

Beck is an object-oriented programmer. He suggest we use polymorphic messages, which enables variation and provides clarity. He warns before procedural style of programming.

Beck suggests using inner classes when appropriate. 

Conditionals are fine as they are local and not widely used. As soon as their usage becomes wider, a class hierarchy or delegation seems more appropriate. 

Beck suggests we should try to convert utility classes into regular objects if possible.

Chapter 6 - State

Our brains are used to environment with a state. State changes. That causes problems for programmers. It also causes problems for parallel computations. There are some languages which got rid of state, but none of them got too popular. That is because we are used to state. OOP helps us to encapsulate state, so we can limit methods which could change the state. The key to managing state is to put bits of state that are used together and die together, together.

Every object is a little computer with its own storage. We shouldn't expose fields of objects as public, therefore loosing flexibility. 

Direct access is when we access the variable directly. It has no flexibility, so in order to refactor, we will have to change the code in all places. It also doesn't communicate well. Beck usually doesn't think about his programs in terms of storage. Methods or objects communicate their intent better.

Indirect access is when we use getter and setter to access fields. Beck uses direct access inside the class and indirect access for class clients. This has both flexibility and readability. Most access to storage should be from inside the class. Otherwise we have bigger design issues.

If you have common state for multiple objects, maybe those should be fields, instead of method parameters. On the other hand, if couple of fields are used in some methods of the class only, they might be better off as method parameters than fields.

If the class has variable state, we can use maps to store it. This is more flexible, but also less readable solution. If there is a common state stored in maps, it should perhaps be refactored to fields.

In rare times, objects need special-purpose state, which is needed only for computation, but otherwise it is not useful as the rest of the fields. Storing it in a field would go against the rule of symmetry. In these times we can use near single-purpose object for holding this information instead of in the object itself. This makes it really hard to replicate and read state, but could be still useful in rare occasions.

When using variables in Java, Beck recommends using only local variables and private fields, rarely static. This way, you don't have to encode the field scope in its name. The reader either sees its declaration or doesn't, so he knows whether it is a local variable or a field. Lifetime of a variable should be as close to its scope as possible.

Local variables have some common roles. E.g. a result/s variable is a method result. Count is a special case of result. Explanatory variables help to explain the algorithm. They are one step towards helper methods. Another role is reuse - when you need to reuse something, you can store it in a local variable. Final common role is to store an element of a collection.

Fields have some common roles as well. Helper fields can be used instead of passing the same parameter to all object' methods. Flag fields change the behavior of the object in couple of places. It the count of such places becomes too high, field should be changed to a strategy. Fields can obviously hold state. Fields can also hold components of the containing object.

Parameters are the only other option besides fields, to pass state. The also have several use-cases. E.g. collecting parameters collect results of a method. Parameters can be optional with default values. Var args parameter can also help in some cases. If couple of parameters are passed together to a lot of methods, consider creating a parameter object from them.

Constants help reader to understand some magic numbers. E.g. Color.WHITE is read better than 0xFFFFFF.

If Beck struggles with a name of a variable, it is often a sign that the design should be updated.

Types should be declared as general as possible. But sometimes consistency can win, e.g. declaring a variable List instead of Collection for consistency reasons.

Variables can be initialized eagerly or lazily. Only if performance is an important factor during the initialization of a variable, the lazy initialized should be used.

Chapter 7 - Behavior

There are several patterns for writing behavior in Java.

Control flow tells you to express computations as sequences of steps.

Main flow means that you should have the main flow written as clear as possible. Use exceptions for exceptional cases.

Message allows you to express control flow with messages.

Choosing message allows us to create an extension point of the object. It can be used instead of switches.

Double dispatch is a choosing message with two dimensions.

Decomposing message is a way to split a complicated computation into smaller cohesive steps using methods.

Reversing message is extracting a method e.g. from a one-liner just to put all commands into one level of abstractions, therefore achieving symmetry. They often have parameters.

Inviting message is like an abstract method. If you want to create an extension point for the future, it can be useful.

Explaining message is a method created for explanatory reasons. They are typically one-liners revealing intentions.

Exceptional flow tells us to express the exceptional cases as clearly as possible without interfering with the main flow. The clarity of the main flow is more important.

Guard clause means we should have an early return from a method in case of exceptional case. It is much easier to read than the method with one big if. A variant of this pattern for cycles, is the continue statement.

Exception should be used only if the exceptional case is non-local. E.g. catching a disk error is on a completely different application layer than where it happened.

Checked exceptions should be used when you want to make sure the client of the code handles it.

Propagate exceptions tells us to wrap low-level exception to higher-level exceptions which are more meaningful to the reader. In an extreme example, consider client of the application reading the NullPointerException.

Chapter 8 - Methods

Dividing a program into methods has two main benefits. It helps readability and it provides reuse.

There are three main problems of methods - size, purpose and naming. If you create too many too small methods, readers will have a hard time connecting the bits. This is a contradiction to clean code's advice of sort.

Composed method is a method composed of calls to other methods, which have to be on the same level of abstraction. The question is what is the correct size of a method. Beck recommends relatively small methods, which can be easily comprehended. He divides logic into methods only after he has a program functioning. Sometimes he inlines all the methods and redoes the structuring it into method.

Methods should have intention-revealing names. E.g. Customer.find is better than Customer.linearCustomerSearch. Implementation details should not leak into a method name.

Method visibility should be as restrictive as possible at the beginning. Then you can slowly reveal what methods need to be revealed more. Beck doesn't use final methods. Static methods can be rarely used, mostly for construction.

Method object is one of favorite Beck's refactoring, although he uses it rarely. You have a complex method with many parameters, local variables and so on. It is hard to split into more methods, as they would have many parameters. So instead you create a class named after the method. You change all the parameters and locals into fields. Now it is easy to refactor the complex method into many neatly named methods. Sometimes you need to inline ugly split methods into one complex method before starting this refactoring.

We should override methods to communicate specialization.

We should overload methods to provide an alternative API for the same computation.

We should have the most generic method return types possible.

Comments are a costly way to communicate intention of methods. They are hard to maintain consistent with the code and there is no feedback if they are no longer consistent with the code. If method naming, simplicity and unit tests are not enough to document a method, only then pick a good comment to help readers understand it.

We need helper methods to create composed methods. But helper methods can be too small. Sometimes inlining a helper method can improve readability. You should play with method inlining and extracting when composing the code.

ToString() method should be written for programmers debugging a hard problem.

For a few simple conversions we should provide a conversion method on the source object which returns a converted instance. E.g. asString. For anything else than a simple conversion, we should create a conversion constructor on a target class which has the source class as the input.

We should express object creation clearly. We should always provide complete constructors, which create complete usable objects, without a further need to call setters or anything. If an object is really hard to create, we can provide intention-revealing static factory methods. We can use helper methods to create parts of an object, i.e. internal factories.

Collections should be accessed with specialized methods, such as addBook, bookCount, etc.

Booleans setting methods can be sometimes replaced with two methods, one for true and one for false, if that helps readability.

Boolean query methods should have good names, such as isXXX, hasXXX, etc. But this is usually a signal for moving the logic into the object, where it is invoked.

Equals and hashCode should be defined together.

We should use getters only occasionally. We should use setters rarely. Instead of using getters and setters, we should write more communicative OO code.

It is sometimes beneficial to create safe copies of getter output or setter input. But is shouldn't be overused. It is a symptom of a design problem of not using OOP correctly.

Chapter 9 - Collections

It is important to know collections well. They have a direct impact on readability, maintainability and performance of the code. Beck recommends to optimize for performance as little as possible and localize performance-related changes, because they can make code quality or flexibility downgrade.

Beck introduces basic interfaces of Java collections. Arrays are the most primitive fixed-size collections. They are the only non-library collection type in Java. They should be only considered for performance reasons. Iterables could be iterated. Collections have size, can be added to, removed from, and can be tested for containing an element. List is a ordered set of items with indexes. Set is a set of unique items without order. OrderedSet is ordered Set. Map is a collection of elements retrieved by a key.

Implementation-wise Beck introduces ArrayList, HashMap, HashSet, LinkedHashSet, and others. He introduces Collections utility class with its singleton, emptyList, and sort methods. We should almost never extend collections. We should rather use it and delegate to it if we need to simulate collection-like interface.

Chapter 10 - Evolving Frameworks

Frameworks violate the overarching rule for the previous implementation patterns. Cost of change is no longer cheaper than reading the code. We have to make changes that are backwards and ideally also forwards compatible. Backwards compatibility might add complexity to the code, but is often the only choice. If you must break the compatibility, you can do it gradually, with incremental releases.

Appendix A - Performance Measurement

This chapter provides examples of the patterns by presenting a performance measurement framework.