My notes about the Clean Code book
These are my personal notes about the Robert C. Martin's book Clean Code.
A lot of chapter deals with number of function arguments. Ideal number being zero, than 1, 2, and it should never be 3 or more. Why? Because arguments require a lot of conceptual power. They are also hard to test. Output arguments are even more hard to understand. How we can fight multi-parameter functions? Well, there are e.g. argument objects, which wrap arguments. We can extract a wrapper class for the argument which we would pass to every function. Or we can move an argument to a field instead of passing it between methods. Or we can even extract a class which takes the argument as input in constructor and delegate work to this class.
Chapter 2 - Meaningful names
The book starts with an advice that we should use intention-revealing names. E.g. elapsedTimeInDays vs d. Of course we should avoid disinformation and non-informative names. Uncle Bob then explains what he means by using meaningful distinctions. E.g. variables message and theMessage next to each other do not make sense. We should obviously use pronounceable names (generationTimestamp vs genymhdhms) and searchable names (e.g. MAX_CLASSES_PER_STUDENT is more searchable than 7). We shouldn't be cute (e.g. whack method). We should be consistent with names per concept. E.g. decide between fetch, retrieve and get and stick to it.
We should avoid encodings, as the modern programming languages have strong typing, making them unnecessary. E.g. prefix "I" before interfaces names is wrong. Better is no prefix for interfaces and some suffix for implementations, e.g. Impl. The client of our API is usually not interested in the implementations, but he is interested in the interfaces.
Class names should be nouns. Words like Manager, Data or Info should be avoided. Class names should not be verbs. Methods names should start with a verb.
A new advice for me was that we should prefer solution domain names in preference to problem domain names. E.g. accountVisitor (if visitor as a design pattern is used) is preferred to problem domain names.
Then the chapter talks about adding meaningful context to the names. E.g. add "address" prefix to variables if it makes code more readable. Even better is extract to them to Address class. On the other hand, we shouldn't add unnecessary context, such as "segment" prefix to all variables, if we are in the context of "Segmentation" class.
Chapter 3 - Functions
The first rule of functions is that they should be small. Ideally less than 4 lines long. This also means that if a function contains a loop or an if, the inner body of that should be probably only 1 line long and it would therefore probably be another function call, which has also documentatory value.
The 2nd rule is that functions should do one thing only. How to validate this? There are at least 2 ways. Either try to describe the function purpose with the TO paragraph. "TO getCustomerName we ask the cache of the customer names to retrieve the name". If the function lines are an abstraction level below given TO paragraph, we're doing something wrong. Another nice heuristic is that if we can extract another meaningful method from the method body, it does more than one thing. If a function has comments about its "sections", it does more than one thing.
Third rule of functions is to have one level of abstraction only. This means e.g. not to mix file system details with domain logic in one method.
Another interesting approach is a step-down rule. Every function should be followed by functions which it uses (one level of abstraction below). We want to read the list of functions as if it was TO paragraphs narrative.
The chapter follows by more concrete examples or tricks. Switch statements are evil, because they cannot do one thing only and they tend to repeat themselves across the code base. Therefore the only place where they make sense, is a factory method for a polymorphism mechanism, which replaces all their usages. One nice heuristic for evaluating a function quality is command query separation. The function should either do something, or return an information. It shouldn't do both. We should prefer exceptions to error codes, as they introduce bad client code. That said, we should extract try/catch blocks into own methods, since error handling is one thing.
Another interesting approach is a step-down rule. Every function should be followed by functions which it uses (one level of abstraction below). We want to read the list of functions as if it was TO paragraphs narrative.
The chapter follows by more concrete examples or tricks. Switch statements are evil, because they cannot do one thing only and they tend to repeat themselves across the code base. Therefore the only place where they make sense, is a factory method for a polymorphism mechanism, which replaces all their usages. One nice heuristic for evaluating a function quality is command query separation. The function should either do something, or return an information. It shouldn't do both. We should prefer exceptions to error codes, as they introduce bad client code. That said, we should extract try/catch blocks into own methods, since error handling is one thing.
A lot of chapter deals with number of function arguments. Ideal number being zero, than 1, 2, and it should never be 3 or more. Why? Because arguments require a lot of conceptual power. They are also hard to test. Output arguments are even more hard to understand. How we can fight multi-parameter functions? Well, there are e.g. argument objects, which wrap arguments. We can extract a wrapper class for the argument which we would pass to every function. Or we can move an argument to a field instead of passing it between methods. Or we can even extract a class which takes the argument as input in constructor and delegate work to this class.
Chapter 4 - Comments
The chapter starts with a surprising statement for me. Most of the comments are bad comments. They only make up for bad code. It is much better to rewrite the code to make comments unnecessary. We should really think twice before writing a comment. Could we improve the design to make the comment unneeded?
Then the chapter continues with comments which are not bad (but that's a minority):
- Legal comments.
- Informative comments, like a comment describing a long regular expression.
- Explanations of intent.
- Clarification comments, usually about the code you cannot change.
- Warning of consequences.
- TODO comments.
The most funny part of the chapter are examples of bad comments:
- Mumbling comments, where you have to examine the code to even understand what the author meant.
- Redundant comments, like "configuration settings" on the configurationSettings field of a class.
- Mandated comments, such as all methods must have JavaDoc, lead to the typical JavaDoc abominations.
- Journal comments when we have versioning systems.
- Noise comments, like "default constructor".
- Comments instead of variables or methods. Many comments can be replaced by a simple variable or a method.
- Obviously: Position makers, Closing bracket comments, Attributions and Bylines, Commented-out code.
- HTML comments.
- Too much information, e.g. historical or other non-relevant information.
- Inobvious comments (which themselves needs explanation), e.g. "start with an array that is big enough to hold all the pixels (plus filter bytes), and an extra 200 bytes for header info" from apache commons.
- Function headers, JavaDoc for non-public stuff.
Chapter 5 - Formatting
I liked the funny explanation in the importance of formatting. Formatting is important. By now (meaning "in chapter 5") we should know that functionality of the code is not as important as its readability. While functionality will change a lot, code readability will hunt us for a long time.
He starts with vertical formatting. The good heuristic is that the files should be around 200 lines of code long, and almost never over 500 lines of code. We should think of a class as of a newspaper article. Most important stuff should be at the top and less important at the bottom. We should immediately know if we are at the right place by just reading the class name. We should use whitespace to separate concepts visually. We should avoid whitespace to imply close association. Methods that correlate should be kept vertically together. Local variables should be declared as close to place they are first used as possible. Instance variables should be at the class beginning. Methods which are called from other methods should be below them, to be read naturally, as a newspaper article.
Then the chapter continues with horizontal formatting. Uncle Bob likes to set his personal line length limit to 120 characters. I didn't like the rest of this part, as it seemed obvious. Indentation helps reading what is more nested, and so on. I found it funny they had a team meeting, where they decided all the formatting rules in 10 minutes. Our team is certainly not so fast :)
Chapter 6 - Objects and Data Structures
Uncle Bob first explains the real purpose of encapsulation. Hiding the fields is about abstractions. A class should not publish its properties via getters and setters. It should rather design an interface to manipulate with the essence of the data without disclosing the implementation.
The chapter then compares data structures (including POJOs) with objects. Procedural code (i.e. code using data structures) makes it easy to add functions without changing the data structures. OOP makes it easy to add new objects without changing the functions. Both has disadvantage where the other has advantage. Then there are hybrids, when a programmer tries to add logic to a data structure. These have all the disadvantages, and should be avoided.
A lot of space is for Law of Demeter. In short, we should only talk to friends, not strangers. E.g. code context.getOptions().getScratchDir().getAbsolutePath() violates Law of Demeter, but only if the objects returned are not all data structures, which expose their internal structure by design. It is usually good to split up the calls to 3 calls of a getter each on a separate line. If the options or scratchDir were not data structures, but OO objects, we should refactor the code. The first idea, to create a complex getter on context, is not very good, because it would be soon too big. We should think more and pick a correct design that simplifies the usage and hides the implementation details.
The chapter then compares data structures (including POJOs) with objects. Procedural code (i.e. code using data structures) makes it easy to add functions without changing the data structures. OOP makes it easy to add new objects without changing the functions. Both has disadvantage where the other has advantage. Then there are hybrids, when a programmer tries to add logic to a data structure. These have all the disadvantages, and should be avoided.
A lot of space is for Law of Demeter. In short, we should only talk to friends, not strangers. E.g. code context.getOptions().getScratchDir().getAbsolutePath() violates Law of Demeter, but only if the objects returned are not all data structures, which expose their internal structure by design. It is usually good to split up the calls to 3 calls of a getter each on a separate line. If the options or scratchDir were not data structures, but OO objects, we should refactor the code. The first idea, to create a complex getter on context, is not very good, because it would be soon too big. We should think more and pick a correct design that simplifies the usage and hides the implementation details.
Chapter 7 - Error handling
The chapter starts with obvious - we should use exceptions in favor of error codes. Another obvious note is about checked exceptions - we basically shouldnt use them (unless we re developing special (e.g. critical) software). The first surprising thing for me was a recommendation to write unit test for exceptional cases first. This way it will force us to prepare the transactionality of the function right from the start.
Another good advice is to create just one class for exceptions incoming from one part of the system. We can distinguish between errors by context provided in the exception. We should only create multiple types of exceptions only if we really intent to have places where we catch just one, but not another kinds of exceptions.
Last good advice, is not to check for nulls, or pass nulls, but use a special case refactoring instead.
Another good advice is to create just one class for exceptions incoming from one part of the system. We can distinguish between errors by context provided in the exception. We should only create multiple types of exceptions only if we really intent to have places where we catch just one, but not another kinds of exceptions.
Last good advice, is not to check for nulls, or pass nulls, but use a special case refactoring instead.
Chapter 8 - Boundaries
The chapter starts with a naturally different motivations between providers and consumers of a 3rd party code. Providers want to offer the library to biggest possible audience, while consumers would like the solution to be customized for their needs. Therefore 3rd party code is a problem worth noticing.
We can start exploring 3rd party code by automated tests, which will test it using our intended usage. This way we will not only learn about it, but also have a suite of tests for future upgrades of the library.
The way to manage 3rd party code is to limit the dependent code to minimum. We can use wrappers around the library. Or we can create an ideal interface for us and an adapter for the 3rd party library. The adapter is very good for yet-unknown dependencies.
Chapter 9 - Unit Tests
The chapter about unit tests is just a basic introduction to them. Author explains the three laws of TDD:- Write test code first.
- Write just so much test code that it fails.
- Write just so much production code that the failing tests pass.
The big part of the chapter is about a premise that the tests should be clean. They should not be a second-class citizen when it comes to code quality. Dirty tests are even more expensive than no tests.
A nice rule to follow when writing unit tests is one assert per unit test. Also one concept per test.
Chapter 10 - Classes
This chapter is mostly about two things - high cohesion and low coupling. It starts with Single Responsibility Principle. The class should have exactly one responsibility, i.e. one reason to change. We should builds systems with many small classes which communicate with small amount of other classes.
A class should have small amount of instance variables. Each method should manipulate at least one of the instance variables. The more variables the method uses, the more cohesive it is to the class. When a class looses cohesion, we should split it into two classes.
We should wire classes to interfaces, not concrete implementation. This way, we not only gain low coupling between classes, but we also enable dependency injection pattern.
Chapter 11 - Systems
This chapter was one of the most boring ones. Maybe because the fight between Spring and EJBs seems to be over and Spring has won. So he chapter only states the obvious:
- Architecture should be non-invasive.
- We should write POJOs only.
- If we have cross-cutting concerns, there are aspects for that.
Whole chapter fells like just an introduction to the Spring Framework.
Chapter 12 - Emergence
One of the shortest chapters. It only presents 4 rules of simple design by Kent Beck. In order of importance a design is simple if it:
- Runs all tests (i.e. is covered by automated tests).
- Has no duplication.
- Expresses the intent.
- Minimizes the number of classes and methods.
Then the chapter goes on explaining each of the rules. I would highlight two advises. The most important way to by expressive is to try. Most people don't. High number of classes and methods are sometimes result of pointless dogmatic thinking, such as making an interface for every class.
Chapter 13 - Concurrency
Writing clean concurrent programs is hard. Concurrency helps us decouple what gets done from when it gets done. This can dramatically improve throughput and sometimes even structure of a system. While throughput improvement is an obvious motivation, structural improvement is not so obvious. The system will start to look like many small computers talking to each other, which is supposed to help separate concerns. My experience is that the main (and almost only) motivation for concurrent code is the throughput.
Then there are tips.
- We should keep concurrency-related code separate from other code (i.e. concurrency is one responsibility).
- We should limit access to shared data.
- Threads should be as independent as possible. Ideally working with local variables only.
- We should know our concurrency libraries.
- We should understand our algorithm models, such as produced-consumer, readers-writers and dining philosophers.
- Beware dependencies between synchronized methods.
- Keep the synchronized sections as small as possible.
- Think about the shut-down process early, if it is important for you to shut down correctly. It is hard to do correctly.
- Don't ignore one-time system failures, as they can be the concurrency-related issues.
- Make your threading code pluggable/configurable. E.g. count of threads.
- Test your code with more threads than processors and on different platforms.
- Use automated tool to instrument your code with jiggles, which try to mess up the concurrency to make bugs more visible.
Chapter 14 - Successive Refinement
This chapter is one of the longest ones so far because of lots of code in it. Although the useful information is scarce here, I liked to read the example and how Uncle Bob thinks. The most helpful thing I learned here was how he does refactoring. He never breaks tests. When he wants to introduce more generic parsing of a boolean or string arguments, he doesn't immediately replace the old variables with new types. Instead he places new implementation next to the old one, just to keep tests passing all the time. Only after all the implementation was changed to new way, the old instances got removed.
Chapter 15 - JUnit Internals
For me, this chapter was one of the best chapters of the book. It was an example of code improvement. The source code was of JUnit by Kent Beck, which means the code was already better than most of the code we deal with each day. It had 100% code coverage, although probably line coverage only. Uncle Bob claims to have found a useless IF there, which means that branch coverage couldn't not have been 100%.
Chapter 16 - Refactoring SerialDate
In this chapter, Uncle Bob took a random open-source project and refactored it to be cleaner. Same as in chapter 14, all changes that he made were made while unit tests were passing. There were many small refactorings done in this chapter. It should have been good, but the form was not good for me. I always had to jump back and forth to the code in the appendix, so it was too long for me. It did not read well.
Chapter 17 - Smells and Heuristics
This chapter is basically a reference of code smells and heuristics. It is partially redundant with the rest of the book, but it is still very useful.
Comments
- Inappropriate information should not be in the comments. This includes historical information and so on (mostly metadata info).
- Obsolete comments should be deleted.
- Redundant comments like exact repeating of what code does are useless.
- Poorly-written comments. We should always think how to write a comment, if we must write one.
- Commented-out code should be removed. We have source control systems for keeping history.
Environment
- Build requires more than one step is a smell. It should be possible to do with one command, or equivalent in IDE.
- Tests require more than one step. It should be possible to run them all in one step.
Functions
- Too many arguments are hard to understand.
- Output arguments are even harder to understand.
- Flag arguments usually mean that function does not do one thing.
- Dead function is useless and should be removed.
General
- Multiple languages in one source file, like HTML and Java, Javadoc, English, and so on. We should limit the count as much as possible.
- Obvious behavior is not implemented, like not implementing stringToDay(dayName) for "Monday". Not following the principle of least surprise.
- Incorrect behavior at the boundaries means that the implementation at boundaries is not done, or not tested.
- Overriden safeties like turned off warnings is also a bad sign.
- Duplication is what half of this book, or half of GOF patterns are about.
- Code at a wrong level of abstraction is quite common and quite hard to get rid of.
- Base classes depending on their derivatives is obviously wrong (although it has exceptions).
- Too much information exposed, e.g. bloated interface for no reason.
- Dead code, i.e. code that can never be executed, should be removed. There are tools for finding it.
- Vertical separation means that e.g. related functions are not vertically not close enough.
- Inconsistency is also about principle of least surprise. Do similar things in a similar manner.
- Clutter like default constructor with no body, should be removed.
- Artificial coupling, like general enums being inner classes of non-related classes, so other classes have to know about those classes too.
- Feature envy is kind of similar to Law of Demeter, but rephrased in a way, that if methods of a class heavily manipulate data of the other class, the original class envies features of the other class and the methods belong there. Sometimes there is no other way.
- Selector arguments is a redundant smell to flag arguments with a note that this applies to not only booleans, but also enums, integers and so on.
- Obscured intent by e.g. Hungarian notation is not necessary.
- Misplaced responsibility like when function names do not tell what functions really do.
- Inappropriate static methods should sometimes be instance methods. We should only create static functions if we are sure we don't need polymorphism for them.
- Heuristic: Use explanatory variables is the first heuristic (in contrast with smells so far). We should use them.
- Heuristic: Function names should say what they do, duh.
- Heuristic: Understand the algorithm we should. We should not skip understanding the other people's code.
- Heuristic: Make logical dependencies physical, e.g. don't duplicate a constant, but create a getter for it to get it in another class.
- Heuristic: Prefer polyorphism to if/else or switch/case statements.
- Heuristic: Follow standard conventions of your platform/language/etc.
- Heuristic: Replace magic numbers with named constants.
- Heuristic: Be precise, e.g. don't use floats for money.
- Heuristic: Structure over convention means we should prefer ways to force structure e.g. by polymorphism over relying on other programmers copying switch/case statements the same way as we designed them.
- Heuristic: Encapsulate conditionals so they are better understood.
- Heuristic: Avoid negative conditionals as they are harder to understand.
- Heuristic: Functions should do one thing.
- Hidden temporal couplings should be made explicit by e.g. returning a type from a method which second method accepts.
- Heuristic: Don't be arbitrary means we should have a reason for how we design our code.
- Heuristic: Encapsulate boundary conditions means that we should even encapsulate +1s if they are spread over our code.
- Heuristic: Functions should have statements on only 1 level of abstraction.
- Heuristic: Keep configurable data at high levels means that e.g. constants should be in classes high in the namespaces hierarchy.
- Heuristic: Avoid transitive navigation is the law of Demeter. We should not call to deep levels of the classes we use.
Java
- Use wildcards when importing to save space.
- Don't inherit constants from an interface, use a static import for that.
- Use enums instead of some constants.
Names
- Use descriptive names, e.g. variable name "a" is bad.
- Choose names at appropriate level of abstraction, e.g. a Modem interface should have a connect method instead of a dial method.
- Use standard nomenclature where possible, e.g. a "Decorator" in a class name which is a decorator.
- Unambiguous names, e.g. no rename, doRename and renameInternal.
- Use long names for long scopes, e.g. "i" is OK for short scope of a for-cycle, but longer the scope of a variable/method is, the longer its name should be.
- Avoid encodings, like Hungarian encoding. They are obsolete these days.
- Names should describe side-effects, e.g. createOrReturnObjectOutputStream.
Tests
- Insufficient tests means the test coverage is too low.
- Use a coverage tool, otherwise you never know.
- Don't skip trivial tests, they have documentary value.
- Ignored tests can be used to question requirements.
- Test boundary conditions, obviously.
- Exhaustively test near bugs, as they tend to congregate.
- Patterns of failure are revealing means that sometimes failures in tests lead you to a bug fix.
- Test coverage patters can be revealing means that the code which is not executed by passing tests might contain the reason why the failing tests are failing.
- Tests should be fast, otherwise you will unlikely run them often.
Appendix A - Concurrency II
This chapter was supposed to go deeper into concurrency problems, but I still think it just scratched the surface. He restated that concurrency could improve throughput. He also repeated that concurrency-related code should be kept separate from other code, because of the Single Responsibility Principle. Then there was a nice example using bytecode of the code that is not atomic (e.g. ++i). We should generally be aware which code is shared and which parts of it are atomic and which are not. Then he introduced Java libraries for concurrency, such as AtomicInteger or ConcurrentHashMap or Executor. He continued with example of why Iterator is inherently not thread safe (because temporal dependencies between its methods). We must create an adapter for it if we want to share it between threads. Then he explained a deadlock and 4 possible ways to prevent it:
- Avoid mutual exclusion, e.g. by increasing the number of resources so there are more of them than threads.
- Break Lock and Wait, so don't wait. Release all locks if you need a resource which is locked. This could lead to starvation or a livelock.
- Break Preemption is similar to the previous solution, but it allows some waiting. It would contain a strategy for requests for resources between threads.
- Breaking Circular Wait is the most common approach, which requires the resources to be locked in agreed order by each thread. This also has disadvantages, because this could lock a not-yet-needed resource too early, or it could be plain impossible if the 2nd resource to be locked depends on a result from the 1st locked resource.
The last part of the chapter repeated that there are test libraries for instrumenting production code so that it fails on concurrency problem much more frequently.
Comments