Tuesday, October 31, 2017

Notes about the Clean Architecture book

The first look into the contents of the book was a bit scary. Some of these things were already explained in Robert Martin's previous books. Then I found out they are explained from a different (software architect's) perspective. Overall, I liked this book better than Clean Coder, but it was less useful for me than Clean Code book of the same author.

Part I - Introduction

The foreword contained two useful antipatterns:
  • Too authoritative and rigid architecture.
  • Speculative generality in software architecture.
The main point of the preface is that not much changed for the last 50 years in software architecture. The rules of software architecture are the same regardless of any variable (time, type of project, ...).

When you get the architecture of the software right, you magically don't need horde of programmers maintaining it.

Chapter 1 - What is Design and Architecture?

There is no difference between software design and architecture. There is a continuum of decisions from highest to lowest levels. They both are part of it.

The goal of architecture is to minimize the amount of personnel to develop and maintain a software system.

A typical project where architecture doesn't matter starts alright. The first couple of releases are fine and everybody is productive. But as requirements add up, developers try to fit new requirements into current project, which is shaped differently. The jigsaw starts to rot and the productivity falls. It is common that the changes done at the end of the project cost 40 times more than changes in first 2-3 releases.

What went wrong? The slow and steady wins the race. Developers today live in 3 big lies:
  1. Myth: We just need to get to market first and we clean it up later. (it is not going to happen)
  2. Myth: Messy code makes us faster today and slows us later. Myth: We can switch mode from making messes to cleaning messes. (making messes is always slower, even in very short term as shown in example of a simple kata)
  3. Myth: Starting from scratch is a solution. (the same mess will be the result)

Chapter 2 - A Tale of two Values

The first value of software is its behavior. Many developers think it's the only value. The second value is the architecture. It has far greater value, while being overlooked by both programmers and managers.

The behavior is urgent but not important. The architecture is important but not urgent. The dilemma of software developers is that business managers are not equipped to value architecture. That's what software engineers were hired for.

Software architects focus more on structure of the project than on particular features/functions. They make those new features easier to write, modify, extend and to maintain. They should fight for keeping the project clean to prevent it being impossible to change.

Part II - Programming Paradigms

Chapter 3 - Paradigm Overview

There are three main programming paradigms:
  • Structured programming taught us to separate code by functions and it is core of our algorithms. It does so by preventing the use of goto.
  • Object-oriented programming taught us to manage dependencies between modules of code with the use of polymorphism. It prevents us from using function pointers.
  • Functional programming taught us discipline how to access data (immutability). It does so by preventing variable assignment.
All these paradigms were invented during 10 years between 1958 and 1968 (BTW in reverse order) and no new paradigm appeared since then. 

Chapter 4 - Structured Programming

Djikstra found out that certain uses of goto statements prevent decomposition of algorithms into smaller problems. I.e. goto prevents divide and conquer approach to proofing of algorithm correctness. 

Other uses of goto didn't have this problem. Djikstra found that the non-problematic use of goto is corresponding to selection and iteration statements (if and while). 

Bohm and Jacopini proved that all programs can be programmed with just selection, iteration and sequence. This was a remarkable coincidence. All programs can be programmed with the same tools which enable us to prove their correctness. 

Djikstra wrote "Go To Statements Considered Harmful", his famous paper from 1968, so the structured programming was born. Today, we are all structured programmers, although not necessarily by choice. We have no other option (of unconstrained goto jumps).

We still use the divide-and-conquer approach when decomposing complex problems into simple methods by using what is called functional decomposition.

Djikstra's dream of provable programs never came to reality. Instead, informatics leaned to scientific method. The programs are not provable, but they are falsibible, i.e. we can only prove they are wrong. We do this using tests. We decompose programs into simple testable parts using functional decomposition, which we prove using tests.

Chapter 5 - Object-Oriented Programming

Plugin architecture was invented to protect software from coupling to IO devices. Even though the idea is old, the programmers didn't extend it to their own programs, because using function pointers was dangerous. OOP allows plugin architecture to be used anywhere, for anything.

By using dependency inversion, we can have source code dependencies point in the opposite direction to the flow of control. This can be implemented by using interfaces. The code in lower layer can depend on interface, which higher layer will implement. Even though lower layer controls the flow in this case, the dependency points from higher layer to lower layer. This has profound effect on software design. Any dependency wherever it is, can be inverted. We can module the software dependencies however we want. E.g. the database and the UI can depend on (pure) business rules.

So the most important trait of OOP is polymorphism. It allows us to make low-level details (like database or UI) depend on high-level policies, like business rules. Low-level details can be then developed independently.

Chapter 6 - Functional Programming

Variables in functional programming languages do not vary. All the problems that we face in concurrent programming cannot happen if we use immutable variables. And yes, immutability is practicable, even if we make couple of compromises:

Segregation of mutability tells us to put as much processing logic as possible to immutable components, to drive code out of the components that must allow mutation. 

Event sourcing is a strategy, when we store e.g. all the bank transactions but not the account balance. When state is required, we will re-apply all transactions and re-compute the current account balance. As a consequence, we implemented just the CR part of the CRUD.

Part III - Design Principles

This is not an introduction to the SOLID principles, but rather an architectural review on them.

Chapter 7 - Single Responsibility Principle

The module should be responsible to one and only one actor. Symptoms of this rule violations are:
  1. Accidental duplication - when Employee class has 3 methods, which are responsible to COO, CFO and CTO of the company respectively. The solution to this problem is to separate code e.g. inside calculator classes responsible to their respective manager. We can even keep the most reasonable computation in the Employee if that makes sense.
  2. Merges - when 2 teams have merge conflicts, it usually means they are changing one component for different reasons.

Chapter 8 - Open-Closed Principle

If component A should be protected from changes in component B, than component B must depend on component A. This is how OCP works at the architectural level. Architects organize components into a topology, where higher level components are protected from the changes in lower level components.

Chapter 9 - Liskov Substitution Principle

LSP violation example - Square is not a good subclass of a Rectangle, because its sides don't change independently. The LSP should be applied at the architectural level, otherwise the system becomes polluted with extra mechanisms.

Chapter 10 - Interface Segregation Principle

The ISP on the architectural level means - don't depend on modules which contain something you don't need, or you will face unexpected issues.

Chapter 11 - Dependency Inversion Principle

  • Don't refer to volatile concrete classes. Refer to abstract interfaces instead. Also don't create volatile objects by yourself, but delegate to abstract factories instead.
  • Don't derive from volatile concrete classes. Inheritance is the strongest bond and should be used with greatest care.
  • Don't override concrete functions. 
  • In fact, don't ever mention anything that is volatile and concrete.
The DI is used in all the following chapter. It is the main mechanism of separating the modules.

Pat IV - Component Principles

Chapter 12 - Components

This chapter is just a history lesson on how did we get to the dynamically linked files and that it enables plugin architecture.

Chapter 13 - Component Cohesion

The chapter discusses 3 principles of component cohesion.

REP - Reuse/Release Equivalence principle

The granule of reuse is the granule of release. The classes and modules which belong to a component must form a cohesive group. There must be an overarching theme that these classes share. The parts of a component should be releasable together. This is kind of a weak advice - the component should "make sense". However, it is still important, because violations are quite easy to detect. They don't make sense.

CCP - Common Closure Principle

Those classes which change for the same reason at the same times should be in one component. Those which change for different reasons at different times should be split into separate components.

CRP - Common Reuse Principle

Don't force users of a component to depend on things they don't need. We want to make sure we put only those classes to a component, which are inseparable from each other.

These 3 principles form a triangle, similar to scope, time and money triangle. You can be really good at 2 of those principles, but then you will lack the third. When projects start, they tend to sacrifice the REP. Developability wins over reusability. Then, as projects mature, they will lean slowly towards the REP. The component cohesion changes with time, from developability towards reusability.

Chapter 14 - Component Coupling

The next 3 chapters are about tension between developability and good design.

ADP - Acyclic Dependency Principle

Allow no cycles in component dependency graph. If you have any cycle between, say 3 components, these 3 components have effectively became one large component. There are at least 2 ways of breaking the cycle:
  1. Apply the dependency inversion principle, i.e. introduce interfaces to invert the dependency.
  2. Create a new component which both problematic components will depend on.
Component dependency diagrams have very little to do with the function of the application. Instead, they are mapped to the buildability and maintainability of the system. Component dependency structure grows and evolves with the logical design of the application.

SDP - Stable Dependency Principle

Depend in direction of stability. Instability of a component can be measured. There are fan-in and fan-out metrics for a component, which count the incoming and outgoing dependent classes. Then instability (I) of a component is fan-out / (fan-in + fan-out). SDP says that that the I metric of a component should be larger than the I metric of components it depends on. We can fix the violations of SDP the same 2 ways as we fix violations of ADP (DIP or a new component). If we choose to create a new component, it will likely contain interfaces only. This is quite common in Java or C#.

SAP - Stable Abstraction Principle

Component should be as abstract as it is stable. Again, some metrics. Abstraction of a component (A) is # abstract classes / # concrete classes. We generally want the A metric be opposite to the I metric of a component. There are 2 extreme violations of this principle:
  1. Zone of Pain, where you have stable component with concrete classes. Such components are too rigid. Example would be a database schema. It is harmful because databases are volatile. A harmless example of component in Zone of Pain, is a String, because it is not likely to be changed.
  2. Zone of Uselessness, where you have abstract components with no dependents. Such components are simply unused.
Where you want your components to sit, is called the Main Sequence. You can measure how far are you from it D = |A+I-1|. Therefore a statistical analysis of design is possible. You can plot a graph of your components and focus on those which are too far from the Main Sequence. 

Part V - Architecture

Chapter 15 - What is Architecture?

First of all, a software architect is a programmer. He continues to take programming tasks while guiding the rest of the team toward a design that maximizes productivity. His design strategy is to leave as many options open as possible. Good architecture makes the application easy to develop, maintain and deploy. The ultimate goal is to minimize the total cost of the system and maximize programmer productivity.

If 5 teams are developing a system and no other factors are involved, the system will likely be split into 5 components - one for each team.

Architects should always consider deployment issues early on. E.g. it might not be wise to start with micro-services.

The impact of architecture on operations is less dramatic than on development, deployability and maintenance. Almost any issue can be solved by throwing more hardware on it.

Of all aspects of a software system, maintenance is the most costly. The primary costs are spelunking and risk:
  • Spelunking is the digging process, trying to find the best place and strategy to introduce even the simplest feature or fix a bug.
  • While making those changes, the likelihood of creating additional defects is always there, therefore adding the cost of risk.
What are the options architects need to keep open? They are details that don't matter, e.g. database, web server, REST, SOA, microservices, or dependency injection framework. But what if your company already made a commitment to certain database? A good architect pretends than no such decision has been made. He maximizes number of decisions not made.

Chapter 16 - Independence

A shopping cart application with a good architecture will look like a shopping card application. More on this will be in chapter 21.

Decision of communication protocol between the components is a decision that a good architect leaves open. An architecture that maintains the proper isolation of its components and doesn't assume any communication means between them, will be much easier to transition to a different communication stack, as the operational needs change for the system.

Conway's Law: Any organization that designs a system will produce a design whose structure is a copy of the organization's communication structure.
The structure is composed of well-isolated, independently deployable components.

A good architecture enables system to be deployed immediately after it is build, ideally with one button, or automatically.

We can decouple components 2 ways:

  • Decoupling by horizontal layers leaves us e.g. with a components for UI, business rules and a database.
  • Decoupling use cases are vertical slices of the software in order to separate use cases development.
The pattern here is to decouple things which change for different reasons. The decoupling mode which we choose might also help operations. But to take advantage of operational benefit, we must split the components to very separate services / microservices. A good architect leaves options open and the decoupling mode is one of those options.

As long as the layers and use cases are decoupled, the software will support teams being separated in any organizational way, e.g. feature teams, database teams, ...

Architects often fall into a trap of fearing duplication too much. When we find a truly duplicate code, we are honor-bound to remove it. But there are 2 kinds of duplication. A true duplication is when every change in one instance requires a change in the other instance. But if 2 sections of code are similar, but change for different reasons at different times - then they are not true duplicates.

Uncle Bob's preference is to decouple components enough so that micro-services could be physically separated, but still keep them in one address space as long as possible. The problem with starting with microservices is that they probably won't be fine-grained enough.

A good architecture will allow a system to be born as a monolith, then be split to multiple modules and then eventually be split into different services. The process could be even reversed in later stages of a project.

The point is that decoupling mode will probably change in project's lifetime, so the architecture's role is to be prepared for such change.

Chapter 17 - Boundaries: Drawing Lines

The point here is to draw a boundary around the low-level details of the system, such as a database. Example of FitNesse project shows that this allowed Uncle Bob to ditch the database completely and have everything in plain-text files instead.

If we put the database behind an interface (applying DIP), we can draw a boundary line just below the interface, so that the database is replaceable. The database (low-level detail) should indeed depend on business rules (higher-level abstractions). Both GUI and DB should be "plugins" to the business rules. We should recognize this as an application of DIP to allow Stable Abstractions Principle.

Chapter 18 - Boundary Anatomy

There are 4 ways of implementing a boundary:
  1. Monolith - everything is statically linked into a single executable, but maintaining good modularization (boundaries) is still very beneficial here. Modules can be developed independently. We use DIP to make the correct dependency flow (low-level details depend on high-level abstractions). Communications between the components are very fast (they're just method calls) and they tend to be chatty. 
  2. DLLs - the same thing, but the monolith is split into multiple components / libraries (e.g. JARs).
  3. Processes - these are system processes, which is much stronger separation than the previous two. Communication is mort costly, so it is kept to minimum.
  4. Services - communication is even more costly because it all happens over network. Lower level services are plugins to higher level services.
Most systems except monoliths use more than one strategy. E.g. each service could be a set of JARs inside one WAR. This means that a system is typically divided into boundaries which are chatty and boundaries which are more concerned with latency.

Chapter 19 - Policy and Level

A strict definition of a level is the distance from the inputs and outputs. This chapter is another restatement that high-level policies should not depend on low-level details. We can flip the dependency flow using DIP.

Chapter 20 - Business Rules

This chapter sets some of the naming concepts for the Clean Architecture (chapter 22). 

Critical Business Rules are the business rules which would exist even if there was no computer system to automate things. They usually require some data, these would be the Critical Business Data. They are embodied in a software system in Entities.

Use Cases are software-application-specific business rules, which describe how the system is used. They should not include how the system appears to the user. This is too much detail for the use cases. Note that the Entities don't know that Use Cases exist, but Use Cases actively use Entities. The dependency flow goes from Use Cases to Entities.

We might be tempted to use Entities as our request/response objects, because they share so much data. We should avoid this temptation. They will change at different times for different reasons.

The Use Cases should be independent on the UI or the database. They are the core value of our system. Use cases should be the most reusable component in the system.

Chapter 21 - Screaming Architecture

When new programmers come to see your health-care system source code repository, they should immediately see it and say: "Oh, this is a health-care system." The architecture should "scream" health-care on them.

The fact that the system is delivered via the the web is a detail and should be treated as such. The same applies to frameworks. They should not dominate the system. Look at the frameworks with jaded eye. Are they worth the cost of marrying them? Can you protect yourself from them?

Chapter 22 - The Clean Architecture

The best description of the Clean Architecture is from the Uncle Bob himself and is available for free on his former employer's website. He found that all modern architectures share these traits:
  • Independent of frameworks. Frameworks are just a detail, a plugin to the core of the system.
  • Testable. If the core of the system is independent, it can be fully covered by tests.
  • Independent of UI. UI is just a detail, a plugin to the core of the system.
  • Independent of DB. Database is just a detail, a plugin to the core of the system.
  • Independent of any external agency.
So he presents a layers (which he calls circles) from bottom to top:
  1. Entities are enterprise-wide business rules. They are very unlikely to change.
  2. Use cases are application-specific business rules. They might change because of e.g. change requests to the application.
  3. Interface adapters convert formats from one to another. They do it in the way that is beneficial for the lower layers.
  4. Drivers and Frameworks are the details. There is usually not much code in this layer. E.g. just a glue code.

Chapter 23 - Presenters and Humble Objects

The Humble Object pattern is useful for testability of a system. We separate hard-to-test code to a separate object, which contains nothing else. That's why it is called humble. The rest of the system is testable.

We want our Views to be humble. So everything that is displayed on the screen, is represented as a string or a boolean, or something similarly simple. We don't want to format dates in the views. We don't want any logic in our views.

The use of humble objects at architectural boundaries greatly increases the testability of the system.

Chapter 24 - Partial Boundaries

This chapter presents 3 strategies for implementing simpler boundaries. Each of them has less bureaucracy than the full-fledged solution. There are of course many other strategies for this. Architects decide whether a real boundary is needed or such partial solution would suffice.
  1. Skip the last step. We will carefully prepare the boundaries so that the code could be split into different components. But then we will let it live in a single component.
  2. Even lighter is one-directional boundary. We will separate dependencies using interfaces. Nothing prevents bypassing interfaces and using dependencies directly.
  3. Even lighter is a facade pattern. Client code will use a facade, which has direct dependendencies on implementation classes. Therefore client code will transitively depend on implementation classes.

Chapter 25 - Layers and Boundaries

This chapter is similar to the previous one. Full-fledged boundaries come at a cost. This cost has to be weighted. Architects don't want to over-engineer, as it is much worse than under-engineering. Sometimes, a boundary must be created. Sometimes, partial boundary will suffice. Sometimes a need for boundary should be ignored (as he presents in this chapter on 200-lines of code game). However, this is not a one-time decision. It takes a watchful eye, a repeated weighing of pros and cons.

Chapter 26 - The Main Component

The main component is the lowest-level component. The ultimate detail. Think of main as a configuration plugin to the application. It contains the basic setup and configuration. It is a plugin, so we can have one for production environment, and other for development environment. Or different mains for different countries to deploy to.

Chapter 27 - Services: Great and Small

Services (or microservices) which physically separate business functions of an application are not much more than an expensive function calls. They are also not necessarily architecturally important.

Is strong decoupling really an advantage of microservices? Services share no variables, that's for sure. But they are coupled by the data they share - e.g. requests and responses. If one request needs to add a field to it, it might cause change in multiple services. So this benefit is an illusion.

Is independent developability really an advantage of microservices? First, history has shown that big scalable systems can be developed either as a monolith, or as component-based system, or as services. Second, if strong decoupling is a myth, than a simple change can cause all of the services to change, so there is no real independence of the teams.

Services are nice mechanism for scalability and developability. However, they are not architecturally significant. They are just a communication mechanism between system boundaries. The architecture is drawn by these boundaries. In many cases, client and a service are so coupled, there is no architectural significance whatsoever.

Chapter 28 - The Test Boundary

Tests are part of the system.

If tests depend on very volatile things, like GUIs, they tend to be fragile. Generally, Fragile Tests Problem is when a simple change can cause hundreds or thousands of tests to fail. Such tests make the system rigid. The developers are afraid to make necessary design changes, if any change breaks so many tests.

Well designed test suite has its own API. This API contains all the necessary code to bypass e.g. security rules, or whatever prevents testing. It will often be a superset of interactors and interface adapters used by UI. The purpose of the API is to decouple tests structure from the system structure.

Imagine a system, where every production class or even method has a corresponding test. Such system also becomes rigid. Each refactoring breaks many tests. The testing API should hide the system structure from the tests.

Chapter 29 - Clean Embedded Architecture

We need more software and less firmware. If the only way to test your software is on the target hardware, then the target hardware will become the test bottleneck. There is nothing that keeps us from polluting all the code with hardware-specific code. Software and firmware shouldn't intermit. It is an antipattern and such code will resist changes. Hardware is a detail. That's why we have HAL (hardware abstraction layer) which software can use and which abstracts the underlying hardware. You have to be agnostic to the OS details in your software. OSAL is a thing.

The app-titude test:
  1. First make it work.
  2. Then make it right.
  3. Then make it fast.

Part V - Details

Chapter 30 - Database is a Detail

Why do we have disks and databases? How would we programmers store our data structures if there were no disks and we would have to store them in RAM? We wouldn't organize them to database tables and access them through SQL. We would store them as objects, because that's what we do. That's why from the architectural viewpoint, we shouldn't care about the storage details of our objects. It is just a detail.

Don't underestimate vendors marketing. In late 1980s, every company had to have RDBMS. Today words "enterprise" or "SOA" are more marketing than reality.

To sum it up, the data model is important. The database is just a technology, a mechanism, of storing it to disks. The database is a detail.

Chapter 31 - The Web is a Detail

Abstraction of GUI as a plugin is not easy. It will likely take several iterations to get it right. However, it is often worth it. The way to do it, is to have use cases independent on the GUI.

Chapter 32 - Frameworks are Details

Framework authors created frameworks to solve their problems. Not yours. And still, they try to persuade you to couple your system to their framework as tightly as possible. It is one-directional marriage. You take on all the risks and burden and the framework takes on nothing at all:
  • Architecture of the framework might not be very clean. And once it is in, the framework is not going out.
  • Framework might help you boost up the start of the project. But as time passes, you might outgrow it and you start fighting the framework more than it is worth.
  • The framework might evolve in a direction you might not like.
  • There could be a better framework, which you would like to use instead, but since you married this one, you can't.
The solution? Don't marry the framework. You can still use it, just don't couple to it. E.g. you shouldn't have @Autowired annotations all over your business classes. A better place for Spring IOC is the Main component, because the wiring of classes is the lowest level detail.

Of course, you must marry some frameworks. E.g. standard library. But it still should be a decision. It is not a commitment to be entered lightly. 

Chapter 33 - Case Study: Video Sales

Uncle Bob presents the architecture of his cleancoders website for selling videos. First, he does a use case analysis. Four actors come out of it. Then he creates an architecture: For every actor, there will be corresponding Views, Presenters, Interactors (Use Cases) and Controllers. These 4 layers will be separated by architectural boundaries. There are also Data Gateways and a Revenue Gateway and a Database in a separate architectural boundary, Utilities. The point of separation by actors is the "different reasons" part of the SRP. The point of architectural boundaries is the "different rates" of the SRP. Once you separated the code this way, you can mix it into components or even deployables. 

I remember being disappointed with this chapter. It is too short for a case study. And this architecture doesn't "scream" video sales to me, because the first thing I see are the technical boundaries.P

Chapter 34 - The Missing Chapter

This chapter starts with different ways to package our software:
  1. Package by (a technical) layer. Similar to the case study, we package together things which are on the same technical layer. Martin Fowler says this is a good way to get started.
  2. Package by (s business) feature. We put everything which has something to do e.g. with orders, into orders super-package. After a simple "move" refactoring, we have package by feature. Now the top-level package architecture really screams something about the application.
  3. Ports and adapters. All the variations of hexagonal architecture, clean architecture, etc. fall into this category. You have one package of everything testable, the domain, like services and their dependencies via interfaces. Everything else, the infrastructure, is separated and dependent on the domain. This idea comes from the DDD.
  4. Package by component. This is Simon Brown's, this chapter author's, recommended way. It's the same as ports and adapters, but the "backend" infrastructure is packaged together with the domain. This can be considered as a preparation for splitting to microservices, if necessary.
After this presentation comes the point. All 4 ways of packaging are the same dependency-flow-wise. But the visibility of the components could be different. If some of them are in the same package, then we can use the default visibility in Java. Mr. Brown suggests that we use compiler (and not just discipline) to enforce the dependency rule. He is also enthusiastic about the new Java 9 module system, which gives us more power for encapsulation. Alternative way is to use different source code trees, using maven, gradle or other build tool.

Part VII - Appendix VII

Appendix A - Architecture Archaeology

My only note from this part is the experience that if you want to build a reusable framework, you have to have at least 2 reusers for it.

No comments: