Notes about Modern Software Engineering

Notes to my future self about David Farley's book Modern Software Engineering. IMHO the book was quite boring, and presented just a few new ideas.

Chapter 1 - Introduction

We need to become experts in learning and experts in managing complexity. The scientific method we learned at school:
  • Characterize - make an observation about the current state.
  • Hypothesize - create a theory explaining the observation.
  • Predict - make a prediction based on hypothesis.
  • Experiment - test the prediction.
When we implement many small experiments this way, we will limit our jumping to wrong conclusions and do a better job.

Software engineering is the application of an empirical, scientific approach to finding efficient, economic solutions to practical problems in software.

To become experts in learning, we need the following techniques: Iteration, Feedback, Incrementalism, Experimentation, Empiricism.

To become experts in managing complexity we need: Modularity, Cohesion, Separation of Concerns, Abstraction, Loose Coupling.

The book then describes set of ideas that act as practical tools to dive an effective strategy, e.g.: Testability, Deployability, Speed, Controlling the variables, Continuous Delivery.

Fred Brooks is often cited in this book. He compared hardware and software speed of development like this: There is no single development, in either technology or management technique, which by itself promises even one order of magnitude improvement within a decade in productivity, in reliability, in simplicity.

Farley believes he provides in this book a paradigm shift for software development comparable with Galileo challenging the conventional wisdom of his time.

Chapter 2 - What is Engineering

Our engineering is not like any production engineering (e.g. bridge building). We do design engineering instead. A bridge builder might create a computer simulation of the proposed design, an approximate model of the bridge. We don't need to worry about precision of our models. Our models are the reality of our system. Software development is all about discovery, learning and design. We should, even more than spaceship designers apply the techniques of exploration in our work. 

Engineering means stuff that works. Engineering is the application of an empirical, scientific approach to finding efficient, economic solutions to practical problems. Engineering is not just code, it is everything around as well.

There were several significant steps that affected the productivity of programmers, but only one came close to Fred Brooks 10x improvement. It was the step from machine code to high-level languages. The level of change in our industry in impressive, but none of it is very significant. E.g. Serverless is important but principally only because is encourages more modular approach with a better separation of concerns with respect to data.

Software development is not just craft. Craft-based production is low-quality. A human being, however talented, is not as accurate as a machine. Precision and scalability are what differs software development from craft. The art of programming is the art of organizing complexity. Engineering is the more scalable, more effective offspring of craft. That said, it is important not to dismiss the value of craft, if we by craft mean creativity. But craft is not enough. Engineering is the evolution of craft to the height of human creativity and ingenuity.

We should make decisions based on evidence. Understanding the trade-offs is vital for decision-making. One of the key trade-off is coupling.

Modern teams fight with schedule pressure, quality and maintainability of their designs. We should adopt a practical, rational, lightweight, scientific approach.

Chapter 3 - Fundamentals of an Engineering Approach

Engineering disciplines are firmly grounded in scientific rationalism and take pragmatic, empirical approach to making progress.

We get excited about new technologies. E.g. with Hibernate it was actually more code to write than the equivalent behavior written in SQL while SQL being easier to understand. The only 10x steps for programmers was the step from Assembler to C and the step from procedural to OO programming. There may be not many 10x gains but there are certainly 10x losses.

One of the reasons that we find it difficult to discard bad ideas is that we don't really measure our performance in software development very effectively. The only valid measures we need are stability and throughput. #interesting Stability is tracked by:
  • Change Failure Rate - the rate at which a change introduces a defect at a particular point in the process.
  • Recovery Failure Time - how long to recover from a failure at a particular point in the process.
Throughput is traced by:
  • Lead Time - how long for a single-line change to go from idea to working software.
  • Frequency - how often are changes deployed into production.
There is a correlation between a development approach and the commercial outcome for the company. Speed and quality are actually correlated. #interesting Surprisingly, change approval boards don't improve stability.

Chapter 4 - Working Iteratively

As long as we have some way of telling whether we are closer to, or further from, our goal, we could even iterate randomly and still achieve our goal. Working iteratively automatically narrows our focus and encourages us to think in smaller batches. #important

Waterfall was better suited for production line problems, not for a learning exercise, that software development is. Agile approach is an infinite approach, where we iteratively refine our thinking, identify the next small step, and then take that step. We embrace change. 

We work in smaller batches. We make small steps. At a different scale you can think of CI and TDD as being inherently iterative processes. In CI we are committing often. In TDD we work test by test.

Chapter 5 - Feedback

Feedback is the transmission of evaluative or corrective information about an action, event, or process to the original, or controlling, source. Without it, there is no learning. Despite this, many companies pay no attention to it. In most organizations, guesswork, hierarchy, and tradition are the much more widely used arbiters for decision-making. Feedback allows us to establish a source of evidence for our decisions.

As Farley begins writing a test, he wants to know if his test is correct. He would like some feedback whether his test is correct. So he writes the test and runs it to see it fail. By using TDD, we could achieve all hallmarks of software quality, like modularity, separation of concerns, high cohesion, etc. We are defining how external users of our code will interact with it.

Continuous Integration and its big brother Continuous Delivery demands us to make changes in small steps and have something ready for use after every small step.

On software architecture level, we need to take testability and deployability of our systems seriously. The adoption of Continuous Delivery in both microservices and monoliths, promotes modular, better abstracted, more loosely coupled designs., because only then you can deploy and test them efficiently.

We should strive to "Fail fast". Continuous Delivery and DevOps sometimes call this principle as shift-left. #interesting

Applying for fast feedback enables organizations to learn faster even for product design.

Feedback is also useful for organization and culture. If your team has an idea to improve its approach to something, take inspiration from a scientific approach and be clear about where are you now and where you want to be. Then periodically check to see if you are closer to, or further from, the target.

For us, stability and throughput for feedback are important, because they are best that we currently understand, not because they are perfect. Both CI and CD are ideas that optimize our development process to maximize the quality and the speed of the feedback that we collect.

Chapter 6 - Incrementalism

Incremental design is directly related to any modular design application, in which components can be freely substituted if improved to ensure better performance. Modularity is an important idea. Divide the problem into pieces aimed at solving a single part of a problem. A modular approach frees the teams to work more independently. High-performing teams make progress and are able to change their mind without approval from someone external to the team.

The tools we need to work incrementally are: modularity (we want to limit the impact we are making), refactoring (make safe changes), testing (have a safety net), version control (have a safe space to return to). We should take slightly more care when designing our integration points, because it causes slightly more pain to change those.

We can begin to work before we have all the answers. Many people struggle to start if the don't have the detailed design of what they are going to build upfront. But that is agile. Farley thinks of it as of defensive design, but a better name is incremental design. The current design of the code should concern only current system needs, not everything we could think of (that would be over-engineering). People called Farley a 10x engineer. It is because he is an incremental designer. #interesting He does what he writes in this book. He worries about over-engineering. He never adds code that is not needed now. #interesting That said, he does always try to separate the concerns in his design, break out different parts of the system, design interfaces that abstract the ideas in the code that they represent, and hide the detail of what happens on the other side of the interface. He strives for simple, obvious solutions in his code, but also has some kind of internal warning system that sounds off when his code starts to feel too complex, too coupled, or just insufficiently modular. His aim is not simple code, but rather code that he can change when he learns new things.

Chapter 7 - Empiricism

Empiricism means emphasizing evidence, especially as discovered in experiments. Even for the best companies, only a fraction of their ideas produce the effects that they predicted. Science is a problem-solving technique and it works. Make a hypothesis. Figure out how to prove or disprove it. Carry out the experiment. Observe the results and see they match your hypothesis. Repeat.

Farley then presents an example of self-deception. One of the most common causes of a cache-miss for most systems that we found by measurement was concurrency. Farley then presents that counter-intuitively single-threaded solution to a problem was much faster than multithreaded ones. Assume that what you know and what you think is probably wrong and figure out how to prove it. Parallelism hurts if you need to join the results back together. We must, continually, be skeptical about our guesses.

Chapter 8 - Being Experimental

Experiments provide insight into cause-and-effect by demonstrating what outcome occurs when a particular factor is manipulated. Four characteristics define "being experimental" as an approach:
  • Feedback - we need to understand how we will collect results that will provide us with a clear signal.
  • Hypothesis - we need to have an idea we want to evaluate. First we guess it.
  • Measurement - we need a clear evaluation method.
  • Control the variables - we need to eliminate as many variables as we can so that the signal of the experiment to us is without noise.
The clearest form of such an experiment is software development guided by the tests, or test-driven development. High-performing teams that employ TDD, CI and CD spend 44 more time on useful work. There is no downside. You can have your cake and eat it.

Experiments come in all sizes. E.g. we can predict the exact error message the test to fail with.

Automated testing and continuous delivery techniques like infrastructure as code, make our experiments more reliable a repeatable.

Chapter 9 - Modularity

To cope with complexity, we must divide the systems that we build into smaller, more understandable pieces. This is the real skill of software development. Farley has a commit-stage check that no method is longer that 20-30 lines of code. He also rejects methods with more than 5-6 parameters. #interesting

Testability is very important. If our tests are difficult to write, it means that our design is poor. TDD is a talent amplifier that is even more important as we move from craft to engineering.

Beware of testing too big scope. If we test everything together, we risk not being able to interpret the test results. As the test scope grows, the precision reduces. The real root cause of a lack of determinism in computer systems is concurrency. #interesting

From a purely practical perspective, we can think of a service as code that delivers some "service" to other code and hides the detail of how it delivers that "service". This is just the idea of "information hiding" and is extremely important if we want to manage the complexity of our systems as they grow. A service can certainly, sensibly, be thought of as a module of our system. The seams or boundaries should be treated with more care. They should be translation and validation points for information. A system is not modular if the internal workings of adjacent modules are exposed. Communication between modules (and services) should be a little more guarded than communication between them.

The most scalable approach to software development is to distribute it. Amazon had its famous two-pizza teams, which allowed it to grow at an unprecedented rate. Part of the definition of a microservices is that they are "independently deployable". However this comes at the sometimes very significant cost of a more complex, more distributed architecture in our systems. We are forced, now, to take modularity very seriously indeed. We probably need to consider and apply ideas like runtime version management for APIs and so on.

Modularity is important at every scale. Deployability is a useful tool when thinking of system-level modules, but this, alone, is not enough to create high-quality code. There is a modern pre-occupation with services.

You can't make a baby in a month with 9 women. The teams of 5 took only one week longer than the teams of 20 people over a 9-month period. We need modular organizations as well as modular software. So if we want our organization to be able to scale up the secret is to build teams and systems that need to coordinate to the minimum possible degree. #interesting

Chapter 10 - Cohesion

Pull the things that are unrelated further apart, and put the things that are related closer together. #interesting The trouble is that this is open to overly simplistic interpretations. A naive view of cohesion is that everything is together and so easy to see. Is this easier to understand? There is a common desire among programmers to reduce the amount of typing that they do. Don't measure simplicity in terms of the fewest characters typed. The primary goal of code is to communicate ideas to humans. Optimize to reduce thinking rather than to reduce typing.

Separation of concerns guides us to a rule: One class, one thing, one method, one thing.

The route to fast code is to write simple, easy-to-understand code. If you are interested in performance of your code, measure it.

We should separate accidental complexity from essential complexity. Farley then presents an example where a method is split using listeners, i.e. indirection. This code is more or less cohesive based on context. If the listeners are considered to be a part of the problem, than the code is less cohesive. If they are just a side effects, than the code is more cohesive.

If the code confuses different responsibilities, it lacks clarity and readability. If our responsibilities are more widely spread, it may be more difficult to see what is happening. By keeping related ideas close together, we maximize the readability. So the code example with listeners indirection, while more flexible, lacks clarity.

Cohesion is probably the slipperiest of the ideas in the list of ideas for managing complexity. Having all the code in one place, is at least cohesive, but this is too simplistic.

Chapter 11 - Separation of Concerns

Separation of concerns is a design principle for separating a computer program into distinct sections such that each section addresses a separate concern. It is the most powerful principle of design of Farley's own work. He applies it everywhere.

Farley's personal preference for the code example from Chapter 10 is to strongly prefer the listeners method. #interesting He very much likes that he removed the concept of storage from his core domain. Version without listeners is confusing concerns. Storage belongs to the realm of accidental complexity. We should separate accidental and essential complexity. It is our work to minimize accidental complexity.

A useful tool for achieving separation of concerns is dependency injection. We can use the testability of the systems that we create to drive quality into them in a way that little else, beyond talent and experience, can do.

When we have a ports and adapters architecture, always translate information that flows between services. Farley recommends to add ports and adapters whenever we communicate to something which is in a different repository or a deployment pipeline.

Chapter 12 - Information Hiding and Abstraction

Information hiding is about drawing lines, or seams, in our code so that when we look at those lines from the "outside", we don't care about what is behind them. For Farley information hiding = abstraction.

When we developers say my manager doesn't allow me to "refactor", "test", "design better" or even "fix that bug", it does not make sense. We don't need a permission to do a good job. There is no tradeoff between speed and quality. It is a good thing, a sensible thing, to change the existing code. We just need to work in small steps that are easy to undo.

We should always be thinking of the simplest route to success, not the coolest, not the one with most tech that we can add to our CVs or resumes. In Farley's experience if we take this idea of "striving for simplicity" seriously, we are more likely to end up with something cool and enhance our CVs. Future-proofing is a sign of design and engineering immaturity. YAGNI - you ain't gonna need it. #interesting

The real solution to the problem of being afraid to change our code are abstraction and testing. If we abstract our code, we are, by definition, hiding the complexity in one part of the system form another.

We should beware the leaky abstractions - the ones which leak details that they are supposed to hide.

The more targeted the abstractions are to the problem that we are trying to solve, the better the design.

We should isolate third-party systems and code.

Chapter 13 - Managing Coupling

Coupling is a degree of interdependence between software modules; a measure of  how closely connected two routines or modules are; the strength of relationships between modules. Coupling is not something that we can, or should, aim to always wholly eliminate. The real reason why attributes like modularity and cohesion and techniques like abstraction and separation of concerns matter is because they help us to reduce the coupling. This reduction has direct impact on the speed and efficiency with which we can make progress on the scalability and reliability of both our software and our organizations. We should prefer looser coupling over tighter coupling. Coupling is a monster at the heart of software development. 

Perhaps the biggest commercial impact of coupling is our ability to scale up development. If your team and my team are developmentally coupled, we could maybe work to coordinate our releases. The best way to do this is through CI. Most organizations are unable or unwilling to invest enough in the changes necessary to make this work. There are only 2 strategies that make sense - either a coordinated approach or a distributed approach. Each comes with costs and benefits. 

Microservices are: small, focused on one task, aligned with a bounded context, autonomous, independently deployable and loosely coupled. This definition aligns with what Farley calls a good design. The trickiest idea here is "independently deployable". Microservices is an organizational scaling pattern. The cost is that we give up coordination. The DevOps report says that without coordination the teams are more likely to deploy high-quality work more often.

The Nygard model of coupling recognizes 5 types of coupling:
  • Operational - a consumer can't run without a provider.
  • Developmental - changes in producers and consumers must be coordinated.
  • Semantic - change together because of shared concepts
  • Functional - change together because of shared responsibility
  • Incidental - change together for no good reason (e.g. breaking API changes)

DRY - don't repeat yourself is an excellent advice within a context of a single function, service or a module. Farley would extend the scope of this advice to the repository and deployment pipeline. But we shouldn't share code between microservices.

Farley prefers asynchronous communication between microservices.

Chapter 14 - The Tools of an Engineering Discipline

People are too slow, too variable in what they do, and too expensive to rival an automated approach to gathering the feedback that we need. Designing to improve the testability of our code makes us design higher quality code. While TDD encourages the design of testable code, unit testing does not. Unit testing after the code is written, encourages us to cut corners, break encapsulation, nad tightly couple our test to the code that we already wrote. #interesting

We should work iteratively, adding a test for the piece of work in front of us. We predict how the test will fail. Then we can make create code that makes the test pass and review our design and make small, safe, behavior-preserving changes to optimize the quality of our code and our tests.

We should differentiate between releasability, which implies some feature completeness and utility to users, and deployability, which means that the software is safe to release into production, even if some features are not et ready for use and are hidden in some way. If the deployment pipeline says that the change is good, there is no more testing to be done, no more sign-offs, and no further integration testing with other parts of the system before we deploy the change into production.

When Farley consult the teams to help them adopt CD, he advises them to focus on working to reduce the time it takes to gain feedback. Get to a releasable outcome in less than one hour. Speed is a tool that we can use to guide us toward higher-quality, more efficient outcomes.

We want the tests that we create to give precisely the same results every time that we run them for the same version of the software under test. Given the same inputs, computers will generate the same outputs every time. The only limit to this truth is concurrency. Reliably testable code is not multithreaded within the scope of a test, except for some very particular kinds of test.

CD is a highly effective strategy around which to build a strong engineering discipline for software development.

Chapter 15 - The Modern Software Engineer

Engineering is the application of an empirical, scientific approach to finding efficient, economic solutions to practical problems.

There are many companies these days which challenge the old companies in their field, like Tesla challenges automotive, etc. One of the defining characteristics of organizations like these is that they are nearly always engineering-led. Software development is not a cost center or a support function; it is the "business". Even a company like Tesla, whose product is a physical device, has shaped its organization around software ideas. #interesting

So a more sensible model is to treat the structure of our organizations as a tool (as opposed to treating IT as a tool). We identify business vision and goals, decide how we could achieve that technically (architecture), figure out how we could build something like that (process), and then pick an organization structure that will support the necessary activities. 

CD says work so that your software is always in a releasable state, optimize for fast feedback and our aim is to have the most efficient feedback from idea to valuable software in the hands of our users.


Comments

Popular posts from this blog

Notes about the Clean Architecture book

Notes about the Building Microservices

Notes about A Philosophy of Software Design