Notes about Code Complete 2nd edition
Software Construction
The claim that construction errors cost less to fix is true but misleading because the cost of not fixing them can be incredibly high. Construction is 30-70% of a software project. Productivity of individual programmers varies by a factor of 10-20 during construction. Construction is the only activity that's guaranteed to be done.
Software Metaphors
Advantage of models is that they are easy to grasp as conceptual wholes. Disadvantage is that they can be misleading when overextended. They are heuristics, not algorithms.
The writing metaphor suggests that programming is like writing a casual letter. Truth is programmers often talk about 'code readability'. The writing metaphor implies a development process that's too simple and rigid to be healthy. One example of poor advice coming from this metaphor is Fred Brooks' "Plan to throw one away, you will anyway."
Another metaphor is software farming - growing a system. You design a piece, program it, test it and add it to the system. The incremental technique is valuable, but the metaphor itself is terrible. It implies you don't have the control over how the system grows.
The final and arguably the best metaphor is software construction - like a building. It is better that writing or growing metaphors. Building a tower 100 times the size doesn't require merely 100 times more resources. It requires completely different approach. When constructing building, the main expense is the labor. You have to design upfront to avoid fixing mistakes later on. It generally doesn't make sense to build things you can buy ready-made. Sometimes it makes sense to use lightweight agile approaches, but sometimes you need rigid heavyweight ones. Making structural changes in a program costs more than adding or deleting peripherals.
Development Prerequisites
Carpenter's saying "Measure twice, cut once." is highly relevant in software development. The overarching goal of preparation is risk reduction. The highest risks should be taken care of as early as possible. The top risks are poor requirements and poor planning. Part of developer's work is to educate non-technical staff about development process. The bigger the project the more planning it needs. It pays off to do things right the first time. One rule of thumb is to specify 80% of requirements upfront. The alternative is to specify 20% and the rest in increments. For most applications the iterative process is much more useful that sequential one.
The first prerequisite is a problem definition. It shouldn't be the solution definition, but the problem one. Official requirements ensure that users drive the requirements, not the programmers. Everybody should know the cost of changing requirements.
Without good architecture, it might be impossible to have successful construction. The architecture should describe the system in broad terms. Think explaining the solution to six-year-old. The architecture should prove that all alternative solution are worse than chosen one. Architecture should define purpose of each building block. Architecture should describe communication between the building blocks, major classes, major data tables, have modular user interfaces which could be changed later on, have plan for scarce resources such as database connections, threads, etc., have a security plan and performance estimates. Architecture should treat error handling as well, scalability, interoperability, fault tolerance, feasibility. Architecture should point the direction whether developers should do minimum required work or tend to overengineer. In house development should be rationalized if it is preferred oved ready-made solutions. Architecture should describe strategy for handling changes and to delay commitment. It should describe motivation for all major decisions.
If requirements are unstable, they should be treated as a project on their own.
Key Construction Decisions
Programmers are more productive using familiar language than unfamiliar one. Higher the level of the programming language higher the productivity and quality of the work programmers do. Python is much higher than Java according to this book. Most programmer avoid Assembler, unless they are focusing extremely on performance. C was the standard in 1980s and 1990s. Cobol is COmmon Business-Oriented Language. Java is used mostly for web applications. SQL is de facto standard for managing relational databases.
Programming conventions to use should be spelled before construction begins. It can be useful to know where you are on the technology wave. Mature technology environments benefit from rich software development infrastructure. Early-wave environments suffer from lack of programming languages, which are buggy and poorly documented. It might seem discouraging to work on early-wave technologies, but it is not the intent. Programmers who program into the language first decide what thoughts they would like to express and only then express them in the tools provided to them. Most programming principles are language-agnostic. Every programming language has its strengths and weaknesses. Be aware of those of your programming language.
Design in Construction
Both big and small projects benefit from design upfront. The more explicit it is the better. Design is mostly heuristics. Brooks argues that there are two complexities - essential (to the problem solved) and accidental. History tends to solve accidental complexity for us. We should still keep accidental complexity at bay.
Managing complexity is the most important technical topic in software development. The goal is to minimize the amount of program to think about once at a time. Good design should have minimal complexity. We should avoid 'clever' solutions. Design should be done with maintenance programmer in mind. Good design has loose coupling, exchangeability of modules, reusable components, high number of classes that use given class, low class dependencies, good portability. Good design means the program should have no unused extra parts. Good design uses standard techniques. Components graph should be acyclic.
We should start by identifying real-world entities as objects and their attributes. Then we can determine what can be done to each object. Then determine what each object can do to other objects. Determine what parts of object are visible to others. We should define each object interfaces.
Encapsulation helps by hiding inner complexities. Inheritance is one of the most powerful tools, which is often misused and causes harm. Well-designed classes hide most of their complexity in hidden parts. Circular dependencies are bad design.
Good design identifies and separates things that are likely to change. Few areas that are likely to change are business rules, hardware dependencies, input and output and difficult design and construction areas. It is good to identify the core functionality of the system. This part is unlikely to change.
Design patters can move communication to a higher level of abstraction. They can also reduce complexity and errors, because they are ready-made solutions to common problems. There are two drawbacks - forcing design pattern to the code where it is not the best fit and trying the design pattern just to learn it.
Good design is about strong cohesion - how focused the class is. Good design is design for test. There should be One Right Place for all maintenance changes. Brute-force often gets the job done and is good enough. Diagrams are sometimes worth 1000 words. \
Divide and conquers works here. We can do it top-down, until it would be easier to program the diagram than to draw it, or bottom-up, if we need something more tangible to work with.
We should make use of prototypes for main risks. The book overall votes for upfront design. Essential design decisions should be documented in the code as well, then in a wiki, summary e-mails. photos, CRC (class, responsibility, collaborator) cards, or UML diagrams. Designing is an iterative process.
Working classes
Class interfaces should provide a consistent abstraction. They should have one responsibility. A class should have couple of cohesive APIs. The goal here is to be able to forget about everything else while working on a class.
Class interface should hide something - a system interface, a design decision or an implementation detail. The benefits are many - changes don't affect the whole program, it's easier to improve performance, you can make the interface more informative than the ugly implementation details, the program becomes more self-documenting, you can work on higher level of abstraction, etc. Bad class interfaces usually have miscellaneous set of functions, or inconsistent abstract levels. Class cohesion and abstraction are closely related.
Containment is usually preferable to inheritance unless you are modelling an 'is a' relationship. We should avoid inheritance as much as possible, but polymorphism is still preferred to extensive type checking.
Classes are your primary tool for managing complexity. Give their design as much attention as needed to accomplish that objective. Minimize class dependencies.
High-quality Routines
The most important reason to create a routine is to improve the program readability. Aside from the computer itself, a routine is the second greatest invention in computer science.
Sometimes the routines should be simple. Good routine names describe everything the routine does. A good routing has maximum 50-150 lines of code. Routines shouldn't have more than 7 parameters.
Return values should be used only if the main purpose of the function is to return that value.
Defensive Programming
Production code should do better than 'garbage in, garbage out'. The idea of defensive programming comes from defensive driving. You take the responsibility for protecting yourself even when the bug is not your fault. Defensive programming makes errors easier to find, fix and less damaging.
Assertions can help detect errors early, especially in large systems, high-reliable systems and fast-changing code bases. Exceptions are a great tool for implementing assertions. Java assertions are not necessary.
Decision how to handle bad inputs is a key high-level design decision.
You can have a development version of the program which adds more assertions than the production code.
The Pseudocode Programming Process
The alternative to test-first development, the Pseudocode Programming Process is about writing the pseudocode first, in the right level of abstraction and then implementing it. The key is to write what the code does instead of how it does it.
The Power of Variable Names
The most important consideration in naming a variable is that the name fully and accurately describe the entity variable represents. Any naming convention is better than no convention.
Fundamental Data Types
Avoid magic numbers and strings. Read your compiler warnings. Document conditionals with helper variables. Use named constants for data declarations and loop limits.
Organizing Straight-Line Code
When statements require you to call them in specific order, take measures to make that order clear. The code should be well readable from top to bottom.
Controlling Loops
Loops are complicated. Help you readers by keeping them simple. While loop should be used instead of for loop, if appropriate. Loops should be short enough to see at once. Loop termination condition should be obvious.
Unusual Control Structures
If an early return statement enhances readability, use it to skip over trivial cases. Otherwise minimize returns in each routine.
Recursion should be limited to one routine.
Table-Drive Methods
A table-driven method is a scheme when you find the result in a look-up table instead of having a complicated logic. It should be considered as an alternative to complicated logic or inheritance models.
General Control Issues
Putting a conditional in a function can enhance readability even if used only once. Boolean values should be compared to true implicitly, but numbers explicitly (in languages where 0 is false). Boolean tests should be states positively.
If you cannot make your code simple, it means you don't understand it well enough. The core thesis of structured programming is that you can program any control flow using sequence, selection and iteration.
The Software-Quality Landscape
These are the external characteristics of software quality: correctness, usability, efficiency, reliability, integrity, adaptability, accuracy and robustness. Moreover, programmers care about the internal quality: maintainability, flexibility, portability, reusability, readability, testability. understandability.
There are many techniques for improving software quality: explicit quality-assurance activity, testing strategy, software-engineering guidelines, formal technical reviews and external audits.
To improve quality, we should state the measurable objective. The surprising fact is that the people will actually do what is asked from them.
No single defect-detection technique has results better than 60% of defects found. The most successful companies therefore use multiple techniques. Among the best techniques are: formal inspections of all requirements, architecture and design for critical parts of the system, modelling and prototyping, code reading or inspections and execution testing.
The general principle of software quality is that improving quality reduces development costs. It means the quality assurance is free in the end, but it requires reallocation of the resources so the defects are fixed when they are cheap to fix. Quality assurance is usually process-oriented.
Collaborative Construction
Code reviewer will find different defects that a tester can find and usually more defects are found during code reviews than during testing. When using pair programming instead, use code conventions, so that trivial but long discussions can be avoided.
Checklists are useful when using formal inspections.
Developer Testing
Developers usually perform white-box testing. 80% of defects will be found in 20% of the classes. Such classes should be found, redesigned and rewritten.
Debugging
Debugging is a last resort. We should always understand the problem before fixing it. We should fix the problem not the symptom. The compiler warning settings should be set to maximum warnings and treated as errors. We should always add a test for every defect that needed to be debugged and find similar defects throughout the codebase.
Refactoring
Even on well-managed projects, requirements change by rate of 4% monthly. If we use 'duct tape' to implement changing requirements, the quality degrades. If we see them as opportunities to improve the overall design, quality improves.
There are several reasons to refactor the code: code is duplicted, a routine is too long, a loop is too long or deeply nested, a class has poor cohesion, a class interface doesn't have a consistent level of abstraction, a method has too many parameters, changes require modifications in multiple classes, inheritance hierarchies need to be modified together, a class doesn't do very much, comments are used to explain difficult code, global variables are used, unused code. The best way to prepare for future, is not speculative code, but code refactored to be as straightforward as possible to make future changes easy to do.
Some interesting refactorings: covert a data primitive to a class, null objects, pass whole object instead of specific fields, replace inheritance with delegation. Refactorings should be kept small. We should do one refactoring at a time. We should target error-prone modules. Sometimes it is too late to refactor and the whole code should be tossed away and rewritten from scratch. If we are in a maintenance mode, we should improve the parts we touch.
Code-Tuning Strategies
Performance is only one aspect of overall software quality, and it's usually not the most important.
Quantitative measurement is a key to maximizing performance. Most programs spend most of their time in a small fraction of their code.
We should consider improving performance by changing program requirements and design. Avoid OS interactions and I/O. Try compiler optimizations. Upgrade your hardware. Consider code tuning as a last resort.
Code-Tuning Techniques
Substitute complicated logic with lookup tables. Translate key routines to a lower-level language. Use lazy-evaluation.
How Program Size affects Construction
As the software size is higher, there is greater need to support communication. The whole point of methodologies is to reduce communication problems. The methodology should live or die on its merits as a communication facilitator. Scaling up a lightweight methodology seems to work better than scaling down heavyweight methodology.
All other things being equal, the larger project will have lower productivity, more errors per line of code, requires much more planning.
Managing Construction
Software projects operate as much on an "expertise hierarchy" as on an "authority hierarchy". Assign two people o every part of the project. Review every line of code. Route good code examples for review - a coding standards manual can consist mainly of a set of "best code listings". Emphasize that code listings are public assets. Reward good code. If the manager has a programming background, he should be able to understand all the code.
Estimate cost, schedule and quality impact of each proposed change. View major changes as a warning that requirements development isn't complete yet.
Integration
Phased integration is called "big bang" integration for a reason. In top-down integration, you add classes at the top first, at the bottom last. As an alternative to strict top-down integration, you can integrate from top down in vertical slices. In bottom-up integration, you integrate classes at the bottom first, at the top last. In sandwich integration you save the middle classes to integrate as last. In risk-oriented integration, you integrate the riskiest classes first. In feature-oriented integration you integrate classes grouped into features. In T-shaped integration you build and integrate deep slice of the system to verify architectural assumptions and then build and integrate the breadth of the system to provide a framework for developing the rest. Incremental integration comes in many flavors and all of them are better than big bang integration.
Project should be integrated ideally daily. Programmers should commit their changes at least daily as well.
Layout and Style
The specific convention you follow is less important than the fact that you follow one. If you desire strong organizing principle, all routines can be in alphabetical order.
Self-documenting Code
The class interface should present a consistent abstraction. The code should be straightforward and never clever. Good comments don't repeat the code. They clarify its intent. Comments should explain at the higher level what you are trying to do. But your documentation efforts should be focused on code itself. When someone says "this is a tricky code" we should consider it a bad code. Surprises should be documented. Purpose of each file should be described. Still the question whether to comment is a legitimate one.
Personal Character
The characteristics that matter the most are humility, curiosity, intellectual honesty, creativity and discipline, and enlightened laziness. The characteristics of a superior programmer have almost nothing to do with talent and everything to do with a commitment to personal development. Surprisingly, raw intelligence, experience, persistence, and guts hurt as much as they help.
Programming in terms of the problem rather than the solution helps to manage complexity.
Where to find more Information
Pragmatic Programmer, Programming Pearls, Writing Solid Code, Programmers at Work, Dr. Dobb's Journal.
Comments