The Coder's Proverbs is a series where I summarize some lessons and principles I've learned over my career by using a memorable and simple saying of wisdom.
Is kind of von Clausewitz's wisdom that to win a war you must weaken your enemy. The prime way of doing it is by isolating them: cutting supply lines, so they cannot receive ammo, food, clothing and other assets; or cutting their communication channels, so they cannot receive instructions or situational updates. Isolating your enemy is key to winning a war.
We can apply this principle to Software Engineering so we can win the war that (sometimes) is writing maintainable and robust code. But, who is our enemy?
I can say with full confidence that the number one enemy of Software Engineering is change. Not the project people, not the product people, not the less inexperienced members of your team, not your language of choice, none of that. The vast majority of your challenges come from change.
Think about it. You would not be working on that feature if a client had not requested it. You wouldn't be struggling with that integration if your users had not demanded it. You would not be fixing that bug if it had not been reported. You would not be refactoring the module you wrote last week if the requirements had not been refined. The sad reality for us (and also the blessing) is that Software is never static: it evolves and is constantly changing. If the software is not changing, then there is no one writing code, and then no suffering developer. Change is an unavoidable reality in Software and the number one cause of rot. Entropy, I think, is the correct term to define this reality.
So, to win in the Software Development war -- Oh dear, I sound like one of those LinkedIn influencers! -- you must isolate your enemy. In other words, you must isolate the things that can, might or will change in your code.
Take a look at the following piece of PHP code.
<?php
function importCountries()
{
$url = 'https://restcountries.com/v3.1/all';
$contents = file_get_contents($url);
$data = json_decode($contents, true);
foreach ($data as $country) {
// Real code will do something more meaningful here
echo $country['name'].PHP_EOL;
}
}
Now, there is a bunch of stuff that could be wrong with that code, depending on how possible is for it to change. Remember, code is not the enemy: it is change! If this is code for some automated routine for your toy scrapping project, then it is fine. If this is part of a data-importing application pipeline, then this is severely wrong.
Let's enumerate the possible things that might change in that context.
What happens if I want to make the data source dynamic and not just get country data from Rest Countries?
What happens if the format then, changes? Can I consume YAML, CSV or XML?
What happens if the action I want to perform on the data changes according to the data?
What happens if I want to take note of the records whose action has failed?
What happens if I want to apply transformations and actions that are repeatable?
What happens if I need to import 500,000 records? Will the system run out of memory?
It is clear that this code is far from ready to handle all those possible scenarios or requests. It needs to isolate the things that might change so it can support the actual change. And the number one way of isolating something in programming is by creating an abstraction for it. This is how it would look like:
<?php
interface Step
{
public function process(array $records): void;
}
class Result
{
public readonly array $record;
private ?Throwable $error = null;
public function __construct(array $record)
{
$this->record = $record;
}
public function withError(Throwable $e): void
{
$this->error = $e;
}
public function isSuccess(): bool
{
$this->error === null;
}
public function getError(): Throwable
{
if ($this->error === null) {
throw new RuntimeException('No error');
}
return $this->error;
}
}
class Pipeline implements Step
{
private Step $step;
public function __construct(Step $step)
{
$this->step = $step;
}
/**
* @param Traversable<int,array> $source
* @return Generator<int,Result>
*/
public function run(Traversable $source): Generator
{
foreach ($source as $record) {
$result = new Result($record);
try {
$this->process($record);
} catch (Throwable $e) {
$result->withError($e);
}
yield $result;
}
}
public function process(array $record): void
{
$this->step->process($record);
}
}
So, let's enumerate what we have done here:
By accepting any
Traversable
that returnsarray
on every iteration, we have abstracted away the source. Now, we don't care if the source is JSON, XML, or any other. We are going to iterate over a collection of arrays. It is up to the source class to fetch the corresponding records.By creating an interface called
Step
we can now abstract away the action we want to perform in every record, and we can create implementations that will do common tasks.By creating the
Result
object we can report whether every record was successfully processed or not, and what was the error.By using
Generator
, we can process any data-set of any size without worrying about memory consumption.
If you are thinking "Great, this is five times more lines than the previous approach and it does not even contain the code for reading JSON" you are looking at the problem from the wrong angle. The problem in the first place was not that the code was hard to grasp. In fact, anyone would understand what the previous code did. The problem was that it was not resistant to change. Making code more robust, code that isolates the things that change, by definition is going to take more lines of code. The price you pay is more lines of code, but the benefit you get is immensely superior. I would say that is a pretty good tradeoff.
Moreover, is not always true that abstraction requires more code. Designing abstractions is very similar, in a way, to search algorithms. In searching algorithms, if your list is small, you can use O(n)
approaches and don't even worry. However, as your list grows, using a O(log n)
approach will make a difference. In abstractions, you don't feel the benefit of it if your codebase is small, but as your use cases grow and the abstraction proves itself, it requires exponentially less code to do more things with that abstraction.
Conclusion
What I've given you here is really the Single Responsibility Principle. Many people believe the principle is "A class should have only one responsibility" or "A module should do only one thing". That's technically incorrect. The original wording of the principle given by Uncle Bob was that "a class or module should have only one reason to change". By isolating the things that change, our pipeline has just one reason to change: the change in requirements. The Pipeline
class does not change its logic if the data source changes, or if the steps change: it will always remain the same.
Isolate the things that change behind abstractions, so those things can change freely and leave the rest of your code untouched. That will make your code more robust, and resistant to change.