Considerations before jumping into Microservices

Disclaimer: A few years ago I wrote this piece. It came out very passionately then due to frustrations experienced in a microservices project. I have matured and grown since then, and in many ways, this article is a more nuanced refinement of those ideas and an attempt to correct my misconceptions at the time.

Knowing the Monolith

I think before I say anything about this topic, a proper definition of the concept of a monolith is needed. A monolith is usually defined as a piece of software (usually in a single codebase) that has grown so complex and has accumulated so much technical debt over the years that now is extremely hard to change, hard to understand and therefore easy to break. Developers dread the day they have to make a change or implement a new feature, and when they do, unexpected things happen all over the place. The system is so big, so complex and so glued together, that no one knows where to look to fix the issues. This in turn leads to poor development times, a lot of “I’m sorry that can’t be done” and tons of burned-out engineers due to the pressure for delivering and the frustrations and fear of unexpected issues. In terms of the business, the client’s trust is harmed because of the many issues and poor response times.

There are mainly three major things characteristic of monoliths, and their causes are usually related to violations of best practices in OOP.

Monoliths are rigid, which means, they are hard to change. The reason why they are hard to change is because of coupling. Usually, this manifests itself in not abstracting away the details that change (a violation of the Dependency Inversion principle), sticking all the logic in a single place (a violation of the Single Responsibility principle), or because almost every modification requires changing existing code instead of adding a new one (a violation of the Open-Closed principle).

You can get a measure of the rigidness of the system using automated tools that perform code analysis, like cyclomatic complexity, code duplication, test coverage, number of lines per class or module, file churning, ABC, etc. But these metrics, even when you have them, need proper interpretation by a human, preferably a consultant that can analyze the codebase for things that cannot be picked up by code analysis tools, like SOLID violations or lack of use/misuse of design patterns. Some of these are also measures of our next point.

Monoliths are complex, which means, they are hard to understand. It might be due to their size, or maybe because information about the system has been lost (lack of documentation), or maybe because the code is written in a very dirty way. Sometimes, in the life of a system, so-called shortcuts are done to overcome a design limitation, and the people that implemented them are long gone, and their knowledge of the system shortcut is gone with them. Maybe the system is complex because it does things unconventionally. For instance, I have seen many systems myself that, when you change something (say, add a field to a payload) you need to do something else somewhere else on the system so that the change can behave properly. The problem is you need to know that.

There are many, many reasons for complexity. One of the best ways to measure complexity is how long would take a new engineer to get the system up and running in their development machine, get a grasp of it, understand where the critical parts are and be able to make meaningful changes without much aid from anyone else. Of course, that is also a measure impacted by the engineer’s experience. There is also, as I mentioned before, automated tool analysis.

Monoliths are fragile, which means, they are easy to break. Due to all the reasons mentioned above, the system breaks pretty easily, and for the weirdest of reasons. Even skilled engineers fail into their traps and introduce unexpected and accidental side-effects or bugs. Again, this is probably due to the inherent coupling of the components of the system, plus its complexity of it.

You can measure fragility by getting a sense of how afraid are developers, even experienced ones, of making any change to the system. Also, by how many regressions the release of a new functionality causes. By using static analysis tools you can pick potential sources of breakage due to typing too.

Monoliths are a sad tragedy to be in. We wanted a flexible, simple and robust system, but somehow we ended up with a rigid, complex and fragile one.

The Principal Architect™

So, we have enumerated the issues with monoliths. They are rigid, complex and fragile. Our organisation is struggling to deliver and customer confidence in our capabilities of delivering a quality product is decreasing dramatically.

But, fear not! We have identified our problems to solve! So the Principal Architect at the company says:

Let’s come up with a strategy to incrementally refactor our monolith to make it more flexible, simple and robust.

~ The Wise Principal Architect 🦫

A good approach. Not so fast and requires skilled engineers in the art of refactoring plus the ones who know the system well. But it can be done for sure. Or maybe it can’t; maybe you are past the point of refactoring (in terms of cost) or you don’t have “surgeon engineers” at your disposal, so maybe they say:

Let’s take all the knowledge we have so far and build a second version of the system that is more flexible, simple and robust.

~ The Bold Principal Architect 🦅

A bolder approach, but faster than the former. You can do this with mid-skilled engineers under the supervision of the ones who know the system well. You need to put extra effort into not repeating the mistakes of the first system though, otherwise, your effort is pure waste.

However, you know they don’t say that, don’t you? That’s why you are reading this blog post! What they say is:

It is time for us to split our monolith into microservices like Netflix or others are doing!

~ The Foolish Principal Architect 🐥

If you are there when that happens...

On a more serious note, I think this comes to pass because we have a misconception of what microservices are and therefore, we are unable to see the real problem.

Misconceptions on Microservices

There are a lot of misconceptions about microservices. They are usually hard to spot because there are some elements of truth in them (hence the name, misconceptions).

One of the most famous ones is that they are a way of “splitting the monolith”. This is quite misleading because suggests the main problem with a monolith is its size. If you add that to the fact of how the pattern is named (micro-services) then you understand the cause of confusion. People bring the reasonings of SOA and apply them to this problem. Thus, the following way of thinking becomes the common argument for implementing this pattern:

If we split this big thing into smaller chunks, then it would be easier to manage.

~ An Engineer. Famous Last Words. Circa 2018.

This looks reasonable, and it is indeed (in the appropriate contexts). This is the principle of modularity: separating things into smaller components that then can be combined to form a larger whole. The idea is that at any time you can pull out one component and replace it with another without breaking the whole system. And it is also easier to make sense of smaller pieces than a big ones, right? So far, so good.

In pre-microservices SOA, this separation usually fell into the runtime space. This means these modules interacted between them in the safe realm of random access memory. Changes would be made in a single place (the application code) and then shipped in a single executable without major hassle. Adapting to changes in the public API of one module usually involved correcting the consuming services in the same PR. This is one of the victories of object-oriented programming.

However, as we will see further, there is a whole range of complexity introduced when we apply this principle to services running as separate applications communicating over TCP networks. I’ll get there in the next section. I want to stress something else first.

Our long introduction on what a monolith is had a very special purpose that will come to light now. If you look again at the reasoning justifying the splitting of the monolith into smaller services, hopefully, you will realise that fundamental problems somehow escaped the reasoning backing the proposed solution. This is extremely important.

True, size might be a factor in the complexity of a system, but it is not the most important factor, and certainly not the only one. It is not just complexity due to size, but rather any complexity, anything that causes rigidness, and anything that makes things fragile. If you have complexity, fragility and rigidness, it does not matter if you have one big large service or many small ones. You will suffer anyways.

“Estimate the Cost. Sit down and consider.”

Actually, no, I lied. It does matter. Is better to have a monolith than a distributed mess. Here is the big takeaway of this article: if you implement microservices for the wrong reason and if you are not ready to tackle the challenges that come with them, then you will be in a worse place than when you started.

As usual, the scriptures bring untimely wisdom in this matter (and in any other matter regarding life really). Jesus is trying to explain to people that following him is very demanding and that they should consider whether they can do it or not before blindly jumping on the bandwagon (let’s remember these are the popular times of Jesus’ ministry). He uses two wonderful metaphors. Here is what he says:

Suppose one of you wants to build a tower. Won’t you first sit down and estimate the cost to see if you have enough money to complete it? For if you lay the foundation and are not able to finish it, everyone who sees it will ridicule you, saying, ‘This person began to build and wasn’t able to finish.’ Or suppose a king is about to go to war against another king. Won’t he first sit down and consider whether he is able with ten thousand men to oppose the one coming against him with twenty thousand? If he is not able, he will send a delegation while the other is still a long way off and will ask for terms of peace. In the same way, those of you who do not give up everything you have cannot be my disciples.

~ The Holy Bible. Luke 14:28-33 (emphasis mine)

We are not builders nor kings, and maybe you do not follow Jesus, but surely we both can relate to this. Prudence and consideration is a scarce gift in a world that encourages you to take any opportunity you have in front of you.

The first thing you should think is, do you have the resources to maintain all those microservices? Is hard enough to maintain one service, let alone fifty. You need to have the workforce in place for that effort. Otherwise, don’t do it.

Of course, maintaining involves many things. For instance, each part of the development lifecycle needs to be properly automated. You cannot test all of your microservices manually. You need to rely on automated testing and quality checking (Continuous Integration) and try to also automate the release process (Continuous Deployment). If you don’t do that, better not to jump in the effort. You will be crushed under the weight of manual deployment and testing work. You can’t treat your services like pets anymore, giving them all the care and attention they need individually: now they should be treated as cattle.

Some companies develop their own semi-automated release process. It is automated up until the last mile after which there is some manual work to be done for release. This is another challenge: making sure you can come up with a process that is simple to learn and properly documented, so it can be quickly grasped by any engineer in the company. If you fail to do that, there will be trouble.

If keeping one application up to date with the latest framework, language version, or standard is hard enough, imagine doing that for thirty, or forty. It is not fun. True, there are things like dependable or language-specific tooling to upgrade from one version to another, but yet is another automated process you have to implement, maintain, document and ensure that runs smoothly.

Things get tricky if you implement custom, in-house tooling, frameworks or libraries to accelerate the development effort. The reason why you have to accelerate it though is because you brought that extra work when you decided to bring microservices in. Every new project implies a setup cost that you don’t have to pay if the project is already there and the tooling is in place. No mention, you have to keep that tooling nice, sharp and well documented and make sure every project is using an updated version. And don’t dare to break BC!

Dealing with the data dichotomy is also something a lot of teams don’t consider. When you had the monolith, it was all there in the same database, so it was easy to share data between services. But now (hopefully) every service has its own database and a very narrow REST interface. But they still have to share data. How do we do that? If you just create a network of interconnected REST calls, you will end up with a very complex system, and changes in one place could and will break stuff in other places. Services will be coupled together and you will end up with a distributed monolith. Sure, you can implement observability systems to detect when things break, but that is half of the problem. The problem is that now the other affected service is in another project, in another repo (maybe in another language!), that you need to clone, set up and understand before you can make a fix there. In the monolith, at least, it was in the same source code.

Maybe you did your homework and decided you will communicate your microservices using an event-driven approach, but that has also challenges and pitfalls of its own. Tracing where an event goes and what services reacted to it is hard, and it requires a particular set of skills and engineers with the right experience to develop event-driven systems. They are not easy to find, and if you do find them, they are not cheap to hire.

The ability now of going polyglot is another factor to consider. Now you can have 2, 3 or even 4 languages with a presence in your company. There is a constant tooling and language war over which one should we use. Tooling has to be built in 2, 3 or 4 different languages. More sooner than later you will end up feeling the huge maintenance burden they cause.

If the technical challenges are big, the people's challenges are monumental. Suddenly your company has hired twice the people to be able to cope with the extra amount of work you have brought into yourself. New people, with new experiences and new ideas, come to the table. Sometimes they could bring change just for the sake of changing something. For instance, someone had a bad experience with GraphQL on their previous job, or someone likes Postgres over MySQL or Golang over PHP (imagine that! 😜). More people means really, more diversity of experiences and opinions. And while that is a good thing, you still need to be able to filter what makes sense from what is just pure noise.

To keep this under control, you implement a process heavy-culture that focuses a lot on uniformity and consistency, so your company does not become the wild west. Although you gain in that area, process overloading brings problems of its own: new engineers have to learn your processes, and they might be not the best choice sometimes or fall outdated pretty quickly. Also, processes need to be properly documented and simple. Otherwise, they will not be followed. More extra work!

Maybe your monolith was maintained by one or two teams, but now you have 7 teams working on your suite of microservices. Communication is suddenly more important than ever, and it is even more if your services are coupled with direct request-response calls. One team’s work can break the other’s, or one team can be blocked waiting for the implementation of another feature by another team. Sitting time is great for engineers: they can read Reddit, or Twitter, or maybe write long boring articles like this one. It is not great for your company though: you are losing money.

Honestly, I could go on and on with more and more things you need to consider before you make this move, and I still will have the stuff to say. There is abundant literature and media out there that deals with the pitfalls of microservices, so make sure to take a look at it! I leave you with my personal favourite here:

https://www.youtube.com/watch?v=r8mtXJh3hzM

Final words

Maybe you read all this and say: “but all those things are solvable!” And yes, you are correct, they are! Every single one of those problems has a solution and can be addressed effectively. But if you are asking that I think you might be missing the point. The question is not whether they are solvable. The question is, do you need to solve them? Do you need to bring all that complexity in to have a successful business? Do you need to add another suite of technical problems to solve on top of your business problems? Do you really need microservices, or do you only want them?

Don’t get me wrong. The benefits of microservices are great. But you only reap the benefits if they are correctly implemented and you have what it takes to do so. If not, again, you could end up worse than before you started.

The bottom line is to estimate the cost, sit down and consider, and formulate a plan for your transition (if you really need to transition!). If you fail to plan, you plan to fail.