Microservices Lessons Learned

What I learned moving to a microservice architecture in an enterprise environment.

Microservices are all the rage these days. Everyone seems to be building them, and I have noticed that everyone is struggling with the same issues. And yet if you start looking for information about experiences and best practices, you usually end up with NetFlix, Spring Cloud (which is inspired by NetFlix) or Spotify. And of course Sam Newman of ThoughtWorks who has written a great book on the subject and has given many interesting talks.

I have been working with microservices for about 2 years now, and early last year I took over the architecture of a rather complex content management and publishing platform, using microservices. Note that the company I work for is not a startup, but a more traditional enterprise environment, with a huge existing infrastructure and code base. We have learned a couple of lessons, most of them the hard way, and I want to share those in this blog post.

Don’t do microservices on a greenfield project

This is the most important lesson we’ve learned. When you start a new project, you shouldn’t immediately build it with microservices. I think you should always go monolith-first. Why? Although microservices themselves should be fairly simple, and with frameworks like Spring Boot and Spring Cloud they are easy to build, you are building a distributed system, and that adds a lot of complexity. You need to have resilience, you need to start doing distributed tracing and logging, you have to write clients, you have JSON serialization and deserialization, and so on. And perhaps the most difficult part to get right in the beginning, is knowing what part of your system goes into what microservice. Domain driven design surely helps, but your understanding of the domain will evolve as the project grows, and once you have divided everything in autonomous services, it’s much harder merging them than it is the other way. When you start monolith-first, you can still use domain driven design, but as your domain evolves it will be much easier to refactor your code. DDD is almost a requirement for microservices, but not the other way around.

Another advantage of starting monolith-first is focus. Because you don’t have to deal with all the complexities of a distributed system, you can actually focus on delivering value. And isn’t that the most important part of any greenfield project? Actually proving that the system you are building has value for the company should be your absolute priority. If it does not, then you can fail fast and not lose too much money in the process. If it does prove valuable, the number of users starts growing and you need to scale up, then you can start dividing your system into autonomous parts. But that will probably be true only for a small percentage of products.

Let’s talk about that other kind of greenfield project: the kind that aims to replace an existing system. I’ve always believed this kind of project to be a mistake. They are usually a sign of hubris. You basically throw away all the knowledge and all the experience that is present in the old system, and you have to start building all of that again from scratch. No matter how crappy you perceive the legacy system to be, it will usually take a very long time to surpass it with your greenfield project. So basically, don’t replace legacy with a greenfield project. In fact, brownfield projects are the perfect opportunity to build microservices, thanks to the strangler pattern. This pattern allows you to gradually replace functionality in your legacy application with a microservice that initially just functions as a façade and, when all consumers of this functionality have switched to the façade, takes over the underlying logic and datestore from the old system. So your legacy system shrinks while your microservice architecture grows. And since your domain is much clearer in a legacy system, it’s also much easier to decide what goes where.

Change your organisation first

An enterprise is a totally different beast than a startup. There’s already a well-established culture, which is rarely agile in nature, and there is usually a very extensive infrastructure in place. With that infrastructure usually comes a set of strict rules, aimed at maintaining stability. This rigidity might have been useful in the past, when dealing with big software projects and when doing waterfall, but it is a disaster when trying to build a system using a microservice architecture. Once you get skilled at this type of architecture, and your number of teams starts to scale up, things will move very quickly, and your infrastructure will not be able to follow. We have definitely had this problem at our company, and I’ve heard many people in similar situations telling stories of brilliant systems which work great on virtual machines on a local desktop, but with nowhere to be deployed.

So what should you do about this? DevOps. As long as your company does not have a serious DevOps movement in place, you should stay away from microservices. The ideal environment for microservices is the cloud. You can have self-service infrastructure there, you have scalable storage, message buses, event streams, horizontal scaling of your application containers, it’s a match made in heaven. Setting up a private cloud is hard and time-consuming, so if at all possible you should go to the public cloud, at least initially. At a certain scale, having your own private cloud is no doubt cheaper (for now), but at least you can start building stuff in the public cloud while waiting for it.

Even more important than the cloud is automation. You should have full automation of everything, or at least be moving towards it, before starting with microservices. Infrastructure as code is a nice starting point: make sure all your current systems are managed with something like Puppet. You can already go quite far with tools like that, even on classic infrastructure. On the development side, you want to be doing at least continuous integration on your legacy projects, perhaps even continuous delivery. This is possible with legacy projects, and with monoliths. To give an example, before I took over our new microservices architecture, I was responsible for a big legacy monolithic system, scaled on 50 or so servers. When I arrived on the project, there had not been a deployment for 9 months. The deployment manual was an Excel sheet, containing hundreds of manual steps, so nobody actually dared to deploy it anymore. This was one of the main reasons the new greenfield project had been started. In a couple of months time though, I was able to fully automate the build and deploy of this legacy system, using Gradle and Jenkins. From a frequency of once every nine months, we moved to once a week on production, and daily on development, test and acceptance. The greenfield project had not worked on this kind of automation yet, so it took much longer for them to build and deploy the shiny new software.

What you ultimately need to do is stop thinking as an enterprise and start thinking as a group of startups. The company itself, or rather the infrastructure department just provides the platform that the development teams use to build their software on.

1 service is built and run by 1 team

This cannot be stressed enough: there should only be 1 team responsible for a microservice. A team can be responsible for many services of course, but a service should never be built or maintained by more than one team. The whole point of using microservices is autonomy. You want it to be able to evolve without dependencies on other services or teams. When multiple teams work on the same service, you lose that advantage, and you will be stuck in no time. And SAFe doesn’t help, we’ve tried that.

If you want to maximize the autonomy of a team, you should include the UI of a microservice in their responsibility. You want your team to become fully responsible for a business domain. The user interface is always part of that. Which takes us to the next thing to remember:

Don’t build monolithic single page applications

Along with the microservices hype came another one: the Single Page Application. The advancement of browsers and their Javascript engines, and emerging frameworks like AngularJS suddenly made it possible to build big applications in the browser. And then we all started building fat clients again, only this time in the browser and without Flash. But because we can do that, doesn’t mean we have to do it.

Whether or not SPA’s are evil or the future of the web isn’t the point here, but one mistake you shouldn’t make, is building a gigantic monolithic UI on top of your nicely separated, autonomous microservices. In my experience, doing that will make your project grind to a halt. Every feature you build must pass through this fat client, so your autonomy basically disappears.

When we realised this, and we started breaking apart the user interface of our system in small applications (call them micro-apps if you will), we saw a dramatic increase in the velocity of our teams. Breaking apart the UI was achieved by following the example Spotify set: by using iframes. All our microservices now have their own user interface, in most cases not built with fancy Javascript frameworks, but using Thymeleaf and progressive enhancement, and we are able to integrate most of them at the UI level, thanks to things like HTML5 drag-and-drop, websockets and message buses.

Don’t start with too many people

Ah, the mythical man-month. We have all known for decades that adding more people to a late project will make it later. And yet, every company keeps making the same mistake. Microservices are very useful if you have to scale, but that advantage doesn’t come immediately. Your architecture has to be quite mature before you can scale out your teams. So, just like you should start with a monolith, you should also start with a small team. As you create value and your team matures, you can scale, but not earlier.

Don’t overengineer

This rule of course applies to everything, but seems to be more easily forgotten in a microservice environment than anywhere else. Especially in a greenfield microservice project (which you shouldn’t do, remember?). The thinking goes a bit like this:

  • This time we will do it The Right Way

  • We won’t make the same mistakes as the previous developers, who built a piece of crap and were obviously idiots

  • This microservice will be used by lots of other consumers, so we must make it capable to fulfill every future use case we can think of.

  • We do microservices now, so we should do DDD, so we definitely should do CQRS, which means we can’t possibly build anything without the Axon Framework! And while we’re at it, let’s use every design pattern we ever learned, then it will surely be 'future-proof'.

Sorry for the cynicism, but this is something I have witnessed my entire career, so I tend to get worked up about it. Just keep it simple. The best way to do this, is to focus on value. Once we asked our teams to deliver value in production after each sprint, and we had metrics in place as evidence for the value they delivered, we saw much simpler designs, which could handle change much better than the old ones that tried to be future-proof.

Resilience is easily forgotten

A distributed system is not more resilient to failure than a monolithic one, quite the opposite. There are many more points of failure you have to take into account. Don’t count on everything working correctly, but build for failure. When doing microservices you have to be a good citizen and expect others not to. In other words, you should build for failure. Companies like NetFlix enforce this by having their Chaos Monkey continually testing the resilience in production. I think you should start using these tools as early as possible, so building for failure becomes a natural reflex for your developers.

When thinking about resilience, the first thing developers start doing is add Hystrix to the service clients, as a circuit breaker implementation. This is of course very good, but don’t just stick to using this with default settings. Think about what fallbacks you want when an underlying service fails, and make sure you involve the product owners as well. Also, circuit breakers are not enough. They make a consumer resilient to the failure of a producer, but they don’t protect a producer from a badly functioning consumer. Add this kind of protection by building in throttling and bulkheads, or by working event-driven. You can control situations like this more easily in a reactive system than when you have all your communication done synchronously, with REST APIs. Avoid building a big ball of mud, consisting of microservices.

Don’t forget the testing pyramid

The testing pyramid, developed by Mike Cohn, states that you should have much more low-level unit tests than high-level end-to-end tests. End-to-end tests take much longer to run, and are usually more brittle.

We have repeated this concept time and time again to all our developers, but somehow, end-to-end tests keep taking the upper hand. Is it a lack of confidence in the many components of a flow? I don’t know, but it’s clear that good testing practices are doubly important in a microservice architecture.

Conclusion

I have tried to describe some pitfalls you might encounter building microservices, especially when you work in an enterprise environment. If you’re working in a startup and in the public cloud, your experiences might be completely different from mine. Nevertheless, I hope this blog post can help you be productive faster. Although it may seem otherwise, I still strongly believe in microservices, but like everything, they are not a silver bullet.