Event Sourcing Microservices with Kafka

Like many software engineers, I come from the world of enterprise software development. In this world design patterns and software architecture are king. I’ve spent countless hours studying the principals of SOA, Domain Driven Design, CQRS and all the wonderful enterprise architecture patterns laid out for us by Martin Fowler. One pattern that always struck me as both fascinating and fundamentally hard was Event Sourcing:

Event Sourcing ensures that all changes to application state are stored as a sequence of events. Not just can we query these events, we can also use the event log to reconstruct past states, and as a foundation to automatically adjust the state to cope with retroactive changes.

That sounds great and all but, how do we actually use this data? How do we query it? Where does it get stored? How do I manage the sequence of events on disk? Do I have one giant “events” table, or one table per event type? Is it centralized or does each service store it’s own events? And, examples of real world event sourced systems that could be studied were few and far between.

Meanwhile, over in Silicon Valley, the “Internet Companies” (Google, Netflix, Twitter, etc.) were busy trying to figure out, how do we make these applications scale to then-unprecedented levels of data and users? Sure, posting or consuming a 140-character Tweet might sound easy, but try making it work for a billion people, all at the same time. With realtime analytics. And fast responses. And no fail-whale. On top of that, the act of developing software itself needed to be scaled as companies rushed to build up massive teams of engineers. From this side of the world we started to see the concept of Microservices, where a “monolithic” application is broken out into individual services that can be developed, hosted and scaled independently. The interesting thing about these Microservices, as an architectural pattern is, they actually start to look a whole lot like SOA and Domain Driven Design from the Enterprise.

During this time, engineers at LinkedIn created and open-sourced Kafka. Kakfa is a high performance, distributed, immutable event log with a publish/subscribe style interface. And, as it turns out, this is exactly the missing link for Event Sourcing.

In the rest of this post, I’m going to describe a microservice application using Kafka as the primary data store. Please also head over to the accompanying Github repository for an example demo application code, along with a guide for setting it up yourself on your own Kontena Platform to try it out. Getting Kafka up and running might be easier than you think.

The Scenario

Let's imagine an e-commerce application, which is a collection of microservices and some front-end application.

alt

We have services for managing Customers, Products and Purchases. Classic Domain Driven Design says that each of these are Aggregate Roots and should manage their own data, storing only references (Ids, basically) to other aggregates. Of course, microservices are supposed to own their own data, and this is the key to why Domain Driven Design really comes into its own when working on a microservice architecture. Each microservice is just an Aggregate Root, hosted and deployed on its own.

So, we are keeping our aggregates separate, Eric Evens is finally happy with us, and the world is now our oyster. That is, until it's time for some complicated queries. Imagine an internal admin dashboard, which requires some page to show us all products purchased by customers, with some fancy search. We would need to make 3 different REST calls to 3 different services, and join that data in-memory. That might not sound too bad, but imagine what happens when we include just a few more services in that query. It gets ugly, fast.

Materialized Views to the Rescue

Hopefully you are already familiar with the concept of a materialized view in a relational database. In short, it is some query who’s results are stored on disk and/or memory as if they were a table of their own. And, if you really think about it hard enough, under the hood even a database table is just a materialized view of events written to the database’s transaction log, which stores all incoming changes as an append-only log. That transaction log sounds an awful lot like how Kafka works. If that sounds interesting to you, I highly recommend you read this fantastic article by Martin Kleppmann which explains it far better than I ever could.

So, if we start thinking about Kafka topics as the transaction log of our microservices, what are the tables? We have a lot of flexibility here. Maybe we store our current state in an RDBMS. Or maybe in some denormalized form in a document database. Or, to take it to the extreme, how about in plain memory? Gasp, you say! What happens when we start a new instance of our microservice?! The trick is in the details of how Kafka works. Unlike more traditional pub/sub systems (think RabbitMQ or Redis), Kafka stores all of the messages it has received in a topic/partition with a sequential index , and it is up to the consumer to store this index and determine where they would like to start reading from (don’t worry, most of the Kafka client libraries make this easy to deal with). So, if you want to use Kafka like RabbitMQ, just always read from the end, getting new messages pushed to you as they come. But how about instead we always read from the beginning? That means, when our service starts up, we just replay every single event that has happened, building up our state until we are done. Once the messages have been replayed, we now how an in-memory representation of our aggregates, ready to be queried. And, what if, after processing these events and storing our aggregates, we now publish out another event, this time with the current state of our entity after business rules and other things have been applied.

alt

This second event contains the stream of entities our service exposes. Other services can subscribe to this topic and store their own version of this data as they see fit. For our scenario before, our frontend’s data store is just another materialized view of real business data. We can have nice fast local data access, and we can treat this data like a cache, ready to be rebuilt at any time from our event store.

alt

The thing to remember is to differentiate the types of events:

1) Events that are the direct result of some user interaction or command. These events are your event source, and the related topics should be owned by only one service.

2) Events that are the result of a command, used to publish the current state of the service and its entities. These topics can be subscribed to by all interested parties, including possibly the microservice itself so that it can repopulate its own state quickly at startup.

An interesting variation of this architecture is to use your favorite Actor framework (such as Akka for JVM or .Net, Celluloid for Ruby, etc.) where each Actor represents a single instance of your aggregate, possibly sharded across a cluster of service instances. As messages come in your service, they are sent to the appropriate actor based on the event’s key.

alt

Of course I’m painting a much simpler and rosier picture than the reality of a production system. There are a lot of valid questions that will come up when trying to build a system like this. Let’s try and tackle a few.

Won’t I Run Out of Disk Space?

Storing each and every event since the dawn of time can lead to some serious resource usage, especially hard disk space. Luckily, Kafka topics offer some fairly advanced data retention policies. For instance, we can say that a topic is capped at some amount of bytes, some length of time, or not capped at all. In addition, Kafka offers a particularly nice feature called log compaction. In a nutshell, log compaction allows us to always store the latest value in the topic for any given key. That way we can save disk space while still retaining the most recent version of a record. Of course this isn’t exactly perfect for event sourcing, which wants us to store all of the events. A nice tradeoff might look like this:

1) Any topic that stores incoming events to a service, generally end-user commands, should be stored forever if possible. This is your event source.

2) As your microservice processes these events, applies business rules and updates the state of your aggregates, you publish out the current state of that aggregate to another Kafka topic. This topic can have a data retention policy using log compaction, and this is the topic that other services will consume to generate materialized views.

What About Consistency?

The key here is eventual consistency. In a distributed system, this is just a fact of life. As your services asynchronously interact with one another via Kafka topics (or some other message passing transport), the service may have to validate this state itself. If issues occur, the service may need to publish a failure event that other services need to react to. In other words, your services may need to start asking for forgiveness rather than asking for permission.

What About HTTP?

Even without event sourcing, the question of HTTP and request/response often comes up when discussing asynchronous/reactive systems. If everything in my system is just asynchronously passing and responding to messages, how does my HTTP-based frontend handle this? My preferred option is to use websockets. You can still use a “normal” HTTP-based REST API for issuing commands to the server (say POST /shopping_cart), but that request will just trigger some asynchronous events and return some sort of transaction_id. The frontend can then show the user that something is happening, maybe a spinner or maybe just some notice somewhere on the page. In the back end, various status events could be published for this transaction as it is processed by your backend services. These event topics are subscribed to by your websocket server, which can associate the event with an active user and push the event down to the client.

alt

The important concept to remember is, you need to not rely on your REST API’s response as the indication of a transaction being complete, only the indication that it has begun.

Wrapping Up

I hope that you found this write up of Event Sourcing and Kafka interesting. Remember to go to the Github repository to learn more about the demo application, and follow the steps so you can run it yourself and see all of the concepts in action. Until next time!

About Kontena

Want to learn about real life use cases of Kontena, case studies, best practices, tips & tricks? Need some help with your project? Want to contribute to a project or help other people? Join the Kontena Forum to discuss more about the Kontena Platform and Kontena Cloud, chat with other happy developers on our Slack discussion channel or meet people in person at one of our Meetup groups located all around the world.