Automatic etcd cluster member replacement

Kontena container platform has used etcd pretty much from the beginning. It is one of the key technologies used by Kontena's service discovery and dynamic load balancing for services.

As most people familiar with etcd know, it's not the most easy or intuitive system to setup or configure properly. In Kontena, etcd is considered as internal technology and there shouldn't be any need for developers to touch it. It's mostly operating in background providing the essential distributed k/v storage used by the platform. While etcd works as promised, there has been an issue burdening us for very long time: member replacement.

In cloud, your servers will come and go, usually without any notice. When some of your nodes terminate (for any reason) you typically want to replace those with zero downtime and 100% automation. There is a problem with etcd: it does not support automatic member replacement by default, but for us at Kontena, the full automation and developer happiness is something that we truly value. So, fully automated etcd member replacement: Challenge accepted! :)


Why replacement

In etcd there's something called initial cluster. The initial cluster defines the initial set of member nodes in the cluster and if more nodes join they start in proxy mode. The initial cluster is typically 3, 5, 7 or even 9 nodes. If now this initial cluster shrinks because of your nodes disappearing to the abyss of the cloud you could end up in situation where you lose majority of your cluster. And when you lose majority in etcd, or any RAFT based system, things go south pretty nastily. Better to replace the nodes.

In Kontena when you remove, or the cloud removes on your behalf without asking for permissions, a node the platform will automatically try to "recycle" roles in the cluster. So whenever you spin up replacing node we can now actually replace the lost member in the etcd cluster fully automatically.

The process

According to etcd docs the process itself is pretty straightforward:

  1. Remove the membership information of the lost node
  2. Add new member to the cluster
  3. Start the new member

All the documentation states that this is a manual process and has been designed so. It is by design to have the steps separated so that first you need to inform the cluster about the new configuration after which you can only start the new etcd node. But whenever we see and API it means we can automate it. :) And the membership API is pretty easy to use. The hardest part was to figure out how to determine the correct parameters for the replacement node.

The implementation

As explained, Kontena tries to replace the cluster roles automatically. So whenever you spin up new node in the grid we try to use that to replace any lost node. This is also, and especially, true for the etcd membership. When the Kontena agent spins up, the first thing it does it calls home to the Master. The Master responds with details for the new node and part of this information is the expected etcd membership status. Now before actually bootstrapping the local etcd, (in a container, naturally), the agent tries to contact some existing etcd nodes in the cluster. If it can connect and finds membership information the agent can determine if we are actually replacing an existing node in the cluster, bootstrapping "extra" node or bootstrapping the first nodes in the initial cluster.

The results

Now we are able to replace lost Kontena grid members without any manual intervention. One bullet more for our long list of developer happiness features. This feature is introduced first time in release 0.14.0.

But we're not stopping here. We are still investigating if and how we could automate dynamic resizing of the etcd cluster and if we could also automatically recover from a etcd member majority failure.

About Kontena

Kontena is a new open source Docker platform including orchestration, service discovery, overlay networking and all the tools required to run your containerized workloads. Kontena is built to maximize developer happiness. It works on any cloud, it's easy to setup and super simple to use. Give it a try! If you like it, please star it on Github and follow us on Twitter. We hope to see you again!

Image Credits: chaos by Jeff Laitila

Jussi Nummelin

Read more posts by this author.

Tampere, Finland
comments powered by Disqus