Running Docker in Production: three use cases and the good, the bad and the ugly

At the first Docker Helsinki Meetup for 2017 on January 17th, we heard about three interesting use cases of the ins and outs of how Docker is used in production at Solita, Zalando and Pipedrive. We also heard about the good, the bad and the ugly of running Docker in production.

Solita's Use Case

Firstly, Heikki Simperi from Solita explained how his company uses Docker to handle the various apps and systems associated with their management of the Finnish National Railway Service (VR), as well as the problems they've run into when using Docker.

Solita runs a variety of their own apps on Docker, including a navigator for train drivers, a construction notification system and a traffic controller app. Heikki reiterated that it's important for the downtime to be minimal when the apps involved are ones responsible for railway management and that "anything over a 3-5 min downtime causes delays for trains, but nobody dies".

Most of the issues they experienced using Docker were related to building an image, having a private repository, removing or starting up a container, or a bug inside an app. Overall, the running usage of the Docker platform has been stable, with "zero downtime".

The problems have lessened after upgrading to each new Docker version and they are experiencing less and less problems as time goes on. They are looking forward to the release of Docker 1.12 on a stable channel from Red Hat.

Zalando's Use Case

Secondly, Rami Rantala delved into how Docker is used at Zalando by all their different teams and projects to deploy production systems.

At the Zalando Helsinki office, which has more than 100 people working across several teams, they utilize several different production systems and work on projects related to search, same-day-delivery and partners etc., and take the Radical Agility approach to running their platforms.

Each autonomous team has their own AWS account and go against the advice that you shouldn't run Docker in production; according to Rami "anything that runs in AWS runs with Docker", with Docker being the only allowed method to deploy production systems at Zalando. They also have their own Docker repo (Pierone), their own Docker base images and run one container per instance and always deploy a full stack.

Some areas with room for improvement include: deployments are slow, they lose some of the benefits of Docker with container communication layers, and the expense of running one container per instance, as this means there are thousands of instances. Another issue identified is that Docker sometimes breaks Pierone.

They are currently looking into using Kubernetes, as it looks likely to lead to reduced costs, require less AWS accounts and infrastructure, lead to more efficient container communication and faster and simpler deployments. They piloted this changeover during Hackweek and expect the production roll-out to start during Q2.

Pipedrive's Use Case

Thirdly, Renno Reiurm discussed the pains and gains running Docker live at Pipedrive, which helps small businesses control the complex selling process. Founded in 2010, it now has more than 30,000 paying customers worldwide, employs more than 200 people and has offices in Tallinn, Tartu and in New York City.

Pipedrive uses Docker as they experienced growth pains with Chef, as recipes in are written only rarely. This makes it easy to forget how it's done and often entails learning a new language and new tools, a proven entry barrier.

Their early Docker platform started evaluated running Docker inside a Vagrant box, but instead they switched to a custom built docker-machine and have lately moved to Docker4Mac.

The first Docker builds were using Codeship Docker CI beta and entailed the first usage of Tutum (Docker Cloud) as the orchestration service.

This test usage had a number of drawback, including CI processes with Codeship were slow and the Docker build itself took 15 minutes. Additionally, the deployment in the Docker Tutum cluster took another 10 minutes. Sometimes it was so slow they wondered if it was actually working. They also ran into stability issues, "data loss" and "service downtime".

With the need to improve the speed of the CI process and to improve the reliability of Docker's infrastructure, Docker infrastructure v2.0 was born.

Some drawbacks that Pipedrive experiences when running Docker include: the length of time it takes to build, test and deploy Docker containers; that consumers should connect only to healthy services; everyday maintenance of Jenkins jobs and that containers handle 10 000 connections and constant high loads.

Some benefits of using Docker include: the evolution of applications - they are generic enough to run in multiple regions and environments; the time from idea to live can be reduced from 2 weeks to one day; and servers vs. services are managed asynchronously.

Pipedrive's container use continues to grow: with 70 in-house built Docker services, 90 Docker images, 500 containers running, 3200 container deploys since October. Each day, there are 30 container deployments daily and one new container is born at Pipedrive.

Renno's recommendations for going live with Docker: take care of the OS, read GitHub issues, read from the source, keep it up to date and (performance) test it.

The Good, the Bad and the Ugly

Finally, Jari Kolehmainen's thought provoking presentation about Docker: The Good, The Bad and The Ugly, gave rise to the question of how to avoid "the bad and the ugly" when running Docker in production. One such way is to use the Kontena Container and Microservices Platform - as it works on any cloud, is easy to setup and simple to use.

Jari mentioned how the first steps in using Docker in production can be "like a wild rollercoaster, you have ups and downs" but that the benefits outweigh this and in “the end you will have great success".

For those of you who aren't yet running Docker in production, perhaps you're running it in a test environment etc., then the first step is to choose the right path which is critical for moving toward production. Oftentimes the path is predetermined, whether by the project constraints (you have to use a datacenter or some cloud service etc.), but if you're free to choose, then you have three main options: DIY, rent an actual cloud service or you can use a pre-made platform, such as Kontena.

In the DIY model, you have an engine, such as the Docker engine, which you try to tune and tweak, and to build the actual "car" with your team or by yourself.

Generally this "sounds like fun stuff to do and you can control everything and it just works". After spending some time on the actual engine tuning and you go to production with all the things in place, sometimes you might end up with a car with a great engine, lots of duct tape, climate control that overall works, but you never know when it breaks down.
(PICTURE)

Jari's advice is "don't do it". If you're new to running Docker in production, the DIY option can be a useful learning tool but it's better not to do it yourself as you'll likely only want to work through the DIY process once.

The cloud rental option includes everything from a provider such as AWS, Azure, Kubernetes etc., all of which are fine options. As when taking a taxi, you only pay a set amount and you get the whole system ready to use for the purchase price, without the need to maintain anything. This might be more suitable for some individual use case than for others.

But if your project needs to be run inside a data center or some place that doesn't have these options available, then the third option is likely the correct one for you.

Recommended platforms include: Docker Swarm (new), Kubernetes, Kontena and DCOS. All of which are good options when you don't want to DIY. You can pick one of these or another such option, but DIY isn't recommended.

Platforms such as Kontena, act like a "pre-made, ready to drive car" which allows you to focus on your applications rather than on what's under the car's hood. They include most of the features you need to complete your daily tasks built-in and have different focuses, e.g. Kontena focuses on ease of use and ease of maintenance, others may focus on scalability.

To you as a DevOps person, these features mean less maintenance is required, as the platforms come with "all batteries included" and are "battle tested" which really matters in production.

Docker engine

There are some things to take into account, even if you’re using one of the previously mentioned platforms and you install the actual LINUX distribution and if you install the docker engine and then the platform on top of that.

When you’re reinstalling the Docker engine the first time to Red Hat, Ubuntu or any of these distributions, most likely the default settings for the engine are not right for the actual production usage. You can check Google for solutions and tweak the settings so that the engine can handle the actual load you need to have in production. The Docker engine just runs the containers, it doesn’t do any clean up, so you need to have a configuration in place to manage the clean up tasks of logs, containers, volumes etc.

If you don’t want to tweak the default settings very much, then Jari highly recommends that you use a distribution that is made for containers. One good option is Core OS, called Container Linux, Red Hat has their Atomic distribution platform and SUSE also has their own custom-made option for containers. Usually these types of distributions will have better defaults that might actually work in production.

One critical thing that you must check is the graph driver, if you’re using a recent kernel version you can use Overlay2. It’s the "hot thing of today", but this may change with the next kernel release. It’s the fastest option and you shouldn’t run into many problems using it, most distributions ship with overlay1.

Overlay by default has some downsides but you can use it in production. If you're using Ubuntu then you can likely switch to Aufs if you install the extras package for your kernel then you can switch to that and it also works as well as Overlay2

The “ugliest” part has been the device mapper, which Jari thinks still comes as a default with Red Hat and generally causes pain. You can use the device mapper if you know how to configure the device mapper's internals.

The Docker engine has a cool feature called “plugins”. Plugins can be used to provide overlay networks and provide volume storage for the engine, but there is one downside - you can’t run these plugins in a container, although they say that you can, you actually can’t. So try to avoid it.

Starting from Docker version 1.13 a plugin architecture v.2 is included which should solve most of theses problems.

A good rule for production is keeping everything up-to-date. This includes the Docker engine, as well as the kernel; not just for bugs, but for security as well.

CI/CD Pipeline

When you are running containers or Microservices in production, usually you have a bunch of services that you try to tackle and if you don’t have a well-built pipeline in place “you will basically go mad”. There are so many steps and so many things you must take into account when you are moving these containers from different stages to production; if you don’t automate the process, it will be difficult.

Basically there are three stages: the build phase, test phase and deployment phase.

In the build phase you shouldn't build or run these images on your own machines. Instead you should have a CI system that builds, tests and finally deploys them to the correct environment.

Some useful advice: you should script everything (this is not container specific) from build to test, to deployment. Additionally, every configuration and script that you have should be version controlled. If you don’t have this in place, you'll run into trouble when you start running things in production. Lastly, do not put secrets in the configuration files; this simply isn’t a good practice.

You can use a platform that provides support for storing your secrets, as it provides a way to handle them between different environments and different deployments.

Some good pipeline tools

Jari’s personal fave: drone - it’s very good as it forces you to think that everything is a container.

*The whole pipeline inside drone is run as containers. If you have been using something like Travis, it's very similar, but you can run it inside your own data center. Jenkins is another well known option, however it’s a bit more complex and isn’t designed for containers but can be used it. The last option is Gitlab CI - while Jari doesn’t have personal experience with using this tool, he has heard many great things about it.

In this pipeline example, señor developer is trying to use the pipeline in a similar way as we do at Kontena when we are building our own cloud services.

The developer is packing the next big thing and when he pushes the changes to GitHub, drone is integrated there and it will get the web hook, trigger the build, run the tests inside the pipeline in the containers. If everything is fine, it will push the actual docker image to the internal registry. Finally, it will trigger the deploy to the Kontena Master to the right environment depending on whether it’s a release etc. Usually it might take only a couple of minutes from Git-push to actual deployment to production. Kontena Platform acts as an intermediary in this instance and it will take care of all the rolling deployments, and handle the load balancer configuration. Therefore you won’t need to try and fit every piece together manually.

Security

The first few times you use containers in production you’re likely just to be happy to have something running in production and you may not really think about security and how it should be handled. However, this should be taken into account right from the testing and staging environment.

The Audit trail

(e.g. as found in Kontena) allows you to track changes to the system and who they were made by.

Security patches

The container native OS the best recommended ones to use, for e.g. Core OS will automatically update the hosts and reboot when updates are done. You can also use configuration management such as Chef, Ansible or Puppet to handle security patches. You should invest some time for image scanning services that help identify security issues in your Docker images. For example, Docker Hub and Quay.io by CoreOS provide this kind of functionality.

Some platforms provide network security as a feature (e.g. Kontena does). They might encrypt all the traffic between hosts or between data centers. Additional security measures that some platforms include are: creating network segments, defining policies, and as a last resort you can always try and configure firewalls.

Prepare for chaos

Chaos is a fact of life; hosts fail, engines fail, containers fail and your app can crash. Being properly prepared for "controlled chaos" means that when it happens, it's not such a big deal.

When planning for production it is highly recommended to do it in such a way that you can "kill" any host, any node, at any time so that at least one host can be forced down by sheer brute force and everything will still be fine. Another piece of advice is that if you're using one of the previously mentioned platforms, trust the scheduler, use clustered databases and outsource the state if possible to make running Docker in production as smooth as possible.

You can view the slides from Jari's presentation here.

About Kontena

Kontena, Inc. is the creator of Kontena, an open source, developer-friendly container and microservices platform. Kontena is built to maximize developer happiness by simplifying running containerized applications on any infrastructure: on-premises, cloud or hybrid. It provides a complete solution for organizations of any size. Founded in March 2015, Kontena was recognized as one of the best new open source projects in the 8th annual Black Duck Open Source Rookies of the Year Awards. For more information, visit: www.kontena.io.

Image credits: Fish Boxes by GormGrymme.