Skip to main content

To run or not to run a database on Kubernetes: What to consider

Today, more and more applications are being deployed in containers on Kubernetes—so much so that we’ve heard Kubernetes called the Linux of the cloud. Despite all that growth on the application layer, the data layer hasn’t gotten as much traction with containerization. That’s not surprising, since containerized workloads inherently have to be resilient to restarts, scale-out, virtualization, and other constraints. So handling things like state (the database), availability to other layers of the application, and redundancy for a database can have very specific requirements. That makes it challenging to run a database in a distributed environment. 
However, the data layer is getting more attention, since many developers want to treat data infrastructure the same as application stacks. Operators want to use the same tools for databases and applications, and get the same benefits as the application layer in the data layer: rapid spin-up and repeatability across environments. In this blog, we’ll explore when and what types of databases can be effectively run on Kubernetes.
Before we dive into the considerations for running a database on Kubernetes, let’s briefly review our options for running databases on Google Cloud Platform (GCP) and what they’re best used for.

  • Fully managed databases. This includes Cloud SpannerCloud Bigtable and Cloud SQL, among others. This is the low-ops choice, since Google Cloud handles many of the maintenance tasks, like backups, patching and scaling. As a developer or operator, you don’t need to mess with them. You just create a database, build your app, and let Google Cloud scale it for you. This also means you might not have access to the exact version of a database, extension, or the exact flavor of database that you want.
  • Do-it-yourself on a VM. This might best be described as the full-ops option, where you take full responsibility for building your database, scaling it, managing reliability, setting up backups, and more. All of that can be a lot of work, but you have all the features and database flavors at your disposal.
  • Run it on Kubernetes. Running a database on Kubernetes is closer to the full-ops option, but you do get some benefits in terms of the automation Kubernetes provides to keep the database application running. That said, it is important to remember that pods (the database application containers) are transient, so the likelihood of database application restarts or failovers is higher. Also, some of the more database-specific administrative tasks—backups, scaling, tuning, etc.—are different due to the added abstractions that come with containerization.
Tips for running your database on Kubernetes

When choosing to go down the Kubernetes route, think about what database you will be running, and how well it will work given the trade-offs previously discussed. Since pods are mortal, the likelihood of failover events is higher than a traditionally hosted or fully managed database. It will be easier to run a database on Kubernetes if it includes concepts like sharding, failover elections and replication built into its DNA (for example, ElasticSearch, Cassandra, or MongoDB). Some open source projects provide custom resources and operators to help with managing the database.
Next, consider the function that database is performing in the context of your application and business. Databases that are storing more transient and caching layers are better fits for Kubernetes. Data layers of that type typically have more resilience built into the applications, making for a better overall experience.  
Finally, be sure you understand the replication modes available in the database. Asynchronous modes of replication leave room for data loss, because transactions might be committed to the primary database but not to the secondary database(s). So, be sure to understand whether you might incur data loss, and how much of that is acceptable in the context of your application.
After evaluating all of those considerations, you’ll end up with a decision tree looking something like this:

How to deploy a database on Kubernetes

Now, let’s dive into more details on how to deploy a database on Kubernetes using StatefulSets. With a StatefulSet, your data can be stored on persistent volumes, decoupling the database application from the persistent storage, so when a pod (such as the database application) is recreated, all the data is still there. Additionally, when a pod is recreated in a StatefulSet, it keeps the same name, so you have a consistent endpoint to connect to. Persistent data and consistent naming are two of the largest benefits of StatefulSets. You can check out the Kubernetes documentation for more details.
If you need to run a database that doesn’t perfectly fit the model of a Kubernetes-friendly database (such as MySQL or PostgreSQL), consider using Kubernetes Operators or projects that wrap those database with additional features. Operators will help you spin up those databases and perform database maintenance tasks like backups and replication. For MySQL in particular, take a look at the Oracle MySQL Operator and Crunchy Data for PostgreSQL. 
Operators use custom resources and controllers to expose application-specific operations through the Kubernetes API. For example, to perform a backup using Crunchy Data, simply execute pgo backup [cluster_name]. To add a Postgres replica, use pgo scale cluster [cluster_name].
There are some other projects out there that you might explore, such as Patroni for PostgreSQL. These projects use Operators, but go one step further. They’ve built many tools around their respective databases to aid their operation inside of Kubernetes. They may include additional features like sharding, leader election, and failover functionality needed to successfully deploy MySQL or PostgreSQL in Kubernetes.
While running a database in Kubernetes is gaining traction, it is still far from an exact science. There is a lot of work being done in this area, so keep an eye out as technologies and tools evolve toward making running databases in Kubernetes much more the norm. 
When you’re ready to get started, check out GCP Marketplace for easy-to-deploy SaaS, VM, and containerized database solutions and operators that can be deployed to GCP or Kubernetes clusters anywhere.


Popular posts from this blog

People are going wild for a handy new shortcut that will change the way you use Google Docs

- Google has introduced new URLs that can open up blank Google Docs with the click of a button. - To try it out, simply point your browser to  or other Google URLs. - Here's an incomplete list of these new URLs, along with a way to take the shortcut to the next level. Last month, Google rolled out a new time-saving shortcut for anyone who spends a lot of time in Google Docs. To open a new, blank document — or spreadsheet, or presentation — all you have to do is go to one of Google's handy new URLs. So if you want to start a new document, you just have to type " " into your browser. Google Docs ✔ @googledocs Introducing a .new time-saving trick for users. Type any of these .new domains to instantly create Docs, Sheets, Slides, Sites or Forms ↓ 9:35 PM - Oct 25, 2018 4,550 2,812 people are talking about this Twitter Ads info and privacy Here&#

Set start times and import reminders in Tasks

Here comes one of the most awaited features. Tasks is one of the goals to follow what you have to do in G Suite. These new updates will help ensure the majority of your to-dos are in Tasks, and guarantee that you can monitor the due dates related with them. Moreover, importing reminders to Tasks can support your users if your association is at present changing from Inbox to Gmail. Set a date and time for your tasks and receive notifications - You’ll find a place to add date & time. Create repeating tasks - Also you can make an event recur. Import reminders into Tasks This import tool will pull your reminders (from Inbox/Gmail, Calendar, or the Assistant) into Tasks.When importing reminders into Tasks, we’ll copy over the title, date, time and recurrence of the reminder. Please note, reminders with locations associated will not be imported. Additionally, this is a one-time import and not a constant sync. - When you open Tasks on the web or your mobile app, you’ll se

Use Vault for Gmail Confidential Messages and Jamboard Files

Google vault will be supporting two new formats in the future, Gmail confidential mode emails & Jamboard files stored in Google Drive. Google Vault gives you a chance to retain, hold, search, and export data to support your organization’s retention and eDiscovery needs. This dispatch includes support for new information types with the goal that you can thoroughly oversee your association's information. What happens when individuals in your association sends confidential messages? Vault can hold, retain, search, and export all confidential mode messages sent by users in your association. Messages are constantly accessible to Vault, notwithstanding when the sender sets a termination date or denies access to private messages. Here’s an example of what will see in Vault when they search for and preview this email sent by . But It’ll not work vise versa. Admins can hold, retain, search and export message headers and s