Using NodeSelector to Schedule Deployments with large volumes of Stateful Data on Kubernetes

Image for post
Image for post

I recently migrated an application that reads from a very substantial data store to Kubernetes; the big challenge was that my standard Kubernetes nodes looked something like:

However, its data-store is something like 4–6TB, and since the application’s primary function is to stream (almost entirely read activity — writes happen from a Kubernetes Job, which I won’t cover here, but the principle will be the same) much of this data, the latency for something like an NFS-backed volume was vaguely too high, so I mounted the block devices (containing replicas of the data) to 2 of the hosts, and planned to schedule the Deployment to target only those two nodes.

In this case, these are block devices dedicated to this application, mounted to the Kube nodes I want to target, and while there are more involved/less-brittle configurations for this kind of setup, I elected to use the mountpoint for the devices, (containing a directory called pp-data`), as a volume.

The benefit here is that, with a volume, the deployment can also fail if the device isn’t mounted, and likewise if the scheduling rule (which I’ll cover in a moment) doesn’t work, the Deployment won’t be run (because that path won’t exist):

So, this hostPath volume will depend on the block device being mounted in order for the directory to be present, whereas other host volume related types may not detect this (i.e. will create a missing directory, and there are a couple of other types for itself that can, indeed, target a specific block device — which may not be ideal for a few reasons, but does exist if you can count on consistent naming, UUIDs, etc.-).

Now that our manifest knows how to make use of the data volume on the host, we can use to tell the API server to only schedule this resource on nodes that contain this large volume:

What does here is check for hosts with the label applied, and if that has a value of ; your label, and the accompanying value can be set to whatever you’d like. You can also similarly label and taint nodes to prevent or explicitly set affinity for scheduling rules:

So, in my case, I took the node IDs from :

I know that these two nodes have the big volume attached, so I’m going to apply my label to them:

and then you can validate by applying the label to your kubectl command to list the nodes with that label attached:

Let’s go back to our manifest, which should look like this completed:

and go ahead and apply:

Once it applies successfully, you can validate that the pods were scheduled per your rules, using the labels on the application above ( i.e. ):

You’ll see that the Nodes containing the 2 replicas of the Pod you defined in your Deployment above ended up on the 2 nodes containing the host volume we specified, using that node label.

Written by

Systems Engineer

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store