another man's ramblings on code and tech

Jobs in Kubernetes

Unfortunately for any potential job seekers that ended up here, Jobs in the Kubernetes (k8s) context are a type of controller (like a deployment) that allows for running only a certain number of containers until a certain threshold of runs has been reached. This differentiates it from the other types in that it focuses on number of runs; the other kinds focus on number of pods available. This makes them a great option for batch loads. Here I’ll go over some interesting uses of Jobs.

restartPolicy and backoffLimit

The major advantage with Jobs is that one can set restartPolicy: Never and backoffLimit: 0. The restartPolicy is a parameter that normally can’t be changed; in every other type of k8s deploy, if a pod randomly goes down, the only policy is to bring them right back up as soon as possible. In the Job type, however, one can tell it to give up restarting. backoffLimit goes hand-in-hand with this other option; it says if there is a failure, how many times should I retry before giving up. So, if there is a failure, and you do want it to retry sometimes, then you can define how many retries with backoffLimit.

This is incredibly useful. Together, with the limit set to 0 and restarts set to never, you get the only template in k8s that will work to run through a series of containers from start to end only once.

parallelism and completions

These two options control how much work to do and with how many resources. The parallelism option, as expected, controls how many containers will be allowed to run at once in this Job. The completions option controls how many successful runs should be counted before exiting. This allows one to control the flow of work.


An example from my lab to generate a one time load would be:

apiVersion: batch/v1
kind: Job
    name: load-test-tool-job
        app: load-test-tool
    parallelism: 10
    backoffLimit: 0
            name: load-test-tool
            app: load-test-tool
        restartPolicy: Never
        hostNetwork: false
# ...

This type of Job is useful in one-time load tests, as I can tune parallelism to define how many containers should run at once to create different types of load.

Date: April 9th at 12:40pm