Jobs in Kubernetes
Unfortunately for any potential job seekers that ended up here, Jobs
in the Kubernetes (k8s) context are a type of controller (like a deployment) that allows for running only a certain number of containers until a certain threshold of runs has been reached. This differentiates it from the other types in that it focuses on number of runs; the other kinds focus on number of pods available. This makes them a great option for batch loads. Here I’ll go over some interesting uses of Jobs
.
restartPolicy
and backoffLimit
The major advantage with Jobs
is that one can set restartPolicy: Never
and backoffLimit: 0
. The restartPolicy
is a parameter that normally can’t be changed; in every other type of k8s deploy, if a pod randomly goes down, the only policy is to bring them right back up as soon as possible. In the Job
type, however, one can tell it to give up restarting. backoffLimit
goes hand-in-hand with this other option; it says if there is a failure, how many times should I retry before giving up. So, if there is a failure, and you do want it to retry sometimes, then you can define how many retries with backoffLimit
.
This is incredibly useful. Together, with the limit set to 0 and restarts set to never, you get the only template in k8s that will work to run through a series of containers from start to end only once.
parallelism
and completions
These two options control how much work to do and with how many resources. The parallelism
option, as expected, controls how many containers will be allowed to run at once in this Job
. The completions
option controls how many successful runs should be counted before exiting. This allows one to control the flow of work.
Example
An example from my lab to generate a one time load would be:
---
apiVersion: batch/v1
kind: Job
metadata:
name: load-test-tool-job
labels:
app: load-test-tool
spec:
parallelism: 10
backoffLimit: 0
template:
metadata:
name: load-test-tool
labels:
app: load-test-tool
spec:
restartPolicy: Never
hostNetwork: false
containers:
# ...
This type of Job
is useful in one-time load tests, as I can tune parallelism to define how many containers should run at once to create different types of load.