KubeDojo

Rolling Updates and Rollbacks

AK
by Alexis Kinsella··20 min read
Rolling Updates and Rollbacks

Introduction

You push a new container image, the rollout starts, and three minutes later your on-call phone rings. The new version is crash-looping. Half your pods are down. You need to get back to the last known good state in seconds, not minutes.

This is what kubectl rollout undo exists for. But the rollback is only as useful as your understanding of what happened during the rollout. If you don't know how Kubernetes manages ReplicaSet revisions, which knobs control the update pace, and what happens when a rollout stalls, you're guessing. The CKAD exam tests this under the Application Deployment domain (20%): expect tasks that require you to update a Deployment's image, check rollout status, and roll back to a specific revision.

What the docs don't tell you: CoreDNS, ingress-nginx, Istio, and cert-manager all make different bets on maxSurge and maxUnavailable, and each choice reflects a specific failure mode the authors wanted to avoid. This article explains those choices, the rounding behavior that makes percentage values behave unexpectedly at low replica counts, and the six kubectl rollout subcommands that give you control when a rollout goes sideways.

How Rolling Updates Work

What Triggers a Rollout

A rollout starts when you change anything inside .spec.template: the container image, environment variables, labels, resource limits, volume mounts. The Deployment controller detects the template change, creates a new ReplicaSet, and begins shifting pods from the old ReplicaSet to the new one.

Changes outside .spec.template do not trigger a rollout. Scaling the Deployment (changing .spec.replicas) adjusts the current ReplicaSet directly. Updating annotations on the Deployment object itself has no effect on pods. This distinction matters on the exam: if a question asks you to "update the deployment" by changing replicas, that is not a rollout.

The ReplicaSet Dance

Every Deployment manages one or more ReplicaSets. When a rollout starts, the Deployment controller creates a new ReplicaSet with zero replicas and begins scaling it up while scaling the old one down. The pace of this transition is controlled by two fields: maxSurge and maxUnavailable.

With the default settings (25%/25%), a 4-replica Deployment updates like this:

$ kubectl rollout status deployment/web-app
Waiting for deployment "web-app" rollout to finish: 1 out of 4 new replicas have been updated...
Waiting for deployment "web-app" rollout to finish: 2 out of 4 new replicas have been updated...
Waiting for deployment "web-app" rollout to finish: 3 out of 4 new replicas have been updated...
Waiting for deployment "web-app" rollout to finish: 1 old replicas are pending termination...
deployment "web-app" successfully rolled out

During the transition, both old and new ReplicaSets exist. You can see them:

$ kubectl get rs -l app=web-app
NAME                  DESIRED   CURRENT   READY   AGE
web-app-6b7f6d4bc     0         0         0       12d
web-app-7c8d9e5fa     4         4         4       45s

The old ReplicaSet is scaled to zero but not deleted. Kubernetes keeps it around for rollback, up to the number specified by revisionHistoryLimit (default 10).

Rolling update ReplicaSet transition: old RS scales down as new RS scales up across four steps Figure 1: Rolling update ReplicaSet dance. With maxSurge=1 and maxUnavailable=1 (defaults on 4 replicas), Kubernetes transitions pods in steps — the old ReplicaSet scales to zero but is retained for rollback.

RollingUpdate vs. Recreate

Kubernetes supports two strategy types:

RollingUpdate (default): gradually replaces pods, maintaining availability throughout. This is what you'll use for most workloads.

Recreate: kills all existing pods before creating new ones. Zero availability during the switch. Use this when your application cannot tolerate two versions running simultaneously — a database migration that changes the schema, or a workload that holds an exclusive lock on a shared volume.

# Recreate strategy — all pods terminated before new ones start
spec:
  strategy:
    type: Recreate

CoreDNS, the cluster DNS addon, uses RollingUpdate with a conservative configuration to protect DNS availability:

# kubernetes/kubernetes — cluster/addons/dns/coredns/coredns.yaml.base
apiVersion: apps/v1
kind: Deployment
metadata:
  name: coredns
  namespace: kube-system
spec:
  # replicas: not specified here; tuned by DNS horizontal auto-scaling
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxUnavailable: 1
  # ...
  template:
    # ...
    spec:
      priorityClassName: system-cluster-critical
      affinity:
        podAntiAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
          - weight: 100
            podAffinityTerm:
              labelSelector:
                matchExpressions:
                  - key: k8s-app
                    operator: In
                    values: ["kube-dns"]
              topologyKey: kubernetes.io/hostname

Three things worth noting: maxUnavailable: 1 guarantees at most one DNS pod goes down at a time. The manifest omits replicas because the DNS autoscaler manages the count externally. And pod anti-affinity spreads replicas across nodes so a single node failure doesn't take out all DNS.

Tuning maxSurge and maxUnavailable

What Each Field Controls

maxSurge sets how many extra pods above the desired count can exist during an update. If your Deployment has 4 replicas and maxSurge: 1, Kubernetes can run up to 5 pods during the transition.

maxUnavailable sets how many pods can be down at the same time. With 4 replicas and maxUnavailable: 1, at least 3 pods are always running.

Both fields accept either an absolute integer or a percentage string. The defaults are 25% for both.

Rounding Behavior

Percentages round differently for each field. maxSurge rounds up (allowing more surge), while maxUnavailable rounds down (allowing fewer unavailable). With 3 replicas and 25% defaults:

  • maxSurge: ceil(3 × 0.25) = ceil(0.75) = 1 extra pod allowed
  • maxUnavailable: floor(3 × 0.25) = floor(0.75) = 0 pods can be unavailable

With 3 replicas and defaults, Kubernetes updates one pod at a time with zero downtime. It creates one new pod (surge), waits for it to become ready, then terminates one old pod.

warning: maxSurge and maxUnavailable cannot both be zero. At least one must allow movement, or the rollout has no way to make progress.

Four Patterns from Production

CNCF projects choose different strategies based on their failure characteristics. Here are four patterns, each reflecting a different operational constraint.

Pattern 1: Conservative (ingress-nginx, CoreDNS)

# kubernetes/ingress-nginx — deploy/static/provider/cloud/deploy.yaml
spec:
  minReadySeconds: 0
  revisionHistoryLimit: 10
  # ...
  strategy:
    rollingUpdate:
      maxUnavailable: 1
    type: RollingUpdate

Ingress-nginx uses an absolute maxUnavailable: 1 and lets maxSurge default to 25%. This guarantees exactly one controller pod goes down at a time regardless of how many replicas are running. For an ingress controller, losing multiple pods simultaneously creates traffic blackholes.

Pattern 2: Surge-first (Istio)

Istio's Helm chart defaults to aggressive surge values for istiod:

# istio/istio — manifests/charts/istio-control/istio-discovery/values.yaml
# (under _internal_defaults_do_not_set — Helm-managed defaults for istiod)
_internal_defaults_do_not_set:
  # ...
  rollingMaxSurge: 100%
  rollingMaxUnavailable: 25%

The result: maxSurge: 100% doubles the pod count during an update. All new pods come up before old ones drain. The rationale: when istiod restarts, every sidecar proxy in the mesh reconnects. With 100% surge, the new instances absorb reconnections without a thundering herd.

Pattern 3: No-surge (cert-manager)

# cert-manager/cert-manager — deploy/charts/cert-manager/values.yaml
# Deployment update strategy — ships empty, users configure for their environment.
#
# Recommended for resource-constrained clusters or leader-elected components:
#   strategy:
#     type: RollingUpdate
#     rollingUpdate:
#       maxSurge: 0
#       maxUnavailable: 1
strategy: {}

cert-manager ships with strategy: {} (Kubernetes defaults of 25%/25%) and recommends explicitly configuring maxSurge: 0 with maxUnavailable: 1 for resource-constrained environments. No extra pods are created: one old pod terminates, then one new pod takes its place. Since cert-manager uses leader election and only one controller is active at a time, losing the pod briefly during replacement is acceptable.

Pattern 4: Zero-downtime (metrics-server)

# kubernetes-sigs/metrics-server — manifests/base/deployment.yaml
spec:
  strategy:
    rollingUpdate:
      maxUnavailable: 0

Metrics-server sets maxUnavailable: 0 — the new pod must be ready before the old one is terminated. Combined with the default maxSurge: 25%, this ensures the cluster always has a healthy metrics endpoint available. Without metrics, HPA decisions stall.

Strategy Comparison

Project Component maxSurge maxUnavailable Pattern
ingress-nginx Controller default 25% 1 Conservative
Istio istiod 100% 25% Surge-first
CoreDNS DNS server default 25% 1 Conservative
metrics-server Metrics Server default 25% 0 Zero-downtime
cert-manager All components 0 (recommended) 1 (recommended) No-surge

The choice depends on your workload. Infrastructure components that can't tolerate downtime (DNS, ingress, metrics) use conservative or zero-downtime patterns. Components that need to minimize reconnection storms surge aggressively. Resource-constrained environments avoid surge entirely.

The kubectl rollout Toolkit

Checking Rollout Status

kubectl rollout status watches a rollout in real time and exits when it completes or fails:

$ kubectl rollout status deployment/web-app
Waiting for deployment "web-app" rollout to finish: 2 out of 3 new replicas have been updated...
Waiting for deployment "web-app" rollout to finish: 2 of 3 updated replicas are available...
deployment "web-app" successfully rolled out

This is useful in CI/CD pipelines as a gate: it returns exit code 0 on success, non-zero on failure or timeout.

Viewing Rollout History

kubectl rollout history lists all revisions Kubernetes has tracked:

$ kubectl rollout history deployment/web-app
deployment.apps/web-app
REVISION  CHANGE-CAUSE
1         <none>
2         Update nginx to 1.25
3         Update nginx to 1.27

To inspect what changed in a specific revision:

$ kubectl rollout history deployment/web-app --revision=2
deployment.apps/web-app with revision #2
Pod Template:
  Labels:       app=web-app
                pod-template-hash=6b7f6d4bc
  Annotations:  kubernetes.io/change-cause: Update nginx to 1.25
  Containers:
   nginx:
    Image:      nginx:1.25
    Port:       80/TCP

The CHANGE-CAUSE column is populated from the kubernetes.io/change-cause annotation. The old way was the --record flag, deprecated in Kubernetes 1.24. Set it manually instead:

$ kubectl annotate deployment/web-app kubernetes.io/change-cause="Update nginx to 1.27"

Rolling Back

kubectl rollout undo reverts to the previous revision:

$ kubectl rollout undo deployment/web-app
deployment.apps/web-app rolled back

To roll back to a specific revision:

$ kubectl rollout undo deployment/web-app --to-revision=1
deployment.apps/web-app rolled back

Under the hood, a rollback scales up the old ReplicaSet associated with that revision and scales down the current one. The rollback itself becomes a new revision in the history.

Pausing and Resuming

kubectl rollout pause freezes a rollout. This lets you batch multiple changes into a single rollout instead of triggering one per change:

$ kubectl rollout pause deployment/web-app
deployment.apps/web-app paused

$ kubectl set image deployment/web-app nginx=nginx:1.27
$ kubectl set resources deployment/web-app -c nginx --limits=memory=256Mi

$ kubectl rollout resume deployment/web-app
deployment.apps/web-app resumed

Without pausing, each set command would trigger a separate rollout.

Restarting a Rollout

kubectl rollout restart triggers a fresh rollout without changing the spec. It sets a kubectl.kubernetes.io/restartedAt annotation on the pod template, which Kubernetes treats as a template change:

$ kubectl rollout restart deployment/web-app
deployment.apps/web-app restarted

This is useful when you need to cycle all pods after a ConfigMap or Secret change that wasn't automatically picked up, or when you want to pull an updated image with the same tag (not recommended, but common with latest).

Triggering Updates

Three ways to update a Deployment:

# Imperative — update the container image directly
$ kubectl set image deployment/web-app nginx=nginx:1.27

# Interactive — edit the full spec in your editor
$ kubectl edit deployment web-app

# Declarative — apply a modified manifest
$ kubectl apply -f deployment.yaml

For the CKAD exam, kubectl set image is the fastest. In production, kubectl apply with version-controlled manifests is the standard.

Fine-Tuning Rollout Behavior

minReadySeconds

minReadySeconds specifies how long a new pod must stay in Ready state before Kubernetes counts it as "available" and continues the rollout. The default is 0 — the pod is available the instant it passes its readiness probe.

Setting this to 10 or 30 seconds catches pods that start up, pass the initial readiness check, then crash shortly after. Without minReadySeconds, the rollout marks that pod as available and continues terminating old pods, potentially leaving you with a Deployment full of crash-looping replicas.

Most projects ship minReadySeconds: 0 (the default). You set it explicitly in .spec.minReadySeconds when your application has meaningful startup time after the readiness probe first succeeds — a JVM warming up caches, a service downloading configuration, or a component reconnecting to cluster state. Pair it with maxUnavailable: 0 for zero-downtime rollouts: no old pod terminates until the new one has been healthy for minReadySeconds seconds.

progressDeadlineSeconds

progressDeadlineSeconds sets how long Kubernetes waits for a rollout to make progress before reporting it as failed. The default is 600 seconds. When the deadline passes, the Deployment's status includes a Progressing=False condition:

$ kubectl get deployment web-app -o jsonpath='{.status.conditions[?(@.type=="Progressing")].message}'
ReplicaSet "web-app-7c8d9e5fa" has timed out progressing.

warning: progressDeadlineSeconds does NOT trigger an automatic rollback. It marks the rollout as failed, but the old ReplicaSet continues serving traffic. You must run kubectl rollout undo yourself. This surprises people who assume Kubernetes will auto-recover.

revisionHistoryLimit

revisionHistoryLimit controls how many old ReplicaSets are kept for rollback. The default is 10. Each old ReplicaSet consumes an object in etcd, so in clusters with many Deployments, lowering this number reduces etcd storage pressure.

Setting it to 0 disables rollback entirely: all old ReplicaSets are garbage-collected, and kubectl rollout undo has nothing to revert to.

The Single-Replica Deadlock

Cilium's Helm chart contains a real-world lesson about single-replica Deployments. The operator Deployment template overrides maxUnavailable for single replicas:

# cilium/cilium — templates/cilium-operator/deployment.yaml (simplified)
# When replicas == 1, Cilium forces maxUnavailable: 100% to prevent deadlock
strategy:
  rollingUpdate:
    maxUnavailable: 100%
  type: RollingUpdate

The problem: with 1 replica and a percentage-based maxUnavailable (Cilium operator defaults to 50%), Kubernetes rounds down to 0. Combined with podAntiAffinity that prevents two operator pods on the same node, the new pod cannot be scheduled (the old one still occupies the slot), and the old pod cannot be terminated (maxUnavailable is 0). The rollout deadlocks indefinitely.

Cilium solves this by forcing maxUnavailable: 100% for single-replica Deployments. The lesson: always use absolute integers — not percentages — for low replica counts, or explicitly handle the single-replica case.

Gotchas

progressDeadlineSeconds doesn't auto-rollback. The most common misconception. A timed-out rollout stays in its current state. The old ReplicaSet keeps serving, the new one stays partially rolled out, and nothing happens until you run kubectl rollout undo.

Scaling during a rollout distributes pods proportionally. If you scale a Deployment from 3 to 6 replicas while a rollout is in progress, the new pods are split proportionally between the old and new ReplicaSets. You won't get 6 pods on the new version; some will run the old one.

Percentage-based maxUnavailable rounds to 0 with low replica counts. With 1 replica and maxUnavailable: 25%, that's floor(0.25) = 0. The rollout cannot terminate the old pod, causing a deadlock. Use absolute integers for Deployments with fewer than 4 replicas.

The --record flag is deprecated. Removed from kubectl help text in Kubernetes 1.24. Old study materials and blog posts still reference it. Use kubectl annotate with kubernetes.io/change-cause instead.

Only .spec.template changes trigger rollouts. Updating the Deployment's own annotations, labels, or replica count does not create a new ReplicaSet. If an exam task asks you to "trigger a rollout," changing replicas won't do it.

revisionHistoryLimit: 0 erases rollback capability. Once set, all old ReplicaSets are deleted immediately. If you need to roll back afterward, there is nothing to roll back to.

Practice Scenarios

These exercises mirror the format of CKAD exam tasks. Each starts from an existing state, has a constraint, and ends with a verifiable outcome.

  1. A Deployment api-gateway in namespace production is running nginx:1.24 with 4 replicas. The platform team requires upgrading to nginx:1.27 with zero downtime (maxUnavailable: 0). Update the image, confirm all pods are on the new version, and annotate the revision with change-cause: security-patch.

  2. A Deployment web-frontend in namespace staging was updated this morning and is broken. The on-call engineer needs to roll back to revision 2 — not the previous revision. Identify the correct revision using kubectl rollout history, then execute a targeted rollback. Verify the image tag matches revision 2.

  3. A Deployment cache-proxy currently has maxSurge: 25% and maxUnavailable: 25% (defaults). You must reconfigure it to never exceed the desired replica count during updates (set maxSurge: 0, maxUnavailable: 1). Apply the change and trigger a rollout to verify the new strategy takes effect.

  4. During a planned maintenance window, you need to apply three config changes to Deployment metrics-collector as a single atomic rollout. Pause the Deployment, apply all three changes, then resume and verify only one new revision appears in the rollout history.

Wrap-up

Rolling updates are a kubectl rollout workflow backed by ReplicaSet versioning. The exam tests all six subcommands; choose your maxSurge and maxUnavailable values based on the specific failure mode you're protecting against, and use absolute integers for any Deployment with fewer than 4 replicas.

Blue/Green and Canary Deployments on Kubernetes covers what comes after rolling updates: implementing traffic-split strategies with native Kubernetes primitives.

Application Deployment (1 of 6)

AK
Alexis Kinsella

Languages (Rust, Go & Python), container orchestration (Kubernetes), data and cloud providers (AWS & GCP) lover. Runner & Cyclist.

Subscribe to KubeDojo

Get the latest articles delivered to your inbox.