KubeForge — Hands-on Kubernetes & EKS Learning

Scenario

The `web` Deployment was deployed by a junior engineer who made a one-character typo in the liveness probe path: `/heathz` instead of `/healthz`. Kubernetes keeps restarting the container because the probe fails. Fix the probe path so the pod stays Running and becomes ready.

Why probes exist

Kubernetes cannot read your application's source code to know whether it is healthy. Probes are the mechanism you use to tell the kubelet how to check. Without them, a pod that is Running at the OS level but serving 500 errors or deadlocked will receive traffic indefinitely.

The three probe types

Probe	Failure action	Typical use
`livenessProbe`	Kill and restart the container	Detect deadlocks, infinite loops
`readinessProbe`	Remove pod from Service endpoints	Signal "not ready for traffic yet"
`startupProbe`	Block liveness/readiness until it passes	Slow-starting apps (JVM warm-up, DB migration)

Probe mechanisms

All three probe types support the same check mechanisms:

# HTTP GET — most common for web servers
livenessProbe:
  httpGet:
    path: /healthz   # must return 2xx–3xx
    port: 8080

# TCP socket — useful for non-HTTP services
livenessProbe:
  tcpSocket:
    port: 5432

# Exec command — runs inside the container
livenessProbe:
  exec:
    command: ["pg_isready", "-U", "postgres"]

Key timing fields

livenessProbe:
  httpGet:
    path: /healthz
    port: 80
  initialDelaySeconds: 10   # wait before first check
  periodSeconds: 10         # check every N seconds
  failureThreshold: 3       # restart after 3 consecutive failures
  timeoutSeconds: 1         # fail if no response within N seconds

initialDelaySeconds is critical for slow-starting containers — if the liveness probe fires before the app is ready, the container restarts in a loop even when the app would have started correctly.

Common mistakes

Typo in the path — the most common real-world bug. /heathz instead of /healthz causes constant restarts. Always copy-paste the path from your application's actual health endpoint definition.

Liveness = readiness — use separate paths when possible. A liveness failure restarts the pod (disruptive). A readiness failure merely stops traffic (graceful). Your /healthz (liveness) should only fail for truly unrecoverable states.

Too-aggressive thresholds — failureThreshold: 1 with periodSeconds: 5 means one slow response restarts your container. Default failureThreshold: 3 is a safer starting point.

Startup probes for slow apps

startupProbe:
  httpGet:
    path: /healthz
    port: 8080
  failureThreshold: 30    # 30 × 10s = 5 minutes maximum startup window
  periodSeconds: 10

Once the startup probe succeeds, liveness and readiness probes take over. This prevents premature restarts during initialization.

Summary

Field	Purpose
`initialDelaySeconds`	Grace period before first probe
`periodSeconds`	How often to check
`failureThreshold`	Consecutive failures before action
`successThreshold`	Consecutive successes to become healthy (readiness only)
`timeoutSeconds`	Per-check deadline

Fix the broken liveness probe

beginner~20 min

manifest.yamlYAML

Cluster loading…