

Sarthak Varshney is a Docker Captain, 5x C# Corner MVP, and 2x Alibaba Cloud MVP, with over six years of hands-on experience in the IT industry, specializing in cloud computing, DevOps, and modern application infrastructure. He is an Author and Associate Consultant, known for working extensively with cloud platforms and container-based technologies in real-world environments.
So you've got your Docker container running. The terminal shows it's "Up." You feel good. You grab a chai, sit back, and think everything's fine.
But is it really?
Here's the thing — Docker saying a container is "Up" doesn't mean your app inside it is actually working. It just means the container process didn't crash. Your Node.js server could be stuck in an infinite loop, your database might have lost its connection, or your Flask app could be returning 500 errors on every single request — and Docker would still happily report: "Status: Up 3 hours."
That's exactly the gap that HEALTHCHECK fills. And once you pair it with restart policies, you've got a self-healing setup that can recover from failures without you having to wake up at 3am.
Let's dig in.
Imagine you've deployed a web app. The container starts, the process is running, Docker is happy. But behind the scenes, the app failed to connect to the database on startup. It's sitting there, alive but completely useless — like a TV that's plugged in but has no signal.
Without health checks, Docker has no idea. It won't restart the container. Your load balancer (if you have one) will keep sending traffic to a broken instance. Users get errors. Your phone starts ringing.
With a health check, Docker periodically knocks on the door — "Hey, are you actually functional?" — and if the answer is silence or an error, Docker marks that container as unhealthy and can act on it.
The HEALTHCHECK instruction goes inside your Dockerfile. It defines a command that Docker will run inside the container at regular intervals to check if the app is working.
Here's the basic syntax:
HEALTHCHECK [OPTIONS] CMD <command>
And here's a real example for a web app:
HEALTHCHECK --interval=30s --timeout=10s --start-period=15s --retries=3 \
CMD curl -f http://localhost:8080/health || exit 1
Let's break down what each option does:
--interval=30s
This is how often Docker runs the health check. Every 30 seconds, Docker will run the CMD. Think of it like a doctor taking your pulse every 30 seconds.
--timeout=10s
If the health check command doesn't respond within 10 seconds, Docker counts it as a failure. Your doctor isn't going to wait forever — if there's no response in 10 seconds, something's wrong.
--start-period=15s
This is a grace period when the container first starts. Maybe your app needs 10-12 seconds to warm up — initialize connections, load config, etc. During the start period, failed health checks don't count against the container. Docker won't panic-mark it as unhealthy just because it took a moment to boot.
--retries=3
How many consecutive failures before Docker marks the container as "unhealthy." One bad check could be a fluke — network blip, CPU spike. Three failures in a row? That's a real problem.
CMD curl -f http://localhost:8080/health || exit 1
This is the actual check command. It hits the /health endpoint on your app. The -f flag on curl makes it return a non-zero exit code if the HTTP response is an error (4xx, 5xx). If curl fails OR the endpoint returns an error, we exit with 1 (failure). If it works, curl exits with 0 (success).
The golden rule: Exit code 0 = healthy. Any other exit code = unhealthy. That's it.
Let's say you're building a simple Node.js Express app. Here's how you'd write the Dockerfile with a proper health check:
FROM node:20-alpine
WORKDIR /app
COPY package*.json ./
RUN npm install
COPY . .
EXPOSE 3000
# Health check: ping the /health route every 30 seconds
HEALTHCHECK --interval=30s --timeout=5s --start-period=20s --retries=3 \
CMD wget -qO- http://localhost:3000/health || exit 1
CMD ["node", "server.js"]
Notice I used wget here instead of curl — both work, and wget is often already installed in Alpine-based images. If your base image doesn't have either, you can use a Node.js one-liner:
HEALTHCHECK CMD node -e "require('http').get('http://localhost:3000/health', (r) => { process.exit(r.statusCode === 200 ? 0 : 1) })"
And in your server.js, make sure you actually have that /health endpoint:
app.get('/health', (req, res) => {
// You can add real checks here — DB connection, memory, etc.
res.status(200).json({ status: 'ok' });
});
Once your container is running with a health check configured, you can inspect it:
docker ps
The STATUS column now shows something like:
STATUS
Up 2 minutes (healthy)
Up 5 minutes (unhealthy)
Up 10 seconds (health: starting)
Those three states are the health lifecycle:
health: starting — The start period is still running. Docker is being patient.healthy — The check passed. Everything's good.unhealthy — The check failed (retries exhausted). Something's broken.For detailed information, use docker inspect:
docker inspect --format='{{json .State.Health}}' <container_name>
This gives you a JSON output with the last few health check results, the time they ran, and the exit codes. Super useful for debugging.
Or for something more human-readable:
docker inspect <container_name> | grep -A 20 '"Health"'
If you don't control the Dockerfile (using a third-party image, for example), you can add a health check when running the container:
docker run -d \
--name my-nginx \
--health-cmd="curl -f http://localhost/ || exit 1" \
--health-interval=30s \
--health-timeout=5s \
--health-retries=3 \
--health-start-period=10s \
nginx
This overrides or adds a health check without touching any image. Handy for quick setups.
Okay, so now Docker knows your container is unhealthy. But what does it do about it?
By default — nothing. Docker just marks it unhealthy and moves on. If you want automatic recovery, you need restart policies.
A restart policy tells Docker what to do when a container exits or becomes unhealthy. You set it with the --restart flag:
docker run -d --restart <policy> my-image
There are four restart policies:
no (default)docker run -d --restart no my-image
Docker does nothing when the container exits. The default behavior — you have to restart it manually. Fine for development, bad for production.
alwaysdocker run -d --restart always my-image
Docker always restarts the container if it stops, regardless of why. Even if you manually docker stop it, it'll restart when the Docker daemon restarts. This is useful for critical services but can be annoying if you deliberately stop a container and it keeps coming back.
on-failuredocker run -d --restart on-failure:5 my-image
Docker restarts the container only if it exits with a non-zero exit code — meaning something actually crashed. The :5 limits restarts to 5 attempts. Great for apps where you want recovery from crashes but don't want Docker endlessly restarting something that's fundamentally broken.
unless-stoppeddocker run -d --restart unless-stopped my-image
This is probably the most practical policy for production. It's like always, except if you manually stop the container with docker stop, it stays stopped — even after a Docker daemon restart. You have explicit control. This is the one most production teams reach for.
Here's something that trips up a lot of beginners: Docker does NOT automatically restart containers just because they're marked "unhealthy."
Read that again.
Health checks update the container's health status — but the restart policy only kicks in when the container exits (the main process stops). If your app is stuck and hanging (returning errors but the process is still running), Docker will mark it unhealthy but the restart policy won't trigger because the container never exited.
So how do you handle a truly stuck container? A few approaches:
Option 1: Make your app exit on critical failures
The cleanest solution. If your app detects it's in a broken state — lost DB connection, corrupted state, whatever — just process.exit(1). Docker sees the non-zero exit, the restart policy kicks in, container restarts. Clean and simple.
Option 2: Use Docker Swarm or Kubernetes Docker Swarm actually does respond to unhealthy containers by rescheduling them. If you're running in Swarm mode, mark unhealthy containers get replaced. This is one reason orchestrators exist.
Option 3: External monitoring + scripted intervention
A monitoring script watches for unhealthy containers and calls docker restart on them. Not elegant, but it works for simple setups.
You've got health checks running. Now how do you actually monitor this stuff?
# See health status for all running containers
docker ps --format "table {{.Names}}\t{{.Status}}\t{{.Image}}"
# Live watch — runs every 2 seconds
watch -n 2 docker ps
Docker emits events you can subscribe to. This is great for lightweight monitoring:
docker events --filter event=health_status
This streams events like:
2026-04-25T10:32:01.123456789+05:30 container health_status my-app (status=healthy)
2026-04-25T10:45:22.987654321+05:30 container health_status my-app (status=unhealthy)
You can pipe this into a log file or hook it up to alerting tools.
Here's a simple shell script to check for unhealthy containers and restart them:
#!/bin/bash
# healthcheck-monitor.sh
UNHEALTHY=$(docker ps --filter health=unhealthy --format "{{.Names}}")
if [ -n "$UNHEALTHY" ]; then
echo "Found unhealthy containers: $UNHEALTHY"
for container in $UNHEALTHY; do
echo "Restarting $container..."
docker restart "$container"
done
else
echo "All containers healthy."
fi
Run this via cron every minute:
# Add to crontab (crontab -e)
* * * * * /path/to/healthcheck-monitor.sh >> /var/log/docker-health.log 2>&1
Not a full-blown monitoring solution, but for a small VPS setup? It gets the job done.
Mistake 1: Health check endpoint doesn't actually check anything
A lot of devs add a /health route that just returns { status: "ok" } unconditionally. That's better than nothing, but you're missing the point. Your health check should verify that the app is actually functional — at minimum, check that database connections are alive:
app.get('/health', async (req, res) => {
try {
await db.query('SELECT 1'); // real check
res.status(200).json({ status: 'ok' });
} catch (err) {
res.status(503).json({ status: 'error', message: err.message });
}
});
Mistake 2: Forgetting curl or wget isn't in the base image
Especially with minimal images like alpine or distroless, curl and wget might not be installed. Either install them in your Dockerfile or use a language-native check:
RUN apk add --no-cache curl # for Alpine
Mistake 3: Setting --interval too short
If your interval is 5 seconds and the check itself takes 3-4 seconds, you're hammering your own app with health checks. Set the interval reasonably — 30 seconds is a solid default for most apps.
Mistake 4: No --start-period on slow-starting apps
Java apps, apps with heavy initialization, anything that takes >10 seconds to start — if you don't set a start period, Docker will immediately start failing health checks and might never get the container into a healthy state. Always add a --start-period that's longer than your expected startup time.
Mistake 5: Using always restart policy in development
You make a code change, the container crashes (expected), and Docker immediately restarts it running the old image. Confusing. Use --restart no in development.
Everything we've covered works in Docker Compose too. Here's how you'd write it in docker-compose.yml:
version: '3.8'
services:
webapp:
build: .
ports:
- "3000:3000"
restart: unless-stopped
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:3000/health"]
interval: 30s
timeout: 10s
retries: 3
start_period: 20s
db:
image: postgres:15
restart: unless-stopped
environment:
POSTGRES_PASSWORD: secret
healthcheck:
test: ["CMD-SHELL", "pg_isready -U postgres"]
interval: 10s
timeout: 5s
retries: 5
start_period: 10s
Notice two things here:
The db service uses pg_isready — a PostgreSQL-native command that checks if the server is ready to accept connections. Much better than a generic TCP check.
You can make your webapp wait for db to be healthy before starting:
webapp:
depends_on:
db:
condition: service_healthy
This is one of the most underused features of Docker Compose. The condition: service_healthy means Compose will wait until the database's health check passes before even starting the web app. No more race conditions where your app crashes because the DB wasn't ready.
Here's your hands-on challenge. Build this and see it in action.
Step 1: Create a simple Python Flask app with a /health endpoint that checks a counter. After 3 requests, make the health endpoint return 500 (simulating a failure).
# app.py
from flask import Flask, jsonify
import os
app = Flask(__name__)
request_count = 0
@app.route('/')
def index():
return "Hello from Docker!"
@app.route('/health')
def health():
global request_count
request_count += 1
if request_count > 3:
return jsonify({"status": "error"}), 500
return jsonify({"status": "ok"}), 200
if __name__ == '__main__':
app.run(host='0.0.0.0', port=5000)
Step 2: Write a Dockerfile with a HEALTHCHECK that checks this endpoint every 10 seconds with a 3-second timeout and 2 retries.
Step 3: Build and run the container with --restart on-failure:3.
Step 4: Watch docker ps every few seconds. See when the status changes from healthy to unhealthy. Does Docker restart the container? Why or why not?
Bonus: Modify the Flask app so that when /health fails, it also calls sys.exit(1). What changes now?
This exercise will make the behavior of health checks and restart policies click in a way that no amount of reading can.
Let's pull it all together. Here's what you now know:
HEALTHCHECK is for.HEALTHCHECK instruction runs a command periodically and marks the container healthy or unhealthy based on the exit code.--interval, --timeout, --start-period, and --retries to tune the check for your app's behavior.no, always, on-failure, unless-stopped) tell Docker what to do when a container exits.condition: service_healthy to control startup order properly.Getting health checks right is one of those things that separates hobbyist Docker usage from production-grade deployments. It's not complicated — but most people skip it until something breaks at 2am. Don't be that person.
Add a health check. Set a restart policy. Sleep better.