

Sarthak Varshney is a Docker Captain, 5x C# Corner MVP, and 2x Alibaba Cloud MVP, with over six years of hands-on experience in the IT industry, specializing in cloud computing, DevOps, and modern application infrastructure. He is an Author and Associate Consultant, known for working extensively with cloud platforms and container-based technologies in real-world environments.
Let me ask you something. Have you ever pulled a Docker image, typed docker pull nginx, and just… trusted that it worked? No questions asked. Nginx is there, it runs, life is good.
But what actually happened when you ran that command? What is an "image" even made of? And that :latest tag — is it actually the latest thing? Spoiler: often it's not.
This one's a deep dive. Grab some chai, open a terminal, and let's actually understand what Docker images are made of — not just use them blindly.
Here's the thing — an image is not a big fat single file like a .zip or an .iso. It's not one monolithic blob sitting on your disk. It's a stack of layers, each layer representing a set of file system changes.
Think of it like this: imagine you're making a transparency presentation (the old OHP kind, if you've seen those). You have a blank transparent sheet. You write "Ubuntu base OS" on it. That's your first layer. Now you put a second sheet on top, and write "Install Python 3.11". Third sheet — "Copy my app code". Fourth sheet — "Set the startup command".
When you look at the whole stack from above, you see one complete picture — a running container. But underneath, it's made of those individual transparent sheets, stacked on top of each other.
That's a Docker image. Each instruction in your Dockerfile creates one of those sheets.
Run this and look closely at the output:
docker pull python:3.11-slim
You'll see something like:
3.11-slim: Pulling from library/python
6e909acdb790: Pull complete
...
b1f3fc9d1b28: Pull complete
...
Digest: sha256:abc123...
Status: Downloaded newer image for python:3.11-slim
See those lines with hashes and Pull complete? Those are individual layers being downloaded. Not one big file — several smaller ones. That's Docker telling you it's pulling the transparent sheets one by one.
Now run:
docker image history python:3.11-slim
This gives you a breakdown of every layer in that image — what command created it, and how much space it takes. You'll see lines like apt-get install, COPY, ENV — each one is a layer.
Here's where it gets clever. Docker caches every layer on your machine. So if two images share the same base — say, both start with ubuntu:22.04 — Docker downloads that base layer only once.
Picture this scenario: you have five different Python apps. All five start with:
FROM python:3.11-slim
You'd think Docker downloads the Python base five times. Absolutely not. It downloads it once, and all five images share it. This saves disk space and download time in a massive way.
Now say you update your app code and rebuild:
FROM python:3.11-slim # Layer 1 — already cached ✓
RUN pip install flask # Layer 2 — already cached ✓
COPY app.py /app/app.py # Layer 3 — CHANGED, rebuilds from here
CMD ["python", "/app/app.py"] # Layer 4 — also rebuilds
Docker sees that Layers 1 and 2 haven't changed, so it reuses them from cache instantly. Only Layers 3 and 4 get rebuilt. This is why layer order in your Dockerfile matters a lot — more on that in a minute.
Let's talk about image naming. When you write:
nginx:1.25.3
That breaks down into:
nginx — the image name (also called the repository)1.25.3 — the tagTags are just human-readable labels pointing to a specific image. They're not magic version numbers tracked by Docker — they're labels that the image publisher decides. Think of them as sticky notes on a filing cabinet. The note says "1.25.3", but someone put that note there, and someone can also move it.
You can even create your own tags:
docker tag nginx:1.25.3 my-nginx:production
Now you have two tags pointing to the same image. It's the same image — same layers, same content — just two different labels.
Okay, this is the part everyone gets wrong. Including people who've been using Docker for years.
When you run:
docker pull nginx
Docker silently adds :latest and pulls nginx:latest. Most people assume latest means "the most recent version". That's a reasonable assumption. It's also completely wrong — or at least, unreliable.
Here's the truth: latest is just a tag. It's a convention. The image publisher can attach :latest to literally any version they want. Historically, most publishers point it to the most recent stable release. But there's no enforcement. No guarantee. Nothing automatic.
What this means in practice:
nginx:latest today might be nginx 1.25.3latest to itdocker pull nginx on Monday gives you something different than on FridayThis is called non-determinism and it is the enemy of reliable software. You never want "it depends on when you ran it" to be the answer to "which version are we running?"
Let's prove this. Run:
docker pull nginx:latest
docker inspect nginx:latest --format='{{.Id}}'
Note that image ID. Come back after nginx releases a new version and run it again. It'll be different.
Beyond tags, every Docker image has an immutable identifier called a digest. It looks like:
sha256:e4a0d3f6b2c981...
This is a cryptographic hash of the image contents. It never changes. If you pull by digest, you are guaranteed to get exactly that image, every single time, forever.
You can see the digest in a pull output:
docker pull nginx:1.25.3
# Digest: sha256:abc123def456...
Or get it from an existing image:
docker inspect nginx:1.25.3 --format='{{index .RepoDigests 0}}'
And then pull by digest:
docker pull nginx@sha256:abc123def456...
That image will never silently change. It's the most precise way to pin an image. For production environments, this is gold.
Alright, so latest is scary. What should you use instead? Here's a practical breakdown.
Use specific version tags for production and CI/CD.
Instead of:
FROM node:latest
Use:
FROM node:20.11.1-alpine3.19
Yes, it's more verbose. But now your build is deterministic. Anyone who clones your repo and builds this image gets exactly the same result — today, six months from now, on a different machine, in a different country.
Use image digests for absolute certainty.
In high-stakes environments — financial systems, healthcare, anything where "it suddenly behaved differently" is unacceptable — pin by digest:
FROM node@sha256:abc123def456...
Tag your own images meaningfully.
When you build images for your own services, don't just tag them latest and call it a day. A solid tagging strategy looks like:
# Semantic version
docker build -t myapp:2.4.1 .
# Git commit SHA (great for traceability)
docker build -t myapp:$(git rev-parse --short HEAD) .
# Environment tag pointing to a version
docker tag myapp:2.4.1 myapp:production
docker tag myapp:2.4.1 myapp:latest # Optional, point latest to current stable
This way you can always answer: "What exact version is running in production right now?" You run docker inspect on the container, check the image tag, and know immediately. No guessing.
Keep a latest tag if you want — just make it intentional.
There's nothing wrong with having a latest tag if you consciously move it whenever you release a new stable version. The problem is when latest is the only tag, and your infrastructure depends on it without anything else to fall back on.
Remember how Docker caches layers? Here's what that means for writing Dockerfiles.
Bad approach — code changes invalidate everything:
FROM python:3.11-slim
COPY . /app # If ANY file changes, everything below re-runs
RUN pip install -r requirements.txt # This runs EVERY time you change any .py file
The problem: COPY . /app copies everything — including your source code. Every time you change a single Python file, Docker invalidates this layer, and everything below it (including pip install) runs again from scratch. That pip install could take minutes.
Good approach — install dependencies first, code second:
FROM python:3.11-slim
COPY requirements.txt /app/requirements.txt # Only this file
RUN pip install -r /app/requirements.txt # Runs only when requirements change
COPY . /app # Your code — changes often, but cheaply
Now pip install only runs when requirements.txt actually changes. Your code changes? Docker uses the cached pip layer, copies the new code, and the build takes seconds instead of minutes.
This is one of those small habits that saves enormous time over a project's lifetime.
Mistake 1: Using latest in docker-compose.yml for production
# ❌ This is asking for trouble
services:
web:
image: nginx:latest
You can use latest locally for quick experiments. In a production compose file, pin to a version.
Mistake 2: Tagging only as latest when pushing to a registry
# ❌ Don't do this alone
docker push myrepo/myapp:latest
# ✅ Do this
docker push myrepo/myapp:1.3.0
docker push myrepo/myapp:latest # Optional, but back it up with a version tag
Mistake 3: Adding COPY . . early in the Dockerfile
Leads to cache-busting on every change. Move it as late as possible.
Mistake 4: Treating layers as free
Each layer adds a tiny overhead. Chaining RUN commands is better than separating them when they're related:
# ❌ Three layers for no reason
RUN apt-get update
RUN apt-get install -y curl
RUN rm -rf /var/lib/apt/lists/*
# ✅ One clean layer
RUN apt-get update && apt-get install -y curl && rm -rf /var/lib/apt/lists/*
Also — that rm -rf /var/lib/apt/lists/* matters. If you clean up in a separate layer, the space doesn't actually disappear from the image (earlier layers are immutable). Cleaning up in the same RUN command actually saves disk space.
Mistake 5: Pulling an image and assuming it's still what it was
If you pulled node:20 three weeks ago and pull it again today, you might get a different image. Tags can move. Run docker pull before a new build if you want to make sure you have the current version for that tag.
These commands will become your best friends:
# See all layers and their sizes
docker image history myapp:1.0.0
# See full metadata — env vars, entrypoint, ports, etc.
docker inspect nginx:1.25.3
# Check image size
docker images nginx
# See all tags you have locally for an image
docker images node
# Remove dangling images (untagged ones, usually left over from builds)
docker image prune
# Remove a specific image
docker rmi myapp:old-version
Run docker image history python:3.11-slim right now if you have it pulled. Look at which layers are large — that tells you where the bulk of the image size is coming from, and where you might want to optimize.
Here's a hands-on challenge to cement everything in this article.
Step 1: Create a simple Dockerfile
Make a new folder, create a file called Dockerfile, and paste this in:
FROM ubuntu:22.04
RUN apt-get update && apt-get install -y curl && rm -rf /var/lib/apt/lists/*
RUN echo "Hello from my image" > /welcome.txt
CMD ["cat", "/welcome.txt"]
Step 2: Build and tag it properly
docker build -t myimage:1.0.0 .
docker tag myimage:1.0.0 myimage:latest
Step 3: Inspect the layers
docker image history myimage:1.0.0
See the layers? Try to match each layer to the Dockerfile instruction that created it.
Step 4: Check the digest
docker inspect myimage:1.0.0 --format='{{index .RepoDigests 0}}'
(This might be empty if you haven't pushed to a registry — that's fine. The image ID serves the same purpose locally.)
Step 5: Change the Dockerfile and rebuild
Add a line before CMD:
RUN echo "version 2" >> /welcome.txt
Rebuild:
docker build -t myimage:2.0.0 .
Watch the output carefully. Which layers say CACHED? Which ones rebuilt? Can you explain why?
Bonus challenge: Rearrange the Dockerfile so CMD comes before the RUN echo line. What happens?
If you walked away from this with just three things, let it be these:
Images are layered, and layers are cached — structure your Dockerfile to exploit that. Put things that change rarely at the top, things that change often at the bottom.
latest is a lie — or at least, an unreliable promise. Pin to specific version tags in anything that matters. Use digests when you want absolute certainty.
Tags are just labels — they can be moved, reassigned, or ignored. The digest is the true immutable identity of an image.
Docker images look like magic from the outside. Once you understand layers and tags, they're one of the most elegantly designed systems in modern infrastructure. And now you actually know what's happening when you run docker pull.
Go explore docker image history on a few images you use daily. You'll be surprised what you find inside.