Let me be direct with you: the moment I first ran claude --dangerously-skip-permissions on my actual development machine, my stomach dropped a little. Not because I didn't trust the model — but because "dangerously" is right there in the flag name. One wrong prompt, one misunderstood context, and your project files could look very different very fast.

That's the exact problem Docker Sandboxes solves. And after spending time with the sbx CLI, I think this is genuinely one of the more important tools Docker has shipped for developers working with AI coding agents in 2025.

What Even Is Docker Sandboxes?

Docker Sandboxes is a product from Docker that lets you run AI coding agents — like Claude Code, Gemini CLI, GitHub Copilot CLI, Codex, Kiro, and OpenCode — inside isolated microVM environments. Each sandbox gets its own Docker daemon, its own filesystem, and its own network stack. The agent can do whatever it wants in there: install packages, build images, modify configs, run containers. Your host machine stays completely untouched.

Think of it as a disposable VM that spins up in seconds, does the AI's work, and gets thrown away when you're done.

The tool you interact with is called sbx. And importantly — it doesn't require Docker Desktop. This is its own standalone binary.

+-----------------------------+
|        Your Host            |
|  +-----------------------+  |
|  |    microVM Sandbox    |  |
|  |  - Own Docker daemon  |  |
|  |  - Own filesystem     |  |
|  |  - Own network        |  |
|  |  - AI Agent running   |  |
|  +-----------------------+  |
|  Your files: untouched      |
+-----------------------------+

Why microVMs? Why Not Containers?

This is the first question I hear from developers. We already have Docker containers for isolation — why add another layer?

The answer is the isolation model. Containers share the host kernel. If an agent does something destructive at the kernel level, it can still impact your system. A microVM, on the other hand, runs a completely separate OS kernel with a hardware boundary between it and your host.

This matters when you're running agents in --dangerously-skip-permissions mode (called YOLO mode), which is actually the default in Docker Sandboxes. In this mode, the agent never pauses to ask for permission before executing a tool. No approval prompts, no interruptions. The agent installs what it needs, modifies what it wants, commits its work — all without stopping to check with you.

Without a hard isolation boundary, YOLO mode on your real machine is exactly as dangerous as it sounds. Inside a sandbox, it's just fast.

The Docker team actually built their own VMM for this (they were asked about Firecracker in community discussions) — because Firecracker targets Linux server environments, and Docker wanted something that works on macOS, Windows, and Linux equally well.

Setting Up Docker Sandboxes

Prerequisites are minimal but platform-specific.

macOS requires Apple Silicon (M1/M2/M3/M4) and macOS Tahoe (26) or later.

Windows requires 64-bit x86_64, Windows 11, and the HyperVisor Platform feature enabled:

Enable-WindowsOptionalFeature -Online -FeatureName HypervisorPlatform -All

Linux (Ubuntu) requires Ubuntu 24.04+, 64-bit x86_64, and KVM hardware virtualization. You can verify KVM availability like this:

$ lsmod | grep kvm

If you see kvm_intel or kvm_amd in the output, you're good. Then add your user to the kvm group:

$ sudo usermod -aG kvm $USER
$ newgrp kvm

Installing sbx

# macOS
brew install docker/tap/sbx

# Windows
winget install -h Docker.sbx

# Linux (Ubuntu)
curl -fsSL https://get.docker.com | sudo REPO_ONLY=1 sh
sudo apt-get install docker-sbx

Then log in:

$ sbx login

This opens a browser for Docker OAuth. On first login, you'll be asked to choose a default network policy:

Choose a default network policy:

     1. Open         — All network traffic allowed, no restrictions.
     2. Balanced     — Default deny, with common dev sites allowed.
     3. Locked Down  — All network traffic blocked unless you allow it.

Balanced is what I recommend starting with. It allows traffic to npm, PyPI, GitHub, and common dev services while blocking everything else. You can always adjust individual rules later.

Authenticating Your Agent

For Claude Code with a Claude Max, Team, or Enterprise subscription, no API key setup is required upfront. Just use /login inside the sandbox to authenticate with OAuth. The session token lives on your host and is injected by a proxy — it's never stored inside the sandbox itself.

For API key-based authentication (any agent, or if you prefer API keys for Claude Code):

$ sbx secret set -g anthropic

This stores the key in your OS keychain. A proxy on your host injects it into outbound requests, so the API key is never exposed inside the sandbox. Pretty clean security model.

To also give the agent GitHub access for creating pull requests:

$ sbx secret set -g github -t "$(gh auth token)"

Running Your First Sandbox

This is the part I genuinely enjoy. Navigate to your project directory and run:

$ cd ~/my-project
$ sbx run claude

Replace claude with whatever agent you want — gemini, codex, copilot, kiro, opencode. The first run pulls the agent image (takes a minute). Subsequent runs reuse the cache and start in seconds.

You can check what's running:

$ sbx ls
SANDBOX              AGENT    STATUS    PORTS   WORKSPACE
claude-my-project    claude   running           ~/my-project

Running sbx with no arguments opens an interactive terminal dashboard showing all your sandboxes with live CPU and memory usage. From there you can attach to agents, open shells, and manage network rules without touching the CLI.

Direct Mode vs. Branch Mode

There are two ways the agent can interact with your repository.

Direct Mode (Default)

The agent edits your working tree directly. If you're working on something where you want instant visibility into what the agent is doing, this works fine. Just stage, commit, and push as you normally would.

Branch Mode

This is what I use for anything non-trivial. Pass --branch to give the agent its own Git worktree:

$ sbx run claude --branch my-feature

This creates a Git worktree under .sbx/ in your repository root. The agent works on its own branch and directory — completely separate from your main working tree. You're free to keep coding on main while the agent works on its branch.

my-project/
├── src/              ← your main working tree
├── .sbx/
│   └── claude-my-project-worktrees/
│       └── my-feature/   ← agent works here
└── .gitignore

When the agent finishes, review the work:

$ cd .sbx/claude-my-project-worktrees/my-feature
$ git log
$ git diff main

If you're happy with it, push and open a PR:

$ git push -u origin my-feature
$ gh pr create

You can also let sbx generate the branch name automatically:

$ sbx run claude --branch auto

And if you want multiple agents working on the same repo in parallel, branch mode keeps them isolated — each gets its own worktree:

$ sbx run --branch feature-a my-sandbox
$ sbx run --branch feature-b my-sandbox

Add .sbx/ to your .gitignore to keep your git status clean:

$ echo '.sbx/' >> .gitignore

Managing Network Access

One of the more practical things about sbx is the network governance. If an agent tries to reach something blocked by your policy, it'll fail — which surfaces the issue clearly instead of silently.

Check current rules:

$ sbx policy ls

Allow a specific host:

$ sbx policy allow network registry.npmjs.org

If you're running a local service on your host (say, an Ollama server at port 11434), reach it from inside the sandbox using host.docker.internal:

$ sbx policy allow network localhost:11434
$ curl http://host.docker.internal:11434

Port Forwarding

Sandboxes are network-isolated by default, so if your agent starts a dev server, you can't just open localhost:3000. Use sbx ports to forward traffic:

$ sbx ports my-sandbox --publish 8080:3000
$ open http://localhost:8080

One thing worth knowing: services inside the sandbox must bind to 0.0.0.0, not 127.0.0.1. Most dev servers default to loopback, so you'll need to pass something like --host 0.0.0.0 when starting them. Port mappings also don't persist across sandbox restarts — you'll need to re-publish.

Kits and Templates: Customizing Your Sandbox

This is where Docker Sandboxes gets really powerful for teams and repeated workflows.

Templates

A template is a Docker image baked ahead of time. If your project needs .NET, Rust, or some non-default toolchain, build a Dockerfile that extends the default sandbox template, push it to a registry, and use it with --template:

$ sbx run claude --template ghcr.io/myorg/my-dotnet-sandbox:1.0

Templates are ideal for heavy dependencies — things you don't want to reinstall every single sandbox session.

Kits

A kit is a YAML artifact applied at runtime. It can:

Install tools
Inject environment variables and credentials
Configure network proxy rules
Drop files into the sandbox
Run startup commands
Define an entirely new agent from scratch

# Use a kit from a local directory
sbx run claude --kit ./my-kit/

# Use a kit from a git repo with a specific tag
sbx run claude --kit "git+https://github.com/docker/sbx-kits-contrib.git#ref=v0.1.0&dir=code-server"

# Use a kit from an OCI registry
sbx run claude --kit ghcr.io/myorg/my-kit:1.0

The key distinction: templates bake things into the image (installed once, reused). Kits apply things at startup (flexible, per-run customization). They're designed to work together — a heavy template as the base, thin kits layered on top for per-project or per-team config.

One thing the community has flagged (and Docker has acknowledged): kits currently don't support automatic port publishing or user-configurable parameters. Those are on the roadmap.

Cleaning Up

Sandboxes persist after the agent exits. Stop one without deleting it:

$ sbx stop my-sandbox

When you're done:

$ sbx rm my-sandbox

This deletes everything inside the sandbox — packages, Docker images, branch mode worktrees — but your actual project files on the host are untouched.

The Isolation Story in One Command

Want to show someone why microVM isolation matters? Skip the architecture diagrams. Open a sandbox and run this:

$ sudo rm -rf *

Watch every file disappear. Then type exit. Open your project folder on the host. Everything is exactly where you left it. That single moment does more explaining than any slide deck. The sandbox absorbed a catastrophic command and your machine never flinched. That's the whole point of Docker Sandboxes — the agent gets full freedom inside, and you get a hard guarantee that none of it reaches your host.

When Should You Actually Use This?

Here's my honest take as someone who works with containers daily:

Use Docker Sandboxes when:

You want to run an AI agent in fully autonomous mode (no permission prompts)
You're letting an agent work on unfamiliar codebases
You want clean Git workflows where agent changes are reviewable before merge
You're running multiple agents in parallel on the same repo
You want your team to safely use AI tools without worrying about one bad prompt nuking a dev environment

You probably don't need it when:

You're doing simple, supervised, one-off prompts where you review every action
You're on a machine that doesn't meet the hardware requirements (no KVM, older macOS)

A Note on Licensing

This comes up a lot: Docker Sandboxes is not tied to Docker Desktop licensing. sbx is a separate binary. Docker's position here is that of a neutral platform — sbx works with Claude, Gemini, Copilot, Codex, Kiro, and OpenCode. No vendor lock-in on the agent side.

Where to Go From Here

The official documentation lives at https://docs.docker.com/ai/sandboxes/ and covers everything I've touched on here, plus deeper dives into:

Architecture (the microVM isolation model in detail)
Security (credential handling, workspace trust, network policies)
CLI reference (all sbx commands and flags)
Troubleshooting (common issues, signed commits, sandbox startup failures)

The sbx-releases GitHub repository is also where you can file bug reports and follow the project.

Final Thoughts

The developer workflow is shifting fast. AI coding agents are no longer just autocomplete — they're writing functions, refactoring modules, running tests, and committing code. Giving them the freedom to work efficiently and keeping your environment safe used to be a tradeoff. Docker Sandboxes takes that tradeoff off the table.

The isolation is real, the workflow is clean, and the Git branch mode is genuinely useful for teams. If you're already using Claude Code or any other AI coding agent, this is worth setting up this week.

What Even Is Docker Sandboxes?

Why microVMs? Why Not Containers?

Setting Up Docker Sandboxes

Installing sbx

Authenticating Your Agent

Running Your First Sandbox

Direct Mode vs. Branch Mode

Direct Mode (Default)

Branch Mode

Managing Network Access

Port Forwarding

Kits and Templates: Customizing Your Sandbox

Templates

Kits

Cleaning Up

The Isolation Story in One Command

When Should You Actually Use This?

A Note on Licensing

Where to Go From Here

Final Thoughts

Sarthak Varshney

What Even Is Docker Sandboxes?

Why microVMs? Why Not Containers?

Setting Up Docker Sandboxes

Installing sbx

Authenticating Your Agent

Running Your First Sandbox

Direct Mode vs. Branch Mode

Direct Mode (Default)

Branch Mode

Managing Network Access

Port Forwarding

Kits and Templates: Customizing Your Sandbox

Templates

Kits

Cleaning Up

The Isolation Story in One Command

When Should You Actually Use This?

A Note on Licensing

Where to Go From Here

Final Thoughts