Docker Agent: Multi-Agent AI Teams in Sandboxed Environments

Introduction
You want your AI coding agent to build images, run containers, and test deployments autonomously. But giving it access to your host Docker daemon is like handing a stranger your house keys. They can see everything, touch everything, and leave traces everywhere.
This is the autonomy paradox: agents need Docker to be useful, but can't be trusted with your Docker.
Docker Desktop 4.63+ solves this with two complementary tools. Docker Agent is an open-source framework for building teams of specialized AI agents with declarative YAML configuration. Docker Sandboxes runs those agents in isolated microVMs with private Docker daemons, providing hypervisor-level isolation that keeps your host untouched. (Previously called cagent in Docker Desktop 4.49-4.62 — same tool, renamed CLI.)
Instead of prompting one generalist agent that context-switches constantly, you define a team: a product manager that breaks down requirements, a designer that creates wireframes, a full-stack engineer that implements features. Each agent has specialized instructions, minimal necessary tools, and clear handoff protocols. The root agent coordinates the workflow automatically.
Docker Agent — Declarative AI Teams
Single generalist agents try to do everything: research, design, code, test. The result is constant context-switching and jack-of-all-trades output. Docker Agent shifts the model from prompting to orchestration.
You define agents in YAML — their models, instructions, tools, and relationships. A root agent delegates tasks to sub-agents with specialized roles. Each agent stays focused on its lane.
Agent anatomy:
# Agent configuration format (Docker Agent 0.3+)
models:
model:
provider: anthropic
model: claude-sonnet-4-0
max_tokens: 64000
agents:
root:
model: model
description: Product Manager - Leads the development team
instruction: |
You are the Product Manager leading a development team...
# ... [detailed workflow and iteration principles]
Break down requirements into small iterations
Coordinate between designer → frontend → fullstack → QA
sub_agents: [designer, awesome_engineer]
toolsets:
- type: filesystem
- type: think
- type: todo
- type: memory
path: dev_memory.db
- type: mcp
ref: docker:context7
Earlier Docker Agent versions (4.49-4.62, when called cagent) use model_name at the agent level instead of a separate models block.
The dev-team.yaml example from Docker's repository defines a three-agent software development workflow. The root agent acts as product manager, breaking down user requirements into iterations and coordinating handoffs. The designer creates wireframes and responsive layouts. The "awesome engineer" implements both frontend and backend, integrating with the design specs.
Tool ecosystem: Built-in tools cover common needs — filesystem for code access, shell for command execution, think for reasoning, todo for task tracking, memory for persistent state. MCP (Model Context Protocol) servers extend this with external integrations like DuckDuckGo search, GitHub API, or documentation lookups via Context7.
Model providers: Agent is provider-agnostic. Configure OpenAI, Anthropic, Gemini, AWS Bedrock, Mistral, xAI, or Docker Model Runner for local models. Each agent can use a different model — your research agent might use a cheaper model for web searches while your code reviewer uses a premium model for complex analysis.
Packaging and distribution: Agents are OCI artifacts. Push them to any registry like container images:
docker agent push my-agent:latest
docker agent pull my-agent:latest
This makes agent teams portable and versionable. Your organization can maintain a catalog of pre-configured agents for common workflows: code review, security auditing, documentation generation.
Quick Start: Run Your First Agent
# Create a sandbox
docker sandbox create my-agent
# Run an agent from the examples
docker sandbox exec my-agent "docker agent run github.com/docker/docker-agent/examples/blog.yaml"
# Clean up
docker sandbox rm my-agent
This gives you an immediate hands-on path before diving into multi-agent patterns.
Multi-Agent Patterns
The power of Docker Agent emerges in how you structure agent relationships and workflows. Three patterns cover most use cases.
Hierarchical delegation — The root agent acts as coordinator, delegating to specialized sub-agents. This mirrors real team structures: a product manager doesn't write code, but coordinates designers and engineers. The dev-team.yaml example uses this pattern:
# Simplified example showing hierarchical structure
agents:
root:
description: Product Manager
sub_agents: [designer, awesome_engineer]
instruction: |
Break down requirements into iterations
Coordinate handoffs between team members
designer:
description: UI/UX Designer
instruction: Create wireframes and design specs
awesome_engineer:
description: Full Stack Engineer
instruction: Implement backend APIs, integrate frontend
Sequential workflows — Some workflows require strict ordering: research before writing, design before implementation. The root agent enforces this by calling agents one at a time. The blog.yaml example demonstrates:
# blog.yaml (excerpt)
agents:
root:
instruction: |
1. Call web_search_agent to research topic
2. Call writer to generate 750-word post
# ... [workflow enforcement: ONE AGENT AT A TIME]
sub_agents: [web_search_agent, writer]
toolsets:
- type: think
web_search_agent:
model: anthropic
description: Search the web for information
toolsets:
- type: mcp
ref: docker:duckduckgo
writer:
model: anthropic
description: Write technical blog posts
instruction: |
You are an agent that receives a single technical writing prompt...
# ... [750-word blog post generation with code examples]
The root agent's instructions explicitly enforce sequential execution. Only the web search agent has DuckDuckGo access — the writer focuses purely on content generation.
Shared state coordination — Agents don't share context automatically. For coordination, use shared memory and files. The dev-team.yaml agents all access dev_memory.db for persistent state and write decisions to .dev-team/dev-team.md:
toolsets:
- type: memory
path: dev_memory.db
This coordination file becomes the team's shared brain — tracking iteration progress, design decisions, and handoff notes. When an agent starts, it reads the file to understand the current state.
Tool isolation — Give agents only the tools they need. Your research agent needs web access but shouldn't write files. Your code reviewer needs filesystem access but shouldn't execute shell commands. This minimizes blast radius if an agent goes off the rails.
Docker Sandboxes — Security Architecture
Docker Agent solves orchestration. Docker Sandboxes solves security. Together, they enable safe autonomous execution.
The fundamental constraint: AI agents need to build images, run containers, and use Docker Compose. Giving an agent access to your host Docker daemon means it can see your containers, pull images, and run workloads directly on your system. That's too much access for autonomous code execution.
Why containers fail: Running the agent in a container doesn't solve this. Containers share the host kernel (or in Docker Desktop's case, the same virtual machine) and can't safely isolate something that needs its own Docker daemon. Docker-in-Docker approaches either require privileged mode or mounting the host socket — both compromise isolation.
MicroVMs provide the boundary: Each sandbox runs in a dedicated microVM with:
- Private Docker daemon: isolated from your host and other sandboxes
- Hypervisor-level isolation: macOS uses
virtualization.framework, Windows uses Hyper-V - Process and filesystem isolation: separate kernel, can't access host resources
- Bidirectional file sync: your workspace syncs at the same absolute path
- Disposable by default: delete the sandbox, everything inside is gone
When an agent runs docker build or docker compose up inside a sandbox, those commands execute against the sandbox's private daemon. The agent sees only containers it creates. It cannot access your host containers, images, or volumes.
Architecture overview:
Host Machine
├── Docker Desktop
│ └── Hypervisor (virtualization.framework / Hyper-V)
│ └── MicroVM (Sandbox)
│ ├── Private Docker Daemon
│ ├── Agent Process
│ └── Workspace (synced to host)
Bidirectional file sync preserves absolute paths. Your workspace at /Users/alice/projects/myapp on the host appears at the same path inside the sandbox. Changes sync both ways — edit a file on the host, the agent sees it. The agent modifies a file, you see the change. This is file synchronization, not volume mounting, which works across different filesystems.
Network isolation adds another control layer. Sandboxes have outbound internet access through your host's connection. An HTTP/HTTPS filtering proxy runs at host.docker.internal:3128. You can configure network policies to control which destinations are allowed.
Credential injection keeps API keys off the sandbox entirely. Docker Desktop runs a local HTTP proxy at host.docker.internal:3128 that intercepts requests to known AI provider endpoints (api.openai.com, api.anthropic.com, etc.). When a request matches, the proxy injects the Authorization header using the value from your host environment. The sandbox never sees the credential. It only sees the proxied response. When you remove the sandbox, no credentials remain inside.
What persists: Docker images and containers built by the agent, installed system packages, agent state (credentials, configuration, history), and workspace changes (synced to host).
What's ephemeral: The entire VM and its contents when you run docker sandbox rm. Images built inside, packages installed, state not synced — all gone.
Running Agents in Production
Installation depends on your setup:
- Docker Desktop 4.63+ —
docker-agentCLI plugin is pre-installed. Rundocker agentdirectly. - Homebrew —
brew install docker-agent. Symlink to~/.docker/cli-plugins/docker-agentto usedocker agent, or rundocker-agentstandalone. - Binary releases — Download from GitHub Releases, copy to
~/.docker/cli-plugins/.
Set at least one API key:
export OPENAI_API_KEY=sk-...
export ANTHROPIC_API_KEY=sk-ant-...
Or use Docker Model Runner for local models without API calls.
Credential management in sandboxes uses the proxy injection mechanism. Set environment variables on your host — the proxy injects them into API requests. For agents that require environment variables at startup (some custom installations), set them to placeholder values like proxy-managed — the proxy injects actual credentials regardless.
Resource overhead is the trade-off for isolation. Each sandbox consumes disk space for:
- VM disk image (grows as you build images and install packages)
- Docker images pulled or built inside
- Container layers and volumes
Multiple sandboxes don't share images or layers. Each has its own isolated daemon and storage. Expect several GB per active sandbox.
Sandboxes trade higher resource overhead for complete isolation. Use containers when you need lightweight packaging without Docker access. Use sandboxes when you need to give something autonomous full Docker capabilities without trusting it with your host environment.
When to use what:
| Approach | Isolation | Agent Docker Access | Host Impact | Use Case |
|---|---|---|---|---|
| Sandboxes (microVMs) | Hypervisor-level | Private daemon | None — fully isolated | Autonomous agents building/running containers |
| Container with socket mount | Kernel namespaces | Host daemon (shared) | Agent sees all host containers | Trusted tools that need Docker CLI |
| Host execution | None | Host daemon | Full — direct system access | Manual development by trusted humans |
Sandboxes trade higher resource overhead for complete isolation. Use containers when you need lightweight packaging without Docker access. Use sandboxes when you need to give something autonomous full Docker capabilities without trusting it with your host environment.
CI/CD integration leverages the disposable nature. Spin up a sandbox for each test run. Execute your agent-driven tests, then delete the sandbox. No cleanup required, no state leakage between runs.
Gotchas
Sandboxes cannot communicate with each other. Each VM has its own private network namespace. An agent in one sandbox cannot reach services or containers in another sandbox. Design workflows assuming complete isolation.
No access to host localhost services. The VM boundary prevents direct access to services running on your host machine. Your agent can't connect to localhost:5432 for a PostgreSQL database running on the host. Use public networking or set up port forwarding explicitly.
Disk usage grows unbounded. Each sandbox accumulates images, containers, and packages. Monitor disk usage and remove unused sandboxes with docker sandbox rm. There's no automatic garbage collection.
MCP server configuration requires Docker Desktop integration. Docker-based MCP servers (like docker:duckduckgo) need Docker Desktop's MCP gateway. Self-hosted MCP servers require manual configuration and network access from inside the sandbox.
Credential injection supports limited providers. The proxy automatically injects credentials for OpenAI, Anthropic, Google, GitHub, and a few others. Custom providers require manual credential management inside the sandbox.
Wrap-up
Docker Agent and Sandboxes solve complementary problems: orchestration and isolation. Agent gives you declarative multi-agent workflows with specialized roles and automated handoffs. Sandboxes gives you hypervisor-level isolation with private Docker daemons for safe autonomous execution.
The shift is from prompting one generalist agent to orchestrating teams of specialists. Each focused, each with minimal necessary tools, each running in its own isolated environment. Your host stays untouched. Your agents get the freedom to build, run, and test without permission prompts.
Run docker agent new to generate your first agent team interactively. Start with two agents — a researcher and a writer — before scaling to full development teams. The YAML is composable: once you understand delegation and tool isolation, you can build workflows as simple or sophisticated as your use case demands.
Mastering the Kubernetes ecosystem — depth-first, no hype.
Subscribe to KubeDojo
Get the latest articles delivered to your inbox.