KubeDojo

Docker Agent: Multi-Agent AI Teams in Sandboxed Environments

KubeDojo
by KubeDojo··15 min read·
Docker Agent: Multi-Agent AI Teams in Sandboxed Environments

Introduction

You want your AI coding agent to build images, run containers, and test deployments autonomously. But giving it access to your host Docker daemon is like handing a stranger your house keys. They can see everything, touch everything, and leave traces everywhere.

This is the autonomy paradox: agents need Docker to be useful, but can't be trusted with your Docker.

Docker Desktop 4.63+ solves this with two complementary tools. Docker Agent is an open-source framework for building teams of specialized AI agents with declarative YAML configuration. Docker Sandboxes runs those agents in isolated microVMs with private Docker daemons, providing hypervisor-level isolation that keeps your host untouched. (Previously called cagent in Docker Desktop 4.49-4.62 — same tool, renamed CLI.)

Instead of prompting one generalist agent that context-switches constantly, you define a team: a product manager that breaks down requirements, a designer that creates wireframes, a full-stack engineer that implements features. Each agent has specialized instructions, minimal necessary tools, and clear handoff protocols. The root agent coordinates the workflow automatically.

Docker Agent — Declarative AI Teams

Single generalist agents try to do everything: research, design, code, test. The result is constant context-switching and jack-of-all-trades output. Docker Agent shifts the model from prompting to orchestration.

You define agents in YAML — their models, instructions, tools, and relationships. A root agent delegates tasks to sub-agents with specialized roles. Each agent stays focused on its lane.

Agent anatomy:

# Agent configuration format (Docker Agent 0.3+)
models:
  model:
    provider: anthropic
    model: claude-sonnet-4-0
    max_tokens: 64000

agents:
  root:
    model: model
    description: Product Manager - Leads the development team
    instruction: |
      You are the Product Manager leading a development team...
      # ... [detailed workflow and iteration principles]
      Break down requirements into small iterations
      Coordinate between designer → frontend → fullstack → QA
    sub_agents: [designer, awesome_engineer]
    toolsets:
      - type: filesystem
      - type: think
      - type: todo
      - type: memory
        path: dev_memory.db
      - type: mcp
        ref: docker:context7

Earlier Docker Agent versions (4.49-4.62, when called cagent) use model_name at the agent level instead of a separate models block.

The dev-team.yaml example from Docker's repository defines a three-agent software development workflow. The root agent acts as product manager, breaking down user requirements into iterations and coordinating handoffs. The designer creates wireframes and responsive layouts. The "awesome engineer" implements both frontend and backend, integrating with the design specs.

Tool ecosystem: Built-in tools cover common needs — filesystem for code access, shell for command execution, think for reasoning, todo for task tracking, memory for persistent state. MCP (Model Context Protocol) servers extend this with external integrations like DuckDuckGo search, GitHub API, or documentation lookups via Context7.

Model providers: Agent is provider-agnostic. Configure OpenAI, Anthropic, Gemini, AWS Bedrock, Mistral, xAI, or Docker Model Runner for local models. Each agent can use a different model — your research agent might use a cheaper model for web searches while your code reviewer uses a premium model for complex analysis.

Packaging and distribution: Agents are OCI artifacts. Push them to any registry like container images:

docker agent push my-agent:latest
docker agent pull my-agent:latest

This makes agent teams portable and versionable. Your organization can maintain a catalog of pre-configured agents for common workflows: code review, security auditing, documentation generation.

Quick Start: Run Your First Agent

# Create a sandbox
docker sandbox create my-agent

# Run an agent from the examples
docker sandbox exec my-agent "docker agent run github.com/docker/docker-agent/examples/blog.yaml"

# Clean up
docker sandbox rm my-agent

This gives you an immediate hands-on path before diving into multi-agent patterns.

Multi-Agent Patterns

The power of Docker Agent emerges in how you structure agent relationships and workflows. Three patterns cover most use cases.

Hierarchical delegation — The root agent acts as coordinator, delegating to specialized sub-agents. This mirrors real team structures: a product manager doesn't write code, but coordinates designers and engineers. The dev-team.yaml example uses this pattern:

# Simplified example showing hierarchical structure
agents:
  root:
    description: Product Manager
    sub_agents: [designer, awesome_engineer]
    instruction: |
      Break down requirements into iterations
      Coordinate handoffs between team members
  
  designer:
    description: UI/UX Designer
    instruction: Create wireframes and design specs
  
  awesome_engineer:
    description: Full Stack Engineer  
    instruction: Implement backend APIs, integrate frontend

Sequential workflows — Some workflows require strict ordering: research before writing, design before implementation. The root agent enforces this by calling agents one at a time. The blog.yaml example demonstrates:

# blog.yaml (excerpt)
agents:
  root:
    instruction: |
      1. Call web_search_agent to research topic
      2. Call writer to generate 750-word post
      # ... [workflow enforcement: ONE AGENT AT A TIME]
    sub_agents: [web_search_agent, writer]
    toolsets:
      - type: think
  
  web_search_agent:
    model: anthropic
    description: Search the web for information
    toolsets:
      - type: mcp
        ref: docker:duckduckgo
  
  writer:
    model: anthropic
    description: Write technical blog posts
    instruction: |
      You are an agent that receives a single technical writing prompt...
      # ... [750-word blog post generation with code examples]

The root agent's instructions explicitly enforce sequential execution. Only the web search agent has DuckDuckGo access — the writer focuses purely on content generation.

Shared state coordination — Agents don't share context automatically. For coordination, use shared memory and files. The dev-team.yaml agents all access dev_memory.db for persistent state and write decisions to .dev-team/dev-team.md:

toolsets:
  - type: memory
    path: dev_memory.db

This coordination file becomes the team's shared brain — tracking iteration progress, design decisions, and handoff notes. When an agent starts, it reads the file to understand the current state.

Tool isolation — Give agents only the tools they need. Your research agent needs web access but shouldn't write files. Your code reviewer needs filesystem access but shouldn't execute shell commands. This minimizes blast radius if an agent goes off the rails.

Docker Sandboxes — Security Architecture

Docker Agent solves orchestration. Docker Sandboxes solves security. Together, they enable safe autonomous execution.

The fundamental constraint: AI agents need to build images, run containers, and use Docker Compose. Giving an agent access to your host Docker daemon means it can see your containers, pull images, and run workloads directly on your system. That's too much access for autonomous code execution.

Why containers fail: Running the agent in a container doesn't solve this. Containers share the host kernel (or in Docker Desktop's case, the same virtual machine) and can't safely isolate something that needs its own Docker daemon. Docker-in-Docker approaches either require privileged mode or mounting the host socket — both compromise isolation.

MicroVMs provide the boundary: Each sandbox runs in a dedicated microVM with:

  • Private Docker daemon: isolated from your host and other sandboxes
  • Hypervisor-level isolation: macOS uses virtualization.framework, Windows uses Hyper-V
  • Process and filesystem isolation: separate kernel, can't access host resources
  • Bidirectional file sync: your workspace syncs at the same absolute path
  • Disposable by default: delete the sandbox, everything inside is gone

When an agent runs docker build or docker compose up inside a sandbox, those commands execute against the sandbox's private daemon. The agent sees only containers it creates. It cannot access your host containers, images, or volumes.

Architecture overview:

Host Machine
├── Docker Desktop
│   └── Hypervisor (virtualization.framework / Hyper-V)
│       └── MicroVM (Sandbox)
│           ├── Private Docker Daemon
│           ├── Agent Process
│           └── Workspace (synced to host)

Bidirectional file sync preserves absolute paths. Your workspace at /Users/alice/projects/myapp on the host appears at the same path inside the sandbox. Changes sync both ways — edit a file on the host, the agent sees it. The agent modifies a file, you see the change. This is file synchronization, not volume mounting, which works across different filesystems.

Network isolation adds another control layer. Sandboxes have outbound internet access through your host's connection. An HTTP/HTTPS filtering proxy runs at host.docker.internal:3128. You can configure network policies to control which destinations are allowed.

Credential injection keeps API keys off the sandbox entirely. Docker Desktop runs a local HTTP proxy at host.docker.internal:3128 that intercepts requests to known AI provider endpoints (api.openai.com, api.anthropic.com, etc.). When a request matches, the proxy injects the Authorization header using the value from your host environment. The sandbox never sees the credential. It only sees the proxied response. When you remove the sandbox, no credentials remain inside.

What persists: Docker images and containers built by the agent, installed system packages, agent state (credentials, configuration, history), and workspace changes (synced to host).

What's ephemeral: The entire VM and its contents when you run docker sandbox rm. Images built inside, packages installed, state not synced — all gone.

Running Agents in Production

Installation depends on your setup:

  • Docker Desktop 4.63+docker-agent CLI plugin is pre-installed. Run docker agent directly.
  • Homebrewbrew install docker-agent. Symlink to ~/.docker/cli-plugins/docker-agent to use docker agent, or run docker-agent standalone.
  • Binary releases — Download from GitHub Releases, copy to ~/.docker/cli-plugins/.

Set at least one API key:

export OPENAI_API_KEY=sk-...
export ANTHROPIC_API_KEY=sk-ant-...

Or use Docker Model Runner for local models without API calls.

Credential management in sandboxes uses the proxy injection mechanism. Set environment variables on your host — the proxy injects them into API requests. For agents that require environment variables at startup (some custom installations), set them to placeholder values like proxy-managed — the proxy injects actual credentials regardless.

Resource overhead is the trade-off for isolation. Each sandbox consumes disk space for:

  • VM disk image (grows as you build images and install packages)
  • Docker images pulled or built inside
  • Container layers and volumes

Multiple sandboxes don't share images or layers. Each has its own isolated daemon and storage. Expect several GB per active sandbox.

Sandboxes trade higher resource overhead for complete isolation. Use containers when you need lightweight packaging without Docker access. Use sandboxes when you need to give something autonomous full Docker capabilities without trusting it with your host environment.

When to use what:

Approach Isolation Agent Docker Access Host Impact Use Case
Sandboxes (microVMs) Hypervisor-level Private daemon None — fully isolated Autonomous agents building/running containers
Container with socket mount Kernel namespaces Host daemon (shared) Agent sees all host containers Trusted tools that need Docker CLI
Host execution None Host daemon Full — direct system access Manual development by trusted humans

Sandboxes trade higher resource overhead for complete isolation. Use containers when you need lightweight packaging without Docker access. Use sandboxes when you need to give something autonomous full Docker capabilities without trusting it with your host environment.

CI/CD integration leverages the disposable nature. Spin up a sandbox for each test run. Execute your agent-driven tests, then delete the sandbox. No cleanup required, no state leakage between runs.

Gotchas

Sandboxes cannot communicate with each other. Each VM has its own private network namespace. An agent in one sandbox cannot reach services or containers in another sandbox. Design workflows assuming complete isolation.

No access to host localhost services. The VM boundary prevents direct access to services running on your host machine. Your agent can't connect to localhost:5432 for a PostgreSQL database running on the host. Use public networking or set up port forwarding explicitly.

Disk usage grows unbounded. Each sandbox accumulates images, containers, and packages. Monitor disk usage and remove unused sandboxes with docker sandbox rm. There's no automatic garbage collection.

MCP server configuration requires Docker Desktop integration. Docker-based MCP servers (like docker:duckduckgo) need Docker Desktop's MCP gateway. Self-hosted MCP servers require manual configuration and network access from inside the sandbox.

Credential injection supports limited providers. The proxy automatically injects credentials for OpenAI, Anthropic, Google, GitHub, and a few others. Custom providers require manual credential management inside the sandbox.

Wrap-up

Docker Agent and Sandboxes solve complementary problems: orchestration and isolation. Agent gives you declarative multi-agent workflows with specialized roles and automated handoffs. Sandboxes gives you hypervisor-level isolation with private Docker daemons for safe autonomous execution.

The shift is from prompting one generalist agent to orchestrating teams of specialists. Each focused, each with minimal necessary tools, each running in its own isolated environment. Your host stays untouched. Your agents get the freedom to build, run, and test without permission prompts.

Run docker agent new to generate your first agent team interactively. Start with two agents — a researcher and a writer — before scaling to full development teams. The YAML is composable: once you understand delegation and tool isolation, you can build workflows as simple or sophisticated as your use case demands.

KubeDojo
KubeDojo

Mastering the Kubernetes ecosystem — depth-first, no hype.

Subscribe to KubeDojo

Get the latest articles delivered to your inbox.