Subagent Designs — Studio Notes

Every coding agent hits the same wall: you ask it to do something complex, and halfway through it forgets what it was doing. The context window fills up with file contents and test output, and suddenly it’s lost.

Subagents fix this. Spin up a separate context, let it do the messy work, get back a summary.

Simple idea, but the implementations vary a lot. I think there are roughly three levels of sophistication emerging.

1.Pick a preset

This is where production tools are today.

Claude Code, Anthropic’s coding CLI, leans into automatic delegation. Each subagent has a description, and Claude just decides when to use one. It ships with Explore (fast read-only search on Haiku), Plan (gathers context before strategizing), and General-purpose (full tool access). You can define custom ones as Markdown files. Subagents can’t spawn other subagents. Deliberately flat.

It solves context pollution well. But the subagent configs are static, decided before the conversation starts. The main agent picks which subagent, never what it can do. Works great for known workflows, less so when the task doesn’t fit any predefined template.

spawn plan-subagent "analyze reactant structure"
> Plan initialized. Grabbing package.json and webpack config…
> Scanning root structures… found 8 dependencies.
> COMPLETED: Formulated 4-step implementation plan successfully.

spawn explore-subagent "grep search '.map('"
> Explore initialized. Scanning files in /src/components/…
> Located 1 matching unkeyed map in src/components/Grid.jsx:124.
> COMPLETED: Identified all relevant query occurrences.

spawn general-purpose-subagent "apply React key fix"
> GP initialized. Replacing lines 124 in src/components/Grid.jsx…
> Executing: npm run test:components
> output: ✓ All 4 unit tests successfully passed.
> COMPLETED: Patched files and validated branch builds clean.

2.Build on the fly

Instead of picking from a catalog, the main agent constructs subagents at runtime, choosing tools, models, and instructions based on the actual task.

Slate, a coding agent from Random Labs, is squarely subagent-focused. The main agent dynamically hands each subagent a subset of tools scoped to the task, rather than exposing the full toolkit. Subagents can run different models, and return summaries rather than raw transcripts. Worth trying. Install it here; they offer $10 in free usage to start.

OpenSage, a research paper on dynamic agent orchestration, pushes further. It doesn’t just dynamically assign a subset of tools to each subagent — it can also dynamically create tools. The LLM writes them at runtime (actual executable functions) and assigns them per-task. Working on a database migration? The parent might mint a subagent with custom schema-inspection tools that didn’t exist in the original toolkit. Two topology patterns emerge naturally: vertical (sequential pipeline) and horizontal (multiple agents trying different strategies, results ensembled).

AgentFactory, another paper, optimizes across sessions. Successful subagents get saved as Python code. Next time a similar task appears, the system reuses and refines them. After a few tasks you’ve got a growing library that cuts token usage roughly in half. The Meta-Agent dynamically scopes tools per subagent rather than dumping the full toolkit on every worker.

The orchestrator designs the worker at runtime. Right tools, right model, right instructions, shaped by the task.

01 · Parent Analyzes the database migration request. Needs to inspect active SQLite schemas.

→

02 · Tool compiler Writes inspect_sqlite_schema.py at runtime to extract structural maps.

→

03 · Transient worker Fresh Haiku subagent scoped to only the newly compiled DB tool.

But it’s still hierarchical. One boss, multiple workers. Workers don’t talk to each other.

3.The DAG (mostly theoretical)

Instead of one orchestrator delegating ad hoc, the system decomposes the task into a dependency graph where nodes are agents and edges are data dependencies.

Another way to picture it: each agent acts like a team member with a specific role, and the structure mirrors an actual team hierarchy. Take shipping a new feature. A tech lead agent scopes the work and fans out three tracks in parallel: backend, frontend, and QA. Each delegates further. Backend spawns a migration specialist alongside a handler-writer. Frontend splits component work from state logic. QA, once the diff is ready, runs unit and integration testers side by side. Many branches execute concurrently.

flowchart TD
    L["Tech Lead
scope & delegate"]
    B["Backend
build API"]
    F["Frontend
build UI"]
    Q["QA Lead
plan coverage"]
    M["Migration
schema changes"]
    HD["Handlers
endpoint logic"]
    C["Components
UI parts"]
    S["State
store & sync"]
    UT["Unit Tests
per-module"]
    IT["Integration
end-to-end"]

    L --> B
    L --> F
    L --> Q
    B --> M
    B --> HD
    F --> C
    F --> S
    Q --> UT
    Q --> IT

A DAG is more flexible than its one-way diagram suggests. A common variant gives each agent its own directory on a shared filesystem, with write access to its own and read access to everyone else’s. Agents pull partial outputs from their dependencies as those appear, and coordination effectively becomes bidirectional without losing the declarative structure.

A close cousin is nested subagents, where a subagent can itself decide to spawn another subset of subagents. Same underlying idea as the DAG: control flow shaped by the task rather than fixed ahead of time. The real distinction is when the structure gets formed. A DAG is decomposed upfront. Nested subagents grow recursively at runtime, with each parent deciding what to spawn based on what its own context just revealed.

AdaptOrch, a paper on DAG-based routing, shows that picking topologies based on task characteristics consistently beats fixed pipelines. DyTopo, a companion line of work on semantic message routing, has agents publish what they need and what they can offer, with a manager routing messages only where the match is useful.

One open idea: not all subagent outputs are equally important. A quick lookup might collapse cleanly into a single line. A piece of subtle reasoning that the parent needs to verify might warrant the full transcript. So the subagent should be able to choose the granularity of what it returns, up to promoting its entire context back to the parent when a summary would lose too much.

Another: reusability. Every run is data. If we log the call graphs, the inputs, the outputs, and especially the error paths, we can refine subagents that keep failing the same way. Winning ones get saved and reused. Broken ones get evaluated and rewritten. Over enough tasks, the system starts to learn what a good subagent for this kind of job actually looks like.

4.Where are we?

Production is solidly Level 1, inching into Level 2. Research is heading toward Level 3.

We aren’t at Level 3 yet. But given how fast these frameworks are moving, we might be entering it sooner than I would have guessed.

Note to self: revisit nested subagents and topology benchmarks next.