May 25, 2026

I Built 12,000 Lines of Agent Orchestration. Then I Deleted Them.

How Guild went from a multi-agent runtime competing with Claude Code to a template library that works with it — and why the code you delete matters more than the code you keep.

The Beginning

When I started Guild in February 2026, Claude Code was powerful but raw. It had no structured workflows, no agent roles, no session persistence. Ask it to build a feature and it would start writing code immediately — no evaluation, no design phase, no review gate.

So I built what seemed obvious: an orchestration runtime. A workflow engine that would parse YAML step definitions, dispatch agent tasks, manage state transitions, handle retries and failure routing, track token usage, and record execution traces.

By v1.4.0, Guild had 12,000+ lines of JavaScript: an orchestrator with a pure state machine, an executor with parallel dispatch and recursive delegation, a dispatch protocol with model tier resolution and fallback chains, a trace system, a learnings engine, and a provider layer that spawned Claude CLI subprocesses.

It was well-architected. Thoroughly tested (626 tests). The orchestrator was the crown jewel — zero I/O, pure functions, clean state transitions.

And nobody used it.

The Realization

Here's what actually happened when someone typed /build-feature in Claude Code:

Claude Code loaded the SKILL.md file
The LLM read the workflow steps as prose instructions
Claude Code followed the phases using its native Agent tool

The YAML workflow? The LLM read it as structured documentation. The executor? Never touched. The orchestrator? Completely bypassed. Users were getting value from the templates — the agent role definitions and skill workflows. The runtime was a parallel universe no one visited.

Meanwhile, Claude Code had evolved. It now had native parallel agent execution, worktree isolation, background mode, a memory system, plan mode, task tracking. Every feature I'd built in the executor, Claude Code did natively — and better, because it had conversation context that my subprocess-spawning executor didn't.

The Decision

I sat down and listed what Guild actually provided that Claude Code didn't:

Agent role templates — well-crafted .md files with identity, responsibilities, and boundaries
Skill workflows — structured slash commands that enforce evaluation, design, and review phases
The eval system — structural assertions, trigger accuracy tests, semantic matching, regression detection
Session continuity — SESSION.md for ephemeral state, complementing Claude Code's long-term memory

And what Guild had that was redundant:

An executor competing with Claude Code's Agent tool
An orchestrator competing with how the LLM naturally follows instructions
A dispatch system competing with Claude Code's model selection
A trace system, accounting, pricing — infrastructure for a runtime nobody ran

The Delete

-10,700 lines deleted

86 files removed

4,500 lines remaining

In one session, I deleted the executor, orchestrator, dispatch protocol, trace system, accounting, pricing, learnings engine, provider layer, and four CLI commands. 86 files. 10,700 lines. Gone.

What remained was exactly what provided value: 6 role definitions, 10 skill workflows, the eval system, and a CLI for setup and validation.

What I Learned

Build for the platform, not against it. When I started, Claude Code couldn't orchestrate multi-step workflows. Building a runtime made sense. But platforms evolve, and what was a gap becomes a feature. The right move is to complement the platform, not compete with it.

Content beats infrastructure. Users didn't care about my elegant state machine. They cared that /build-feature made Claude Code evaluate their idea before writing code. The value was in the 50 lines of a well-written SKILL.md, not the 500 lines of executor that loaded it.

Measure what matters. The eval system — the part I almost considered secondary — turned out to be Guild's most unique contribution. No other framework measures whether its own skills actually work correctly. That's the kind of differentiation that survives platform evolution.

Session continuity fills a real gap. Claude Code's memory system stores long-term knowledge but explicitly excludes ephemeral work state. SESSION.md captures where you stopped — what branch, what phase, what's next. The combination of both gives you something neither provides alone.

Guild Today

Guild v2 is a Claude Code plugin. Install with /plugin install Guild-Agents/guild and you get 10 skill workflows and 6 role definitions. No npm install, no init command, no configuration. Claude Code reads the templates natively.

The eval CLI is still there for anyone who wants to validate skill quality. But the core product is the content — the structured workflows that make Claude Code think before it builds.

Sometimes the best engineering decision is knowing what to delete.

Try Guild

/plugin install Guild-Agents/guild