The Architecture Behind Task Delegation: Pools, Routing, and Dependencies
Most multi-agent systems start simple: one agent gets a task, does the work. But what happens when you have 6 agents, 50 concurrent tasks, and dependencies between them? Here's how we built the delegation system behind Agent Swarm — and the hard lessons from 3,000+ completed tasks.
If you're building a multi-agent system, task delegation is the first problem that becomes non-trivial. A single agent with a single task is straightforward. But the moment you have multiple agents with different capabilities, concurrent work, and tasks that depend on each other, you need an actual system.
We've been running Agent Swarm in production for over 90 days. Six agents. Over 3,000 completed tasks. The delegation architecture has been rewritten twice. Here's what we landed on and why.
The Task Lifecycle
Every piece of work in Agent Swarm flows through a state machine. This sounds obvious, but getting the states right took multiple iterations. Our first version had three states (pending, running, done). The current version has ten — but the core flow looks like this:
The key insight was separating assignment from execution. A task can be assigned to an agent (pending) but not yet started. This matters because agents run in Docker containers that poll for work — there is a real gap between "this task is yours" and "the agent has picked it up."
Task Pools vs. Direct Assignment
We support two delegation models, and you need both. Direct assignment is when the Lead agent knows exactly who should do the work: "Picateclas, implement this PR." Task pools are when work is posted without a specific assignee and agents claim it.
Direct Assignment
The Lead creates a task and assigns it to a specific agent by ID. The target agent's runner picks it up on next poll. Best for specialized work where only one agent has the right context — like assigning a PR review to the Reviewer agent.
Task Pools
Tasks are created as "unassigned" with tags (e.g., implementation, research). Idle agents poll the pool, filter by their capabilities, and claim tasks. This is how we load-balance — if one coder agent is busy, another can pick up the work.
Offer/Accept Pattern
A middle ground: the Lead offers a task to a specific agent, but the agent can reject it (e.g., if it lacks context or is overloaded). Rejected tasks go back to the pool or get re-offered. This prevents forcing work onto agents that would fail at it.
How the Lead Routes Work
The Lead agent is itself an AI — it reads Slack messages, interprets intent, breaks down complex requests into sub-tasks, and decides who does what. This is both the most powerful and most fragile part of the system.
The Lead has access to every agent's profile: their name, role description, current status, and task history. When a new request comes in, it evaluates which agent is the best fit based on role specialization and current workload. A bug fix goes to the coder. A PR review goes to the reviewer. A research question goes to the researcher.
But the Lead also handles task decomposition. A Slack message like "add a new blog post to the landing page" becomes: (1) research the existing blog format, (2) write the post, (3) test the build, (4) create a PR. The Lead creates these as separate tasks with dependencies.
Task Dependencies
Real work has order. You can't review a PR that doesn't exist yet. You can't deploy code that hasn't been tested. Agent Swarm supports dependsOn — a list of task IDs that must complete before a task becomes eligible for execution.
When a task's dependencies are all completed, it transitions from blocked to its normal lifecycle. If any dependency fails, the dependent task can be automatically cancelled or left for the Lead to decide.
We also built Workflows on top of this — declarative DAGs of tasks with interpolation. A workflow definition says "run task A, pass its output to task B, then fan out to tasks C and D in parallel." Workflows are how we handle recurring multi-step processes like daily content generation or scheduled health checks.
// Workflow node with cross-step data access
{
"id": "write-post",
"type": "agent-task",
"inputs": { "research": "gather-data" },
"config": {
"template": "Write a post using: {{research.taskOutput}}"
},
"next": ["review-post"]
}When Tasks Fail
With 3,000+ completed tasks, we've also seen hundreds of failures. The delegation system needs to handle failure as a first-class outcome, not an exception.
Structured failure reasons
Every failed task includes a failureReason field. This isn't just for humans — the Lead reads it to decide whether to retry, reassign, or escalate.
Automatic memory indexing
When a task fails, the failure reason is indexed into the swarm's memory. Next time a similar task comes up, agents can search for past failures and avoid the same mistakes.
Concurrency limits
Each agent has a MAX_CONCURRENT_TASKS setting. Overloading an agent causes context window pressure, leading to degraded output. We run most agents at 1-2 concurrent tasks.
Pause and resume
Long-running tasks can be paused (freeing the agent for urgent work) and resumed later with full context. The session state is preserved so the agent picks up exactly where it left off.
Key Takeaways for Builders
Model your task states explicitly
Don't collapse assignment and execution into one state. Distributed agents need at least: unassigned, assigned, running, completed, failed. Add more as you discover gaps.
Support both direct assignment and pools
Direct assignment for specialized work, pools for load balancing. The offer/accept pattern is the bridge between them — the Lead suggests, the agent decides.
Keep dependency DAGs wide and shallow
Fan out work early, join results late. Deep chains are fragile. Our most reliable workflows have a fan-out of 3-5 parallel tasks with 2-3 levels of depth.
Index failures into memory
Every failed task is a learning opportunity. If agents can search past failures before starting similar work, your retry success rate goes up significantly.
Context is everything
The single biggest lever for task success is the quality of the task description. Include the repo, the files, the intent, and any constraints. An agent with good context succeeds; one without it wastes cycles.
Task delegation is the backbone of any multi-agent system. Get it wrong and your agents step on each other, starve for work, or fail silently. Get it right and you have a system that scales — add more agents, handle more work, without changing the architecture.
Agent Swarm is open source. The full task lifecycle, pool implementation, and workflow engine are in the repo. If you're building your own agent orchestrator, start there.