January 13, 2025·12 min read

Why Your AI Agent Needs a Job Description: How SOUL.md and Role Templates Turn Generic LLMs Into Reliable Specialists

Every LLM is a generalist by default—and generalists make unreliable autonomous workers. Here's the lightweight file-based architecture that creates persistent specialist personas without fine-tuning.

SOUL.mdidentity architectureagent specializationLLM orchestrationAI agents

Agent identity architecture diagram showing SOUL.md and IDENTITY.md files

You wouldn't hire a software engineer by telling them "be helpful" and expecting consistent output across six months. Yet that's exactly how most teams deploy AI agents — via ephemeral system prompts that evaporate when the context window fills or the container restarts.

After running production agent swarms for 18 months, we've learned that identity is the fundamental primitive of reliable autonomy. Without persistent persona definitions, agents drift. The researcher starts writing code. The coder adopts marketing speak. The lead agent forgets it can delegate. This isn't a capability problem — it's an identity anchoring problem.

Why Do Generic System Prompts Fail at Scale?

Generic prompts like "you are a helpful coding assistant" create three critical failure modes:

Style drift: Without explicit constraints, past interactions slowly morph the agent's communication style. What starts as terse technical output becomes verbose explanations after 50 interactions because the model perceives helpfulness as thoroughness.
Scope creep: A Researcher agent, told only to "find information," will eventually start suggesting implementations because the boundary between research and solutioning was never architected.
Inconsistent decision heuristics: Without defined values to weight trade-offs, the same agent makes different architectural decisions on Tuesday than it did on Monday, given identical inputs.

In our swarm deployments, 73% of behavioral drift incidents traced back to identity ambiguity rather than model capability limitations. When we implemented the SOUL.md pattern, incident resolution time dropped from 4.2 hours to 12 minutes because we could diff the agent's current identity against its baseline.

The SOUL.md + IDENTITY.md Architecture

We separate agent persona into two distinct documents that live in your repository, not your prompt buffer:

SOUL.md defines who the agent is — its values, behavioral directives, boundaries, and self-evolution rules. This is the immutable (or slowly evolving) DNA. IDENTITY.md defines what it does — current expertise, working style preferences, tool proficiencies, and track record. This evolves as the agent learns.

Together, they create a persistent persona that survives session restarts, context compaction, and even model swaps (GPT-4 to Claude to Llama — same identity, different substrate).

# SOUL.md - Core Behavioral Architecture
## Values (Non-negotiable)
- PREFERENCE: Clarity over cleverness
- BOUNDARY: Never commit directly to main; always use feature branches
- SAFETY: Validate inputs before tool execution; fail closed on ambiguity

## Behavioral Directives
- COMMUNICATION: Use structured output when confidence < 0.8; prose when > 0.9
- CONFLICT_RESOLUTION: When task boundaries overlap with other agents,
  yield to the Lead and document the edge case
- ERROR_HANDLING: Blame yourself first, tools second, external APIs third

## Self-Evolution Rules
- May append to IDENTITY.md without approval
- Requires human sign-off to modify SOUL.md values
- Archive learnings to EVOLUTION_LOG.md weekly; summarize quarterly

Notice the taxonomy: Values are declarative and absolute. Directives are procedural. Evolution rules define the agent's autonomy boundary regarding its own identity. This structure prevents the "runaway prompt" scenario where an agent rewrites its own goals to maximize paperclip production.

The Template System: From Weeks to Minutes

We ship 9 official role templates that define the specialist archetypes needed for most software teams: Lead, Coder, Researcher, Reviewer, Tester, FDE (Full-Stack Design Engineer), Content-Writer, Content-Reviewer, and Content-Strategist.

Each template is a pre-validated IDENTITY.md file. Instead of iterating on prompts for three weeks to get a Coder agent that doesn't refactor working code for style points, you copy templates/coder.IDENTITY.md, adjust three lines for your stack, and deploy.

# IDENTITY.md - Coder Specialist Template

## Expertise Domain
- PRIMARY: TypeScript/Python backend services
- SECONDARY: Infrastructure as Code (Terraform, CDK)
- EXPLICIT_NON_EXPERTISE: Frontend CSS styling (delegate to FDE)

## Working Style
- COMMIT_GRANULARITY: Single concern per commit; max 50 lines changed
- COMMENT_PHILOSOPHY: Explain "why" not "what"; code should be self-documenting
- REFACTOR_THRESHOLD: Only refactor when cyclomatic complexity > 10

## Tool Preferences
- DEFAULT_LINTER: Biome (not ESLint/Prettier)
- TEST_FRAMEWORK: Vitest for unit, Playwright for e2e

## Track Record
- RECENT_LEARNING: Zod schema validation catches 40% of runtime errors
- AVOIDED_MISTAKES: ["Forced push to main", "Ignored lint error in hotfix"]

The Track Record section is crucial — it's how agents maintain continuity. When a Coder agent knows it already learned the Zod lesson last Tuesday, you don't pay the context tokens for that discovery again. When it records a mistake, that becomes a permanent behavioral guardrail.

Dimension	Generic System Prompt	SOUL.md + IDENTITY.md
Persistence	Session-bound; lost on restart	Git-versioned; survives indefinitely
Consistency	Drifts with context window pressure	Anchored; explicit evolution only
Auditability	Inaccessible; buried in logs	Full git history; diffable changes
Specialization Time	2–3 weeks of prompt iteration	Minutes using templates
Self-Improvement	None; static instructions	Agents edit their own identity

How Does Self-Evolution Work in Practice?

Self-evolution is where this architecture transcends traditional prompt engineering. Our Researcher agent recently discovered it produced better results when requesting structured JSON from search APIs rather than parsing HTML. It autonomously appended this to its IDENTITY.md:

## Working Style Updates
- API_PREFERENCE: Always request application/json via Accept headers;
  fallback to HTML parsing only if JSON unavailable
- NOTE: Added 2025-01-08 after 23% accuracy increase in source extraction

The implementation uses a lightweight approval workflow:

// TypeScript: Self-evolution controller
interface IdentityUpdate {
  agentId: string;
  targetFile: 'SOUL.md' | 'IDENTITY.md';
  diff: string;
  changeType: 'value' | 'expertise' | 'working_style';
  confidence: number;
  justification: string;
}

async function proposeIdentityUpdate(update: IdentityUpdate) {
  // Working style changes with high confidence auto-merge
  if (update.targetFile === 'IDENTITY.md' &&
      update.changeType === 'working_style' &&
      update.confidence > 0.85) {
    await applyDiff(update);
    await commitToGit(`[AUTO] ${update.agentId} identity update`);
    return;
  }

  // Value changes or SOUL.md edits require human review
  await createPullRequest({
    title: `[PENDING] Identity change for ${update.agentId}`,
    body: update.justification,
    diff: update.diff,
  });
}

This isn't autonomous self-modification — it's structured learning. The agent suggests, the system validates, humans approve boundary changes. Over six months, our Lead agent's IDENTITY.md grew from 40 lines to 180 lines, but its decision consistency improved from 68% to 94% alignment with senior engineer judgments.

The Lead-Worker Dynamic

Multi-agent orchestration fails when every agent tries to be the smartest in the room. We enforce specialization through identity constraints:

The Lead's SOUL.md includes orchestration primitives: task decomposition, priority assessment, agent capability matching, and conflict resolution. It knows it doesn't write code; it delegates. The Coder's identity explicitly forbids architectural decisions affecting other services — it executes within boundaries set by the Lead.

This creates natural routing without complex DAG engines. When a Researcher encounters a bug in the codebase, its identity file contains: IF found_implementation_bug THEN escalate_to Lead, do not fix. No hand-coded routing logic required — the agent's identity determines the control flow.

Anti-Patterns and Edge Cases

We've learned these failure modes the hard way:

The Micromanaged Identity: Specifying every possible decision in SOUL.md creates agents that can't adapt to novel situations. We limit SOUL.md to 5-7 core values and 10-12 behavioral directives. Everything else belongs in IDENTITY.md where it's mutable.
The Vague Identity: "Be professional" is worse than no identity. It's unenforceable and untestable. Every directive must be observable: "Use sentence case for commit messages" is verifiable; "write good commits" is not.
Identity Bloat: After 3 months of self-evolution, one agent's IDENTITY.md hit 4k tokens, consuming 15% of the context window. We now implement quarterly identity compaction — archiving learnings over 90 days old to a history file.

Git Conflicts in Self-Editing: When multiple agents share a base template and edit their individual IDENTITY.md files simultaneously, merging upstream template updates becomes merge-hell. We solve this by making IDENTITY.md a generated file — composed from an immutable base template plus an agent-specific delta file. The delta is what the agent edits; the composite is what gets loaded into context.

Context Window Pressure: In long-running sessions, the combination of SOUL.md + IDENTITY.md + conversation history can exceed limits. We prioritize: keep SOUL.md in full, summarize IDENTITY.md to recent entries + key values, and never truncate the current task context. If compression is needed, archive older conversation turns before touching identity files.

Implementation Checklist

Ready to implement identity architecture? Start here:

Create agents/{agent_id}/SOUL.md with core values and evolution rules
Copy the appropriate template to IDENTITY.md and customize expertise domains
Load both files into system prompt context before user messages
Implement the proposeIdentityUpdate handler with your git workflow
Set up metrics tracking to validate that identity changes improve performance
Schedule quarterly reviews to compact identity files and audit evolution logs

The result? Agents that know who they are, what they do, and how they've failed before. Specialists that don't drift into each other's lanes. A codebase where git log agents/ shows you exactly how your AI workforce is maturing.

That's the difference between hiring generalists and building a team.