May 6, 2026·13 min read

Memory Poisoning: Why Persistent Agent Memory Is a Time Bomb

Persistent memory without decay, provenance, and quarantine is not a learning system. It is shared mutable global state dressed in vector embeddings, and at swarm scale it produces failures that hide in plain sight.

agent memorymemory poisoningvector searchAI orchestrationtemporal decayagent-swarm

Cartoon dog sitting calmly in a burning room saying 'this is fine' — a metaphor for an agent confidently consulting poisoned memory — Your agent retrieving a 23-day-old API endpoint at top-1 cosine similarity.

It started with a Researcher agent documenting an external API endpoint. The endpoint worked perfectly when written to memory—a standard task completion note stored in our SQLite-backed vector store with a 512-dim text-embedding-3-small vector. Twenty-three days later, the third-party provider deprecated the endpoint without warning.

Six different agents subsequently retrieved that memory over the course of a week. Each agent was working on unrelated tasks: one fetching market data, another updating a dashboard, another running a compliance check. But they all issued queries semantically related to that API—“fetch user metrics,” “update analytics,” “pull quarterly data.” And because the memory contained the exact endpoint description, it ranked top-1 in cosine similarity every time.

The failure modes were different for each agent. The dashboard agent threw a 404 and retried until timeout. The compliance agent got a 200 with a different payload shape and silently parsed garbage. The market data agent hit a redirect loop. Because each failure looked task-specific, we spent days debugging network policies, retry logic, and prompt engineering before tracing the common factor: they had all consulted the same 23-day-old memory.

The Compounding Cost

Direct cost: 47 hours of engineering time, 1,200+ wasted API calls, and corrupted output in three client-facing reports. Meta-cost: we stopped trusting memory recall. For two weeks afterward, every agent log showing “retrieved from long-term memory” triggered a manual verification. The system had taught us to distrust it.

Why Semantic Search Amplifies Poisoning Instead of Mitigating It

Here is the uncomfortable truth about vector embeddings: they encode topic, not truth. A memory describing a working endpoint and a memory describing a broken endpoint produce vectors with nearly identical cosine similarity to a query about that endpoint. The embedding model does not know the endpoint was deprecated. It only knows that both memories discuss the same HTTP methods and URL patterns.

Worse, most memory implementations—including ours initially—use access-count boosting as a relevance signal. Frequently accessed memories rank higher. This creates a positive feedback loop: once a poisoned memory is consulted once, it rises in rank, gets consulted more, and entrenches itself. We audited our swarm’s behavior over 90 days and found the median age of top-3 search results increased from 4 days to 41 days, while the share referencing deprecated tools grew from under 5% to roughly 17%.

This is a slow-motion accuracy regression that no per-task metric catches. Individual tasks complete successfully—they just complete with increasing frequency of stale data. It is like rot in a wooden foundation: invisible until the floor collapses.

The Infrastructure Gap You Probably Have

We audited our agent_memory schema (SQLite + 512-dim vectors via text-embedding-3-small) and found we had created accessedAt and accessCount columns in our migration (migrations/001_initial.sql). We were updating them on every read. But our retrieval logic in src/be/db.ts was pure cosine similarity—brute force, unweighted, oblivious to the temporal metadata sitting right there in the same row.

This pattern repeats across the ecosystem. Mem0, Letta, LangGraph’s persistent memory layers, the various “memory layer” libraries on GitHub—they store recency and access metadata but default to semantic retrieval. The decay primitives are sitting unused like fire extinguishers in a house where nobody knows how to pull the pin.

The Four Decay Primitives (And Why None Work Alone)

We now treat four primitives as a non-negotiable unit. Shipping any one without the others creates a different failure mode.

1. Time-Based Exponential Decay

Every memory has a half-life. At retrieval time, we calculate effective relevance as:

effective_score = cosine_sim × exp(-age / half_life) × quarantine_factor

We tune half-lives per source: ~14 days for task completions (volatile), ~90 days for manual entries (curated), indefinite for file indices (stable unless the file changes). Without this, you keep retrieving six-month-old API documentation that “feels” relevant because it mentions the right nouns.

2. Provenance Tagging

Every memory carries sourceTaskId, sourceAgent, and sourceType. When a downstream task fails, we backward-trace: which memories did this agent consult, and which of those should be quarantined? Without provenance, you cannot attribute failure to memory. You are debugging blind.

3. Failure-Driven Quarantine

When a task fails after consulting a top-ranked memory, that memory’s similarity weight drops 50% for 30 days. Two failures and it is hidden from search entirely, pending human review. This is your immune system. Without it, poisoned memories that “look” correct by embedding standards persist indefinitely.

4. Embedding Outlier Detection

Memories whose embeddings drift far from their cluster centroid—detected via Mahalanobis distance on the 512-dim space—get flagged. These are likely hallucinated content, model drift artifacts, or topic-creep (a memory that started as API docs but the agent appended unrelated thoughts). We quarantine automatically if distance exceeds 3σ.

Why the Unit Matters

Time-decay alone deletes good old knowledge that is still valid.
Provenance alone does not help until something fails.
Quarantine alone fires too late for the first victim.
Outlier detection misses correct-looking-but-stale memories.

The combination is the architecture. Any subset is a liability.

What Persistent Memory Really Is

Persistent agent memory has reinvented all the problems that programming languages spent fifty years solving. Shared mutable state? You have it—every agent reads and writes the same vector space. Unbounded retention? Check—memories accumulate until your database chokes. Entrenched bad actors? Absolutely—poisoned memories that rank high. Opaque provenance? By default, yes—you do not know which task wrote what.

The right mental model is not “a database the agent reads.” It is a perishable inventory. Every memory has a freshness date, a confidence score that decays, a chain of custody, and a return policy. Treat it like a grocery store, not a library. Librarians preserve; grocers rotate stock before it spoils.

The Implementation: Schema and Scoring

Here is the schema diff we wish we had shipped on day one. The columns below are the minimum viable decay infrastructure:

-- migrations/002_add_decay_columns.sql
ALTER TABLE agent_memory ADD COLUMN createdAt INTEGER NOT NULL DEFAULT (unixepoch());
ALTER TABLE agent_memory ADD COLUMN accessedAt INTEGER;
ALTER TABLE agent_memory ADD COLUMN accessCount INTEGER DEFAULT 0;
ALTER TABLE agent_memory ADD COLUMN sourceTaskId TEXT;
ALTER TABLE agent_memory ADD COLUMN quarantineUntil INTEGER; -- unixepoch, NULL = not quarantined
ALTER TABLE agent_memory ADD COLUMN failureCount INTEGER DEFAULT 0;
ALTER TABLE agent_memory ADD COLUMN halfLifeDays REAL DEFAULT 14.0; -- tunable per source

-- Index for fast decay calculations
CREATE INDEX idx_memory_temporal ON agent_memory(createdAt, quarantineUntil);

And the retrieval logic we now use in src/be/db.ts:

function calculateEffectiveScore(
  cosineSim: number,
  createdAt: number,
  halfLifeDays: number,
  failureCount: number,
  quarantineUntil: number | null
): number {
  const now = Date.now() / 1000;

  // Check quarantine
  if (quarantineUntil && now < quarantineUntil) return 0;
  if (failureCount >= 2) return 0; // Hard quarantine

  // Time decay
  const ageDays = (now - createdAt) / 86400;
  const decayFactor = Math.exp(-ageDays / halfLifeDays);

  // Failure penalty (soft quarantine)
  const failurePenalty = Math.pow(0.5, failureCount);

  return cosineSim * decayFactor * failurePenalty;
}

// In the retrieval query, we sort by effective_score, not raw cosine_sim

The Audit Script You Should Run Today

We wrote a diagnostic script that surfaces the “most poisonous” memories currently in your store—those that are old, frequently accessed, and likely to be wrong. Run this on any swarm using an agent_memory schema:

-- audit_poisoned_memories.sql
SELECT
  id,
  content_preview,
  createdAt,
  (unixepoch() - createdAt) / 86400 as age_days,
  accessCount,
  sourceType,
  failureCount,
  -- "Poison score": old, accessed often, never failed (so not yet quarantined)
  (accessCount * (unixepoch() - createdAt) / 86400) / (failureCount + 1) as poison_score
FROM agent_memory
WHERE quarantineUntil IS NULL
  AND failureCount < 2
  AND accessCount > 5
  AND (unixepoch() - createdAt) / 86400 > 30
ORDER BY poison_score DESC
LIMIT 10;

If the median age of your top-10 results here is over 60 days and they concern APIs, endpoints, or third-party services, you are sitting on a time bomb.

What Does Not Work (And Why)

We tried simpler fixes before building the full decay system. They failed.

Manual review queues: At swarm scale, agents generate hundreds of memories daily. Human review becomes a bottleneck, and reviewers miss stale technical details just as easily as agents do.

Hard expiration dates: Deleting memories after 30 days loses valuable long-term context. Some knowledge—architectural decisions, stable business logic—should persist. Blanket expiration destroys signal with noise.

Confidence thresholds in the LLM: Asking the agent “are you sure this memory is accurate” is useless. The agent has no ground truth. It will confidently confirm that the deprecated endpoint is correct because the memory says it worked last time.

The Prediction

The entire “agentic memory” thesis is currently broken at the foundation. The valuable lesson from running a 7-agent swarm for 90+ days is not that persistent memory is amazing; it is that persistent memory is necessary and dangerous, and the dangerous part surfaces exactly when teams scale past the threshold where one human can manually review every memory write.

Within 12 months, we predict at least one well-funded “agent memory layer” startup will publicly post-mortem a customer cascade failure traced to memory poisoning. The industry will rapidly converge on decay+provenance+quarantine as table-stakes—the same way password hashing became mandatory after the early 2010s breaches.

Teams shipping persistent memory without these primitives in 2026 are doing the AI equivalent of storing passwords as MD5. It works, until it catastrophically does not.

What You Can Do Today

Regardless of your stack:

Audit your memory schema for unused recency/access columns and start scoring with them.
Instrument the age distribution of your top-K results over time. Watch for the slow climb.
Require provenance metadata at write time: which task, which agent, which source. Failure attribution is impossible without this.
Wire your task-failure handler to backward-trace memory consultations and apply quarantine penalties.
Set a per-source half-life and refuse to store memories with no decay policy. Volatile data must decay faster than stable knowledge.

If your agents have started behaving subtly worse over time without any obvious cause, or if you are about to ship persistent memory in your own framework, treat this as the foundation—not the polish. The unease you felt reading “agents that learn” marketing copy was correct. Now you know why.

FAQ

What is memory poisoning in AI agent systems?

Memory poisoning occurs when stale, incorrect, or deprecated information stored in an agent’s long-term memory gets retrieved via semantic search and used in downstream tasks, causing cascading failures that are difficult to trace because the memory appears relevant by embedding similarity but is factually wrong or outdated.

Why does semantic similarity fail to filter out bad memories?

Embeddings encode topical relevance, not truth or freshness. A wrong-but-relevant memory and a right-but-relevant memory have nearly identical cosine similarity scores. Without temporal decay or provenance tracking, semantic search has no mechanism to distinguish between current and stale information.

What are the four decay primitives for safe agent memory?

Time-based exponential decay (reducing relevance scores as memories age), provenance tagging (tracking which task created each memory), failure-driven quarantine (penalizing memories consulted before task failures), and embedding outlier detection (flagging memories that drift from cluster centroids). Used together, they prevent memory poisoning.

How do you implement memory decay in a vector database?

Store metadata columns for createdAt, accessedAt, accessCount, and sourceTaskId. At retrieval time, calculate effective relevance as cosine_similarity × exp(-age/half_life) × quarantine_multiplier. Tune half_lives per memory source—shorter for volatile API documentation, longer for stable architectural decisions.

When should an agent memory be quarantined?

When a task fails after consulting a memory that ranked in the top-K results, apply a 50% similarity penalty for 30 days. If a memory is associated with two distinct task failures, remove it from search entirely pending human review. This prevents entrenched poisoned memories from causing recurring failures.