What are the four main memory patterns for AI agents described in this post?

The four patterns are: in-context memory (conversation history injected into the prompt), external semantic memory using vector databases like pgvector or Pinecone, structured episodic memory using traditional databases for queryable facts and preferences, and procedural memory for learned skills or tools the agent can store. Each pattern differs in scope, persistence, and infrastructure requirements.

What is the two-tier context strategy for in-context memory, and why does it save tokens?

The strategy keeps the last 5–10 conversation turns verbatim for immediate coherence, while compressing older history into a structured summary of facts rather than reproducing dialogue. Storing a fact like 'User manages 3 inventory warehouses in Jakarta and prefers metric units' is far more token-efficient than replaying the conversation in which that fact was established, which matters especially at scale.

What production pitfall with vector similarity search does the post warn about, and how was it fixed?

The post describes an HR ERP agent that incorrectly retrieved leave-approval conversations when users asked about inventory, because the word 'approval' had high semantic overlap across both topics. The fix was to add structured metadata filters — specifically user_id and a domain tag — as a pre-filter before running the vector similarity search, so semantic search is never used in isolation.

How does the post recommend mitigating memory poisoning in production agents?

Mitigations include adding confidence scores to stored facts, giving users a memory correction tool, periodically re-validating stored facts against authoritative sources, and logging all memory writes for audit. The post also recommends implementing memory TTLs so that outdated preferences or facts are expired or refreshed rather than influencing future agent behaviour indefinitely.

AI Agent Memory Persistence Patterns: From Volatile to Long-Term

Q: Why does the post recommend PostgreSQL with pgvector over a separate vector database service?

Using PostgreSQL with the pgvector extension keeps the stack to a single database that handles both structured data and vector embeddings, with familiar tooling for backups, replication, and access control. This simplifies production operations considerably compared to running a separate vector database service alongside a relational database.

The most common failure mode in AI agent deployments is not intelligence — it is amnesia. An agent that cannot remember what it did yesterday, what the user prefers, or what happened in the last conversation is fundamentally limited. Memory is what separates a stateless chatbot from an agent that actually gets smarter about your specific context over time. I have built memory systems for several AI integrations at Commsult Indonesia, and the pattern choices you make early have significant architectural implications. This post covers the four main memory patterns, when to use each, and the production pitfalls I have run into.

The Four Memory Patterns

AI agent memory falls into four categories based on scope and persistence. In-context memory (conversation history injected into the prompt) is simplest but limited by context window size and costs tokens on every call. External semantic memory (vector databases like Pinecone, pgvector, or Chroma) enables long-term recall based on semantic similarity. Structured episodic memory (traditional databases storing facts, preferences, events) gives you queryable, auditable history. Procedural memory (learned skills or tools the agent can create and store) is the most advanced pattern, used by systems like Hermes Agent.

In-Context Memory: The Starting Point

For most applications, start with conversation history trimming. Keep the last N turns in context, summarize older content using a lightweight model call, and inject the summary at the top. This works well for single-session agents and requires no external infrastructure. The challenge is cost: every token in history costs money on every API call. With GPT-4o at $2.50/M input tokens, a 50-turn conversation history (roughly 10K tokens) adds $0.025 per subsequent message — acceptable for enterprise use, significant for consumer apps at scale.

┌─────────────────────────────────────────────────────────────┐
│              AI Agent Memory Architecture                    │
│                                                             │
│  ┌──────────────────────────────────────────────────────┐  │
│  │  Context Assembly Pipeline (runs on every message)   │  │
│  │                                                      │  │
│  │  1. User Profile (structured DB)                     │  │
│  │     └─ preferences, role, tenant_id                  │  │
│  │                                                      │  │
│  │  2. Semantic Memory (pgvector)                       │  │
│  │     └─ top-3 relevant past interactions              │  │
│  │                                                      │  │
│  │  3. Recent History (sliding window)                  │  │
│  │     └─ last 8 conversation turns verbatim            │  │
│  │                                                      │  │
│  │  4. Current Message                                  │  │
│  └──────────────────────────────────────────────────────┘  │
│                        │                                    │
│                        ▼                                    │
│              LLM (with full context)                        │
│                        │                                    │
│                        ▼                                    │
│  Memory Write Tools: store_fact, update_preference          │
└─────────────────────────────────────────────────────────────┘

From my experience: implement a two-tier context strategy. Keep the last 5-10 turns verbatim (for immediate context coherence) and maintain a structured summary for older history. The summary should store facts, not dialogue — 'User manages 3 inventory warehouses in Jakarta and prefers metric units' is more token-efficient than reproducing the conversation where that was established.

Vector Store Memory for Semantic Recall

Vector databases store embeddings of text and enable retrieval by semantic similarity — the agent can recall relevant past interactions even if they use different words. This is ideal for knowledge bases, document retrieval, and remembering user preferences expressed in natural language. For production, I use PostgreSQL with the pgvector extension rather than a separate vector database service. It simplifies the stack considerably: one database for structured data and vector embeddings, with familiar tooling for backups, replication, and access control.

pgvector Implementation Pattern

The pattern I use: on every agent interaction, embed the user message and store it alongside metadata (user ID, session ID, timestamp, extracted entities). At the start of each new session, retrieve the top-K most relevant past interactions using cosine similarity on the embedding. Inject these as a 'relevant history' block in the system prompt. For entity extraction, run a lightweight structured extraction call (or use regex for simple cases) to pull out names, dates, and domain-specific terms before embedding.

-- pgvector setup
CREATE EXTENSION IF NOT EXISTS vector;

CREATE TABLE agent_memories (
  id          UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  user_id     UUID NOT NULL REFERENCES users(id),
  session_id  UUID,
  domain      TEXT NOT NULL DEFAULT 'general',
  content     TEXT NOT NULL,
  embedding   vector(1536),  -- OpenAI ada-002 dimensions
  metadata    JSONB,
  created_at  TIMESTAMPTZ DEFAULT NOW(),
  expires_at  TIMESTAMPTZ  -- optional TTL
);

-- Index for fast similarity search with metadata filter
CREATE INDEX idx_memories_embedding
  ON agent_memories USING ivfflat (embedding vector_cosine_ops)
  WITH (lists = 100);

CREATE INDEX idx_memories_user_domain
  ON agent_memories (user_id, domain);

-- Retrieve top-K relevant memories
SELECT content, metadata,
       1 - (embedding <=> $1::vector) AS similarity
FROM agent_memories
WHERE user_id = $2
  AND domain = $3
  AND (expires_at IS NULL OR expires_at > NOW())
ORDER BY embedding <=> $1::vector
LIMIT 3;

Structured Database Memory

For facts that need to be reliable and queryable — user preferences, completed tasks, decisions made — use a structured database rather than a vector store. Define a schema for what your agent needs to remember: user preferences table (key-value), task history table (task, result, timestamp), entity registry (names, IDs, relationships mentioned by user). The agent uses tools to read and write this memory store explicitly. This pattern gives you auditability, easy deletion (GDPR compliance), and precise retrieval.

Vector similarity recall has a precision problem: if your embedding model is not tuned to your domain, it will retrieve semantically similar but contextually irrelevant memories. I had an agent for an HR ERP system that kept retrieving leave approval conversations when users asked about inventory — because 'approval' has high semantic overlap. The fix was to add metadata filtering (user_id + domain tag) as a pre-filter before vector similarity search. Never rely on semantic search alone; always combine with structured metadata filters.

Memory Architecture for Production Agents

The architecture I use for production agents combines all three patterns: PostgreSQL with pgvector for both structured facts and semantic embeddings, a sliding window context manager that assembles the prompt from multiple memory tiers, and explicit memory write tools the agent can call to store important information. The context assembly pipeline runs on every message: fetch user profile from structured store, retrieve top-3 semantically relevant past interactions, inject last 8 conversation turns verbatim. Total added context per message: roughly 1,500-2,500 tokens.

When Memory Goes Wrong

Memory poisoning is a real risk: if the agent stores incorrect information (because a user lied, the agent misunderstood, or an indirect injection occurred), that bad memory persists and influences future behavior. Mitigations: add confidence scores to stored facts, implement a memory correction tool users can invoke, periodically re-validate stored facts against authoritative sources, and log all memory writes for audit. Also implement memory TTLs — preferences from 2 years ago may no longer be valid. Build the infrastructure to expire or refresh stale memories.

Sources & Further Reading

Frequently Asked Questions

AI Agent Memory Persistence Patterns: From Volatile to Long-Term

Frequently Asked Questions

AI Agent Memory Persistence Patterns: From Volatile to Long-Term

The Four Memory Patterns

In-Context Memory: The Starting Point

Vector Store Memory for Semantic Recall

pgvector Implementation Pattern

Structured Database Memory

Memory Architecture for Production Agents

When Memory Goes Wrong

Related Articles

The Four Memory Patterns

In-Context Memory: The Starting Point

Vector Store Memory for Semantic Recall

pgvector Implementation Pattern

Structured Database Memory

Memory Architecture for Production Agents

When Memory Goes Wrong

Related Articles