Semantic Search with pgvector: Skip the Dedicated Vector Database

Photo by Google DeepMind

Photo by Google DeepMind
Every semantic search project I have scoped in the past two years started with the same assumption: we will need a vector database. And in almost every one of them — internal document search, ERP record matching, support-article retrieval — the right answer turned out to be the PostgreSQL instance that was already running, with the pgvector extension enabled. One CREATE EXTENSION away from vector search, zero new services to operate.
I compared pgvector against Pinecone head-to-head in an earlier post; this one is the build guide. Schema, index choices, the hybrid-search query I reuse everywhere, and the operational numbers that tell you when you have genuinely outgrown Postgres.
The argument is not that pgvector beats dedicated engines at every benchmark — it is that for the workloads most teams actually have, the operational math dominates:
Here is the complete schema I use, sized for OpenAI's text-embedding-3-small at its default 1,536 dimensions — the cost-effective choice for most retrieval workloads, at roughly 62,500 pages per dollar:
CREATE EXTENSION IF NOT EXISTS vector;
CREATE TABLE documents (
id BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
source TEXT NOT NULL,
chunk TEXT NOT NULL,
-- text-embedding-3-small: 1536 dims by default
embedding VECTOR(1536) NOT NULL,
created_at TIMESTAMPTZ NOT NULL DEFAULT now()
);
-- HNSW: best query speed/recall tradeoff; build after bulk-loading
CREATE INDEX ON documents
USING hnsw (embedding vector_cosine_ops);
-- Top-8 nearest chunks by cosine distance
SELECT id, source, chunk, embedding <=> $1 AS distance
FROM documents
ORDER BY embedding <=> $1
LIMIT 8;The operator matters: pgvector ships six distance operators, and for normalized text embeddings you want cosine distance — the spaceship-like operator in the query above — with a matching vector_cosine_ops index. Mixing the operator and the index opclass is the classic silent mistake: queries still work, they just never use the index.
pgvector gives you two approximate-nearest-neighbor index types. The official guidance is clear and matches my experience:
| Factor | HNSW | IVFFlat |
|---|---|---|
| Query speed and recall | Better speed-recall tradeoff — the default choice | Lower, tunable via probes at query time |
| Build time and memory | Slower builds, more memory | Faster builds, lighter on memory |
| Training data needed | None — can be created on an empty table | Needs existing rows for k-means training |
| When I reach for it | Production search, anything user-facing | Huge batch-loaded corpora where build cost dominates |
Pure vector search misses exact identifiers — invoice numbers, SKUs, error codes — because embeddings blur precise strings. Postgres has the fix built in: combine pgvector with native full-text search and blend the scores. No extra infrastructure, just SQL:
-- Hybrid search: combine vector similarity with full-text rank
WITH semantic AS (
SELECT id, 1 - (embedding <=> $1) AS sim
FROM documents ORDER BY embedding <=> $1 LIMIT 30
),
keyword AS (
SELECT id, ts_rank(to_tsvector('english', chunk),
plainto_tsquery('english', $2)) AS rank
FROM documents
WHERE to_tsvector('english', chunk) @@ plainto_tsquery('english', $2)
LIMIT 30
)
SELECT d.id, d.chunk,
COALESCE(s.sim, 0) * 0.7 + COALESCE(kw.rank, 0) * 0.3 AS score
FROM documents d
LEFT JOIN semantic s ON s.id = d.id
LEFT JOIN keyword kw ON kw.id = d.id
WHERE s.id IS NOT NULL OR kw.id IS NOT NULL
ORDER BY score DESC LIMIT 8;The 70/30 weighting is a starting point, not gospel — tune it against a small labeled set of real queries. For my ERP-document use case, hybrid lifted answer quality more than any embedding model swap, because users search for codes and names as often as for concepts.
The underrated superpower is the JOIN. Tenant isolation, role-based filtering, soft-deletes, and joining retrieved chunks back to their source records all happen in the same query plan as the vector scan. In a dedicated vector store, each of those becomes metadata-filter gymnastics and a second round trip to your database of record.
Honest limits: if you are past roughly fifty million vectors, need multi-region replication of the index itself, sustain thousands of vector queries per second, or your recall requirements force exhaustive tuning that fights the rest of your database workload, a dedicated engine earns its operational cost. Those numbers describe a small minority of products — and migrating later is straightforward, because your embeddings pipeline does not change, only the storage target.
Semantic search is a feature, not a platform decision. Start with the database you already run: one extension, one migration, one hybrid query, and you have production-grade retrieval with your existing backups, auth, and monitoring. Buy a dedicated vector database when your measurements say so, not when a vendor's architecture diagram does.
Sources and further reading