Why use pgvector instead of a dedicated vector database like Pinecone?

For most teams, the operational math favors pgvector: your embeddings live next to your relational data, so tenant filtering, permission joins, and deletions are plain SQL with no synchronization protocol between two systems. You already operate PostgreSQL with backups, replication, and monitoring — a dedicated vector database duplicates all of those obligations as a second stateful service.

When should you choose HNSW over IVFFlat for the vector index?

HNSW is the default choice for production, user-facing search because it offers a better speed-recall tradeoff and can be built on an empty table with no training data required. IVFFlat is better suited for huge batch-loaded corpora where build cost dominates, though its recall is tunable at query time via probes and it builds faster with less memory.

Why add full-text search on top of pure vector search?

Pure vector search misses exact identifiers — invoice numbers, SKUs, error codes — because embeddings blur precise strings. Combining pgvector with PostgreSQL's native full-text search (hybrid search) blends semantic and keyword scores in a single SQL query with no extra infrastructure, and the author found it improved answer quality for ERP document retrieval more than any embedding model swap.

What are the practical storage implications of storing embeddings in PostgreSQL?

A 1,536-dimension vector (the default for OpenAI's text-embedding-3-small) is roughly 6 KB per row, so one million chunks consumes about 6 GB before the index. You must plan storage and shared_buffers accordingly, and bulk-loading data before building the HNSW index — rather than inserting through an existing index — is dramatically faster for large backfills.

At what scale should you migrate away from pgvector to a dedicated vector engine?

The post puts the honest threshold at roughly fifty million vectors, multi-region index replication requirements, thousands of vector queries per second, or recall needs that force exhaustive tuning conflicting with the rest of the database workload. The author notes this describes a small minority of products, and migration is straightforward because only the storage target changes — the embedding pipeline stays the same.

Semantic Search with pgvector: Skip the Dedicated Vector Database

Every semantic search project I have scoped in the past two years started with the same assumption: we will need a vector database. And in almost every one of them — internal document search, ERP record matching, support-article retrieval — the right answer turned out to be the PostgreSQL instance that was already running, with the pgvector extension enabled. One CREATE EXTENSION away from vector search, zero new services to operate.

I compared pgvector against Pinecone head-to-head in an earlier post; this one is the build guide. Schema, index choices, the hybrid-search query I reuse everywhere, and the operational numbers that tell you when you have genuinely outgrown Postgres.

Why Postgres First Is the Right Default

The argument is not that pgvector beats dedicated engines at every benchmark — it is that for the workloads most teams actually have, the operational math dominates:

Your embeddings live next to your data. Filtering by tenant, joining to a permissions table, and deleting a customer's vectors are plain SQL, not a synchronization protocol between two systems that can drift.
You already operate Postgres: backups, replication, monitoring, point-in-time recovery. A vector database is a second stateful system with all of those obligations duplicated.
The scale ceiling is higher than the marketing suggests. pgvector handles vectors up to 16,000 dimensions and millions of rows comfortably on ordinary hardware — most internal corpora are tens of thousands of chunks, not billions.

The Whole Setup Is One Migration

Here is the complete schema I use, sized for OpenAI's text-embedding-3-small at its default 1,536 dimensions — the cost-effective choice for most retrieval workloads, at roughly 62,500 pages per dollar:

CREATE EXTENSION IF NOT EXISTS vector;

CREATE TABLE documents (
  id         BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
  source     TEXT NOT NULL,
  chunk      TEXT NOT NULL,
  -- text-embedding-3-small: 1536 dims by default
  embedding  VECTOR(1536) NOT NULL,
  created_at TIMESTAMPTZ NOT NULL DEFAULT now()
);

-- HNSW: best query speed/recall tradeoff; build after bulk-loading
CREATE INDEX ON documents
  USING hnsw (embedding vector_cosine_ops);

-- Top-8 nearest chunks by cosine distance
SELECT id, source, chunk, embedding <=> $1 AS distance
FROM documents
ORDER BY embedding <=> $1
LIMIT 8;

The operator matters: pgvector ships six distance operators, and for normalized text embeddings you want cosine distance — the spaceship-like operator in the query above — with a matching vector_cosine_ops index. Mixing the operator and the index opclass is the classic silent mistake: queries still work, they just never use the index.

HNSW or IVFFlat: The Only Index Decision

pgvector gives you two approximate-nearest-neighbor index types. The official guidance is clear and matches my experience:

Factor	HNSW	IVFFlat
Query speed and recall	Better speed-recall tradeoff — the default choice	Lower, tunable via probes at query time
Build time and memory	Slower builds, more memory	Faster builds, lighter on memory
Training data needed	None — can be created on an empty table	Needs existing rows for k-means training
When I reach for it	Production search, anything user-facing	Huge batch-loaded corpora where build cost dominates

Hybrid Search: The Upgrade That Costs One Query

Pure vector search misses exact identifiers — invoice numbers, SKUs, error codes — because embeddings blur precise strings. Postgres has the fix built in: combine pgvector with native full-text search and blend the scores. No extra infrastructure, just SQL:

-- Hybrid search: combine vector similarity with full-text rank
WITH semantic AS (
  SELECT id, 1 - (embedding <=> $1) AS sim
  FROM documents ORDER BY embedding <=> $1 LIMIT 30
),
keyword AS (
  SELECT id, ts_rank(to_tsvector('english', chunk),
                     plainto_tsquery('english', $2)) AS rank
  FROM documents
  WHERE to_tsvector('english', chunk) @@ plainto_tsquery('english', $2)
  LIMIT 30
)
SELECT d.id, d.chunk,
       COALESCE(s.sim, 0) * 0.7 + COALESCE(kw.rank, 0) * 0.3 AS score
FROM documents d
LEFT JOIN semantic s ON s.id = d.id
LEFT JOIN keyword kw ON kw.id = d.id
WHERE s.id IS NOT NULL OR kw.id IS NOT NULL
ORDER BY score DESC LIMIT 8;

The 70/30 weighting is a starting point, not gospel — tune it against a small labeled set of real queries. For my ERP-document use case, hybrid lifted answer quality more than any embedding model swap, because users search for codes and names as often as for concepts.

The underrated superpower is the JOIN. Tenant isolation, role-based filtering, soft-deletes, and joining retrieved chunks back to their source records all happen in the same query plan as the vector scan. In a dedicated vector store, each of those becomes metadata-filter gymnastics and a second round trip to your database of record.

Operational Notes from Production

Bulk-load first, index second. Building HNSW on a populated table is dramatically faster than inserting through an existing index — drop and recreate for large backfills.
Watch maintenance_work_mem during index builds; an HNSW build that does not fit spills and slows badly. I set it per-session before big builds.
Embeddings columns are fat. A 1,536-dim vector is roughly 6 KB per row, so a million chunks is about 6 GB before the index — plan storage and shared_buffers accordingly.
Re-embedding is a migration. When you change embedding models, write the new vectors to a second column, backfill in batches, swap the index, then drop the old column — zero-downtime, plain SQL.

When You Have Actually Outgrown It

Honest limits: if you are past roughly fifty million vectors, need multi-region replication of the index itself, sustain thousands of vector queries per second, or your recall requirements force exhaustive tuning that fights the rest of your database workload, a dedicated engine earns its operational cost. Those numbers describe a small minority of products — and migrating later is straightforward, because your embeddings pipeline does not change, only the storage target.

The Takeaway

Semantic search is a feature, not a platform decision. Start with the database you already run: one extension, one migration, one hybrid query, and you have production-grade retrieval with your existing backups, auth, and monitoring. Buy a dedicated vector database when your measurements say so, not when a vendor's architecture diagram does.

Sources and further reading

Frequently Asked Questions

Semantic Search with pgvector: Skip the Dedicated Vector Database

Frequently Asked Questions

Semantic Search with pgvector: Skip the Dedicated Vector Database

Why Postgres First Is the Right Default

The Whole Setup Is One Migration

HNSW or IVFFlat: The Only Index Decision

Hybrid Search: The Upgrade That Costs One Query

Operational Notes from Production

When You Have Actually Outgrown It

The Takeaway

Why Postgres First Is the Right Default

The Whole Setup Is One Migration

HNSW or IVFFlat: The Only Index Decision

Hybrid Search: The Upgrade That Costs One Query

Operational Notes from Production

When You Have Actually Outgrown It

The Takeaway