Every RAG application needs a vector database. The choice between pgvector, Pinecone, Qdrant, and Weaviate isn't just a performance decision — it's an operational complexity, cost, and integration decision. I've used pgvector in my AI Gymbro project and experimented with Pinecone and Qdrant for ERP document search.
The vector database market consolidated significantly in 2024-2025. pgvector with HNSW indexing achieves ~95% of Pinecone's query performance at 1M vectors with zero additional infrastructure cost. The 2025 landscape: pgvector for existing PostgreSQL users, Qdrant for self-hosted performance-critical workloads, Pinecone for fully managed large-scale deployments.
pgvector adds vector data types and similarity search operators to PostgreSQL. The HNSW index (available since pgvector 0.5.0) dramatically improves query performance. The killer advantage: your vector embeddings live in the same database as your application data. You can JOIN embeddings with user data and run everything in a single transaction.
Pinecone is a purpose-built vector database as a service. No infrastructure to manage, automatic scaling, sub-10ms query latency at billion-scale. The tradeoff: cost. Pinecone's Starter tier handles ~100K vectors. The Standard tier starts at ~$70/month for serverless.
Vector Database Performance Comparison — 1M vectors, 1536 dim
Database P95 Latency QPS Cost/mo Self-host?
──────────────────────────────────────────────────────────────
pgvector HNSW 45ms 500 $50 (pg) Yes
Qdrant 8ms 2,000 $40 (VPS) Yes
Pinecone 12ms unlimited $70+ No
Weaviate Cloud 15ms ~1,000 $80+ No
pgvector HNSW Setup:
CREATE EXTENSION vector;
CREATE INDEX ON docs USING hnsw (embedding vector_cosine_ops)
WITH (m = 32, ef_construction = 128);
→ At query time: SET hnsw.ef_search = 64;
Recommended by use case:
≤ 5M vectors + PostgreSQL already → pgvector
Need best filtering + self-host → Qdrant
Billion-scale + fully managed → PineconeFor PostgreSQL users considering pgvector: the HNSW index parameters ef_construction and m dramatically affect the quality/performance tradeoff. I run ef_construction=128 and m=32 for my fitness content embeddings — higher quality index that takes longer to build but gives better recall. At query time, set ef_search=64 or higher for better recall.
Qdrant is an open-source vector database written in Rust with exceptional performance and filtering capabilities. It supports both dense and sparse vectors in the same collection, enabling native hybrid search without external tooling. Self-hosted on a $20/month DigitalOcean droplet, Qdrant handles 1M vectors comfortably.
At 1M vectors with 1536-dimension OpenAI embeddings: pgvector (HNSW, m=32): P95 45ms, handles ~500 QPS on a $50/month managed PostgreSQL instance. Qdrant (self-hosted): P95 8ms, handles ~2,000 QPS on a $40/month VPS. Pinecone (serverless): P95 12ms, handles unlimited QPS (auto-scales).
-- pgvector: vector search with metadata filter
-- JOIN with application data in same query
SELECT
d.id,
d.content,
d.metadata,
u.username AS author, -- JOIN with users table!
1 - (d.embedding <=> $1::vector) AS similarity
FROM documents d
JOIN users u ON d.user_id = u.id
WHERE
d.metadata->>'category' = 'exercise' -- metadata filter
AND d.created_at > NOW() - INTERVAL '30 days'
ORDER BY d.embedding <=> $1::vector -- vector similarity
LIMIT 5;
-- Pinecone equivalent (Python)
# No JOIN possible — must fetch user from separate DB call
index = pinecone.Index("my-index")
results = index.query(
vector=query_embedding,
filter={"category": "exercise"}, # metadata filter
top_k=5,
include_metadata=True,
)
# Then separately: fetch user data from your database
user_ids = [r["metadata"]["user_id"] for r in results["matches"]]
users = await db.fetch_users(user_ids) # second DB call!Pure vector search misses exact keyword matches. Hybrid search combines vector similarity with BM25 keyword scoring. Qdrant supports this natively. For pgvector, you combine vector search with PostgreSQL's built-in full-text search and merge the result sets in application code.
pgvector's sequential scan (no index) becomes dangerously slow above 100K vectors — a query that takes 5ms with 10K vectors takes 500ms+ with 1M vectors. Always create an HNSW index before storing more than 50K vectors. Also: run REINDEX on your vector column during low-traffic periods to rebuild the index and restore recall quality.
If you start on pgvector and need to scale beyond its limits, migrating to Pinecone or Qdrant is straightforward: your embeddings are just floating-point arrays. Export from PostgreSQL, import into Pinecone/Qdrant via their bulk upload APIs, update your search code. Total migration time for 500K vectors: 2-3 days including testing.
Use pgvector if: you're on PostgreSQL, you have under 5M vectors, or you need transactional consistency. Use Qdrant if: you're not on PostgreSQL, you need the best filtering performance, or you want hybrid search built-in. Use Pinecone if: you need billion-scale with zero operational overhead, or you need enterprise SLAs.