PostgreSQL Performance: Indexing & Query Tuning

Photo by Unsplash

Photo by Unsplash
PostgreSQL performance optimization is one of the highest-leverage skills a backend developer can have. A single missing index can turn a 5ms query into a 5-second table scan. This post covers the essential indexing strategies, EXPLAIN ANALYZE interpretation, connection pooling with PgBouncer, and VACUUM strategies that will help you systematically diagnose and fix slow queries in production PostgreSQL databases.
An index is a data structure that lets PostgreSQL find rows without scanning every row in a table. Getting indexing right is the single most impactful optimization you can make. The wrong indexes — or missing indexes — are responsible for the vast majority of PostgreSQL performance problems. Understanding when and how to use B-tree indexes, composite indexes, partial indexes, and expression indexes will cover 95% of your use cases.
The B-tree index is PostgreSQL's default and handles equality (=) and range queries (>, <, BETWEEN, LIKE 'prefix%') efficiently. Composite indexes on multiple columns are powerful but order matters — put the most selective column first, and match the left-prefix in your queries. Partial indexes index only rows matching a WHERE clause, making them dramatically smaller and faster for filtered queries. Expression indexes index the result of a function, enabling fast lookups on computed values.
-- B-tree index for equality and range queries
CREATE INDEX idx_orders_user_id ON orders(user_id);
CREATE INDEX idx_orders_created_at ON orders(created_at DESC);
-- Composite index: column order matters — most selective first
CREATE INDEX idx_orders_user_status ON orders(user_id, status);
-- Partial index: only index active records (much smaller, faster)
CREATE INDEX idx_active_sessions ON sessions(user_id)
WHERE expires_at > NOW();
-- Expression index: for computed WHERE clauses
CREATE INDEX idx_users_lower_email ON users(LOWER(email));
-- Check index usage
SELECT schemaname, tablename, indexname, idx_scan, idx_tup_read
FROM pg_stat_user_indexes
ORDER BY idx_scan DESC;Use pg_stat_user_indexes to identify unused indexes. Indexes have a write cost — every INSERT, UPDATE, and DELETE must update every index on the table. An unused index is pure overhead. Audit and drop them regularly, especially after schema evolution.
EXPLAIN ANALYZE is your most powerful diagnostic tool. It shows the query plan PostgreSQL chose, the estimated vs actual row counts, and the actual execution time for each node in the plan. Understanding how to read EXPLAIN ANALYZE output will let you identify the exact cause of a slow query in minutes rather than hours of guessing.
Watch for these patterns: Seq Scan on a large table (missing index), a large discrepancy between estimated and actual rows (stale statistics — run ANALYZE), Hash Join with many batches (work_mem too low), Nested Loop with many loops (N+1 pattern from your ORM), and high 'Rows Removed by Filter' relative to 'rows' returned (poor index selectivity or missing index).
-- EXPLAIN ANALYZE: shows actual execution time and row estimates
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT o.id, o.total, u.name
FROM orders o
JOIN users u ON u.id = o.user_id
WHERE o.status = 'pending'
AND o.created_at > NOW() - INTERVAL '7 days'
ORDER BY o.created_at DESC
LIMIT 20;
-- Look for these warning signs in the output:
-- Seq Scan on large tables → add an index
-- Rows Removed by Filter: high number → poor selectivity
-- Hash Join with large batches → work_mem too low
-- "actual rows=1 loops=10000" → N+1 query pattern
-- Tune work_mem for complex sorts and hash joins
SET work_mem = '64MB'; -- session-level
-- Find the slowest queries in pg_stat_statements
SELECT query, calls, mean_exec_time, total_exec_time
FROM pg_stat_statements
ORDER BY mean_exec_time DESC
LIMIT 10;PostgreSQL's query planner uses column statistics (collected by ANALYZE) to estimate row counts and choose join strategies. If your statistics are stale — for example after a large batch insert — the planner may choose a poor plan. Run ANALYZE after significant data changes. Increase the statistics target (ALTER TABLE ... ALTER COLUMN ... SET STATISTICS 500) for highly non-uniform distributions.
PostgreSQL connections are heavyweight — each one spawns a new backend process using ~5-10MB of RAM. At scale, hundreds of application threads all trying to hold persistent connections will exhaust your PostgreSQL server's memory before you hit any query performance limits. PgBouncer solves this by pooling connections at the proxy layer, multiplexing many application connections onto a small pool of real PostgreSQL connections.
PgBouncer offers three pool modes. Session mode: one real connection per client session (similar to no pooling). Transaction mode: a real connection is held only for the duration of a transaction, then returned to the pool — this is the recommended mode for most web applications and supports the highest multiplexing. Statement mode: most restrictive, returns the connection after each statement — incompatible with multi-statement transactions.
In transaction pooling mode, you cannot use session-level features like advisory locks (pg_advisory_lock), LISTEN/NOTIFY, or prepared statements between transactions — because each transaction may run on a different real connection. If your application uses any of these, you must configure PgBouncer to use session mode for those connections, or use a separate PgBouncer pool with session mode.
-- PgBouncer configuration (pgbouncer.ini)
[databases]
myapp = host=postgres port=5432 dbname=myapp
[pgbouncer]
pool_mode = transaction
max_client_conn = 1000
default_pool_size = 25
min_pool_size = 5
reserve_pool_size = 5
reserve_pool_timeout = 3
server_idle_timeout = 600
log_connections = 0
-- VACUUM strategies
-- Auto-vacuum handles routine bloat cleanup
-- Manual VACUUM ANALYZE after large batch operations
VACUUM (ANALYZE, VERBOSE) orders;
-- For heavily updated tables, check dead tuple ratio
SELECT relname, n_dead_tup, n_live_tup,
ROUND(n_dead_tup::numeric / NULLIF(n_live_tup + n_dead_tup, 0) * 100, 2) AS dead_pct
FROM pg_stat_user_tables
WHERE n_dead_tup > 10000
ORDER BY dead_pct DESC;A common starting point: set PostgreSQL's max_connections to 100-200, and PgBouncer's default_pool_size to 20-25 per application server. The formula is: pool size = (number of CPUs * 2) + effective_spindle_count. For SSDs with no spinning disks, effective_spindle_count is ~1-2. Benchmarking with pgbench under realistic load is the only way to validate your pool size for your specific workload.
PostgreSQL uses Multi-Version Concurrency Control (MVCC): updates don't overwrite rows in-place but write new versions, leaving dead tuples that must be cleaned up by VACUUM. In write-heavy tables, dead tuple bloat can grow significantly, inflating table and index size and slowing down sequential scans. Understanding and tuning autovacuum is essential for sustained performance in production.
PostgreSQL's autovacuum runs automatically but may not be aggressive enough for high-write tables. Check pg_stat_user_tables for tables with high n_dead_tup or last_autovacuum that is hours old on a busy table. For hot tables, lower the autovacuum_vacuum_scale_factor (how much of the table must be dead before autovacuum runs) and increase autovacuum_vacuum_cost_delay to allow more I/O.
Use pg_stat_statements with RESET to measure query performance before and after an index change. pg_stat_statements tracks total_exec_time, mean_exec_time, and calls for every distinct query fingerprint, giving you objective evidence of the improvement.
Start with pg_stat_statements to identify the slowest queries by mean execution time. Run EXPLAIN (ANALYZE, BUFFERS) on each. Look for missing indexes, sequential scans on large tables, and poor row estimates. Add indexes targeted at the specific query patterns. Measure before and after. Monitor autovacuum health weekly. Tune connection pooling as your application load grows. This cycle — measure, diagnose, optimize, measure — is the foundation of sustainable PostgreSQL performance.
Key PostgreSQL performance concepts in this post include B-tree index, EXPLAIN ANALYZE, PgBouncer, VACUUM, Seq Scan, and pg_stat_statements.