PostgreSQL Indexing in Practice: B-tree, GIN, and Partial Indexes

Photo by Tima Miroshnichenko

Photo by Tima Miroshnichenko
Indexing is the highest-leverage performance work you can do on a PostgreSQL database, and it is also where I see the most cargo-culting: an index on every column, no idea whether any of them are used, and a write path that crawls under the weight of maintaining them. After several years of running Postgres behind NestJS ERP systems, the pattern is always the same. The slow query was never exotic. It was a missing composite index, a leading-column mistake, or a 1.1 GB index that should have been a 38 MB partial one.
This guide is the practical subset of PostgreSQL's index documentation that I actually use: which index type fits which query shape, how to read EXPLAIN ANALYZE output without a PhD, and the measured wins so you can calibrate what good looks like. Everything here was tested against production-shaped data, and every claim about index behavior comes straight from the official docs linked at the end.
PostgreSQL ships six index types: B-tree, Hash, GiST, SP-GiST, GIN, and BRIN. For a typical transactional backend you will live in three of them. Here is the honest map:
| Index type | Best for | Watch out for |
|---|---|---|
| B-tree (default) | Equality and range queries, sorting, uniqueness. Covers 95 percent of OLTP needs: lookups by ID, status, dates, composite filters. | Column order in composite indexes is everything. An index on customer and date does nothing for queries filtering only by date. |
| GIN | JSONB containment, arrays, full-text search. The reason many teams never need Elasticsearch. | Slow to update on write-heavy tables; index size can exceed table size on chatty JSONB. Use jsonb_path_ops when you only need containment. |
| GiST | Geometric data, ranges, nearest-neighbor ordering, exclusion constraints like preventing overlapping bookings. | Lossy for some operators, so Postgres rechecks heap rows; generally slower lookups than GIN for text search. |
| BRIN | Huge append-only tables where physical order correlates with a column, like time-series logs. Tiny: megabytes where B-tree needs gigabytes. | Useless if your data is not physically correlated with the indexed column. Updates and deletes degrade it. |
| Hash | Pure equality lookups, marginally smaller and faster than B-tree in narrow cases. | No range queries, no sorting, no uniqueness. B-tree does everything it does almost as well, so it is rarely worth the novelty. |
The single most common indexing mistake I review is composite column order. The rule that survives contact with production: equality columns first, then the range or sort column. Postgres can walk an index left to right; once a query stops constraining a leading column, the rest of the index is dead weight for that query.
-- The composite index that fixed our invoice list (most selective first
-- is a myth: order by HOW YOU QUERY, equality columns before ranges)
CREATE INDEX idx_invoices_customer_status_created
ON invoices (customer_id, status, created_at DESC);
-- Served by the index above (equality, equality, range):
SELECT * FROM invoices
WHERE customer_id = 42 AND status = 'unpaid'
ORDER BY created_at DESC LIMIT 20;
-- NOT served well: leading column missing → full scan territory
SELECT * FROM invoices WHERE status = 'unpaid';Note the failure case in the snippet: a query filtering only on status cannot use an index that leads with customer_id. That does not mean you add a second index reflexively. Check pg_stat_user_indexes first; every index you keep costs you on INSERT, UPDATE, and autovacuum time. On one busy table we measured roughly 12 percent slower writes from two redundant indexes that no query had touched in months.
Partial indexes are the most underused feature in Postgres land. If your queries always carry a predicate like status is pending, you can build the index over only those rows. The docs are explicit that this both shrinks the index and speeds up updates to excluded rows, because changing a done row no longer touches the index at all. On our background jobs table the live subset was 3 percent of rows: the partial index came out 29x smaller and the worker poll query went from 90 ms to under 2 ms.
-- 97% of our rows were status = 'done'. Queries only ever asked
-- for the live 3%. A partial index keeps only what matters:
CREATE INDEX idx_jobs_pending
ON background_jobs (scheduled_at)
WHERE status IN ('pending', 'retrying');
-- 1.1 GB full index → 38 MB partial index.
-- Bonus: partial UNIQUE index for "one active draft per user"
CREATE UNIQUE INDEX uniq_active_draft
ON purchase_orders (created_by)
WHERE status = 'draft';Partial unique indexes solve business rules that table-level constraints cannot: one active draft per user, one default address per customer, one open shift per employee. They enforce uniqueness only inside the predicate, which is exactly the semantics those rules need. I reach for this at least once per ERP project.
GIN indexes invert the relationship: instead of one entry per row, they store one entry per element, pointing back at every row containing it. That makes containment queries on JSONB and tsvector search fast enough that for catalogs under a few million rows, I default to Postgres full-text search before reaching for a separate search service. One real ERP example: product search across 400k SKUs with a GIN index on a tsvector column returns in 8 to 15 ms, and the data never leaves the transactional database, so there is no sync pipeline to babysit.
-- JSONB containment: which orders have a line item for SKU-123?
CREATE INDEX idx_orders_payload ON orders USING GIN (payload jsonb_path_ops);
SELECT id FROM orders WHERE payload @> '{"sku": "SKU-123"}';
-- Full-text search without Elasticsearch:
CREATE INDEX idx_products_fts ON products
USING GIN (to_tsvector('indonesian', name || ' ' || description));EXPLAIN shows the plan the optimizer chose; ANALYZE actually runs the query and shows what happened; BUFFERS shows how much data it touched. The before-and-after below is a real shape from an invoice list endpoint, anonymized but with the original magnitudes:
EXPLAIN (ANALYZE, BUFFERS)
SELECT * FROM invoices WHERE customer_id = 42 AND status = 'unpaid';
-- BEFORE (no usable index):
Seq Scan on invoices (cost=0.00..48291.00 rows=312 width=184)
(actual time=0.41..312.55 rows=287 loops=1)
Filter: ((customer_id = 42) AND (status = 'unpaid'))
Rows Removed by Filter: 1199713 ← read 1.2M rows for 287
Buffers: shared read=33291 ← 260 MB off disk
-- AFTER (composite index):
Index Scan using idx_invoices_customer_status_created
(cost=0.43..954.12 rows=312 width=184)
(actual time=0.09..1.8 rows=287 loops=1)
Buffers: shared hit=119 ← ~1 MB, all from cache
-- 312 ms → 1.8 ms. Same query, same data.When I read a plan, I look at four things in order, and they diagnose nearly every slow query:
Indexes are not free. Every additional index slows every INSERT and UPDATE on the table, consumes disk and cache, and adds autovacuum work. Audit usage with pg_stat_user_indexes and drop anything with zero scans after a representative period. The fastest index is often the one you delete.
This is the loop I run roughly monthly on production systems, and always before a go-live:
Indexing rewards a measured, unglamorous workflow: find the heaviest queries, read their plans, build the narrowest index that serves them, and verify the win with numbers. The 312 ms to 1.8 ms invoice query in this post was not clever. It was a composite B-tree with the columns in the right order, found by reading one EXPLAIN output carefully.
The takeaway: master B-tree composites and partial indexes first, add GIN when JSONB or search enters the picture, remember BRIN exists when a log table hits 100 million rows, and audit your indexes as ruthlessly as you create them. PostgreSQL gives you world-class tools; the practical skill is knowing which 20 percent to reach for.
Sources and further reading