Which PostgreSQL index types should I learn first for a typical transactional backend?

For most OLTP workloads, B-tree, GIN, and partial indexes cover the vast majority of cases. B-tree handles equality and range queries and accounts for roughly 95 percent of OLTP needs, GIN is the right choice for JSONB containment and full-text search, and partial indexes dramatically shrink index size when your queries always target a fixed subset of rows. BRIN is worth knowing for large append-only tables like time-series logs, but it only works when physical row order correlates with the indexed column.

Why does column order matter so much in a composite B-tree index?

PostgreSQL traverses a composite index from left to right, so a query that skips the leading column cannot use the index efficiently. The practical rule is to put equality-filter columns first, then the range or sort column. A real cost of getting this wrong is illustrated in the post: a busy table suffered roughly 12 percent slower writes from two redundant indexes where the leading column did not match the most common query filter.

When should I use a partial index instead of a full-column index?

Use a partial index when your queries consistently carry a fixed predicate, such as filtering only on rows where status is pending. The index is built only over matching rows, so it is much smaller and updates to excluded rows do not touch it at all. In one background-jobs example from the post, the live subset was 3 percent of total rows, the partial index came out 29 times smaller, and the worker poll query dropped from 90 ms to under 2 ms.

How can I tell whether an existing index is actually being used?

Query the pg_stat_user_indexes view, which tracks scan counts per index. Any index showing zero or near-zero scans after a representative production period is a candidate for removal. The post also recommends enabling pg_stat_statements to identify the top 10 queries by total execution time before deciding which indexes to add or drop, since optimizing by aggregate cost is more effective than reacting to individual complaints.

What are the four key numbers to look for in an EXPLAIN ANALYZE output?

The post identifies: Rows Removed by Filter (a high number signals a missing index), estimated rows versus actual rows (a 10x or greater divergence means stale statistics or an un-indexed expression), buffers shared read versus hit (reads indicate disk I/O that will be slow under real load), and loops on nested nodes (an inner scan with 50,000 loops executes 50,000 times regardless of how cheap each iteration looks).

PostgreSQL Indexing in Practice: B-tree, GIN, and Partial Indexes

Indexing is the highest-leverage performance work you can do on a PostgreSQL database, and it is also where I see the most cargo-culting: an index on every column, no idea whether any of them are used, and a write path that crawls under the weight of maintaining them. After several years of running Postgres behind NestJS ERP systems, the pattern is always the same. The slow query was never exotic. It was a missing composite index, a leading-column mistake, or a 1.1 GB index that should have been a 38 MB partial one.

This guide is the practical subset of PostgreSQL's index documentation that I actually use: which index type fits which query shape, how to read EXPLAIN ANALYZE output without a PhD, and the measured wins so you can calibrate what good looks like. Everything here was tested against production-shaped data, and every claim about index behavior comes straight from the official docs linked at the end.

The Index Types That Matter (and Two That Rarely Do)

PostgreSQL ships six index types: B-tree, Hash, GiST, SP-GiST, GIN, and BRIN. For a typical transactional backend you will live in three of them. Here is the honest map:

Index type	Best for	Watch out for
B-tree (default)	Equality and range queries, sorting, uniqueness. Covers 95 percent of OLTP needs: lookups by ID, status, dates, composite filters.	Column order in composite indexes is everything. An index on customer and date does nothing for queries filtering only by date.
GIN	JSONB containment, arrays, full-text search. The reason many teams never need Elasticsearch.	Slow to update on write-heavy tables; index size can exceed table size on chatty JSONB. Use jsonb_path_ops when you only need containment.
GiST	Geometric data, ranges, nearest-neighbor ordering, exclusion constraints like preventing overlapping bookings.	Lossy for some operators, so Postgres rechecks heap rows; generally slower lookups than GIN for text search.
BRIN	Huge append-only tables where physical order correlates with a column, like time-series logs. Tiny: megabytes where B-tree needs gigabytes.	Useless if your data is not physically correlated with the indexed column. Updates and deletes degrade it.
Hash	Pure equality lookups, marginally smaller and faster than B-tree in narrow cases.	No range queries, no sorting, no uniqueness. B-tree does everything it does almost as well, so it is rarely worth the novelty.

Composite B-tree Indexes: Order Columns by How You Query

The single most common indexing mistake I review is composite column order. The rule that survives contact with production: equality columns first, then the range or sort column. Postgres can walk an index left to right; once a query stops constraining a leading column, the rest of the index is dead weight for that query.

-- The composite index that fixed our invoice list (most selective first
-- is a myth: order by HOW YOU QUERY, equality columns before ranges)
CREATE INDEX idx_invoices_customer_status_created
  ON invoices (customer_id, status, created_at DESC);

-- Served by the index above (equality, equality, range):
SELECT * FROM invoices
WHERE customer_id = 42 AND status = 'unpaid'
ORDER BY created_at DESC LIMIT 20;

-- NOT served well: leading column missing → full scan territory
SELECT * FROM invoices WHERE status = 'unpaid';

Note the failure case in the snippet: a query filtering only on status cannot use an index that leads with customer_id. That does not mean you add a second index reflexively. Check pg_stat_user_indexes first; every index you keep costs you on INSERT, UPDATE, and autovacuum time. On one busy table we measured roughly 12 percent slower writes from two redundant indexes that no query had touched in months.

Partial Indexes: Index the 3 Percent You Actually Query

Partial indexes are the most underused feature in Postgres land. If your queries always carry a predicate like status is pending, you can build the index over only those rows. The docs are explicit that this both shrinks the index and speeds up updates to excluded rows, because changing a done row no longer touches the index at all. On our background jobs table the live subset was 3 percent of rows: the partial index came out 29x smaller and the worker poll query went from 90 ms to under 2 ms.

-- 97% of our rows were status = 'done'. Queries only ever asked
-- for the live 3%. A partial index keeps only what matters:
CREATE INDEX idx_jobs_pending
  ON background_jobs (scheduled_at)
  WHERE status IN ('pending', 'retrying');

-- 1.1 GB full index → 38 MB partial index.
-- Bonus: partial UNIQUE index for "one active draft per user"
CREATE UNIQUE INDEX uniq_active_draft
  ON purchase_orders (created_by)
  WHERE status = 'draft';

Partial unique indexes solve business rules that table-level constraints cannot: one active draft per user, one default address per customer, one open shift per employee. They enforce uniqueness only inside the predicate, which is exactly the semantics those rules need. I reach for this at least once per ERP project.

GIN: JSONB and Full-Text Search Without Extra Infrastructure

GIN indexes invert the relationship: instead of one entry per row, they store one entry per element, pointing back at every row containing it. That makes containment queries on JSONB and tsvector search fast enough that for catalogs under a few million rows, I default to Postgres full-text search before reaching for a separate search service. One real ERP example: product search across 400k SKUs with a GIN index on a tsvector column returns in 8 to 15 ms, and the data never leaves the transactional database, so there is no sync pipeline to babysit.

-- JSONB containment: which orders have a line item for SKU-123?
CREATE INDEX idx_orders_payload ON orders USING GIN (payload jsonb_path_ops);
SELECT id FROM orders WHERE payload @> '{"sku": "SKU-123"}';

-- Full-text search without Elasticsearch:
CREATE INDEX idx_products_fts ON products
  USING GIN (to_tsvector('indonesian', name || ' ' || description));

Reading EXPLAIN ANALYZE: the Four Numbers That Matter

EXPLAIN shows the plan the optimizer chose; ANALYZE actually runs the query and shows what happened; BUFFERS shows how much data it touched. The before-and-after below is a real shape from an invoice list endpoint, anonymized but with the original magnitudes:

EXPLAIN (ANALYZE, BUFFERS)
SELECT * FROM invoices WHERE customer_id = 42 AND status = 'unpaid';

-- BEFORE (no usable index):
Seq Scan on invoices (cost=0.00..48291.00 rows=312 width=184)
  (actual time=0.41..312.55 rows=287 loops=1)
  Filter: ((customer_id = 42) AND (status = 'unpaid'))
  Rows Removed by Filter: 1199713        ← read 1.2M rows for 287
  Buffers: shared read=33291             ← 260 MB off disk

-- AFTER (composite index):
Index Scan using idx_invoices_customer_status_created
  (cost=0.43..954.12 rows=312 width=184)
  (actual time=0.09..1.8 rows=287 loops=1)
  Buffers: shared hit=119                ← ~1 MB, all from cache
-- 312 ms → 1.8 ms. Same query, same data.

When I read a plan, I look at four things in order, and they diagnose nearly every slow query:

Rows Removed by Filter: the smoking gun. Reading 1.2 million rows to return 287 means the database is doing your WHERE clause the hard way. That is an index request written in plan language.
Estimated rows vs actual rows: when they diverge by 10x or more, statistics are stale or the planner cannot see through an expression. Run ANALYZE on the table, or index the expression itself.
Buffers shared read vs hit: read means disk, hit means cache. A query reading tens of thousands of buffers cold will be fast in your test and slow at 9 AM Monday when the cache is full of other tenants' data.
Loops on nested nodes: an inner index scan with loops equal to 50000 executes 50000 times. Cheap node, expensive total. This is where N-plus-one queries hide inside a single SQL statement.

Indexes are not free. Every additional index slows every INSERT and UPDATE on the table, consumes disk and cache, and adds autovacuum work. Audit usage with pg_stat_user_indexes and drop anything with zero scans after a representative period. The fastest index is often the one you delete.

My Indexing Workflow on Every Project

This is the loop I run roughly monthly on production systems, and always before a go-live:

Turn on pg_stat_statements and pull the top 10 queries by total execution time. Optimize by aggregate cost, not by which query annoyed someone today.
For each candidate, run EXPLAIN with ANALYZE and BUFFERS on production-shaped data. A 10k-row dev database will lie to you; the planner chooses different plans at different scales.
Design the smallest index that serves the query: equality columns, then range or sort. Consider partial if the query always carries a fixed predicate, and a covering INCLUDE clause if heap fetches dominate.
Create it with CREATE INDEX CONCURRENTLY so you do not lock writes, then re-run the EXPLAIN and record before and after numbers in the migration commit message. Future you will want that evidence.
A month later, check pg_stat_user_indexes. If the new index shows zero or trivial scans, you misdiagnosed; drop it and look again rather than letting it tax writes forever.

Boring Indexes, Measured Wins

Indexing rewards a measured, unglamorous workflow: find the heaviest queries, read their plans, build the narrowest index that serves them, and verify the win with numbers. The 312 ms to 1.8 ms invoice query in this post was not clever. It was a composite B-tree with the columns in the right order, found by reading one EXPLAIN output carefully.

The takeaway: master B-tree composites and partial indexes first, add GIN when JSONB or search enters the picture, remember BRIN exists when a log table hits 100 million rows, and audit your indexes as ruthlessly as you create them. PostgreSQL gives you world-class tools; the practical skill is knowing which 20 percent to reach for.

Sources and further reading

Frequently Asked Questions

PostgreSQL Indexing in Practice: B-tree, GIN, and Partial Indexes

Frequently Asked Questions

PostgreSQL Indexing in Practice: B-tree, GIN, and Partial Indexes

The Index Types That Matter (and Two That Rarely Do)

Composite B-tree Indexes: Order Columns by How You Query

Partial Indexes: Index the 3 Percent You Actually Query

GIN: JSONB and Full-Text Search Without Extra Infrastructure

Reading EXPLAIN ANALYZE: the Four Numbers That Matter

My Indexing Workflow on Every Project

Boring Indexes, Measured Wins

The Index Types That Matter (and Two That Rarely Do)

Composite B-tree Indexes: Order Columns by How You Query

Partial Indexes: Index the 3 Percent You Actually Query

GIN: JSONB and Full-Text Search Without Extra Infrastructure

Reading EXPLAIN ANALYZE: the Four Numbers That Matter

My Indexing Workflow on Every Project

Boring Indexes, Measured Wins