Why use PostgreSQL full-text search instead of Elasticsearch for ERP projects?

For datasets under 5 million rows, PostgreSQL FTS with a GIN index on a tsvector column matches Elasticsearch's query latency for most query types. The operational advantage is significant: no additional infrastructure, no synchronization lag between your database and a separate search cluster, and no Elasticsearch cluster to manage. For ERP datasets typically ranging from 10K to 500K rows, PostgreSQL FTS is fast enough and dramatically simpler.

How does a GIN index improve full-text search performance in PostgreSQL?

A GIN (Generalized Inverted Index) stores a mapping from each lexeme to the rows that contain it, turning full-text queries from O(N) table scans into O(log N) index lookups. The index is rebuilt incrementally on inserts and updates rather than from scratch. For a table with 1 million rows, a GIN index on a tsvector column enables sub-10ms full-text queries.

How do you handle typos and misspellings in PostgreSQL full-text search?

PostgreSQL's tsvector handles stemmed exact-word matches well but fails on misspellings like 'recieve' instead of 'receive'. The solution is to add a gin_trgm_ops index on the raw text column (using the pg_trgm extension) alongside the tsvector GIN index, then use similarity() or word_similarity() as a secondary ranking signal. A WHERE clause combining @@ plainto_tsquery() with OR similarity() > 0.3 covers both exact and fuzzy matches.

When does PostgreSQL FTS break down and Elasticsearch becomes necessary?

PostgreSQL FTS struggles when queries match millions of rows because ts_rank must read the tsvector of every matching row — broad queries can take 25+ seconds at that scale. The threshold is roughly 1 million or more rows with high-cardinality queries. Other reasons to add Elasticsearch include needing semantic/vector search, real-time cross-service search aggregation across multiple databases, or built-in spelling correction features.

How do you implement autocomplete with PostgreSQL full-text search?

Standard plainto_tsquery does not support prefix matching, so 'inv' will not match 'invoice'. For autocomplete, use to_tsquery('english', $1 || ':*') — the :* operator matches any lexeme starting with the given prefix and works efficiently on GIN indexes. Combining this with LIMIT 10 produces a fast autocomplete endpoint with response times under 20ms even on tables with 100K or more rows.

PostgreSQL Full-Text Search Without Elasticsearch: What I Use in ERP Projects

My ERP projects need search — search across customers, invoices, products, and employees. My instinct was to reach for Elasticsearch. Then I ran benchmarks: for datasets under 5 million rows, PostgreSQL full-text search with a GIN index on tsvector matched Elasticsearch's latency for most query types and was dramatically simpler to operate. No additional infrastructure, no synchronization lag, no Elasticsearch cluster to manage. I've been shipping PostgreSQL FTS in production ERP systems for two years and it handles everything short of fuzzy search and complex relevance ranking.

How PostgreSQL Full-Text Search Works

PostgreSQL's FTS system converts text into a `tsvector` — a sorted list of lexemes (normalized word stems) with positional information. A query is converted to a `tsquery` — a boolean expression of lexemes. The `@@` operator matches a tsvector against a tsquery. PostgreSQL ships with dictionaries for 20+ languages that handle stemming (running → run), stop words (the, a, is), and accents. For Indonesian content in my ERP projects, I use the `english` dictionary as a fallback (Indonesian FTS dictionaries exist but require custom configuration).

The tsvector Column and GIN Index

The production pattern is to add a generated `tsvector` column to your table and build a GIN index on it. The generated column is maintained automatically by PostgreSQL — no triggers or external sync needed. GIN (Generalized Inverted Index) is the index type designed for tsvectors: it stores a mapping from each lexeme to the rows containing it, making full-text queries O(log N) rather than O(N). The index is rebuilt incrementally on inserts and updates, not from scratch. For a table with 1 million rows, a GIN index on tsvector enables sub-10ms full-text queries.

Ranking Results with ts_rank

PostgreSQL's `ts_rank()` function scores each matching row by relevance — how many times the query terms appear, their positions in the document, and their proximity. `ts_rank_cd()` (cover density) additionally considers how close the matching terms are to each other. The limitation: ts_rank requires reading the tsvector of every matching row, which is expensive at scale. For 1 million matches on a broad query, ranking can take seconds. The optimization: add WHERE filters to reduce the match set before ranking, or limit to the first N matches and rank only those.

PostgreSQL FTS Pipeline:
─────────────────────────────────────────────────────────
Text input: "Running PostgreSQL queries efficiently"
                    │
                    ▼ to_tsvector('english', text)
tsvector: 'effici':4 'postgresql':2 'queri':3 'run':1
  (stemmed, stop words removed, positions recorded)
                    │
                    ▼ stored in generated column + GIN index
GIN Index: lexeme → [row_ids]
  'postgresql' → [1, 5, 23, 44]
  'queri'      → [1, 3, 5, 12]
  'run'        → [1, 8, 19]
                    │
Query: plainto_tsquery('english', 'postgresql queries')
       → 'postgresql' & 'queri'  (AND logic)
                    │
                    ▼ GIN index intersection
Matching rows: [1, 5]  → O(log N), not O(N)
                    │
                    ▼ ts_rank() for ordering
Ranked results with relevance score

Performance benchmark (1M rows, GIN index):
  Simple phrase search:        ~8ms
  Boolean AND (2 terms):       ~12ms
  Autocomplete (prefix :*):    ~5ms
  Broad match (many results):  can be slow → use LIMIT early

From my experience with ERP search: combine full-text search with trigram similarity (pg_trgm extension) for typo-tolerance. tsvector handles stemmed exact word matches well, but fails on misspellings like 'recieve' instead of 'receive'. Add a `gin_trgm_ops` index on the raw text column alongside the tsvector GIN index and use `similarity()` or `word_similarity()` as a secondary ranking signal. A query like: `WHERE search_vector @@ plainto_tsquery('english', $1) OR similarity(name, $1) > 0.3 ORDER BY ts_rank(...) DESC` handles both exact and fuzzy matches.

The SQL Pattern I Use in Production

Here's the complete production pattern I use in NestJS + PostgreSQL ERP systems. The pattern includes: a generated tsvector column combining multiple text fields (with different weights using setweight), a GIN index on the tsvector, a search function using plainto_tsquery (which handles multi-word queries without requiring users to know tsquery syntax), ts_rank for ordering, and ts_headline for generating search result snippets with highlighted matches.

Multi-Field Weighted Search

In an ERP context, a customer name match should rank higher than a match in the notes field. PostgreSQL's `setweight()` function assigns weights A-D to tsvectors before concatenating them. Weight A is the highest — a match on an A-weighted field scores higher than a match on a B-weighted field. I use: A for name/title/code, B for description, C for tags/categories, D for notes/remarks. This mimics the basic TF-IDF weighting that Elasticsearch uses, without the complexity of a dedicated search cluster.

-- Production FTS setup for ERP (customers table)
-- 1. Add generated tsvector column with weighted fields
ALTER TABLE customers ADD COLUMN search_vector tsvector
  GENERATED ALWAYS AS (
    setweight(to_tsvector('english', coalesce(name, '')), 'A') ||
    setweight(to_tsvector('english', coalesce(code, '')), 'A') ||
    setweight(to_tsvector('english', coalesce(email, '')), 'B') ||
    setweight(to_tsvector('english', coalesce(notes, '')), 'D')
  ) STORED;

-- 2. GIN index for fast searching
CREATE INDEX idx_customers_search ON customers USING GIN(search_vector);

-- 3. Trigram index for autocomplete + fuzzy matching
CREATE EXTENSION IF NOT EXISTS pg_trgm;
CREATE INDEX idx_customers_name_trgm ON customers USING GIN(name gin_trgm_ops);

-- 4. Search query (used in NestJS service)
SELECT
  id, name, code, email,
  ts_rank(search_vector, query) AS rank,
  ts_headline('english', name, query, 'MaxWords=10, MinWords=5') AS snippet
FROM customers, plainto_tsquery('english', $1) AS query
WHERE
  search_vector @@ query
  OR similarity(name, $1) > 0.3  -- fuzzy fallback
ORDER BY rank DESC, similarity(name, $1) DESC
LIMIT 20;

-- 5. Autocomplete endpoint (prefix match)
SELECT id, name
FROM customers
WHERE to_tsvector('english', name) @@ to_tsquery('english', $1 || ':*')
LIMIT 10;

Implementing Search in NestJS

In a NestJS service, full-text search with raw SQL gives more control than an ORM-abstracted query. I use the TypeORM DataSource or a raw pg client with parameterized queries. The key is to use `plainto_tsquery` rather than `to_tsquery` for user-facing search — `plainto_tsquery` converts a plain text query string (as a user types it) to a tsquery, handling AND logic automatically and ignoring syntax. `to_tsquery` requires users to know tsquery operators like & and |.

Benchmarks show PostgreSQL FTS is competitive with Elasticsearch on phrase search and boolean queries. But for queries that match millions of rows (e.g., a common word in a large dataset), PostgreSQL's ranking step reads every matching tsvector and can take 25+ seconds. The threshold where Elasticsearch outperforms PostgreSQL FTS is roughly 1 million+ rows with high-cardinality queries. For my ERP datasets (10K-500K rows), PostgreSQL FTS is fast enough. If your search needs to rank across millions of records with complex relevance, move to Elasticsearch or a PostgreSQL extension like ParadeDB.

Autocomplete with prefix matching

Standard `plainto_tsquery` doesn't do prefix matching — 'inv' won't match 'invoice'. For autocomplete, use `to_tsquery('english', $1 || ':*')` with the `:*` operator, which matches any lexeme starting with the prefix. Combine this with LIMIT 10 and you have a fast autocomplete endpoint. The `:*` operator works on GIN indexes and is fast even on large tables. I've shipped this pattern for customer name autocomplete and product code lookup in ERP systems — the response time is under 20ms even with 100K+ rows.

When to Add Elasticsearch

PostgreSQL FTS covers 90% of ERP search requirements. The cases where I'd add Elasticsearch: dataset exceeds 5 million rows with broad-match queries, you need semantic/vector search (searching by meaning, not keywords), you need real-time cross-service search aggregation (search across multiple databases simultaneously), or your stakeholders require features like 'did you mean?' spelling correction using the Elasticsearch dictionary. For everything else, PostgreSQL FTS plus pg_trgm is simpler, cheaper, and sufficient.

Sources & Further Reading

Xata — Postgres vs Elasticsearch Full-Text Search — https://xata.io/blog/postgres-full-text-search-postgres-vs-elasticsearch
Neon — Comparing Native Postgres, ElasticSearch, and pg_search — https://neon.com/blog/postgres-full-text-search-vs-elasticsearch
ParadeDB — Elasticsearch vs Postgres for Full-Text Search — https://www.paradedb.com/blog/elasticsearch-vs-postgres

Frequently Asked Questions

PostgreSQL Full-Text Search Without Elasticsearch: What I Use in ERP Projects

Frequently Asked Questions

PostgreSQL Full-Text Search Without Elasticsearch: What I Use in ERP Projects

How PostgreSQL Full-Text Search Works

The tsvector Column and GIN Index

Ranking Results with ts_rank

The SQL Pattern I Use in Production

Multi-Field Weighted Search

Implementing Search in NestJS

Autocomplete with prefix matching

When to Add Elasticsearch

Sources & Further Reading

How PostgreSQL Full-Text Search Works

The tsvector Column and GIN Index

Ranking Results with ts_rank

The SQL Pattern I Use in Production

Multi-Field Weighted Search

Implementing Search in NestJS

Autocomplete with prefix matching

When to Add Elasticsearch

Sources & Further Reading