Vincere.dev Vincere
Back to blog
Technical / / 2 min read

Adding BM25 to Improve AI Agent Retrieval

How we layered indexed lexical search into a Postgres-based AI agent retrieval system using ParadeDB and scoped BM25 queries.

BM25 ParadeDB AI Retrieval

Adding BM25 to Improve AI Agent Retrieval

BM25 solves a different problem than embeddings do. Embeddings are good at meaning, and BM25 is good at matching the literal words in a query. An AI agent product needs both, because users often type the exact term they are looking for even when the rest of the question is loose.

We added BM25 to catch the cases where vector search can miss exact terms: product names, acronyms, rare domain language, IDs, and integration labels.

The migration uses ParadeDB’s pg_search extension and creates a BM25 index on document_chunks.

export const up: MigrationFn<QueryInterface> = async ({ context: qi }) => {
  await qi.sequelize.query('CREATE EXTENSION IF NOT EXISTS pg_search');
  await qi.sequelize.query(`
    CREATE INDEX document_chunks_bm25_idx ON document_chunks
    USING bm25 (id, content, project_id, agent_id)
    WITH (key_field = 'id')
  `);
};

In this case, the index includes project_id and agent_id because retrieval is tenant-scoped. Filtering after the search would risk both performance and correctness.

The BM25 part of the retrieval query ranks lexical matches before fusion.

bm25 AS (
  SELECT id,
         ROW_NUMBER() OVER (ORDER BY pdb.score(id) DESC) AS rank,
         pdb.score(id) AS bm25_score
  FROM document_chunks
  WHERE content ||| :queryText
        AND ${tenantWhere}
  ORDER BY pdb.score(id) DESC
  LIMIT :pool
)

BM25 here is additive. It gives exact terms a path into the candidate set without displacing semantic retrieval.

Keep BM25 as a Signal, Not the Whole System

BM25 can also produce noisy matches when a query contains common words. That is why the repository keeps the raw BM25 score, the vector score, and the final fused score as separate fields.

SELECT c.id,
       c.document_id AS "documentId",
       c.content,
       c.metadata,
       (1 - (c.embedding <=> :vec::vector)) AS "vectorScore",
       f.bm25_score AS "bm25Score",
       f.rrf::double precision AS score
FROM fused f
JOIN document_chunks c ON c.id = f.id
ORDER BY f.rrf DESC
LIMIT :limit

With the scores kept apart, downstream code can inspect why a chunk appeared: whether it was semantically close, lexically strong, or surfaced by both retrievers at once.

Why This Matters for Product Teams

In product terms, BM25 is a reliability feature. When the agent cannot answer a question from a document that obviously contains the answer, the user does not care whether the miss came from embeddings, tokenization, or ranking. They only see that the agent failed, and BM25 removes a common reason for that failure.

What We Learned

Hybrid retrieval becomes necessary once the knowledge base fills up with real company language: brand names, plan names, internal terminology, and integration-specific phrases that embeddings alone tend to smear together.

Reaching for a separate search platform this early would have been premature. For this project, BM25 inside Postgres gave us a lexical layer and kept the architecture compact.

Similar Articles

More practical notes from the Vincere.dev team.

Need hybrid retrieval without overbuilding the stack?

Vincere.dev helps teams design pragmatic AI retrieval systems using the right mix of vector search, lexical search, and product feedback loops.