How an AI Agent Turns Low Confidence and Feedback Into Better Knowledge
The dangerous failure mode for a customer-facing agent is not the one that says “I don’t know.” It is the one that answers confidently and wrong. Every confident-wrong answer is a trust liability, and you usually find out from a customer, not a dashboard.
The agent you actually want does two things most demos skip. It knows when it is unsure and refuses cleanly. And it treats that moment of uncertainty as a signal — a record of exactly what it needs to be taught next. Done well, the same machinery that prevents bad answers also generates your knowledge roadmap.
Score every chunk before trusting it
Retrieval returns candidates, but candidates are not evidence. A confidence score decides how much each retrieved chunk is allowed to influence the answer, and it can draw on several signals: how strongly a chunk overlaps the query’s actual terms, whether independent retrievers agree on it, and — as the system matures — weights for freshness and source trust so a stale or low-authority chunk cannot dominate even when the words match.
Today our score rests on the first two signals, which are the ones that move the needle most. We run two independent rankers — vector similarity and BM25 — and fuse them with Reciprocal Rank Fusion. Lexical overlap is what BM25 measures; retriever agreement is the consistency signal, because a chunk both rankers surface outranks one only a single ranker liked.
export function rrf(rank: number, k = 60): number {
return 1.0 / (k + rank);
}
// vectorScore, bm25Score, and the fused score are kept as separate fields
// so downstream code can see *why* a chunk ranked where it did.
Two thresholds turn that score into policy. RETRIEVAL_MIN_SCORE = 0.35 is the floor to reach the model at all; anything weaker is dropped before generation. LOW_CONFIDENCE_SCORE = 0.5 flags an answer whose best evidence was thin, so it can be captured for review later. Low-confidence context never gets to quietly steer the response.
Constrain the model to what it retrieved
A score is only useful if the model respects it. Constrained generation means the model may answer using the retrieved sources and nothing else — no filling gaps from training data.
const CONSTRAINT_INSTRUCTIONS = [
'Answer ONLY using the facts in the <sources> block below.',
'If the answer is not present in the sources, reply exactly: "I don\'t have enough information..."',
'Cite every factual claim by appending the matching [S#] tag(s) immediately after the claim...',
'Do not invent source IDs. Only cite IDs that appear in <sources>.',
].join('\n');
The qualifying chunks are assembled into a <sources> block, and untrusted input — like a visitor’s form fields — is fenced off with an explicit warning so it cannot smuggle in instructions. The model is boxed in by construction, not by hope.
Make every claim cite its source
Each chunk that survives filtering is tagged in confidence order — S1, S2, S3 — and the model is required to append the matching [S#] tag after every factual claim. Because each tag carries the document ID it came from, a reader (or your own validation layer) can trace any sentence back to the exact chunk and document that supports it. A claim with no citation is a claim the system should not have made.
Refuse when the evidence isn’t there
Refusal is the safety valve, and it fires at two points. The hard refusal runs before the model: if retrieval returns zero qualifying chunks and there is no tool to fall back on, the system skips the LLM call entirely and returns a fixed “insufficient evidence” reply. The soft refusal runs after: if the model, given weak sources, emits the exact refusal line, we detect it and persist the turn as a refusal rather than a normal answer.
Both paths do the same important thing — they record metadata (refusalType: 'hard' | 'soft') and open a help-desk ticket. The agent does not just decline; it reports that it declined.
Turn refusals and feedback into a knowledge roadmap
This is where uncertainty becomes an asset. Every hard refusal, soft refusal, low-confidence answer, and thumbs-down becomes a knowledge-gap event. Those events are clustered by query similarity — anything above 0.85 cosine collapses into one cluster — so the same missing answer asked fifty different ways becomes a single ranked gap with counts attached:
sourceCounts: { refusal_hard, refusal_soft, thumbs_down, low_confidence }
A diagnostic pass then classifies each cluster’s failure mode — no_relevant_docs, wrong_docs_retrieved, almost_matched, split_chunk, or over_refusal — which tells you whether the fix is new content, better chunking, or a threshold tweak. The output is not a vague “the bot is bad” complaint. It is a prioritized backlog of exactly what to write or ingest next, ordered by how often real users hit it.
Verify, then keep evaluating
A backlog is only worth it if you can prove the fix worked. After you add content, verification re-runs the real past queries from a cluster against the current corpus and counts how many now clear RETRIEVAL_MIN_SCORE. The same questions that failed last week become a regression test this week.
That reactive loop is the foundation. The discipline you build on top of it is proactive: a standing suite of adversarial queries, retrieval-recall benchmarks, and hallucination probes — questions about unindexed documents, answers that changed between versions, real entities with invented policies — run continuously so a knowledge change never silently breaks an answer that used to work.
Why this matters for product teams
Most teams treat agent quality as a model problem and argue about prompts. The leverage is somewhere less glamorous: scoring evidence honestly, refusing when it is thin, and mining every refusal for what to teach next. An agent wired this way compounds. It refuses safely on day one, and by the questions it could not answer, it tells you precisely how to make it answer them. Confidence scoring stops being just a guardrail and starts being a roadmap generator — which is exactly the kind of system that earns user trust instead of spending it.