Your team asks your internal knowledge system a question.
It returns 20 results. The answer is somewhere in there.
But result #1 is about a different topic. Result #4 is outdated. The actual answer? It's buried at #17.
Your system retrieved the information. It just put it in the wrong order.
The first result should be the best result. Every time.
INTELLIGENCE LAYER - Reranking happens after retrieval, before the AI generates a response.
When you search a knowledge base, the first retrieval uses fast, approximate matching. Vector similarity gets you in the ballpark. But 'similar embedding' doesn't always mean 'best answer to this specific question.'
Reranking takes those initial results and scores them again. This time with a more sophisticated model that actually reads the query and each result together. It asks: 'Given what the user is asking, how relevant is this specific piece of content?' Then it reorders based on those scores.
The difference matters. Without reranking, your AI gets the top 5 results from fast retrieval. With reranking, it gets the 5 most relevant results. Same data, different selection, dramatically better answers.
Reranking solves a universal problem: fast initial filtering gets you candidates, but you need deeper evaluation to pick the best ones.
First cast a wide net quickly (cheap, approximate). Then evaluate the candidates carefully (expensive, precise). This pattern appears whenever you need both speed and accuracy.
Initial retrieval uses fast similarity matching. Reranking evaluates actual relevance.
"What is our refund policy for customers?"
All employees receive 15 days of PTO annually. Unused days roll over up to 5 days.
Passwords must be 12+ characters with special characters. Change every 90 days.
Starter tier at $99/month, Professional at $299/month, Enterprise custom pricing.
Standard returns within 30 days. Enterprise customers have 60-day window. Full refund to original payment method.
Process refund requests through the dashboard. Approval required for amounts over $500.
Most accurate, most expensive
Feeds the query and each result together into a transformer model. The model sees both and outputs a relevance score. Because it processes them jointly, it understands subtle relationships that embeddings miss.
Flexible, uses your existing model
Sends the query and candidate results to an LLM with instructions to rank them. The LLM returns ordered results with explanations. Good when you're already calling an LLM and want to add reranking without new infrastructure.
Fastest, good for high-volume
Uses lightweight models that output sparse relevance signals. Faster than cross-encoders because they use simpler architectures. Often combined with keyword matching signals for hybrid scoring.
Reranking sits between retrieval and context assembly. It takes the candidates from vector search and hybrid search, scores them for actual relevance to the query, and passes the best ones to context window management for use in generation.
Hover over any component to see what it does and why it's neededTap any component to see what it does and why it's needed
Animated lines show direct connections · Hover for detailsTap for details · Click to learn more
Your vector search returns garbage. You add reranking hoping it'll fix things. It won't. Reranking can only reorder what retrieval gives it. If the right answer isn't in the top 50 candidates, no amount of reranking will find it.
Instead: Fix your chunking and embedding strategy first. Reranking refines good retrieval. It does not rescue bad retrieval.
You run every result through your reranker. Now queries take 3 seconds instead of 200ms. The system feels sluggish. Users abandon before getting answers.
Instead: Rerank only the top N results from initial retrieval (typically 20-50). The reranker refines, the fast retrieval filters.
Reranking gives you ordered results with confidence scores. You take the top 5 and ignore the scores. One of those 5 has a score of 0.12 while the others are 0.85+. You fed the AI garbage anyway.
Instead: Use relevance thresholds. Drop results below a minimum score, even if you have slots left. Empty is better than wrong.
You've learned how reranking improves result quality by reordering candidates based on actual relevance. The natural next step is understanding how to decide which reranked results are good enough to use.