You uploaded 50 documents to your knowledge base. Asked a question. Got the wrong answer.
The information is in there. You can see it. But the AI cannot find it.
The problem is not the documents. It is how they were split.
FOUNDATIONAL - Essential for any AI retrieval system. Get this wrong and nothing downstream works.
When you feed documents to a retrieval system, you cannot give it the whole document at once. The AI has context limits. More importantly, shoving an entire 50-page PDF into a prompt produces terrible results. The AI needs focused chunks.
Chunking is the process of deciding where to draw the lines. Do you split every 500 tokens? Every paragraph? Every section? The choice dramatically affects what the AI retrieves when someone asks a question.
Get it wrong and the AI finds garbage. Get it right and it finds exactly what the user needs.
Chunking is not just about RAG systems. It is a pattern that appears whenever you need to balance context size against retrieval precision.
The unit of retrieval determines what can be found. Too large and you get noise. Too small and you lose context. The art is matching chunk boundaries to meaning boundaries.
Drag the sliders and hover over chunks to see how different settings affect what gets retrieved.
Good balance of context and precision
Overlap helps preserve context at chunk boundaries
Hover over a chunk on the right to highlight its content here
Our refund policy ensures customer satisfaction while protecting business operations. Customers may request a refund within 30 days of purchase by pro...
customers should use our online portal or contact support directly. Our team reviews all requests within 2 business days. Once approved, refunds are p...
standard refunds. Digital products may be exchanged for store credit. Gift cards can only be exchanged for different denominations, not refunded. Cust...
with photos of the defect and your order number. We prioritize these cases and typically resolve them within 24 hours.
Split every N tokens, regardless of content
The simplest approach. You decide on a size (say, 500 tokens) and split documents at that boundary. Usually with some overlap (50-100 tokens) so you do not cut sentences in half.
Split at meaning boundaries using embeddings
Uses embeddings to detect when topics shift. Compares sentence embeddings and splits when similarity drops below a threshold. The chunks follow the document's actual structure.
Split by document structure, then subdivide
Respects document hierarchy. First split by headers, then by paragraphs, then by sentences if still too large. Keeps context about where chunks came from.
A support engineer searches your 200+ internal docs. The error code is in one doc, the root cause is in another, and the fix is buried in a third. Bad chunking returns 50-page PDFs or sentence fragments missing context. Good chunking returns the exact paragraph with the fix, plus enough surrounding context to apply it.
Hover over any component to see what it does and why it's neededTap any component to see what it does and why it's needed
Animated lines show direct connections - Hover for detailsTap for details - Click to learn more
A 50-page technical manual has chapters, sections, and procedures. Splitting it every 500 tokens cuts procedures in half. The AI retrieves half a process and generates incomplete answers.
Instead: Use recursive chunking that respects document structure. Split by headers first, then subdivide large sections.
Zero overlap means sentences get cut mid-thought. "The solution requires..." ends one chunk while "...three specific steps" starts the next. Neither chunk is useful alone.
Instead: Add 10-20% overlap. A 500-token chunk should overlap 50-100 tokens with neighbors.
Your documents change. New versions, new formats, new content. But your chunking was designed for the originals. Retrieval quality degrades silently.
Instead: Treat chunking as part of your document pipeline. Re-chunk when documents update. Monitor retrieval quality over time.
You have learned how chunk boundaries affect retrieval quality. The natural next step is understanding how those chunks become searchable through embeddings.