KnowledgeLayer 2Retrieval Architecture

Chunking Strategies

You uploaded 50 documents to your knowledge base. Asked a question. Got the wrong answer.

The information is in there. You can see it. But the AI cannot find it.

The problem is not the documents. It is how they were split.

8 min read

intermediate

Relevant If You're

Building systems that search internal documentation

Improving AI-powered knowledge retrieval

Debugging why the AI misses obvious information

FOUNDATIONAL - Essential for any AI retrieval system. Get this wrong and nothing downstream works.

Where This Sits

Category 2.3: Retrieval Architecture

Layer 2

Intelligence Infrastructure

Chunking Strategies Citation & Source Tracking Embedding Model Selection Hybrid Search Query Transformation Relevance Thresholds Reranking

Explore all of Layer 2

What It Is

The decision about what constitutes "a piece" of information

When you feed documents to a retrieval system, you cannot give it the whole document at once. The AI has context limits. More importantly, shoving an entire 50-page PDF into a prompt produces terrible results. The AI needs focused chunks.

Chunking is the process of deciding where to draw the lines. Do you split every 500 tokens? Every paragraph? Every section? The choice dramatically affects what the AI retrieves when someone asks a question.

Get it wrong and the AI finds garbage. Get it right and it finds exactly what the user needs.

The Lego Block Principle

Chunking is not just about RAG systems. It is a pattern that appears whenever you need to balance context size against retrieval precision.

The core pattern:

The unit of retrieval determines what can be found. Too large and you get noise. Too small and you lose context. The art is matching chunk boundaries to meaning boundaries.

Where else this applies:

Process documentation - Each procedure becomes its own retrievable unit, not buried in a 40-page manual.

Meeting notes - Decisions and action items become searchable without wading through discussion.

Training materials - Individual concepts are findable without returning entire courses.

Policy documents - Specific rules are retrievable without pulling the entire compliance handbook.

Interactive: Chunking Simulator

See exactly where chunks split your content

Drag the sliders and hover over chunks to see how different settings affect what gets retrieved.

Chunk Size300 tokens

100 (granular)800 (broad)

Good balance of context and precision

Overlap50 tokens

0 (no overlap)200 (high redundancy)

Overlap helps preserve context at chunk boundaries

Chunks Created

1,004

Total Tokens

+17%

Redundancy from Overlap

90%

Retrieval Quality

Source Document

859 tokens original

Our refund policy ensures customer satisfaction while protecting business operations. Customers may request a refund within 30 days of purchase by providing their original receipt or proof of purchase. The item must be returned in unused condition with all original packaging intact. To submit a refund request, customers should use our online portal or contact support directly. Our team reviews all requests within 2 business days. Once approved, refunds are processed to the original payment method within 5-7 business days. Please note that digital products, customized items, and gift cards are not eligible for standard refunds. Digital products may be exchanged for store credit. Gift cards can only be exchanged for different denominations, not refunded. Customized items are final sale unless defective. For defective items, we offer full refunds or replacements regardless of the 30-day window. Simply contact our support team with photos of the defect and your order number. We prioritize these cases and typically resolve them within 24 hours.

Hover over a chunk on the right to highlight its content here

Resulting Chunks (4)

~50 token overlap each

Chunk 1

299 tokens

Our refund policy ensures customer satisfaction while protecting business operations. Customers may request a refund within 30 days of purchase by pro...

Chunk 2

299 tokens+48 overlap

customers should use our online portal or contact support directly. Our team reviews all requests within 2 business days. Once approved, refunds are p...

Chunk 3

299 tokens+48 overlap

standard refunds. Digital products may be exchanged for store credit. Gift cards can only be exchanged for different denominations, not refunded. Cust...

Chunk 4

107 tokens+48 overlap

with photos of the defect and your order number. We prioritize these cases and typically resolve them within 24 hours.

What you're seeing: Each colored chunk becomes a separate vector in your database. When a user asks a question, the system finds the most similar chunks. If important context spans two chunks without overlap, the AI might miss the connection. Too much overlap means paying to store the same content multiple times.

How It Works

Three approaches, different trade-offs

Fixed-Size Chunking

Split every N tokens, regardless of content

The simplest approach. You decide on a size (say, 500 tokens) and split documents at that boundary. Usually with some overlap (50-100 tokens) so you do not cut sentences in half.

Pro: Predictable, fast, easy to implement

Con: Ignores meaning boundaries

Semantic Chunking

Split at meaning boundaries using embeddings

Uses embeddings to detect when topics shift. Compares sentence embeddings and splits when similarity drops below a threshold. The chunks follow the document's actual structure.

Pro: Preserves meaning, better retrieval

Con: Slower, requires embedding calls

Recursive/Hierarchical Chunking

Split by document structure, then subdivide

Respects document hierarchy. First split by headers, then by paragraphs, then by sentences if still too large. Keeps context about where chunks came from.

Pro: Preserves structure, good for docs

Con: Depends on document having structure

Connection Explorer

"Error 4012 on sync" - the fix surfaces in 3 seconds

A support engineer searches your 200+ internal docs. The error code is in one doc, the root cause is in another, and the fix is buried in a third. Bad chunking returns 50-page PDFs or sentence fragments missing context. Good chunking returns the exact paragraph with the fix, plus enough surrounding context to apply it.

Hover over any component to see what it does and why it's neededTap any component to see what it does and why it's needed

RAG Orchestration

AI Assistant

Outcome

React Flow

Foundation

Data Infrastructure

Intelligence

Understanding

Outcome

Animated lines show direct connections - Hover for detailsTap for details - Click to learn more

Upstream (Requires)

OCR/Document Parsing

Downstream (Enables)

Embedding Generation

Common Mistakes

What breaks when chunking goes wrong

Do not use fixed sizes for structured documents

A 50-page technical manual has chapters, sections, and procedures. Splitting it every 500 tokens cuts procedures in half. The AI retrieves half a process and generates incomplete answers.

Instead: Use recursive chunking that respects document structure. Split by headers first, then subdivide large sections.

Do not ignore chunk overlap

Zero overlap means sentences get cut mid-thought. "The solution requires..." ends one chunk while "...three specific steps" starts the next. Neither chunk is useful alone.

Instead: Add 10-20% overlap. A 500-token chunk should overlap 50-100 tokens with neighbors.

Do not chunk once and forget

Your documents change. New versions, new formats, new content. But your chunking was designed for the originals. Retrieval quality degrades silently.

Instead: Treat chunking as part of your document pipeline. Re-chunk when documents update. Monitor retrieval quality over time.

What's Next

Now that you understand chunking

You have learned how chunk boundaries affect retrieval quality. The natural next step is understanding how those chunks become searchable through embeddings.

Recommended Next

Embedding Generation

How chunks become vectors for semantic search

Back to Learning Hub

The decision about what constitutes "a piece" of information

Get it wrong and the AI finds garbage. Get it right and it finds exactly what the user needs.

See exactly where chunks split your content

Drag the sliders and hover over chunks to see how different settings affect what gets retrieved.

Chunk Size300 tokens

100 (granular)800 (broad)

Good balance of context and precision

Overlap50 tokens

0 (no overlap)200 (high redundancy)

Overlap helps preserve context at chunk boundaries

Chunks Created

1,004

Total Tokens

+17%

Redundancy from Overlap

90%

Retrieval Quality

Source Document

859 tokens original

Hover over a chunk on the right to highlight its content here

Resulting Chunks (4)

~50 token overlap each

Chunk 1

299 tokens

Our refund policy ensures customer satisfaction while protecting business operations. Customers may request a refund within 30 days of purchase by pro...

Chunk 2

299 tokens+48 overlap

customers should use our online portal or contact support directly. Our team reviews all requests within 2 business days. Once approved, refunds are p...

Chunk 3

299 tokens+48 overlap

standard refunds. Digital products may be exchanged for store credit. Gift cards can only be exchanged for different denominations, not refunded. Cust...

Chunk 4

107 tokens+48 overlap

with photos of the defect and your order number. We prioritize these cases and typically resolve them within 24 hours.

Three approaches, different trade-offs

Fixed-Size Chunking

Split every N tokens, regardless of content

The simplest approach. You decide on a size (say, 500 tokens) and split documents at that boundary. Usually with some overlap (50-100 tokens) so you do not cut sentences in half.

Pro: Predictable, fast, easy to implement

Con: Ignores meaning boundaries

Semantic Chunking

Split at meaning boundaries using embeddings

Uses embeddings to detect when topics shift. Compares sentence embeddings and splits when similarity drops below a threshold. The chunks follow the document's actual structure.

Pro: Preserves meaning, better retrieval

Con: Slower, requires embedding calls

Recursive/Hierarchical Chunking

Split by document structure, then subdivide

Respects document hierarchy. First split by headers, then by paragraphs, then by sentences if still too large. Keeps context about where chunks came from.

Pro: Preserves structure, good for docs

Con: Depends on document having structure

"Error 4012 on sync" - the fix surfaces in 3 seconds

Hover over any component to see what it does and why it's neededTap any component to see what it does and why it's needed

RAG Orchestration

AI Assistant

Outcome

React Flow

Foundation

Data Infrastructure

Intelligence

Understanding

Outcome

Animated lines show direct connections - Hover for detailsTap for details - Click to learn more

What breaks when chunking goes wrong

Do not use fixed sizes for structured documents

A 50-page technical manual has chapters, sections, and procedures. Splitting it every 500 tokens cuts procedures in half. The AI retrieves half a process and generates incomplete answers.

Instead: Use recursive chunking that respects document structure. Split by headers first, then subdivide large sections.

Do not ignore chunk overlap

Zero overlap means sentences get cut mid-thought. "The solution requires..." ends one chunk while "...three specific steps" starts the next. Neither chunk is useful alone.

Instead: Add 10-20% overlap. A 500-token chunk should overlap 50-100 tokens with neighbors.

Do not chunk once and forget

Your documents change. New versions, new formats, new content. But your chunking was designed for the originals. Retrieval quality degrades silently.

Instead: Treat chunking as part of your document pipeline. Re-chunk when documents update. Monitor retrieval quality over time.