KnowledgeLayer 2Retrieval Architecture

Embedding Model Selection

You searched for "refund policy" and got nothing.

Changed it to "return policy" and found exactly what you needed.

Same concept. Different words. Completely different results.

Your search is only as smart as the model that understands meaning.

8 min read

intermediate

Relevant If You're

Building RAG or semantic search systems

Choosing between OpenAI, Cohere, or open-source models

Wondering why your search results feel "off"

INTELLIGENCE LAYER - The quality of everything downstream depends on picking the right model here.

Where This Sits

Category 2.3: Retrieval Architecture

Layer 2

Intelligence Infrastructure

Chunking Strategies Citation & Source Tracking Embedding Model Selection Hybrid Search Query Transformation Relevance Thresholds Reranking

Explore all of Layer 2

What It Is

The model that decides what "similar" means

Embedding models convert text into numbers. Specifically, they turn words, sentences, or paragraphs into long lists of numbers (vectors) that capture meaning. Two pieces of text with similar meanings end up close together in this number space. That's how semantic search works.

But here's the thing: not all models understand meaning the same way. A model trained on legal documents will have a different sense of "similar" than one trained on product reviews. A model optimized for short queries will struggle with long documents. A multilingual model might be slower but work across languages.

The embedding model you choose determines what your system thinks is relevant. Pick the wrong one and your search results will be consistently mediocre. Pick the right one and everything downstream just works.

The core insight: The embedding model is the lens through which your AI sees meaning. The wrong lens means blurry vision, no matter how good everything else is.

The Lego Block Principle

Embedding model selection is really about matching the model's "understanding" to your specific domain, query type, and operational constraints.

The core pattern:

Identify what makes your use case unique (domain, query length, language, latency needs). Find models trained or fine-tuned for similar patterns. Test with real queries from your users. Measure retrieval quality, not just speed or cost.

Where else this applies:

Hiring a specialist - You match the expert's background to the problem domain.

Choosing a translator - You pick someone who knows both languages AND the subject matter.

Selecting a search engine - You use different tools for academic papers vs. product shopping.

Picking a consultant - Industry experience matters more than generic expertise.

Interactive: Pick a Model, See the Match Quality

Different models, different results

Try different embedding models with different queries. Watch how domain-specific models excel at their specialty but struggle elsewhere.

1. Choose an embedding model

2. Pick a search query

About text-embedding-ada-002

General-purpose model, good baseline performance

Strengths

Easy to use
Consistent quality
Great documentation

Trade-offs

External API required
Cost per token

Try it: Select a model and query above, then click "Run Search" to see how different models rank the same documents. Try the legal query with legal-bert vs ada-002.

How It Works

Three dimensions that determine the right choice

Domain Fit

Does it understand your content?

General-purpose models work okay for general content. But if you have legal contracts, medical records, or technical documentation, a model trained on similar data will understand nuances that general models miss.

Domain-specific models catch subtleties

May not exist for niche domains

Query-Document Match

Symmetric vs asymmetric search

Some models expect the query and document to be similar in length and style (symmetric). Others are designed for short queries matching long documents (asymmetric). Using the wrong type tanks retrieval quality.

Right match dramatically improves results

Wrong match fails silently

Operational Constraints

Speed, cost, privacy trade-offs

OpenAI's models are convenient but send data externally. Open-source models run locally but need infrastructure. Larger models are more accurate but slower and more expensive. You have to balance these.

Clear trade-offs you can optimize for

No single "best" answer

Connection Explorer

Find the right information instantly, even when users use different words

This flow powers semantic search across your knowledge base. The embedding model translates both queries and documents into a shared meaning space, letting users find relevant content even when their words don't exactly match. Pick the wrong model and users miss documents that are right there. Pick the right one and search just works.

Hover over any component to see what it does and why it's neededTap any component to see what it does and why it's needed

Accurate Retrieval

Outcome

React Flow

Data Infrastructure

Intelligence

Outcome

Animated lines show direct connections · Hover for detailsTap for details · Click to learn more

Upstream (Requires)

Chunking Strategies

Downstream (Enables)

Hybrid Search Reranking

Common Mistakes

What breaks when you pick the wrong model

Don't pick based on benchmarks alone

The MTEB leaderboard shows you which model scores highest on standardized tests. But your data isn't standardized. A model that ranks #3 overall might be #1 for your specific domain. Test with your actual queries.

Instead: Create a test set of 50-100 real queries. Measure which model returns the best results for YOUR content.

Don't use the same model for queries and documents

Your users type short, messy queries. Your documents are long and well-structured. A symmetric model treats them the same, which is wrong. Asymmetric models handle this mismatch correctly.

Instead: If queries are shorter than documents, use an asymmetric model or add a query prefix like "query: " to help the model adjust.

Don't ignore dimension count

Higher-dimension embeddings (1536, 3072) capture more nuance but cost more to store and search. For simple use cases, 384 or 768 dimensions work fine. You're paying for precision you might not need.

Instead: Start with a smaller model. Only upgrade dimensions when you can measure the quality improvement justifying the cost.

Next Steps

Now that you understand embedding model selection

You know how to match models to your domain, query patterns, and constraints. The next step is understanding how to combine semantic search with keyword search for even better results.

Recommended

Hybrid Search

Combining keyword and semantic search for the best of both worlds

Continue learning

The model that decides what "similar" means

The core insight: The embedding model is the lens through which your AI sees meaning. The wrong lens means blurry vision, no matter how good everything else is.

Different models, different results

Try different embedding models with different queries. Watch how domain-specific models excel at their specialty but struggle elsewhere.

1. Choose an embedding model

2. Pick a search query

About text-embedding-ada-002

General-purpose model, good baseline performance

Strengths

Easy to use
Consistent quality
Great documentation

Trade-offs

External API required
Cost per token

Try it: Select a model and query above, then click "Run Search" to see how different models rank the same documents. Try the legal query with legal-bert vs ada-002.

Three dimensions that determine the right choice

Domain Fit

Does it understand your content?

Domain-specific models catch subtleties

May not exist for niche domains

Query-Document Match

Symmetric vs asymmetric search

Right match dramatically improves results

Wrong match fails silently

Operational Constraints

Speed, cost, privacy trade-offs

Clear trade-offs you can optimize for

No single "best" answer

Find the right information instantly, even when users use different words

Hover over any component to see what it does and why it's neededTap any component to see what it does and why it's needed

Accurate Retrieval

Outcome

React Flow

Data Infrastructure

Intelligence

Outcome

Animated lines show direct connections · Hover for detailsTap for details · Click to learn more

What breaks when you pick the wrong model

Don't pick based on benchmarks alone

Instead: Create a test set of 50-100 real queries. Measure which model returns the best results for YOUR content.

Don't use the same model for queries and documents

Your users type short, messy queries. Your documents are long and well-structured. A symmetric model treats them the same, which is wrong. Asymmetric models handle this mismatch correctly.

Instead: If queries are shorter than documents, use an asymmetric model or add a query prefix like "query: " to help the model adjust.

Don't ignore dimension count

Higher-dimension embeddings (1536, 3072) capture more nuance but cost more to store and search. For simple use cases, 384 or 768 dimensions work fine. You're paying for precision you might not need.

Instead: Start with a smaller model. Only upgrade dimensions when you can measure the quality improvement justifying the cost.