Hybrid Search
+
Apr 16, 2025
Optimizing the R in RAG
© Doug Turnbull (http://softwaredoug.com), all opinions my own, not my employer
Can’t cover in 45 mins…
Intuitive sense of “close” good enough for today :)
Lexical Search
Key Points:
Vector Search
Key Points:
Also won’t cover
Assumption: embeddings good first pass search
Embeddings get you close but not all the way
ID | Title | Vector (256? 512? Or more dimensions) |
0 | mary had a little lamb | [0.9, 0.8, -0.5, 0.75, ..] |
1 | mary had a little ham | [0.6, 0.4, -0.4, 0.60, ..] |
2 | a little ham | [-0.2, 0.5, 0.9, -0.45, ..] |
3 | little mary had a scam | [0.4, -0.5, 0.25, 0.14, ..] |
4 | ham it up with mary | [0.2, 0.5, 0.2, 0.45, ..] |
5 | Little red riding hood had a baby sheep? | [0.95, 0.79, -0.49, 0.65, ..] |
Similar!
(despite sharing few terms)
Chunked
You’ve chunked your data into a meaningful “search document” with important metadata:
📕
{
“Book_title”: “Nursery Rhymes”
“Section”: “Mary Had a Little Lamb”
“Text”: “...”
}
Embedding for whole document
We want an embedding capturing as much of the document as is reasonable
(Not just a title embedding)
Embedding is ~ two-towerable
Short text (ie queries) and long text (paragraphs) can be mapped in similarity space
QUERY: Kid story about sheep
Document:
Mary had a little lamb, little lamb, little lamb.
Mary had a little lamb, its fleece was white as snow.
And everywhere that Mary went. Mary went. Mary went.
And everywhere that Mary went, the lamb was sure to go.
It followed her to school one day, school one day, school one day. It followed her to school one day, which was against the rule. It made the children laugh and play, laugh and play, laugh and play. It made the children laugh and play to see the lamb at school. And so the teacher sent it out, sent it out, sent it out. And so the teacher sent it out, but still it lingered near. It stood and waited round about, round about, round about. It stood and waited round about, till Mary did appear. “Why does the lamb love Mary so, Mary so, Mary so? Why does the lamb love Mary so?” the little children cry.
Similar
Bonus: embedding is a two tower model!
Query Features
Document Features
(Biencoder, learned on labeled data)
After embedding we boost/rerank/…
Exact name match?
Query mentions color?
(Different query types == different treatments!)
Hybrid Search
+
Apr 16, 2025
Optimizing the R in RAG
© Doug Turnbull (http://softwaredoug.com), all opinions my own, not my employer
Can’t cover in 45 mins…
Intuitive sense of “close” good enough for today :)
Lexical Search
Key Points:
Vector Search
Key Points:
Also won’t cover
Assumption: embeddings good first pass search
Embeddings get you close but not all the way
ID | Title | Vector (256? 512? Or more dimensions) |
0 | mary had a little lamb | [0.9, 0.8, -0.5, 0.75, ..] |
1 | mary had a little ham | [0.6, 0.4, -0.4, 0.60, ..] |
2 | a little ham | [-0.2, 0.5, 0.9, -0.45, ..] |
3 | little mary had a scam | [0.4, -0.5, 0.25, 0.14, ..] |
4 | ham it up with mary | [0.2, 0.5, 0.2, 0.45, ..] |
5 | Little red riding hood had a baby sheep? | [0.95, 0.79, -0.49, 0.65, ..] |
Similar!
(despite sharing few terms)
Chunked
You’ve chunked your data into a meaningful “search document” with important metadata:
📕
{
“Book_title”: “Nursery Rhymes”
“Section”: “Mary Had a Little Lamb”
“Text”: “...”
}
Embedding for whole document
We want an embedding capturing as much of the document as is reasonable
(Not just a title embedding)
Embedding is ~ two-towerable
Short text (ie queries) and long text (paragraphs) can be mapped in similarity space
QUERY: Kid story about sheep
Document:
Mary had a little lamb, little lamb, little lamb.
Mary had a little lamb, its fleece was white as snow.
And everywhere that Mary went. Mary went. Mary went.
And everywhere that Mary went, the lamb was sure to go.
It followed her to school one day, school one day, school one day. It followed her to school one day, which was against the rule. It made the children laugh and play, laugh and play, laugh and play. It made the children laugh and play to see the lamb at school. And so the teacher sent it out, sent it out, sent it out. And so the teacher sent it out, but still it lingered near. It stood and waited round about, round about, round about. It stood and waited round about, till Mary did appear. “Why does the lamb love Mary so, Mary so, Mary so? Why does the lamb love Mary so?” the little children cry.
Similar
Bonus: embedding is a two tower model!
Query Features
Document Features
(Biencoder, learned on labeled data)
After embedding we boost/rerank/…
Exact name match?
Query mentions color?
(Different query types == different treatments!)
Ideal:
Query Understanding
First Pass Embeddings
Boost / Rerank
(depending on needs of query)
Reality:
Query Understanding
Like ~top 100 embeddings
Boost / Rerank
Reality:
Query Understanding
Like ~top 100 embeddings
Boost / Rerank
(Do we have the right top 100 to boost?)
Reality:
Query Understanding
Like ~top 100 embeddings
Boost / Rerank
Need to filter this to “good” 100 or so
Chicken and egg problem:
Query Understanding
Like ~top 100 embeddings
Boost / Rerank
If I want to boost exact product name matches here..
🐓
🥚
Chicken and egg problem:
Query Understanding
Fetch top N=~100 embeddings
Boost / Rerank / Model?
Must retrieve good product name matches from vector index…
🐓
🥚
…if I want to boost exact product name matches here
Chicken and egg problem:
Query Understanding
Like ~top 100 embeddings
Boost / Rerank
The good product name matches better be in the candidates!
🐓
🥚
~2021 vector DB
SELECT * FROM <search_engine>
ORDER BY vector_similarity(query_embedding, title_embedding)�LIMIT 100
No WHERE!
👎 Can’t guarantee product name matches promoted
2025 vector DB (search engine)
SELECT * FROM <search>
WHERE [trowel] in product_name
...
ORDER BY vector_similarity(query_embedding, title_embedding)�LIMIT 100
BEFORE vector_similarity
Get candidates matching “trowel”
👍 Now I have matches!
~2025 era vector DB (search engine)
SELECT * FROM <search>
WHERE [trowel] in product_name
...
ORDER BY vector_similarity(query_embedding, title_embedding)�LIMIT 100
BEFORE vector_similarity
Get candidates matching “mary”
🚨 How does your vector DB pre-filter? Can you do this at scale?
… and “where” could be anything
SELECT * FROM <search>
WHERE “lawn_and_garden” in department
AND “trowel” in item_type
AND (garden in title OR garden in description OR
trowel in title OR trowel in description)
ORDER BY vector_similarity(query_embedding, title_embedding)�LIMIT 100
Search for “garden trowel”
Somehow we turn the query to this dept / item type
… and “where” could be anything
SELECT * FROM <search>
WHERE “lawn_and_garden” in department
AND “trowel” in item_type
AND (garden in title OR garden in description OR
trowel in title OR trowel in description)
ORDER BY vector_similarity(query_embedding, title_embedding)�LIMIT 100
Search for “garden trowel”
And also match query terms in tokenized title/description
… and “where” could be anything
SELECT * FROM <search>
WHERE “lawn_and_garden” in department
AND “trowel” in item_type
AND (garden in title OR garden in description OR
trowel in title OR trowel in description)
ORDER BY vector_similarity(query_embedding, title_embedding)�LIMIT 100
Search for “garden trowel”
And also match query terms
(yes you search nerds, I’m ignoring BM25 and lexical scoring for now)
Practically: there’s a vector index
SELECT * FROM <search>
WHERE “lawn_and_garden” in department
AND “trowel” in item_type
AND (garden in title OR garden in description OR
trowel in title OR trowel in description)
ORDER BY vector_similarity(query_embedding, title_embedding)�LIMIT 100
Search for “garden trowel”
Get top 100 from this set via an index
(otherwise we scan all results to score them)
We can reasonably get top K...
There’s more than one “top K” we care about
SELECT * FROM <search>
WHERE “lawn_and_garden” in department
AND “trowel” in item_type
AND (garden in title OR garden in description OR
trowel in title OR trowel in description)
ORDER BY similarity(query_embedding, title_embedding)�LIMIT 100
UNION ALL
SELECT * FROM <search>
WHERE “lawn_and_garden” in department
AND “trowel” in item_type
ORDER BY similarity(query_embedding, title_embedding)�LIMIT 100
What about “pure” vector matches?
100 from this set
There’s more than one candidate set
SELECT * FROM <search>
WHERE “lawn_and_garden” in department
AND “trowel” in item_type
AND (garden in title OR garden in description OR
trowel in title OR trowel in description)
ORDER BY similarity(query_embedding, title_embedding)�LIMIT 100
UNION ALL
SELECT * FROM <search>
WHERE “lawn_and_garden” in department
AND “trowel” in item_type
ORDER BY similarity(query_embedding, title_embedding)�LIMIT 100
What about “pure” vector matches?
With squiggly lines…
Candidate Set A (lexically filtered)
Candidate Set B (pure vector)
(candidates ordered by vector sim)
Boost lexical matches?
Why do we do it this way?
Candidate Set A (lexically filtered)
Candidate Set B (pure vector)
(candidates ordered by vector sim)
Boost lexical matches?
Should we just get these?
Why do we do it this way?
Candidate Set A (lexically filtered)
(candidates ordered by vector sim)
Boost lexical matches?
Should we just get these?
(Higher precision / lower recall)
Candidate Set B (pure vector)
(Higher recall / lower precision)
With squiggly lines…
Candidate Set A (filtered to lexical)
Candidate Set B (pure vector)
(candidates ordered by vector sim)
Some reranker, boosting, tie-breaking, etc
L0 Retrieval
L1 Ranking
…
More rankers / post-filters
A retrieval “Arm”
And many retrieval arms
Candidate Arm A (one term matches)
Candidate Arm C (same category as query)
(candidates ordered by vector sim)
Some reranker, boosting, tie-breaking, etc
…
More rankers / post-filters
Candidate Arm B (all terms match)
Candidate Arm D (image embedding)
Candidate Arm E (just lexical scores)
Candidate Arm A (one term matches)
Candidate Arm C (same category as query)
(candidates ordered by vector sim)
Candidate Arm B (all terms match)
Candidate Arm D (image embedding)
Candidate Arm E (just lexical scores)
Boost / Rerank
L0
Retrieval Arms
🥚🥚🥚🥚🥚
L1 boost/reranking
🐓
Or depending on the query
Candidate Arm A (one term matches)
Candidate Arm C (same category as query)
(candidates ordered by vector sim)
Some reranker, boosting, tie-breaking, etc
…
More rankers / post-filters
Candidate Arm B (all terms match)
Candidate Arm D (image embedding)
Candidate Arm E (just lexical scores)
Then the boost
Candidate Arm A (one term matches)
Candidate Arm C (same category as query)
(candidates ordered by vector sim)
Candidate Arm B (all terms match)
Candidate Arm D (image embedding)
Candidate Arm E (just lexical scores)
score += product_name_index[l0_matches].score(“garden trowel”)
Or a model
Candidate Arm A (one term matches)
Candidate Arm C (same category as query)
(candidates ordered by vector sim)
Candidate Arm B (all terms match)
Candidate Arm D (image embedding)
Candidate Arm E (just lexical scores)
Ranking model given query + document features
That’s the theory at least
Ideal:
Query Understanding
First Pass Embeddings
Boost / Rerank
(depending on needs of query)
Reality:
Query Understanding
Like ~top 100 embeddings
Boost / Rerank
Reality:
Query Understanding
Like ~top 100 embeddings
Boost / Rerank
(Do we have the right top 100 to boost?)
Reality:
Query Understanding
Like ~top 100 embeddings
Boost / Rerank
Need to filter this to “good” 100 or so
Chicken and egg problem:
Query Understanding
Like ~top 100 embeddings
Boost / Rerank
If I want to boost exact product name matches here..
🐓
🥚
Chicken and egg problem:
Query Understanding
Fetch top N=~100 embeddings
Boost / Rerank / Model?
Must retrieve good product name matches from vector index…
🐓
🥚
…if I want to boost exact product name matches here
Chicken and egg problem:
Query Understanding
Like ~top 100 embeddings
Boost / Rerank
The good product name matches better be in the candidates!
🐓
🥚
~2021 vector DB
SELECT * FROM <search_engine>
ORDER BY vector_similarity(query_embedding, title_embedding)�LIMIT 100
No WHERE!
👎 Can’t guarantee product name matches promoted
2025 vector DB (search engine)
SELECT * FROM <search>
WHERE [trowel] in product_name
...
ORDER BY vector_similarity(query_embedding, title_embedding)�LIMIT 100
BEFORE vector_similarity
Get candidates matching “trowel”
👍 Now I have matches!
~2025 era vector DB (search engine)
SELECT * FROM <search>
WHERE [trowel] in product_name
...
ORDER BY vector_similarity(query_embedding, title_embedding)�LIMIT 100
BEFORE vector_similarity
Get candidates matching “mary”
🚨 How does your vector DB pre-filter? Can you do this at scale?
… and “where” could be anything
SELECT * FROM <search>
WHERE “lawn_and_garden” in department
AND “trowel” in item_type
AND (garden in title OR garden in description OR
trowel in title OR trowel in description)
ORDER BY vector_similarity(query_embedding, title_embedding)�LIMIT 100
Search for “garden trowel”
Somehow we turn the query to this dept / item type
… and “where” could be anything
SELECT * FROM <search>
WHERE “lawn_and_garden” in department
AND “trowel” in item_type
AND (garden in title OR garden in description OR
trowel in title OR trowel in description)
ORDER BY vector_similarity(query_embedding, title_embedding)�LIMIT 100
Search for “garden trowel”
And also match query terms in tokenized title/description
… and “where” could be anything
SELECT * FROM <search>
WHERE “lawn_and_garden” in department
AND “trowel” in item_type
AND (garden in title OR garden in description OR
trowel in title OR trowel in description)
ORDER BY vector_similarity(query_embedding, title_embedding)�LIMIT 100
Search for “garden trowel”
And also match query terms
(yes you search nerds, I’m ignoring BM25 and lexical scoring for now)
Practically: there’s a vector index
SELECT * FROM <search>
WHERE “lawn_and_garden” in department
AND “trowel” in item_type
AND (garden in title OR garden in description OR
trowel in title OR trowel in description)
ORDER BY vector_similarity(query_embedding, title_embedding)�LIMIT 100
Search for “garden trowel”
Get top 100 from this set via an index
(otherwise we scan all results to score them)
We can reasonably get top K...
There’s more than one “top K” we care about
SELECT * FROM <search>
WHERE “lawn_and_garden” in department
AND “trowel” in item_type
AND (garden in title OR garden in description OR
trowel in title OR trowel in description)
ORDER BY similarity(query_embedding, title_embedding)�LIMIT 100
UNION ALL
SELECT * FROM <search>
WHERE “lawn_and_garden” in department
AND “trowel” in item_type
ORDER BY similarity(query_embedding, title_embedding)�LIMIT 100
What about “pure” vector matches?
100 from this set
There’s more than one candidate set
SELECT * FROM <search>
WHERE “lawn_and_garden” in department
AND “trowel” in item_type
AND (garden in title OR garden in description OR
trowel in title OR trowel in description)
ORDER BY similarity(query_embedding, title_embedding)�LIMIT 100
UNION ALL
SELECT * FROM <search>
WHERE “lawn_and_garden” in department
AND “trowel” in item_type
ORDER BY similarity(query_embedding, title_embedding)�LIMIT 100
What about “pure” vector matches?
With squiggly lines…
Candidate Set A (lexically filtered)
Candidate Set B (pure vector)
(candidates ordered by vector sim)
Boost lexical matches?
Why do we do it this way?
Candidate Set A (lexically filtered)
Candidate Set B (pure vector)
(candidates ordered by vector sim)
Boost lexical matches?
Should we just get these?
Why do we do it this way?
Candidate Set A (lexically filtered)
(candidates ordered by vector sim)
Boost lexical matches?
Should we just get these?
(Higher precision / lower recall)
Candidate Set B (pure vector)
(Higher recall / lower precision)
With squiggly lines…
Candidate Set A (filtered to lexical)
Candidate Set B (pure vector)
(candidates ordered by vector sim)
Some reranker, boosting, tie-breaking, etc
L0 Retrieval
L1 Ranking
…
More rankers / post-filters
A retrieval “Arm”
And many retrieval arms
Candidate Arm A (one term matches)
Candidate Arm C (same category as query)
(candidates ordered by vector sim)
Some reranker, boosting, tie-breaking, etc
…
More rankers / post-filters
Candidate Arm B (all terms match)
Candidate Arm D (image embedding)
Candidate Arm E (just lexical scores)
Candidate Arm A (one term matches)
Candidate Arm C (same category as query)
(candidates ordered by vector sim)
Candidate Arm B (all terms match)
Candidate Arm D (image embedding)
Candidate Arm E (just lexical scores)
Boost / Rerank
L0
Retrieval Arms
🥚🥚🥚🥚🥚
L1 boost/reranking
🐓
Or depending on the query
Candidate Arm A (one term matches)
Candidate Arm C (same category as query)
(candidates ordered by vector sim)
Some reranker, boosting, tie-breaking, etc
…
More rankers / post-filters
Candidate Arm B (all terms match)
Candidate Arm D (image embedding)
Candidate Arm E (just lexical scores)
Then the boost
Candidate Arm A (one term matches)
Candidate Arm C (same category as query)
(candidates ordered by vector sim)
Candidate Arm B (all terms match)
Candidate Arm D (image embedding)
Candidate Arm E (just lexical scores)
score += product_name_index[l0_matches].score(“garden trowel”)
Or a model
Candidate Arm A (one term matches)
Candidate Arm C (same category as query)
(candidates ordered by vector sim)
Candidate Arm B (all terms match)
Candidate Arm D (image embedding)
Candidate Arm E (just lexical scores)
Ranking model given query + document features
That’s the theory at least