1 of 80

Hybrid Search

+

Apr 16, 2025

Optimizing the R in RAG

© Doug Turnbull (http://softwaredoug.com), all opinions my own, not my employer

2 of 80

Can’t cover in 45 mins…

  1. How lexical search actually works (ask chat GPT about: inverted index, read “Relevant Search” 😉 )

  • What is an embedding

  • Lexical scoring, vector scoring (cosine, euclidean, etc similarities) etc

Intuitive sense of “close” good enough for today :)

3 of 80

4 of 80

5 of 80

6 of 80

Lexical Search

Key Points:

  • Definition: Finds documents with exact query words.�
  • Core Structure: Inverted Index → term → list of doc IDs.�
    • Example: "apple" → [Doc2, Doc5, Doc9]�
  • Process:�
    • Tokenize query�
    • Lookup tokens in index�
    • Retrieve matching docs�
    • Apply scoring�
  • Scoring Methods:�
    • TF-IDF: Weighs term frequency × rarity�
    • BM25: Length normalization + term frequency saturation�
  • Pros: Fast, explainable�
  • Cons: No semantic understanding (“phone” ≠ “mobile”)

7 of 80

Vector Search

Key Points:

  • Definition: Matches based on semantic meaning, not exact words.�
  • Embedding: Vector representation of meaning�
    • Similar meaning → vectors close together�
    • Example: "dog""puppy", far from "laptop"�
  • Embedding Generation: Word2Vec, GloVe, Transformers, etc.�
  • Vector Scoring Methods:�
    • Cosine Similarity: Angle between vectors�
    • Euclidean Distance: Straight-line distance�
    • Dot Product: Magnitude + direction�
  • Pros: Finds synonyms, concept matches (“TV” ≈ “television”)�
  • Cons: Needs embedding computation, slower than lexical search

8 of 80

Also won’t cover

  • RRF - Reciprocal Rank Fusion

9 of 80

Assumption: embeddings good first pass search

Embeddings get you close but not all the way

ID

Title

Vector (256? 512? Or more dimensions)

0

mary had a little lamb

[0.9, 0.8, -0.5, 0.75, ..]

1

mary had a little ham

[0.6, 0.4, -0.4, 0.60, ..]

2

a little ham

[-0.2, 0.5, 0.9, -0.45, ..]

3

little mary had a scam

[0.4, -0.5, 0.25, 0.14, ..]

4

ham it up with mary

[0.2, 0.5, 0.2, 0.45, ..]

5

Little red riding hood had a baby sheep?

[0.95, 0.79, -0.49, 0.65, ..]

Similar!

(despite sharing few terms)

10 of 80

Chunked

You’ve chunked your data into a meaningful “search document” with important metadata:

📕

{

“Book_title”: “Nursery Rhymes”

“Section”: “Mary Had a Little Lamb”

“Text”: “...”

}

11 of 80

Embedding for whole document

We want an embedding capturing as much of the document as is reasonable

(Not just a title embedding)

12 of 80

Embedding is ~ two-towerable

Short text (ie queries) and long text (paragraphs) can be mapped in similarity space

QUERY: Kid story about sheep

Document:

Mary had a little lamb, little lamb, little lamb.

Mary had a little lamb, its fleece was white as snow.

And everywhere that Mary went. Mary went. Mary went.

And everywhere that Mary went, the lamb was sure to go.

It followed her to school one day, school one day, school one day. It followed her to school one day, which was against the rule. It made the children laugh and play, laugh and play, laugh and play. It made the children laugh and play to see the lamb at school. And so the teacher sent it out, sent it out, sent it out. And so the teacher sent it out, but still it lingered near. It stood and waited round about, round about, round about. It stood and waited round about, till Mary did appear. “Why does the lamb love Mary so, Mary so, Mary so? Why does the lamb love Mary so?” the little children cry.

Similar

13 of 80

Bonus: embedding is a two tower model!

Query Features

Document Features

(Biencoder, learned on labeled data)

  • Name
  • Description
  • Product image embedding
  • ???
  • Query embedding
  • Query

14 of 80

After embedding we boost/rerank/…

Exact name match?

  • Move these to the top!

Query mentions color?

  • Ensure color matches boosted

(Different query types == different treatments!)

15 of 80

Hybrid Search

+

Apr 16, 2025

Optimizing the R in RAG

© Doug Turnbull (http://softwaredoug.com), all opinions my own, not my employer

16 of 80

Can’t cover in 45 mins…

  • How lexical search actually works (ask chat GPT about: inverted index, read “Relevant Search” 😉 )

  • What is an embedding

  • Lexical scoring, vector scoring (cosine, euclidean, etc similarities) etc

Intuitive sense of “close” good enough for today :)

17 of 80

18 of 80

19 of 80

20 of 80

Lexical Search

Key Points:

  • Definition: Finds documents with exact query words.�
  • Core Structure: Inverted Index → term → list of doc IDs.�
    • Example: "apple" → [Doc2, Doc5, Doc9]�
  • Process:�
    • Tokenize query�
    • Lookup tokens in index�
    • Retrieve matching docs�
    • Apply scoring�
  • Scoring Methods:�
    • TF-IDF: Weighs term frequency × rarity�
    • BM25: Length normalization + term frequency saturation�
  • Pros: Fast, explainable�
  • Cons: No semantic understanding (“phone” ≠ “mobile”)

21 of 80

Vector Search

Key Points:

  • Definition: Matches based on semantic meaning, not exact words.�
  • Embedding: Vector representation of meaning�
    • Similar meaning → vectors close together�
    • Example: "dog""puppy", far from "laptop"�
  • Embedding Generation: Word2Vec, GloVe, Transformers, etc.�
  • Vector Scoring Methods:�
    • Cosine Similarity: Angle between vectors�
    • Euclidean Distance: Straight-line distance�
    • Dot Product: Magnitude + direction�
  • Pros: Finds synonyms, concept matches (“TV” ≈ “television”)�
  • Cons: Needs embedding computation, slower than lexical search

22 of 80

Also won’t cover

  • RRF - Reciprocal Rank Fusion

23 of 80

Assumption: embeddings good first pass search

Embeddings get you close but not all the way

ID

Title

Vector (256? 512? Or more dimensions)

0

mary had a little lamb

[0.9, 0.8, -0.5, 0.75, ..]

1

mary had a little ham

[0.6, 0.4, -0.4, 0.60, ..]

2

a little ham

[-0.2, 0.5, 0.9, -0.45, ..]

3

little mary had a scam

[0.4, -0.5, 0.25, 0.14, ..]

4

ham it up with mary

[0.2, 0.5, 0.2, 0.45, ..]

5

Little red riding hood had a baby sheep?

[0.95, 0.79, -0.49, 0.65, ..]

Similar!

(despite sharing few terms)

24 of 80

Chunked

You’ve chunked your data into a meaningful “search document” with important metadata:

📕

{

“Book_title”: “Nursery Rhymes”

“Section”: “Mary Had a Little Lamb”

“Text”: “...”

}

25 of 80

Embedding for whole document

We want an embedding capturing as much of the document as is reasonable

(Not just a title embedding)

26 of 80

Embedding is ~ two-towerable

Short text (ie queries) and long text (paragraphs) can be mapped in similarity space

QUERY: Kid story about sheep

Document:

Mary had a little lamb, little lamb, little lamb.

Mary had a little lamb, its fleece was white as snow.

And everywhere that Mary went. Mary went. Mary went.

And everywhere that Mary went, the lamb was sure to go.

It followed her to school one day, school one day, school one day. It followed her to school one day, which was against the rule. It made the children laugh and play, laugh and play, laugh and play. It made the children laugh and play to see the lamb at school. And so the teacher sent it out, sent it out, sent it out. And so the teacher sent it out, but still it lingered near. It stood and waited round about, round about, round about. It stood and waited round about, till Mary did appear. “Why does the lamb love Mary so, Mary so, Mary so? Why does the lamb love Mary so?” the little children cry.

Similar

27 of 80

Bonus: embedding is a two tower model!

Query Features

Document Features

(Biencoder, learned on labeled data)

  • Name
  • Description
  • Product image embedding
  • ???
  • Query embedding
  • Query

28 of 80

After embedding we boost/rerank/…

Exact name match?

  • Move these to the top!

Query mentions color?

  • Ensure color matches boosted

(Different query types == different treatments!)

29 of 80

Ideal:

Query Understanding

First Pass Embeddings

Boost / Rerank

(depending on needs of query)

30 of 80

Reality:

Query Understanding

Like ~top 100 embeddings

Boost / Rerank

31 of 80

Reality:

Query Understanding

Like ~top 100 embeddings

Boost / Rerank

(Do we have the right top 100 to boost?)

32 of 80

Reality:

Query Understanding

Like ~top 100 embeddings

Boost / Rerank

Need to filter this to “good” 100 or so

33 of 80

Chicken and egg problem:

Query Understanding

Like ~top 100 embeddings

Boost / Rerank

If I want to boost exact product name matches here..

🐓

🥚

34 of 80

Chicken and egg problem:

Query Understanding

Fetch top N=~100 embeddings

Boost / Rerank / Model?

Must retrieve good product name matches from vector index…

🐓

🥚

…if I want to boost exact product name matches here

35 of 80

Chicken and egg problem:

Query Understanding

Like ~top 100 embeddings

Boost / Rerank

The good product name matches better be in the candidates!

🐓

🥚

36 of 80

~2021 vector DB

SELECT * FROM <search_engine>

ORDER BY vector_similarity(query_embedding, title_embedding)�LIMIT 100

No WHERE!

👎 Can’t guarantee product name matches promoted

37 of 80

2025 vector DB (search engine)

SELECT * FROM <search>

WHERE [trowel] in product_name

...

ORDER BY vector_similarity(query_embedding, title_embedding)�LIMIT 100

BEFORE vector_similarity

Get candidates matching “trowel”

👍 Now I have matches!

38 of 80

~2025 era vector DB (search engine)

SELECT * FROM <search>

WHERE [trowel] in product_name

...

ORDER BY vector_similarity(query_embedding, title_embedding)�LIMIT 100

BEFORE vector_similarity

Get candidates matching “mary”

🚨 How does your vector DB pre-filter? Can you do this at scale?

39 of 80

… and “where” could be anything

SELECT * FROM <search>

WHERE “lawn_and_garden” in department

AND “trowel” in item_type

AND (garden in title OR garden in description OR

trowel in title OR trowel in description)

ORDER BY vector_similarity(query_embedding, title_embedding)�LIMIT 100

Search for “garden trowel

Somehow we turn the query to this dept / item type

40 of 80

… and “where” could be anything

SELECT * FROM <search>

WHERE “lawn_and_garden” in department

AND “trowel” in item_type

AND (garden in title OR garden in description OR

trowel in title OR trowel in description)

ORDER BY vector_similarity(query_embedding, title_embedding)�LIMIT 100

Search for “garden trowel

And also match query terms in tokenized title/description

41 of 80

… and “where” could be anything

SELECT * FROM <search>

WHERE “lawn_and_garden” in department

AND “trowel” in item_type

AND (garden in title OR garden in description OR

trowel in title OR trowel in description)

ORDER BY vector_similarity(query_embedding, title_embedding)�LIMIT 100

Search for “garden trowel

And also match query terms

(yes you search nerds, I’m ignoring BM25 and lexical scoring for now)

42 of 80

Practically: there’s a vector index

SELECT * FROM <search>

WHERE “lawn_and_garden” in department

AND “trowel” in item_type

AND (garden in title OR garden in description OR

trowel in title OR trowel in description)

ORDER BY vector_similarity(query_embedding, title_embedding)�LIMIT 100

Search for “garden trowel

Get top 100 from this set via an index

(otherwise we scan all results to score them)

We can reasonably get top K...

43 of 80

There’s more than one “top K” we care about

SELECT * FROM <search>

WHERE “lawn_and_garden” in department

AND “trowel” in item_type

AND (garden in title OR garden in description OR

trowel in title OR trowel in description)

ORDER BY similarity(query_embedding, title_embedding)�LIMIT 100

UNION ALL

SELECT * FROM <search>

WHERE “lawn_and_garden” in department

AND “trowel” in item_type

ORDER BY similarity(query_embedding, title_embedding)�LIMIT 100

What about “pure” vector matches?

100 from this set

44 of 80

There’s more than one candidate set

SELECT * FROM <search>

WHERE “lawn_and_garden” in department

AND “trowel” in item_type

AND (garden in title OR garden in description OR

trowel in title OR trowel in description)

ORDER BY similarity(query_embedding, title_embedding)�LIMIT 100

UNION ALL

SELECT * FROM <search>

WHERE “lawn_and_garden” in department

AND “trowel” in item_type

ORDER BY similarity(query_embedding, title_embedding)�LIMIT 100

What about “pure” vector matches?

  • 100 from this set

45 of 80

With squiggly lines…

Candidate Set A (lexically filtered)

Candidate Set B (pure vector)

(candidates ordered by vector sim)

Boost lexical matches?

46 of 80

Why do we do it this way?

Candidate Set A (lexically filtered)

Candidate Set B (pure vector)

(candidates ordered by vector sim)

Boost lexical matches?

Should we just get these?

47 of 80

Why do we do it this way?

Candidate Set A (lexically filtered)

(candidates ordered by vector sim)

Boost lexical matches?

Should we just get these?

(Higher precision / lower recall)

Candidate Set B (pure vector)

(Higher recall / lower precision)

48 of 80

With squiggly lines…

Candidate Set A (filtered to lexical)

Candidate Set B (pure vector)

(candidates ordered by vector sim)

Some reranker, boosting, tie-breaking, etc

L0 Retrieval

L1 Ranking

More rankers / post-filters

A retrieval “Arm”

49 of 80

And many retrieval arms

Candidate Arm A (one term matches)

Candidate Arm C (same category as query)

(candidates ordered by vector sim)

Some reranker, boosting, tie-breaking, etc

More rankers / post-filters

Candidate Arm B (all terms match)

Candidate Arm D (image embedding)

Candidate Arm E (just lexical scores)

50 of 80

Candidate Arm A (one term matches)

Candidate Arm C (same category as query)

(candidates ordered by vector sim)

Candidate Arm B (all terms match)

Candidate Arm D (image embedding)

Candidate Arm E (just lexical scores)

Boost / Rerank

L0

Retrieval Arms

🥚🥚🥚🥚🥚

L1 boost/reranking

🐓

51 of 80

Or depending on the query

Candidate Arm A (one term matches)

Candidate Arm C (same category as query)

(candidates ordered by vector sim)

Some reranker, boosting, tie-breaking, etc

More rankers / post-filters

Candidate Arm B (all terms match)

Candidate Arm D (image embedding)

Candidate Arm E (just lexical scores)

52 of 80

Then the boost

Candidate Arm A (one term matches)

Candidate Arm C (same category as query)

(candidates ordered by vector sim)

Candidate Arm B (all terms match)

Candidate Arm D (image embedding)

Candidate Arm E (just lexical scores)

score += product_name_index[l0_matches].score(“garden trowel”)

53 of 80

Or a model

Candidate Arm A (one term matches)

Candidate Arm C (same category as query)

(candidates ordered by vector sim)

Candidate Arm B (all terms match)

Candidate Arm D (image embedding)

Candidate Arm E (just lexical scores)

Ranking model given query + document features

54 of 80

That’s the theory at least

55 of 80

Ideal:

Query Understanding

First Pass Embeddings

Boost / Rerank

(depending on needs of query)

56 of 80

Reality:

Query Understanding

Like ~top 100 embeddings

Boost / Rerank

57 of 80

Reality:

Query Understanding

Like ~top 100 embeddings

Boost / Rerank

(Do we have the right top 100 to boost?)

58 of 80

Reality:

Query Understanding

Like ~top 100 embeddings

Boost / Rerank

Need to filter this to “good” 100 or so

59 of 80

Chicken and egg problem:

Query Understanding

Like ~top 100 embeddings

Boost / Rerank

If I want to boost exact product name matches here..

🐓

🥚

60 of 80

Chicken and egg problem:

Query Understanding

Fetch top N=~100 embeddings

Boost / Rerank / Model?

Must retrieve good product name matches from vector index…

🐓

🥚

…if I want to boost exact product name matches here

61 of 80

Chicken and egg problem:

Query Understanding

Like ~top 100 embeddings

Boost / Rerank

The good product name matches better be in the candidates!

🐓

🥚

62 of 80

~2021 vector DB

SELECT * FROM <search_engine>

ORDER BY vector_similarity(query_embedding, title_embedding)�LIMIT 100

No WHERE!

👎 Can’t guarantee product name matches promoted

63 of 80

2025 vector DB (search engine)

SELECT * FROM <search>

WHERE [trowel] in product_name

...

ORDER BY vector_similarity(query_embedding, title_embedding)�LIMIT 100

BEFORE vector_similarity

Get candidates matching “trowel”

👍 Now I have matches!

64 of 80

~2025 era vector DB (search engine)

SELECT * FROM <search>

WHERE [trowel] in product_name

...

ORDER BY vector_similarity(query_embedding, title_embedding)�LIMIT 100

BEFORE vector_similarity

Get candidates matching “mary”

🚨 How does your vector DB pre-filter? Can you do this at scale?

65 of 80

… and “where” could be anything

SELECT * FROM <search>

WHERE “lawn_and_garden” in department

AND “trowel” in item_type

AND (garden in title OR garden in description OR

trowel in title OR trowel in description)

ORDER BY vector_similarity(query_embedding, title_embedding)�LIMIT 100

Search for “garden trowel

Somehow we turn the query to this dept / item type

66 of 80

… and “where” could be anything

SELECT * FROM <search>

WHERE “lawn_and_garden” in department

AND “trowel” in item_type

AND (garden in title OR garden in description OR

trowel in title OR trowel in description)

ORDER BY vector_similarity(query_embedding, title_embedding)�LIMIT 100

Search for “garden trowel

And also match query terms in tokenized title/description

67 of 80

… and “where” could be anything

SELECT * FROM <search>

WHERE “lawn_and_garden” in department

AND “trowel” in item_type

AND (garden in title OR garden in description OR

trowel in title OR trowel in description)

ORDER BY vector_similarity(query_embedding, title_embedding)�LIMIT 100

Search for “garden trowel

And also match query terms

(yes you search nerds, I’m ignoring BM25 and lexical scoring for now)

68 of 80

Practically: there’s a vector index

SELECT * FROM <search>

WHERE “lawn_and_garden” in department

AND “trowel” in item_type

AND (garden in title OR garden in description OR

trowel in title OR trowel in description)

ORDER BY vector_similarity(query_embedding, title_embedding)�LIMIT 100

Search for “garden trowel

Get top 100 from this set via an index

(otherwise we scan all results to score them)

We can reasonably get top K...

69 of 80

There’s more than one “top K” we care about

SELECT * FROM <search>

WHERE “lawn_and_garden” in department

AND “trowel” in item_type

AND (garden in title OR garden in description OR

trowel in title OR trowel in description)

ORDER BY similarity(query_embedding, title_embedding)�LIMIT 100

UNION ALL

SELECT * FROM <search>

WHERE “lawn_and_garden” in department

AND “trowel” in item_type

ORDER BY similarity(query_embedding, title_embedding)�LIMIT 100

What about “pure” vector matches?

100 from this set

70 of 80

There’s more than one candidate set

SELECT * FROM <search>

WHERE “lawn_and_garden” in department

AND “trowel” in item_type

AND (garden in title OR garden in description OR

trowel in title OR trowel in description)

ORDER BY similarity(query_embedding, title_embedding)�LIMIT 100

UNION ALL

SELECT * FROM <search>

WHERE “lawn_and_garden” in department

AND “trowel” in item_type

ORDER BY similarity(query_embedding, title_embedding)�LIMIT 100

What about “pure” vector matches?

  • 100 from this set

71 of 80

With squiggly lines…

Candidate Set A (lexically filtered)

Candidate Set B (pure vector)

(candidates ordered by vector sim)

Boost lexical matches?

72 of 80

Why do we do it this way?

Candidate Set A (lexically filtered)

Candidate Set B (pure vector)

(candidates ordered by vector sim)

Boost lexical matches?

Should we just get these?

73 of 80

Why do we do it this way?

Candidate Set A (lexically filtered)

(candidates ordered by vector sim)

Boost lexical matches?

Should we just get these?

(Higher precision / lower recall)

Candidate Set B (pure vector)

(Higher recall / lower precision)

74 of 80

With squiggly lines…

Candidate Set A (filtered to lexical)

Candidate Set B (pure vector)

(candidates ordered by vector sim)

Some reranker, boosting, tie-breaking, etc

L0 Retrieval

L1 Ranking

More rankers / post-filters

A retrieval “Arm”

75 of 80

And many retrieval arms

Candidate Arm A (one term matches)

Candidate Arm C (same category as query)

(candidates ordered by vector sim)

Some reranker, boosting, tie-breaking, etc

More rankers / post-filters

Candidate Arm B (all terms match)

Candidate Arm D (image embedding)

Candidate Arm E (just lexical scores)

76 of 80

Candidate Arm A (one term matches)

Candidate Arm C (same category as query)

(candidates ordered by vector sim)

Candidate Arm B (all terms match)

Candidate Arm D (image embedding)

Candidate Arm E (just lexical scores)

Boost / Rerank

L0

Retrieval Arms

🥚🥚🥚🥚🥚

L1 boost/reranking

🐓

77 of 80

Or depending on the query

Candidate Arm A (one term matches)

Candidate Arm C (same category as query)

(candidates ordered by vector sim)

Some reranker, boosting, tie-breaking, etc

More rankers / post-filters

Candidate Arm B (all terms match)

Candidate Arm D (image embedding)

Candidate Arm E (just lexical scores)

78 of 80

Then the boost

Candidate Arm A (one term matches)

Candidate Arm C (same category as query)

(candidates ordered by vector sim)

Candidate Arm B (all terms match)

Candidate Arm D (image embedding)

Candidate Arm E (just lexical scores)

score += product_name_index[l0_matches].score(“garden trowel”)

79 of 80

Or a model

Candidate Arm A (one term matches)

Candidate Arm C (same category as query)

(candidates ordered by vector sim)

Candidate Arm B (all terms match)

Candidate Arm D (image embedding)

Candidate Arm E (just lexical scores)

Ranking model given query + document features

80 of 80

That’s the theory at least