1 of 41

2 of 41

Exploring Re-Ranking Strategies for E-commerce Search

Coen Baars, Arian Stolwijk

26 November 2024

3 of 41

Coen Baars

CTO & Co-founder at Giftomatic

Tech-enthusiast with 20 years of experience in Search Relevance, Web Development, and UX

Rotterdam, the Netherlands

4 of 41

Arian Stolwijk

Head of Engineering at Giftomatic

Amersfoort, the Netherlands

5 of 41

About Giftomatic�

Startup founded in 2019

Currently developing solutions for gift card providers

Our main product is an e-commerce search engine for optimised product search that matches specific gift cards�
Our goal is to enhance Gift Card holders’ search experience while increasing gift card providers margins

Active in 20+ countries including The Netherlands, Germany, US, UK, Canada, Australia

6 of 41

Today’s Agenda

A “simple” search example

How to improve search results using rerankers

Conclusion

7 of 41

Alice

Alice is a clever and trendy 16-year old teenager who loves experimenting with makeup and perfecting her style. She has a passion for beauty, fashion.

Amsterdam, the Netherlands

8 of 41

Alice's Christmas Wish List

Beautyblender
Red backpack for school
Apple watch
Frozen Pyjama
Book about horses

9 of 41

Alice's Christmas Wish List

Beautyblender
Red backpack for school
Apple watch
Frozen Pyjama
Book about horses

10 of 41

Keys to success:

Understand User Context: Ensure results align with Alice’s personal preferences and needs.
Capture Intent: Look beyond the literal query to uncover what Alice truly wants (e.g., distinguishing between "Apple Watch" the brand and a themed design).
Provide a Balanced and Diverse Result Set: Present a mix of highly relevant options and diverse choices to keep the selection engaging and comprehensive.

The Elves Quest

The Challenge: Find the perfect presents for Alice.

11 of 41

Time for Santa’s little helpers to start their search for Alice!

12 of 41

Part 1

Retrieval (BM25/Semantic)

Input: User submits a query (e.g., "Apple Watch").
Retriever: Search engine finds items with exact or partial matches to the query.
Result: Results are ranked based on basic relevance, often ignoring context or intent.

13 of 41

Lexical Search (BM25)

14 of 41

Lexical Search (BM25)

Query: “Red backpack for School”

User-context: “teenager, girl/woman, likes: beauty, fashion, luxury”

Criteria: Create a diverse resultset matching the query and user-context.

Discussion:

Is this what we expected, or do we need to improve?

Santa@k

✅

[ARIAN]:

The metric we will use today to judge if we have good results is the "Looks Good To Me metric" or Santa@k.

Preferably you'll use something like an offline evaluation dataset and use NDCG.

[COEN]: Now Let’s look a the results: when comparing with the query “red backpack for school”, we can see some matching criteria like being red or school-related products. However, only 2 are backpacks, and when considering the context, the results are not suitable or varied enough to choose from.

Discussion Points for Lexical Search (Speaker Notes)

1. Why are these results not as expected?

Answer: Lexical search relies solely on matching exact terms from the query to the indexed data. It doesn't understand context or the user’s intent, which can lead to irrelevant or incomplete results. For example, "Apple Watch" might return products related to apples instead of the smartwatch.

2. What are the main advantages of lexical search?

Answer: It's simple to implement, fast, and works well for exact matches. It’s especially effective for queries with precise and unambiguous terms, like product codes or unique names.

3. Why does lexical search struggle with ambiguity?

Answer: Lexical search does not capture the meaning behind words. For ambiguous queries like "Red socks," it may return results including all red items or generic socks, failing to focus on the desired combination.

4. How can lexical search be improved?

Answer: One approach is to use synonym dictionaries or manual tuning of query expansions. However, this can be labor-intensive and still falls short of understanding the deeper meaning or context behind queries.

5. When should we still use lexical search?

Answer: Lexical search is valuable when speed is critical, or when queries are highly specific and structured. It can also serve as a baseline for hybrid approaches when combined with more advanced techniques like semantic search."

15 of 41

16 of 41

Semantic Search

Query Vector

[0.12, 5.04, 0.02, 0.93, …, 2.34]

Document Vectors

[0.11, 4.02, 0.00, 1.10, …, 2.54]

[0.0, 99.04, 0.01, 4.93, …, 1.30]

[0.12, 3.52, 0.65, 0.64, …, 9.23]

…

https://www.sbert.net/examples/applications/semantic-search/README.html

similarity(

query_vector,

doc_vector

)

17 of 41

Semantic Search

18 of 41

Semantic Search

Query: “Red backpack for School”

User context: “teenager, girl/woman, likes: beauty, fashion, luxury”

Criteria: Create a diverse resultset matching the query and user-context.

Discussion:

Is this what we expected, or do we need to improve?

✅

19 of 41

20 of 41

Hybrid search scores

[ARIAN] Vector search returned a few good results, so did the lexical search… so if we combine that, we make Santa happy right??

In the search query we use a bool query, and nest both methods in a should clause. We combine the scores of the two methods.

With boosting we say that the lexical search results should be given a bit more weight.

Slide 1: Title Slide

Content:

Title: "Exploring Re-Ranking Techniques for E-commerce Search"
Subtitle: "Enhancing the Shopping Experience with Smart Search Techniques"
Presented by: [Your Name]
Company: Giftomatic
Date

Speaker notes: "Welcome, everyone! Today, I'll take you through a journey of re-ranking techniques that can enhance the search experience for e-commerce. We'll use the analogy of Santa and his elves as we explore different methods to refine search results and deliver more relevant, high-quality options to users. Let’s get started!"

Slide 2: Introduction

Content:

Storyline: Alice sends Santa a wishlist, and his elves are tasked with finding the best matches.
Challenge: Finding relevant products for ambiguous terms like "Apple Watch" and "Red socks."
Goal: Show how re-ranking can improve search relevance and customer satisfaction.
Topics to Cover:

Lexical Search
Semantic Search
Hybrid Search
Reciprocal Rank Fusion
Semantic Reranking
Maximal Marginal Relevance
Business Rule-Based Re-Ranking

Speaker notes: "To make this technical topic engaging, we'll use a storyline: Alice, who has a wishlist for Santa. Santa’s elves struggle to find the best matches initially, but they improve by using re-ranking techniques. By the end, we’ll see how these techniques can transform search results to deliver relevant and business-aligned products."

Slide 3: Lexical Search

Content:

Description: Lexical (term-based) search is a straightforward method that matches exact terms.
Limitations: Does not account for context or meaning, leading to irrelevant results.

Code Example:�python�Copy code�# Lexical search in ElasticSearch

query = {

"query": {

"match": {

"product_name": "Apple Watch"

}

Speaker notes: "Let’s start with lexical search, which finds exact matches for terms in a query. Although simple, it often misses context, as shown in this code. For example, when Alice requests an 'Apple Watch,' the search might return results for apple-related products, completely missing the wearable device she wants."

Slide 4: Semantic Search (KNN)

Content:

Concept: Uses machine learning to understand the meaning behind search queries, focusing on context.
Advantages: Captures intent, going beyond literal matching.

Code Example:�python�Copy code�# Semantic search using KNN in ElasticSearch

query = {

"query": {

"knn": {

"field": "product_vector",

"query_vector": [0.1, 0.2, ...],

"k": 10

}

Speaker notes: "Semantic search goes a step further by considering context. Using machine learning, it interprets the intent of 'Apple Watch' and returns products that match its meaning. Although improved, semantic search can still struggle with ambiguous terms, where it may return less precise or tangential items."

Slide 5: Hybrid Search

Content:

Overview: Combines lexical and semantic search results for a more balanced approach.
Technique: Blending lexical and semantic scores.

Code Example:�python�Copy code�# Hybrid search combining lexical and semantic scores

query = {

"query": {

"bool": {

"should": [

{"match": {"product_name": "Apple Watch"}},

{"knn": {"field": "product_vector", "query_vector": [0.1, 0.2, ...], "k": 10}}

]

}

Speaker notes: "To get the best of both worlds, we can use hybrid search. This method combines lexical and semantic scores, giving us a balance of exact matches and contextually relevant items. However, naive score blending can sometimes reduce accuracy, as it may include irrelevant results alongside meaningful ones."

21 of 41

Hybrid Search

Query: “Red backpack for School”

User context: “teenager, girl/woman, likes: beauty, fashion, luxury”

Criteria: Create a diverse resultset matching the query and user-context.

Discussion:

Is this what we expected, or do we need to improve?

22 of 41

Hybrid Search

Query: “Red backpack for School”

User context: “teenager, girl/woman, likes: beauty, fashion, luxury”

Criteria: Create a diverse resultset matching the query and user-context.

Discussion:

Is this what we expected, or do we need to improve?

23 of 41

24 of 41

How to improve search results using re-rankers

Overview of various reranking techniques

25 of 41

Part 2

Reranking

{Arian}

Input: User submits a query (e.g., "Apple Watch").
Retriever: Search engine finds items with exact or partial matches to the query.
Result: Results are ranked based on basic relevance, often ignoring context or intent.
Reranker
Enhanced results

[ARIAN]

With just tuning the query, with different weights, combining fields, and all other tricks such as stemming, text analyzers, we just don't get the 5 star products for Alice.

Maybe with a more complex algorithm or AI model we can bring the best results to the top. However that's often too computationally expensive to do that for all products.

Instead, we can split it into two steps,

a retriever that gives a larger pre selection of candidate products,
and a re-ranker that fine-tunes that list into the best results.

The user inputs a query, and the retriever or retrievers select a few hundred results, this is similar what we've seen with lexical or vector search.

Then the reranker sorts the results using some algorithm, so the top results are perfect for the query.

The reranker can be anything, from a small algorithm to a big complex machine learning model.

26 of 41

Re-ranker Strategies

Reciprocal Rank Fusion (RRF)
Maximal Marginal Relevance (MMR)
Learning To Rank (LTR)
Cross Encoders
External Rerank API (Cohere/JinaAI API)
Large Language Model

27 of 41

Re-ranker Strategies

Reciprocal Rank Fusion (RRF)
Maximal Marginal Relevance (MMR)
Learning To Rank (LTR)
Cross Encoders
External Rerank API (Cohere/JinaAI API)
Large Language Model

28 of 41

Reciprocal Rank Fusion

29 of 41

Reciprocal Rank Fusion

Use two retrievers
Just the position of each product of each retriever

Doc	Retriever A	Retriever B	Score A	Score B	Total
A	1	5	1/1	1/5	1.2
B	2	4	1/2	1/4	0.75
C	3	3	1/3	1/3	0.5
D	4	1	1/4	1/1	1.25
E		2	0	1/2	0.5

30 of 41

Maximal Marginal Relevance (MMR)

Diversity Reranker
Iteratively add documents to the selected documents
Penalize a candidate document by the maximum similarity of already selected documents

[ARIAN]

Next we'll look at Maximal Marginal Relevance, or MMR.

Often in e-commerce search, we have quite similar products, for example a single type of shoe, but in different colors, sizes or variations. Using a query we might get result page with all these very similar but slightly different results. This is not good, because we want to show the user, or Santa, different options to show how good and diverse our product catalog is. Otherwise they might think that's all we have at the North Pole.

Instead, we want to show different variations of products instead.

The Maximal Marginal Relevance algorithm helps to add diversity into the results, by penalizing results that are really similar to other results.

How it works is that it:

iteratively adds the products to the ranked list, based on the document-query similarity
Then after it selected the first document that has the highest similarity with the query,
it will select the next document, but it will penalise documents that look like the first document,
so selects a document that is close to the query, but not too close to the documents we already selected.

The similarity function could be anything, but what is easy to use is to use the cosine similarity of the vector embeddings if you have those.

31 of 41

RRF/MMR

Query: “Red backpack for School”

User context: “teenager, girl/woman, likes: beauty, fashion, luxury”

Criteria: Create a diverse resultset matching the query and user-context.

Discussion:

Is this what we expected, or do we need to improve?

✅

32 of 41

33 of 41

Large Language Models

LLMs can do everything!?
Ask LLM to rerank the results
Let’s add some context about Alice!

34 of 41

Large Language Models

35 of 41

Large Language Models

36 of 41

Large Language Model

Query: “Red backpack for School”

User context: “teenager, girl/woman, likes: beauty, fashion, luxury”

Criteria: Create a diverse resultset matching the query and user-context.

Discussion:

Is this what we expected, or do we need to improve?

✅

37 of 41

38 of 41

Other Re-rankers

Learning To Rank (LTR)

Good at non-textual features
You will need to build your own dataset and train your own model�

Cross Encoders

Like bi-encoders, but know about the query context when ranking documents�

External Reranking API

Cohere, JinaAI
Easy to use
Harder to tune

[ARIAN]:

RRF and MMR are two relatively simple algorithms that you can implement and tune, no further machine learning involved.

Learning to rank is a machine learning method that learns how to rank a list of results. You need training data with 3 types of features:

Query features: length of query, user profile, detected intent, selected category
Document Features: category, number of views/likes/purchases, age
Query/Document features: query/title text similarity

Typically using xgboost with gradient boosting trees

Cross encoders models are like the vector embedding models, but the inputs are the query and the document, and the output is a score. As the model knows the context of the query, it is better able to decide if the document is relevant or not while going through the neural network layers.

External reranking APIs can be anything above. The Cohere rerank API implements a model that can do the reranking automatically. Easy to use, fast, but not many ways to tune the results

LLM can do anything right? So why not ask them. We'll showed you how.

39 of 41

Re-rankers Spectrum

Simple

Expensive

RRF^*

MMR

Learning To Rank^*

Cross Encoder^*

Rerank API^*

LLM

^*Supported in elasticsearch, e.g. through 8.16 retrievers

[ARIAN]:

RRF and MMR are deterministic algorithms that could be implement directly in your application

Learning to rank runs pretty fast, and is also offered by Elasticsearch or the LTR plugin. However you need create a training dataset, do feature engineering, evaluations, etc.

Cross encoders are a bit slower, as they use neural networks. Existing models are available out of the box, but might need fine tuning, and you need to run it somewhere.

Elasticsearch has some support to run them inside elasticsearch.

The rerank APIs are really easy to use, and give pretty good results, but hard to finetune, and maybe more expensive. Elasticsearch has integrated some of them as well.

LLMs can be really good, as you can easily give it a lot of context. However it is a lot slower, and giving it 1000 products into the context is not that practical, slow and expensive.

40 of 41

Conclusion

-> Search is difficult

-> Re-rankers are a good way to improve results

-> There is no magical AI solution or a one size fits all

-> Every situation needs a different solution

https://gitlab.com/giftomatic/elasticon-reranker-demo

[COEN]: Today we walked you through different search techniques.

We found that a search with a single retrieval step will result in mediocre results. To take user context, intent, and diversity into account, something more is needed.
We saw “how we can achieve an uplift in relevance by adding a re-ranker step into our search pipeline. Each reranker has its ups and downsides.
We used an LLM as a re-ranker to get the best results, but this is not necessarily the best solution, taking other factors into account, such as response times or costs.
To get good results out of the reranking step, you need to provide it with relatively good results from the retrieval step.
Every search case is different, invest time to truly understand your use case and your user's intent. This will improve the quality of your search engine enormously.

This is the link to our code examples (which also work with the other items from the wishlist)

41 of 41

Thank you for listening!

Time for questions

We are hiring!

Coen Baars

Arian Stolwijk