1
CS 589 Lecture 5
Monday 6:30-9:00
Kidde 228
All zoom links in Canvas
Slides adapted from Stanford CS276
Learning to rank
photo: https://www.scubedstudios.com/information-retrieval/
Lecture 4 Inverted Index Recap
2
Pop quiz (Inverted Index)
1. Which of the following are prefix encodings, i.e., there does NOT exists a sequence such that there are more than one ways to decode it?
A. {a: 1, b:11, c:0}
B. {a: 0, b: 10, b: 110, c:111}
C. {a: 0, b: 100, c: 101, d: 11}
D. {a: 0, b: 100, c: 10, d: 11}
3
Pop quiz (Inverted Index)
2. When we use the following two pointer algorithm to merge the following posting lists, select all documents that will be skipped. Can you implement the two pointer algorithm with Python?
4
2
14
18
27
30
40
47
51
11
12
13
48
56
59
60
70
80
52
53
100
List A:
List B:
A Brief History of IR
5
300 BC
Callimachus: the first library catalog
Punch cards, searching at 600 cards/min
1950s
1958
Cranfield evaluation methodology; word-based indexing
building IR systems on computers; relevance feedback
1960s
1970s
TREC; learning to rank; latent semantic indexing
1980s
TF-IDF; probability ranking principle
1990 - now
web search; supporting natural language queries;
Lecture 5 Overview
6
Review of Lecture 1-3: Parameter estimation
7
query = “best phone”
$400, 20G, Nokia
$500, 30G, Nokia
$600, 40G, iphone
observed click
session 1
session 2
Relevance feedback with the Rocchio’s model
8
rel docs
non-rel docs
beta >> gamma
Background: machine learning for diabetes prediction
9
Input features: X
Output label: y, 1 / -1
Background: linear regression
10
BMI
BMI
Blood pressure
Diabetes
Non diabetes
Background: gradient descent (optimization method)
11
Using machine learning to help improve IR
12
Machine learning for IR came in late
13
Learning to rank benefits from web search
14
Learning to rank as binary classification
15
Input features: X
Output label: y
Learning to rank as binary classification
16
0
2
3
4
5
0.05
0.025
cosine score α
Term proximity ω
R
R
R
R
R
R
R
R
R
R
R
N
N
N
N
N
N
N
N
N
N
Decision surface
Learning to rank as regression
17
1
2
3
3
2
1
1
Learning to rank as regression
18
Term proximity ω
cosine score α
Relevance level
Linear model
Challenge in relevance judgment
19
1
2
3
3
2
1
1
Exact judgment vs Relative judgment
20
Exact judgment
Relative judgment
Cross Entropy Loss
21
Pairwise learning to rank on relative label
22
Binary label
Yahoo! learning to rank challenge
Chapelle and Chang [2011]
23
Yahoo! learning to rank challenge
24
Regression tree
25
decision tree
regression tree
Regression tree
26
Training a regression tree
What’s the algorithm of searching for the splitting criterion? Time complexity is quadratic
27
x^1 < v1
1.444
3.625
x^1 > v3
x^2 < v4
v3
v4
2
1.5
3
4
Step 1: similar as linear regression by searching different dimensions
Towards gradient boostings
28
Towards gradient boostings
29
Gradient boosting: estimating the parameter
30
Gradient boosting: estimating the parameter
31
Least square (gradient = residual)
Each function is equal to a small regression tree
32
f0(x)
3.588
6
6
6
6
6
5
5
5
2
2
2
33
6
6
6
6
6
5
5
5
2
2
2
2.41
2.41
2.41
2.41
2.41
1.41
1.41
1.41
-1.58
-1.58
-2.58
-2.58
-1.58
-1.58
-1.58
-1.58
-1.58
f0(x)
3.588
f1(x)
x1 < 5.5
-1.81
2.037
34
6
6
6
6
6
5
5
5
2
2
2
0.37
0.37
0.37
-0.62
-0.62
-0.62
0.22
0.22
-0.77
-0.77
0.22
0.22
0.22
0.22
0.22
0.37
0.37
f2(x)
x2 < 3.5
-0.427
0.299
Multiple additive regression tree [Friedman 1999]
35
f0(x)
3.588
f1(x)
x1 < 5.5
-1.81
2.037
f2(x)
x2 < 3.5
-0.427
0.299
+
+
Prediction =
+ …
Multiple additive regression tree [Friedman 1999]
36
f0(x)
3.588
f1(x)
x1 < 5.5
-1.81
2.037
f2(x)
x2 < 3.5
-0.427
0.299
+
+
+ …
yes
no
yes
no
RankNet [Burges et al. 2010]
37
RankNet [Burges et al. 2010]
38
RankNet [Burges et al. 2010]
39
A gradient w.r.t. document i
RankNet [Burges et al. 2010]
40
: sum over the negative examples above to make 1 go higher
RankNet [Burges et al. 2010]
41
Which metrics to use for emphasizing performance at a higher position?
a b
From RankNet to LambdaRank
42
From RankNet to LambdaRank
43
LambdaMART [Burges et al. 2010]
44
Gradient boosting
Use lambda as the residual for computing the gradient
Learning to rank categorization
45
| Algorithm | Category | Description |
1992 | Linear reg | Pointwise | Staged logistic regression |
1999 | MART | Pairwise | Multiple additive regression tree |
2005 | RankNet | Pairwise | |
2007 | SVMmap | Listwise | Optimizing MAP using SVM |
2010 | NDCGBoost | Listwise | Optimizing NDCG using boosting |
2016 | XGBoost | Pairwise | Support various eval metrics |
Listwise learning to rank for MAP [Yue et al. 2007]
46
minimize
+: hinge loss
Taking upper bound
Listwise learning to rank for NDCG [Valizadegan et al. 2009]
47
Listwise learning to rank for NDCG [Valizadegan et al. 2009]
48
Summary
49
Resources for learning to rank
50
Deep learning based retrieval
51
Deep semantic matching [Pang et al. 2016]
52
q
v1
each cell:
distributed representation of words (word2vec)
w1
w2
….
wn
vm
v2
….
d
A Study of MatchPyramid Models on Ad-hoc Retrieval
aNMM: Ranking Short Answer Texts with Attention-Based Neural Matching Model
53
Dense passage retrieval
54
“cs 589 stevens”
cs 589 Text mining and information retrieval
source: https://medium.com/@aikho/deep-learning-in-information-retrieval-part-ii-dense-retrieval-1f9fecb47de9
Contrastive learning
55
source: https://medium.com/@aikho/deep-learning-in-information-retrieval-part-ii-dense-retrieval-1f9fecb47de9
Contrastive learning
56
Make Deep IR Model Explainable (ongoing work)
57
Make Deep IR Model Explainable (ongoing work)
58
59
Lecture 5 Summary
60
Improving sentence embedding quality
61