1
CS 589 Lecture 2
All zoom links in Canvas
Probability ranking principle
Probabilistic retrieval models
photo: https://www.scubedstudios.com/information-retrieval/
2
Instructor: Susan Liu
CA: Ahmad Ibraheem
OH: Tue 3:30-5:30
Discord by default
aibrahee@stevens.edu
OH: Thu 1-3 to be added here
Discord by default, in person on HW due weeks
A Brief History of IR
3
300 BC
Callimachus: the first library catalog
Punch cards, searching at 600 cards/min
1950s
1958
Cranfield evaluation methodology; word-based indexing
building IR systems on computers; relevance feedback
1960s
1970s
TREC; learning to rank; latent semantic indexing
1980s
TF-IDF; probability ranking principle
1990 - now
web search; supporting natural language queries;
TF-IDF demo
4
Review of Lecture 1: Vector-Space model
5
Vector space model:
TF-IDF weighting:
How to choose N?
Document length normalization
6
...news of presidential campaign.....
...presidential candidate...
...campaign.....campaign...................
..........................................................
........news..........................................
..........................news........................
..........................................................
..........................................................
...........presidential......presidential
query = "news about presidential campaign"
100 words
5,000 words
d1
d2
true relevance: d1 > d2
over-penalization
...news of presidential campaign.....
...presidential candidate...
..........................................................
..........................................................
..........................................................
...presidential campaign news.....
...presidential candidate...
true relevance: d2 > d1
Need normalization between 1 and |d|
Under-penalization
Document length pivoting
7
dl
normalization
avgdl
1
Document length pivoting
8
Pop quiz (Canvas)
9
Develop your solution from here: https://tinyurl.com/422tdd7n
Answer
10
Lecture 2
11
Random variables
12
sequence = 0, 1, 0, 0, 1, 1, 0, 1, 0, 1, 0, 0
Bernoulli distribution:
Parameter:
Observation
sequence
Random variables
13
sequence = 0, 1, 0, 0, 1, 1, 0, 1, 0, 1, 0, 0
Bernoulli distribution:
Observation
seq
Maximum likelihood estimation
Maximum likelihood estimation
14
Bayes’ rules
15
Chain rule:
Bayes’ rule:
posterior
likelihood
prior
joint distribution
Random variables in information retrieval
16
Notations: in future slides, q denotes the query, d denotes the document, rel denotes the relevance judgment
Application | Coin toss | Information retrieval |
Observation | 0=head, 1=tail | 0=non-relevant, 1=relevant |
Model | alpha = probability of head | p(rel = 1|q, d) = probability of relevance given q, d |
Random variables in information retrieval
17
query = “artificial intelligence”
d = artificial, intelligence, machine, intelligence, information, retrieval
d = 56, 100, 120, 100, 50, 221
rel = 0, 1, 0, 0, 0, 0, 0, 0
q = artificial, intelligence
q = 56, 100
Notations: in future slides, q denotes the query, d denotes the document, rel denotes the relevance judgment
Observation
Probability ranking principle
18
PRP: rank documents by
Estimating
19
Scarcity of
20
q:CS589
d:Susan Liu
q:sports
q:CS589
What is ?
Estimating
21
Problems with this estimation?
Bayes rule
Estimating
22
Using odds to rerank: helps cancel items to simply calculation (next slide)
Property.
Estimating
23
…
i.i.d assumption
wi = 0 or 1
Denote
(help cancel out items)
wi = 0 or 1
24
query = “ the artificial intelligence”
artificial
bags of words representation
| intelligence | book | the | cat | artificial | dog | business |
Doc1 | 0 | 1 | 3 | 1 | 0 | 1 | 0 |
Doc2 | 1 | 0 | 0 | 0 | 0 | 0 | 1 |
query | 1 | 0 | 1 | 0 | 1 | 0 | 0 |
Model 1: RSJ model
25
Given q and d, rank d by (Robertson & Sparck Jones 76):
Probability for a word to appear in a relevant doc for q
RSJ model solves the data scarcity problem:
RSJ model: Summary
26
How to improve RSJ?
Desiderata of retrieval models
27
How to improve RSJ based on these desiderata?
Informative words in long documents are diluted
28
d2 is not twice more relevant than d1
Model 2: Okapi/BM25
29
Okapi/BM25: Formulation
30
Source: https://en.wikipedia.org/wiki/Okapi_BM25
Main idea of BM25: reweighing TF using tf / (tf + k1) (saturation function)
Okapi/BM25: Extending RSJ by Modeling TF
31
eliteness
Okapi/BM25: Extending RSJ by Modeling TF
32
eliteness
33
...news of presidential campaign.....
...presidential candidate...
..........................................................
..........................................................
..........................................................
...presidential campaign news.....
...presidential candidate...
..........................................................
..........................................................
...................president of SIT................
..........................................................
..........................................................
..........................................................
..........................................................
Title: President Election
Title: SIT Ranking
Okapi/BM25: Extending RSJ by Modeling TF
34
RSJ model
extend
(eliteness score)
Okapi/BM25: Modeling TF using Mixture Models
35
Why one Poisson model does not work?
Okapi/BM25
36
eliteness
(2 Poisson model)
RSJ model
extend
Okapi/BM25
37
Approximating the 2-Poisson model
38
slide credit: Stanford CS276 Information retrieval
Analysis of BM25 formulation
39
IDF
Pivoted document length normalization
BM25: Summary
40
BM25: Multi-field retrieval
41
title
question
BM25F
42
alpha_f: hyper parameters
Language model-based retrieval: A more flexible framework
43
RSJ:
LM-based Retrieval model:
Model 3: Language model-based retrieval
44
corpus unigram LM
Model 3: Language model-based retrieval
45
...news of presidential campaign.....
...presidential candidate...
"presidential campaign news"
Language model-based retrieval
46
constant
. . .
Derivation: https://www.overleaf.com/read/jbztxtnbzwcx
How to estimate p_seen and p(w_i|C)?
Different senses of ‘model’ [Ponte and Croft, 98]
47
decouple
retrieval model
other problems (e.g., indexing)
+
Statistical language model
48
Estimating
49
(plug in Eq. 1)
Dirichlet smoothing
corpus unigram LM
Estimating
50
Language model-based retrieval: Summary
51
Feedback language model [Zhai and Lafferty 01]
52
Model-based feedback in the language modeling approach to IR, Zhai & Lafferty, 2001
Feedback language model [Zhai and Lafferty 01]
53
sparsity
infer w/ EM algo
retrieve
get document model
Model-based feedback in the language modeling approach to IR, Zhai & Lafferty, 2001
Evaluation on smoothing methods [Zhai & Lafferty 02]
54
Two-stage language models for information retrieval
Evaluation on feedback LM [Zhai & Lafferty 01b]
55
Model-based feedback in the language modeling approach to IR, Zhai & Lafferty, 2001
Comparison between BM25 and LM [Bennett et al. 2008]
56
However, BM25 outperforms LM in other cases
A Comparative Study of Probabilistic and Language Models for Information Retrieval
Multi-field retrieval
57
Equivalence to KL-divergence retrieval model
58
why not the opposite?
constant
. . .
smoothed
Notes on the KL-divergence retrieval formula and Dirichlet prior smoothing
(Eq. 1)
Other smoothing methods
59
Tuning parameters in smoothing models [Zhai and Lafferty 02]
60
remove
Two-stage language models for information retrieval
Tuning parameters in smoothing models [Zhai and Lafferty 02]
61
Two-stage language models for information retrieval
Summary on Hyperparameter tuning
62
Translation-based language model [Xue et al. 2008]
63
Retrieval Models for Question and Answer Archives
Discussion on query length
64
Deep semantic matching [Pang et al. 2016]
65
q
v1
each cell:
distributed representation of words (word2vec)
w1
w2
….
wn
vm
v2
….
d
A Study of MatchPyramid Models on Ad-hoc Retrieval
Lecture 2: Summary
66
Homework 1
Homework 1