The Neural Search Frontier
The Relevance Engineer's Deep Learning Toolbelt
About us
Doug Turnbull / Author "Relevant Search"
Doug Turnbull, CTO - OpenSource Connections (http://o19s.com)
Tommaso Teofili
Computer Scientist, Adobe ASF member
Discount Code:
ctwactivate18
Relevance: challenge of 'aboutness'
bitcoin regulation
'bitcoin' - aboutness
‘regulation’ aboutness
X
X
Doc 1
Doc 2
X
Query
Doc 2 more about terms, ranked 'higher'
One hypothesis: TF*IDF vector similarity (and bm25 ...)
bitcoin regulation
'bitcoin' - TF*IDF
‘regulation’ TF*IDF
X
X
Doc 1
Doc 2
X
Query
Doc 2 more 'about' because higher concentration of the corpus's 'bitcoin' and 'regulation' terms
Sadly 'Aboutness' is hard
After minutes - simple TF*IDF
After years relevance / feature eng. work
Possible Search quality
Why!?
There's Something About Aboutness?
Neural Search Frontier
Hunt for the About October?
The Last Sharknado: It's *About* Time?
Embeddings: another vector sim
A vector storing information about the context a word appears in
“Have you heard the hype about bitcoin currency?”
“Bitcoin a currency for hackers?”
“Have you heard the hype about bitcoin currency?”
“Hacking on my Bitcoin server! It's not hyped”
Encode contexts
[0.6,-0.4] bitcoin
Docs
Easier to see as a graph
[0.5,-0.5] cryptocurrency
[0.6,-0.4] bitcoin
These two terms occur in similar contexts
Document embeddings
Encode full text as context for document
Encode
[0.5,-0.4] doc1234
Have you heard of cryptocurrency? It's the cool currency being worked on by hackers! Bitcoin is one prominent example. Some say it's all hype
...
Is shared 'context' same as 'aboutness'!?
[0.5,-0.5] cryptocurrency
[0.6,-0.4] bitcoin
[0.5,-0.4] doc1234
[0.4,-0.2] doc5678
Embeddings:
bitcoin
Closer docs to 'bitcoin' ranked higher
Sadly no :(
I want to cancel my reservation
I want to confirm my reservation
Words occur in similar contexts can have opposite meanings!!
'Doc about canceling' not relevant for 'confirm'
Fundamentally there’s still a mismatch
Fiat currencies have a global conspiracy against dogecoin
bit dollar?
Crypto expert creates content with nerd speak
Average Citizen searches in lay-speak
Embeddings derived from corpus, not our searcher’s language
Embedding Modeling
Can we improve on embeddings?
Dot Product
(‘similarity’)
Sigmoid
(forces to 0..1 ‘probability’)
[5.5,
8.1,
...
0.6]
'Bitcoin' Embedding
[6.0,
1.7,
...
8.4]
'Energy' Embedding
5.5*6.0 + 8.1*1.7 +
...
+ 0.6*8.4 = 102.65
0...1
0.67
Out of context
True context
Term | Other Term | In Same Context? | Model Prediction |
bitcoin | energy | 1 | 0.67 |
Trying to predict
A word2vec Model
Term
embedding
Table
(initialized w/ random vals)
Term
embedding
Table (initialized w/ random vals)
Tweak…
Term | Other Term | True Context? | Prediction |
bitcoin | energy | 1 | 0.67 |
bitcoin | dongles | 0 | 0.78 |
bitcoin | relevance | 0 | 0.56 |
bitcoin | cables | 0 | 0.34 |
To get into the weeds on the loss function & gradient descent for word2vec:Stanford CS224D Deep Learning for NLP Lecture Notes Chaubard, Mundra, Socher
Random unrelated terms
Terms sharing context
Tweak ‘bitcoin’ to get prediction closer to 1
Tweak ‘bitcoin’ to get prediction closer to 0
Goal:
Goal:
(showing skipgram / negative sampling method)
Maximize Me!
-
Maximize ‘True’ Contexts
Dot Product (‘similarity’)
Sigmoid
(forces to 0..1)
[5.5,
8.1,
...
0.6]
[6.0,
1.7,
...
8.4]
Minimize ‘False’ Contexts
Just the model (sigmoid of dot product)
F =
Did you just neural network?
Backpropagate tweaks to weights to reduce error
Neuron
(sigmoid)
Word1
Word2
(Dot product implicit)
d F
d v[0]
d F
d v[1]
d F
d v[2]
...
Our fitness function
Tweak this component to maximize fit
More complex/'deep' neural nets, can propagate back error to earlier layers to learn weights
Dot Product (‘similarity’)
Sigmoid
(forces to 0..1)
[5.5,
8.1,
...
0.6]
'Bitcoin' Embedding
[6.0,
1.7,
...
8.4]
'Energy' Embedding
5.5*6.0*2.0 + 8.1*1.7*0.5 +
...
+ 0.6*8.4*1.4 = 102.65
0...1
0.67
Out of context
True context
Doc2Vec, train paragraph vector w/ term vectors
[2.0,
0.5,
...
1.4]
Para.
embedding
matrix
Term
embedding
matrix
Term
embedding
matrix
backpropagate
Same neg sampling can apply here
Term | Para | True Context? | Prediction |
bitcoin | Doc 1234 Para 1 | 1 | 0.67 |
bitcoin | Doc 5678 Para 5 | 0 | 0.78 |
bitcoin | Doc 1234 Para 8 | 0 | 0.56 |
bitcoin | Doc 1537 Para 1 | 0 | 0.34 |
I’m continuing the thread of negative sampling, but doc2vec need not use negative sampling
Random unrelated docs
Terms sharing context
Tweak embeddings to get prediction closer to 1
Tweak embeddings to get prediction closer to 0
Embedding Modeling:
[0.5,-0.4] doc1234
...For years you've been manipulating this...
Your new clay to mold...
Manipulating the constructions of embeddings to measure something *domain specific*
You're *about* to see some fun embedding hacks...
Query-based embeddings?
Query Term | Other Q. Term | Same Session | Relevance Score |
bitcoin | regulation | 1 | 0.67 |
bitcoin | bananas | 0 | 0.78 |
bitcoin | headaches | 0 | 0.56 |
bitcoin | daycare | 0 | 0.34 |
Random unrelated docs
Relevant doc for term
Tweak embeddings to get query-doc score closer to 1
Tweak embeddings to get prediction closer to 0
Embeddings from judgments?
Query Term | Doc | Relevant? | Relevance Score |
bitcoin | Doc 1234 | 1 | 0.67 |
bitcoin | Doc 5678 | 0 | 0.78 |
bitcoin | Doc 1234 | 0 | 0.56 |
bitcoin | Doc 1537 | 0 | 0.34 |
Random unrelated docs
Relevant doc for term
Tweak embeddings to get query-doc score closer to 1
Tweak embeddings to get prediction closer to 0
(derived from clicks, etc)
Pretrain word2vec with corpus/sessions?
Query Term | Doc | Corpus Model | True Relevance | TweakedScore |
bitcoin | Doc 1234 | 0 | 1 | 0.01 |
bitcoin | Doc 5678 | 1 | 0 | 0.99 |
bitcoin | Doc 1234 | 0 | 0 | 0.56 |
bitcoin | Doc 1537 | 0 | 0 | 0.34 |
Takes a lot to overcome original unified model, and its a case-by-case basis
(a kind of prior)
An online / evolutionary approach to converge on improved embeddings?
Geometrically adjusted query embeddings?
Averaging query vectors by means of some possibly relevant documents
Embeddings go beyond language
User | Item | Purchased? | Recommended? |
Tom | Jeans | 1 | 0.67 |
Tom | Khakis | 0 | 0.78 |
Tom | iPad | 0 | 0.56 |
Tom | Dress Shoes | 0 | 0.34 |
Tweak embeddings to get user-item score closer to 1
Tweak embeddings to get prediction closer to 0
(use to find similar items / users)
More features beyond just 'context'
Query Term | Doc | Sentiment | Same Context | TweakedScore |
cancel | Doc 1234 | Angry | 1 | 0.01 |
confirm | Doc 5678 | Angry | 0 | 0.99 |
cancel | Doc 1234 | Happy | 0 | 0.56 |
confirm | Doc 1537 | Happy | 0 | 0.34 |
See: https://arxiv.org/abs/1805.07966
Evolutionary/contextual bandit embeddings?
Query Term | Doc | Corpus Model | True Relevance | TweakedScore |
bitcoin | Doc 1234 | 0 | 1 | 0.01 |
bitcoin | Doc 5678 | 1 | 0 | 0.99 |
bitcoin | Doc 1234 | 0 | 0 | 0.56 |
bitcoin | Doc 1537 | 0 | 0 | 0.34 |
Pull embedding in the direction your KPIs seem to say is successful
(a kind of prior)
An online / evolutionary approach to converge on improved embeddings?
Embedding Frontier...
There's a lot we could do
Neural Language Models
Translators grok 'aboutness'
Serĉa konferenco
What is in the head of this person that 'groks' Esperanto?
'a search conference?'
Language Model
Can seeing text, and guessing what would "come next" help us on our quest for 'aboutness'?
“eat __?_____”
cat: 0.001
...
chair: 0.001
pizza: 0.02
...
nap: 0.001
Highest Probability
Obvious applications: autosuggest, etc
Markov language model, vocab size V
| pizza | chair | nap | ... |
eat | 0.02 | 0.0002 | 0.0001 | ... |
cat | 0.0001 | 0.0001 | 0.02 | ... |
... | ... | ... | ... | |
Probability of ‘pizza’ following ‘eat’ (as in “eat pizza”)
V words
V words
From corpus we can build a transition matrix reflecting frequency of word transitions:
Transition Matrix Math
| pizza | chair | nap | ... |
eat | 0.02 | 0.0002 | 0.0001 | ... |
cat | 0.0001 | 0.0001 | 0.02 | ... |
... | ... | ... | ... | |
Let's say we don't 100% know the 'current' word can we still guess the 'next' word?
| Prob |
eat | 0.25 |
cat | 0.40 |
... | ... |
X
=
| Prob 'pizza' next | P(chair) | P(nap) | ... |
| 0.25*0.02+0.40*0.0001 + ... | ... | ... | ... |
Pcurr
Pnext:
V words
V words
Language model for embeddings?
[5.5,
8.1,
...
0.6]
Term
embedding
matrix
‘eat’
embedding
‘h’ dimensions
| 0 | 1 | 2 | ... | h |
0 | 0.2 | -0.4 | 0.1 | | |
1 | -0.1 | 0.3 | -0.24 | | |
2 | | | | | |
... | | | | | |
h | | | | | |
X
[0.5,
6.1,
...
-4.8]
Embedding of next word
=
(probably clusters near ‘pizza’ etc)
A transition matrix - weights learned through backpropagation
Accuracy requires context...
The race is on!
dust!
eat
Word => Next Word
pizza!
Context +
(and new context)
Context + new word => New Context/word
‘Embedding’ markov model from earlier
Word => New Context & Word
Old Context +
[0.5, 1.6 …
6.1, -4.8 …
… ]
[ 1.6,
-5.4
…]
X
[5.5,
8.1,
...
Eat (embedding)
=
Input
‘Transition’
[...]
[-0.9,
-1.2
…]
X
[5.5,
8.1,
...
=
Input -> New Context
Transition
Pizza (embedding)
[0.5, 1.6 …
6.1, -4.8 …
… ]
X
=
Prev Context -> New Context Transition
[ 0.3,
4.5
…]
[-0.6,
3.3
…]
+
New
Context
‘Embedding’ for next word
[...]
Context -> Output Transition
[-1.6,
7.3
…]
X
=
Output Embedding
Next...
Simpler view
[5.5,
9.1,
...
W_xh
[-5.5,
1.1,
...
W_hh
[5.0,
4.1,
...
[1.5,
0.1,
...
W_xh
[ 8.5,
-1.1,
...
Predicted next word (or embedding)
Weights learned through backpropagation:
(trained on true sentences in our corpus)
on
eat
[-5.5,
1.1,
...
[5.5,
9.1,
...
is
dust
W_hy
W_xh
[-2.5,
5.6,
...
[5.5,
9.1,
...
race
W_xh
Psssst… this is a Recurrent Neural Network (RNN)!
For search...
[5.5,
9.1,
...
W_xh
[-5.5,
1.1,
...
W_hh
[5.0,
4.1,
...
[1.5,
0.1,
...
W_xh
[ 8.5,
-1.1,
...
[-5.5,
1.1,
...
[5.5,
9.1,
...
W_hy
W_xh
Inject other contextually likely terms:
At any given point scanning our document, we can get a probability distribution of likely terms
A dog is loose! Please call the
animal
control
dog
catcher
phone
yodel
walk to
Not a silver bullet
[5.5,
9.1,
...
W_xh
[-5.5,
1.1,
...
W_hh
[5.0,
4.1,
...
[1.5,
0.1,
...
W_xh
[ 8.5,
-1.1,
...
[-5.5,
1.1,
...
[5.5,
9.1,
...
W_hy
W_xh
A dog is loose! Please call the
animal
control
dog
catcher
(corpus language)
(searcher language)
Won't learn this ideal language model if corpus never says 'dog catcher'
One Vector Space To Rule Them All: Seq2Seq
Translation: RNN decoder/encoder
[-1.5,
3.1,
...
W_xh
on
[-5.5,
1.1,
...
[5.5,
9.1,
...
is
W_xh
[-2.5,
5.6,
...
[1.5,
9.5,
...
race
W_xh
W_hh
W_hh
Encoder RNN
[-4.5,
5.1,
...
[-4.5,
5.1,
...
[4.5,
3.1,
...
W_xh
carrera
[-1.5,
9.1,
...
[1.3,
4.3,
...
la
W_xh
W_hh
W_hh
[-4.7,
5.1,
...
Decoder RNN
[2.8,
7.3,
...
W_xh
<START>
W_hy
la
W_hy
carrera
W_hy
es
Backprop...
W_hh
'Translate' documents to queries?
Encoder RNN
Decoder RNN
Animal Control Law
Herefore let it be …
And therefore… with much to be blah blah blah
…
…
The End
Predicted Queries
Doc | Query | Relevant? |
Animal Control Law | Dog catcher | 1 |
Littering Law | Dog Catcher | 0 |
Using judgments/clicks as training data:
Can we use graded judgments?
Encoder RNN
Decoder RNN
Animal Control Law
Herefore let it be …
And therefore… with much to be blah blah blah
…
…
The End
Predicted Queries
Doc | Query | Relevant? |
Animal Control Law | Dog catcher | 4 |
Animal Leash Law | Dog Catcher | 2 |
Construct training data to reflect weighting
Skip-Thought Vectors (a kind of ‘sentence2vec’)
Encoder RNN
Decoder RNN
(sentence before)
Decoder RNN
(sentence after)
It’s fleece was white as snow
Becomes an embedding for the sentence encoding semantic meaning
Mary had a little lamb
And everywhere that mary went, that lamb was sure to go
Skip-Thought Vectors for queries? (‘doc2queryVec’)
Encoder RNN
Decoder RNN
(relevant query)
Decoder RNN
(relevant query)
Becomes an embedding mapping docs into the query’s semantic space
q=dog catcher
q=loose dog
Animal Control Law
Herefore let it be …
And therefore… with much to be blah blah blah
…
…
The End
Decoder RNN
(relevant query)
...
q=dog fight
The Frontier
Anything2vec
Deep learning is especially good at learning representations of :
If everything can be ‘tokenized’ into some kind of token space for retrieval
Everything can be ‘vectorized’ into some embedding space for retrieval / ranking
Lucene community needs to get better first-class vector support
Similarity API and vectors ?
long computeNorm(FieldInvertState state)
SimWeight computeWeight(float boost, CollectionStatistics collectionStats, TermStatistics... termStats)
SimScorer simScorer(SimWeight weight, LeafReaderContext context)
Some progress
Check out Simon Hughes talk: "Vectors in Search"
It’s not magic, it’s math!
Join the Search Relevancy Slack Community
(projects, chat, conferences, help, book authors, and more…)
Matching: vocab mismatch!
Taxonomical Semantical Magical Search Doug Turnbull, Lucene/Solr Revolution 2017
"Animal Control Law"
Dog catcher
Dog
Catcher
Animal Control
Concept 1234
Taxonomist / Librarian
Costly to manually maintain mapping between searcher & corpus vocabulary
No Match, despite relevant results!
legalese
lay-speak
Animal Control Law
Herefore let it be …
And therefore… with much to be blah blah blah
…
…
The End
Ranking optimization hard!
Dog catcher
Score doc
Ranked Results
LTR is only as good as your features: Solr/ES queries based on your skill with Solr/ES queries
LTR?
Ranking can be hard optimization problem:
Fine tuning heuristics: TF*IDF, Solr/ES queries, analyzers, etc...
How can deep learning help?
Taxonomical Semantical Magical Search Doug Turnbull, Lucene/Solr Revolution 2017
Can we learn, quickly at scale, the semantic relationships between concepts?
What other forms of every-day enrichment could deep-learning accelerate?
Dog catcher
Animal control
Sessions as documents
Embeddings built for just query terms
Dog catcher
Dog bit me
Lost dog
[5.5,
8.1,
...
0.6]
word2vec/doc2vec
Can we translate between searcher & corpus?
[5.5,
8.1,
...
0.6]
[2.0,
0.5,
...
1.4]
Insert Machine Learning Here
Ranking Prediction
Doc
Search term
(classic LTR could go here)
Embeddings
Words with similar context share close embeddings
Cryptocurrency is not hyped! It's the future
Hackers invent their own cryptocurrency
Cryptocurrency is a digital currency
Hyped cryptocurrency is for hackers!
Encode contexts
[0.5,-0.5] cryptocurrency
[0.6,-0.4] bitcoin