1 of 105

Deep Learning Methods �for Query Auto Completion�

Manish Gupta

Meghana Joshi

Puneet Agrawal

{gmanish, mejoshi, punagr}@microsoft.com

2 of 105

Agenda

  • Components in Query Auto Completion systems [40 min]
  • Ranking [30 min]
  • Personalization [30 min]
  • Handling defective suggestions and prefixes [30 min]
  • Natural Language Generation [30 min]
  • Summary and Future Trends [10 min]

Manish Gupta (gmanish@microsoft.com)

2

DL for QAC, ECIR 23

3 of 105

Agenda

  • Components in Query Auto Completion systems [40 min]
  • Ranking [30 min]
  • Personalization [30 min]
  • Handling defective suggestions and prefixes [30 min]
  • Natural Language Generation [30 min]
  • Summary and Future Trends [10 min]

Manish Gupta (gmanish@microsoft.com)

3

DL for QAC, ECIR 23

4 of 105

AutoSuggest Examples

Manish Gupta (gmanish@microsoft.com)

4

DL for QAC, ECIR 23

Prefix

Block

Conversation

5 of 105

Conceptual difficulty of the AS problem

Manish Gupta (gmanish@microsoft.com)

5

DL for QAC, ECIR 23

apple

Intention Gap

  • Intention gap is very high in AS
    • Guessing user intent with very short prefixes.
    • Guessing user language using very short prefix.
    • Guessing incorrectly can lead to defects/misspellings/inappropriate suggestions, freshness/local tail intent problems.
  • Goal:
    • Suggest the user’s intended query after minimal input keystrokes
    • Rank the user’s intended query highly in completion suggestions

6 of 105

Important Components in a QAC system

  • Ranking suggestions
    • Most popular completion
    • Time sensitive suggestions
    • Location sensitive suggestions
    • Personalization
  • Ghosting, Session co-occurrences
  • Online spell correction, Defect handling
  • Non-prefix matches, Generating suggestions
  • Mobile QAC, Enterprise QAC

Manish Gupta (gmanish@microsoft.com)

6

DL for QAC, ECIR 23

7 of 105

Ranking suggestions: Most Popular Completion (MPC)

  • “Wisdom of the crowds” MPC solution
    • A trie indexes historical queries along with popularity values.
    • Candidates=suggestions from trie that match the prefix.
    • Rank candidates by a function of its past popularity
  • Language specific popularity
  • Region specific popularity
  • Vary the query-log aggregation period
    • For shorter prefix lengths, a shorter query-log aggregation period is optimal, and vice-versa [Whiting 2013]
  • Can also rank by clicks
    • But click data is sparse

Whiting, Stewart, Andrew James McMinn, and Joemon M. Jose. "Exploring Real-Time Temporal Query Auto-Completion." In DIR, pp. 12-15. 2013.

Manish Gupta (gmanish@microsoft.com)

7

DL for QAC, ECIR 23

8 of 105

Ranking suggestions: Time sensitive suggestions

  • Predictably vs unpredictably popular
    • Predictably popular queries: temporally recurring (e.g. at Christmas, in January, etc.) or known/foreseeable events and phenomena (e.g. TV episodes, sporting events, expected weather etc.).
      • Ranking of candidates must be adjusted with time. “halloween” might be the right suggestion after typing “ha” in October, “harry potter” might be better any other time.
    • Unpredictably popular queries: unforeseeable current events and phenomena (e.g. breaking news).
      • “sarah burke” got bursty in Jan 2012, vs the usually popular “sarah palin”
  • Instead of past popularity, can we rank candidates based on forecasted frequencies?
    • Can use typical time series forecasting methods like ARIMA, exponential smoothing, … [Shokouhi, 2012]
    • Model as a ranked Multi-armed Bandit problem [Wang, 2017]

Shokouhi, Milad, and Kira Radinsky. "Time-sensitive query auto-completion." In SIGIR, pp. 601-610. 2012.

Wang, Yingfei, Hua Ouyang, Hongbo Deng, and Yi Chang. "Learning online trends for interactive query auto-completion." TKDE 29, no. 11 (2017): 2442-2454.

Manish Gupta (gmanish@microsoft.com)

8

DL for QAC, ECIR 23

9 of 105

Ranking suggestions: Location sensitive suggestions

  • A user in San Diego types “Uni”.
    • “Univ of California, San Diego” is ok.
    • “Univ of California, Los Angeles” is not ok at the top.
  • Location sensitivity of queries
    • Local interest queries [Backstrom, 2008]
      • Queries only interesting to users at particular location
        • e.g., name of local high school, newspaper
      • Find center of geographic focus and dispersion for query
      • Determine if query is tightly concentrated or spread diffusely geographically
    • Localizable queries
      • Users at different locations may issue the same query, but referring to different things
        • e.g., pizza hut, house for rent.
      • A localizable query is likely to appear as a sub query in other queries, associating with different locations. “car rental california”, “car rental new york”, etc

Backstrom, Lars, Jon Kleinberg, Ravi Kumar, and Jasmine Novak. "Spatial variation in search engine queries." In WWW, pp. 357-366. 2008.

Welch, Michael J., and Junghoo Cho. "Automatically identifying localizable queries." In SIGIR, pp. 507-514. 2008.

Manish Gupta (gmanish@microsoft.com)

9

DL for QAC, ECIR 23

10 of 105

Ranking Suggestions: Personalization

Manish Gupta (gmanish@microsoft.com)

10

DL for QAC, ECIR 23

Using short-term/long-term user history, location, other signals

the great gatsby fitzgerald

the great gatsby book

the great influenza

the great alone book

the great gatsby film 2013

Search/Click History:

tender is the night

f. scott fitzgerald

this side of paradise

the silent patient

the great gatsby book

the great gatsby film 2013

the great gatsby trailer

the great gatsby film 1974

the greatest showman

the great wall film 2016

Search/Click History:

the wolf of wall street

the revenant

inception

gangs of new york

pain & gain

the great wall of china

the great gatsby film 2013

the great gatsby trailer

the great gatsby film 1974

the greatest showman

Search/Click History:

the wolf of wall street

the revenant

inception

gangs of new york

pain & gain

beijing travel advisory

forbidden city

Session history

Long-term

11 of 105

Ghosting, Session co-occurrences

  • Ghosting: auto-completing a search recommendation by highlighting the suggested text inline i.e., within the search box.
  • Session-context ghosting increased the acceptance of offered suggestions by 6.18% and reduced misspelled searches by 4.42% [Ramachandran et al, 2019]

Ramachandran, Lakshmi, and Uma Murthy. "Ghosting: contextualized query auto-completion on Amazon search." In SIGIR, pp. 1377-1378. 2019.

Bar-Yossef, Ziv, and Naama Kraus. "Context-sensitive query auto-completion." In WWW, pp. 107-116. 2011.

  • Context sensitive AS [Bar-Yossef et al. 2011]
    • After “richard nixon” the most popular successive query starting with “am” is “american presidents”.
    • Handle sparsity of co-occurrences
      • clustering similar query sequences together
      • similarity may be syntactic (e.g., american airlines → american airlines flight status) or only semantic (e.g., american airlines → continental).

Manish Gupta (gmanish@microsoft.com)

11

DL for QAC, ECIR 23

12 of 105

Online spell correction, Defect handling

Small portions of the prefix can be corrected at trie exploration time paying a penalty cost. E.g. “cbo” 🡪 “ceboo”

  • More flexible than Offline Speller because small portions of the prefix can be changed
  • More coverage
  • Key idea: it is possible to jump to a different node in the search trie paying a cost dictated from the Conversion table

Duan, Huizhong, and Bo-June Hsu. "Online spelling correction for query completion." In Proceedings of the 20th international conference on World wide web, pp. 117-126. 2011.

  • Defects
    • Spelling mistakes
    • Offensive suggestions
    • Partial suggestions
    • Rare intents
    • Non-sensical suggestions/hallucinations
    • Gibberish
    • Bad URL

Manish Gupta (gmanish@microsoft.com)

12

DL for QAC, ECIR 23

13 of 105

Non-prefix matches, Generating suggestions

  • Non-prefix matches
    • If Q is “shrimp dip rec”, then a plausible completion found by prefix-search could be “shrimp dip recipes”.
    • A multiterm prefix-search could return, instead, “shrimp bienville dip recipe” or “recipe for appetizer shrimp chipotle dip”.
    • Use inverted index.

Gog, Simon, Giulio Ermanno Pibiri, and Rossano Venturini. "Efficient and effective query auto-completion." In SIGIR, pp. 2271-2280. 2020.

  • A significant proportion of queries issued daily have never been seen previously.
  • Generate suggestions
    • Improves recall in tail.
    • Deep learning NLG
    • FST: Finite state transducers
    • N-gram models
  • Issues
    • Latency
    • Partial suggestions, offensive suggestions, hallucinations, grammatically-incorrect suggestions
    • Personalization

Manish Gupta (gmanish@microsoft.com)

13

DL for QAC, ECIR 23

14 of 105

Mobile QAC, Enterprise QAC

Zhang, Aston, Amit Goyal, Ricardo Baeza-Yates, Yi Chang, Jiawei Han, Carl A. Gunter, and Hongbo Deng. "Towards mobile query auto-completion: An efficient mobile application-aware approach." In WWW, pp. 579-590. 2016.

Hawking, David, and Kathy Griffiths. "An enterprise search paradigm based on extended query auto-completion: do we still need search and navigation?." In Proceedings of the 18th Australasian Document Computing Symposium, pp. 18-25. 2013.

Manish Gupta (gmanish@microsoft.com)

14

DL for QAC, ECIR 23

15 of 105

Agenda

  • Components in Query Auto Completion systems [40 min]
  • Ranking [30 min]
  • Personalization [30 min]
  • Handling defective suggestions and prefixes [30 min]
  • Natural Language Generation [30 min]
  • Summary and Future Trends [10 min]

Manish Gupta (gmanish@microsoft.com)

15

DL for QAC, ECIR 23

16 of 105

Ranking suggestions: Most Popular Completion (MPC)

  • “Wisdom of the crowds” MPC solution
    • A trie indexes historical queries along with popularity values.
    • Candidates=suggestions from trie that match the prefix.
    • Rank candidates by a function of its past popularity
  • Language specific popularity
  • Region specific popularity
  • Vary the query-log aggregation period
    • For shorter prefix lengths, a shorter query-log aggregation period is optimal, and vice-versa [Whiting 2013]
  • Can also rank by clicks
    • But click data is sparse

Whiting, Stewart, Andrew James McMinn, and Joemon M. Jose. "Exploring Real-Time Temporal Query Auto-Completion." In DIR, pp. 12-15. 2013.

Manish Gupta (gmanish@microsoft.com)

16

DL for QAC, ECIR 23

17 of 105

Agenda

  • Traditional Machine Learning methods for ranking suggestions
  • Convolutional Latent Semantic Model
  • LSTM encoder

Manish Gupta (gmanish@microsoft.com)

17

DL for QAC, ECIR 23

18 of 105

Agenda

  • Traditional Machine Learning methods for ranking suggestions
  • Convolutional Latent Semantic Model
  • LSTM encoder

Manish Gupta (gmanish@microsoft.com)

18

DL for QAC, ECIR 23

19 of 105

Prefix and Suggestion Features

  • Prefix features
    • Does it end with a space character
    • Prefix length
  • Suggestion features
    • Suggestion length (characters and words)
    • Frequency in the background set.
      • Time scales: 1/2/3/4 weeks, 1 month, 1 year.
    • Is it a navigational query?
    • Overall impressions of homologous queries: (1) queries with the same terms as the candidate query but in a different order and (2) queries that extend the candidate query.
    • Number of queries with this suggestion as prefix.

Sordoni, Alessandro, Yoshua Bengio, Hossein Vahabi, Christina Lioma, Jakob Grue Simonsen, and Jian-Yun Nie. "A hierarchical recurrent encoder-decoder for generative context-aware query suggestion." In CIKM, pp. 553-562. 2015.

Cai, Fei, and Maarten de Rijke. "Learning from homologous queries and semantically related terms for query auto completion." Information Processing & Management 52, no. 4 (2016): 628-643.

Manish Gupta (gmanish@microsoft.com)

19

DL for QAC, ECIR 23

20 of 105

Pairwise and Contextual Features

  • Pairwise features (using anchor query from session data)
    • For each candidate suggestion, count how many times it follows the anchor query in the background data.
    • Frequency of the anchor query in the background data.
    • Levenshtein distance between the anchor and the suggestion.
  • Contextual Features
    • 10 features corresponding to the character n-gram similarity between the suggestion and the 10 most recent queries in the context.
    • Average Levenshtein distance between the suggestion and each query in the context.

Sordoni, Alessandro, Yoshua Bengio, Hossein Vahabi, Christina Lioma, Jakob Grue Simonsen, and Jian-Yun Nie. "A hierarchical recurrent encoder-decoder for generative context-aware query suggestion." In CIKM, pp. 553-562. 2015.

Cai, Fei, and Maarten de Rijke. "Learning from homologous queries and semantically related terms for query auto completion." Information Processing & Management 52, no. 4 (2016): 628-643.

Manish Gupta (gmanish@microsoft.com)

20

DL for QAC, ECIR 23

21 of 105

Reformulation Features

Manish Gupta (gmanish@microsoft.com)

21

DL for QAC, ECIR 23

 

22 of 105

User Features

  • Typing speed at this keystroke
  • Number of times the suggestion is issued by the user in the past.
  • Suggestion frequency over queries submitted by users of same gender.
  • Suggestion frequency over queries submitted by users of same age group.
  • Average length of queries the user clicked in the past.
  • Average number of words in queries the user clicked in the past.
  • Sim between suggestion words and words in previous queries in same session.
    • Cosine similarity, Jaro Winkler edit distance, WordNet similarity, N-Gram similarity, SERP-Similarity

Li, Yanen, Anlei Dong, Hongning Wang, Hongbo Deng, Yi Chang, and ChengXiang Zhai. "A two-dimensional click model for query auto-completion." In SIGIR, pp. 455-464. 2014.

Di Santo, Giovanni, Richard McCreadie, Craig Macdonald, and Iadh Ounis. "Comparing approaches for query autocompletion." In SIGIR, pp. 775-778. 2015.

Manish Gupta (gmanish@microsoft.com)

22

DL for QAC, ECIR 23

23 of 105

Implicit Negative Feedback from Previous Prefixes in same conversation

  • User wants “facetime”.
  • With prefix “fac”, “facebook” is ranked at the top.
  • User dwells for a long time to examine “facebook” but does not select it.
  • In the next keystroke “e”, popularity-based QAC still makes “facebook” top in the list.

Zhang, Aston, Amit Goyal, Weize Kong, Hongbo Deng, Anlei Dong, Yi Chang, Carl A. Gunter, and Jiawei Han. "adaqac: Adaptive query auto-completion via implicit negative feedback." In SIGIR, pp. 143-152. 2015.

Manish Gupta (gmanish@microsoft.com)

23

DL for QAC, ECIR 23

24 of 105

Agenda

  • Traditional Machine Learning methods for ranking suggestions
  • Convolutional Latent Semantic Model
  • LSTM encoder

Manish Gupta (gmanish@microsoft.com)

24

DL for QAC, ECIR 23

25 of 105

Convolutional latent semantic model for rare prefixes

  •  

Manish Gupta (gmanish@microsoft.com)

25

DL for QAC, ECIR 23

26 of 105

CLSM architecture

Manish Gupta (gmanish@microsoft.com)

26

DL for QAC, ECIR 23

27 of 105

Agenda

  • Traditional Machine Learning methods for ranking suggestions
  • Convolutional Latent Semantic Model
  • LSTM encoder

Manish Gupta (gmanish@microsoft.com)

27

DL for QAC, ECIR 23

28 of 105

Efficient Generation and Ranking for Neural QAC

  • Candidate generation
    • We aim to increase recall of candidates with more context utilization.
    • 3 ways
      • MPC
      • Maximum Context Generation (MCG) i.e., ngram word tries: use as much context as possible.
      • LastWordGeneration (LWG) selects from a list of 100k most frequent suffixes
  • Candidate Ranking
    • Two major components
      • The unnormalized language model layer computes “the sequence probability as the query scores” for a query candidate efficiently.
      • Then pairwise learning-to-rank (LTR) objective functions are applied on the scores of the clicked and non-clicked query pairs.
    • These two components are trained together in an end-to-end fashion.

Manish Gupta (gmanish@microsoft.com)

28

DL for QAC, ECIR 23

Our neural ranking model architecture. On top of it is a Learning-To-Rank layer that takes in multiple candidate scores. The input query has a special token “<sos>"; the probability of "research scientist <eos>" is computed based on LSTM hidden states.

29 of 105

Efficient Generation and Ranking for Neural QAC

Manish Gupta (gmanish@microsoft.com)

29

DL for QAC, ECIR 23

The average time cost of ranking a candidate list with 10 candidates is measured for each model. The average number of words in candidates is 3.20. The hidden vector size and embedding size of LM is 100 and the LSTM layer number is 1.

LSTMEmbed: The final hidden state vector from LSTM is used as the semantic representation of the sequence.

30 of 105

Agenda

  • Components in Query Auto Completion systems [40 min]
  • Ranking [30 min]
  • Personalization [30 min]
  • Handling defective suggestions and prefixes [30 min]
  • Natural Language Generation [30 min]
  • Summary and Future Trends [10 min]

Manish Gupta (gmanish@microsoft.com)

30

DL for QAC, ECIR 23

31 of 105

Personalization for QAC

Manish Gupta (gmanish@microsoft.com)

31

DL for QAC, ECIR 23

Using short-term/long-term user history, location, other signals

the great gatsby fitzgerald

the great gatsby book

the great influenza

the great alone book

the great gatsby film 2013

Search/Click History:

tender is the night

f. scott fitzgerald

this side of paradise

the silent patient

the great gatsby book

the great gatsby film 2013

the great gatsby trailer

the great gatsby film 1974

the greatest showman

the great wall film 2016

Search/Click History:

the wolf of wall street

the revenant

inception

gangs of new york

pain & gain

the great wall of china

the great gatsby film 2013

the great gatsby trailer

the great gatsby film 1974

the greatest showman

Search/Click History:

the wolf of wall street

the revenant

inception

gangs of new york

pain & gain

beijing travel advisory

forbidden city

Session history

Long-term

32 of 105

Agenda

  • Traditional Machine Learning methods
  • Hierarchical RNN Encoder-decoder
  • GRUs with user and time representations
  • Transformer-based hierarchical encoder

Manish Gupta (gmanish@microsoft.com)

32

DL for QAC, ECIR 23

33 of 105

Agenda

  • Traditional Machine Learning methods
  • Hierarchical RNN Encoder-decoder
  • GRUs with user and time representations
  • Transformer-based hierarchical encoder

Manish Gupta (gmanish@microsoft.com)

33

DL for QAC, ECIR 23

34 of 105

Motivation for personalization

  • Prefix=“i”
    • Young (age<25) female users: suggestion=instagram
    • Male users (25<age<44): suggestion=imdb
  • Demography and history features can be used for personalizing auto-completion rankings
    • Age {Below 20, 21-30, 31-40, 41-50, and above 50}
    • Gender
    • Location (zip-codes)
    • Short- and long-history: n-gram similarity

Manish Gupta (gmanish@microsoft.com)

34

DL for QAC, ECIR 23

(Top) The likelihood of instagram and imdb in queries submitted by different demographics according to Yahoo! Clues. (Bottom) The likelihood of instagram and imdb in queries submitted by the logged-in users of Bing.

35 of 105

Features for personalizing auto-completion

Manish Gupta (gmanish@microsoft.com)

35

DL for QAC, ECIR 23

36 of 105

LTR framework for personalization

  • Supervised framework for personalizing auto-completion ranking.
  • Labeled data
    • The query which was eventually submitted by the user is considered as the only right (relevant) candidate and is assigned a positive label.
    • The other candidates are all regarded as non-relevant and get zero labels.
  • User’s long-term search history and location are the most effective for personalizing auto-completion rankers.
  • Supervised rankers enhanced by personalization features can significantly outperform the existing popularity-based baselines, in terms of MRR by up to 9%.

Manish Gupta (gmanish@microsoft.com)

36

DL for QAC, ECIR 23

The biggest movers in personalized autocompletion rankings when the ranker is trained by age features. Each column includes the candidates that were boosted most frequently in the personalized auto-completion rankings for users of the specified age groups.

37 of 105

Location for personalization

Manish Gupta (gmanish@microsoft.com)

37

DL for QAC, ECIR 23

The top movers in each region. These are queries that their average positions in rankings with and without personalization differ the most in each region. The regions are specified by collapsing the first zip-code digits and the users in each region are grouped accordingly. Each map shows the distribution of query popularity across different US states according to Google Trends, and the colors range between light blue (rare) and dark blue (popular).

38 of 105

Agenda

  • Traditional Machine Learning methods
  • Hierarchical RNN Encoder-decoder
  • GRUs with user and time representations
  • Transformer-based hierarchical encoder

Manish Gupta (gmanish@microsoft.com)

38

DL for QAC, ECIR 23

39 of 105

RNNs for personalization

  •  

Manish Gupta (gmanish@microsoft.com)

39

DL for QAC, ECIR 23

40 of 105

Attend, Copy, Generate (ACG)

  • Co-occurrence based models
    • Suffer from data sparsity and lack of coverage for rare or unseen queries.
    • Dealing with these highly diverse sessions makes using co-occurrence based model almost impossible.
  • ACG
    • Query-aware attention mechanism to capture the structure of the session context.
    • Automatically detects session boundaries.
  • Within a single session a large portion of query terms is retained from the previously submitted queries and consists of mostly infrequent or unseen terms that are usually not included in the vocabulary.
    • ~62% of the terms in a query are retained from their preceding queries
    • >39% of the users repeat at least one term from their previous query
    • Based on statistics from the AOL query log, >67% of the retained terms in the sessions are from the bottom 10% of terms ordered by their frequency.

Manish Gupta (gmanish@microsoft.com)

40

DL for QAC, ECIR 23

41 of 105

Attend, Copy, Generate (ACG)

Manish Gupta (gmanish@microsoft.com)

41

DL for QAC, ECIR 23

Example of generating a suggestion query given the previous queries in the session. The suggestion query is generated during three time steps. The heatmap indicates the attention, red for query-level attention and blue for word-level attention. The pie chart shows if the network decides to copy or to generate.

42 of 105

Manish Gupta (gmanish@microsoft.com)

42

DL for QAC, ECIR 23

 

 

43 of 105

Evaluation

  •  

Manish Gupta (gmanish@microsoft.com)

43

DL for QAC, ECIR 23

44 of 105

Comparisons

  • Methods:
    • Most Popular Suggestions (MPS): Rank by historical co-occurrence count with last query in current session.
    • BaseRanker: 17 feature L2R method by Sordoni et al.
    • seq2seq model and one with query-aware attention only (seq2seq + QaA)
    • seq2seq model with copy mechanism (seq2seq + CM) only

Manish Gupta (gmanish@microsoft.com)

44

DL for QAC, ECIR 23

45 of 105

Agenda

  • Traditional Machine Learning methods
  • Hierarchical RNN Encoder-decoder
  • GRUs with user and time representations
  • Transformer-based hierarchical encoder

Manish Gupta (gmanish@microsoft.com)

45

DL for QAC, ECIR 23

46 of 105

Personalized neural Language Model

  •  

Manish Gupta (gmanish@microsoft.com)

46

DL for QAC, ECIR 23

47 of 105

Personalized neural Language Model

  •  

Manish Gupta (gmanish@microsoft.com)

47

DL for QAC, ECIR 23

48 of 105

Diverse beam search

  • Decodes diverse lists by dividing the beam budget into groups and enforcing diversity between groups of beams.
  • We optimize an objective that consists of two terms – the sequence likelihood under the model and a dissimilarity term that encourages beams across groups to differ.
  • This diversity-augmented model score is optimized in a doubly greedy manner – greedily optimizing along both time and groups.

Manish Gupta (gmanish@microsoft.com)

48

DL for QAC, ECIR 23

49 of 105

  • MPC: Most Popular Completion
  • NQAC: word embeddings and the one-hot encoding of characters only.
  • Subscript U: language model is enriched with user vectors
  • Subscript T: language model integrates time features.
  • +D: use of the diverse beam search
  • Also study the impact of adding MPC and LambdaMART (+MPC, +λMART).
  • NQLM(L): Neural Query LM with LSTMs.

Manish Gupta (gmanish@microsoft.com)

49

DL for QAC, ECIR 23

50 of 105

Agenda

  • Traditional Machine Learning methods
  • Hierarchical RNN Encoder-decoder
  • GRUs with user and time representations
  • Transformer-based hierarchical encoder

Manish Gupta (gmanish@microsoft.com)

50

DL for QAC, ECIR 23

51 of 105

Multi-view Multi-task Attentive framework for Personalized QAC

  • Input: query prefix and two behavior sequences, and each behavior is a token sequence, which can be a searched query or a browsed item title.
  • Transformer-based hierarchical encoder to model different kinds of sequential behaviors, which can be seen as multiple distinct views of the user’s searching history.
    • Obtains context-free behavior-level and context-aware context-level representations of each behavior.
  • A prefix-to-history attention mechanism is used to select the most relevant information from two views with the prefix representation from the middle part as key.
  • The prefix representation are combined with the weighted view presentations as the final intention representation to feed into two specific tasks, including CTR prediction and query generation.

Manish Gupta (gmanish@microsoft.com)

51

DL for QAC, ECIR 23

52 of 105

Examples of query candidates recommended by different models for the same prefix and history behavior.

Manish Gupta (gmanish@microsoft.com)

52

DL for QAC, ECIR 23

Ablation Study of query generation models.

 

53 of 105

Agenda

  • Components in Query Auto Completion systems [40 min]
  • Ranking [30 min]
  • Personalization [30 min]
  • Handling defective suggestions and prefixes [30 min]
  • Natural Language Generation [30 min]
  • Summary and Future Trends [10 min]

Manish Gupta (gmanish@microsoft.com)

53

DL for QAC, ECIR 23

54 of 105

Agenda

  • LSTMs for inappropriate query suggestion detection
  • Offline Spell Correction
  • Online Spell Correction

Manish Gupta (gmanish@microsoft.com)

54

DL for QAC, ECIR 23

55 of 105

Inappropriate query suggestion detection

  • Inappropriate: if it may cause anger, annoyance to certain users or exhibits lack of respect, rudeness, discourteousness towards certain individuals/communities or may be capable of inflicting harm to oneself or others.

Yenala, Harish, Manoj Chinnakotla, and Jay Goyal. "Convolutional Bi-directional LSTM for detecting inappropriate query suggestions in web search." In PAKDD, pp. 3-16. Springer, Cham, 2017.

Manish Gupta (gmanish@microsoft.com)

55

DL for QAC, ECIR 23

56 of 105

CONV+BiLSTMs for Inappropriate query suggestion detection

  • Randomly initialize the word vectors for these padded unknown words from the uniform distribution [-0.25, 0.25].
  • Use three 3 × 25 filters.

Yenala, Harish, Manoj Chinnakotla, and Jay Goyal. "Convolutional Bi-directional LSTM for detecting inappropriate query suggestions in web search." In PAKDD, pp. 3-16. Springer, Cham, 2017.

Manish Gupta (gmanish@microsoft.com)

56

DL for QAC, ECIR 23

57 of 105

Results

  • Pattern and Keyword based Filtering (PKF)
  • Support Vector Machine (SVM)
  • Gradient Boosted Decision Trees (BDT)

Yenala, Harish, Manoj Chinnakotla, and Jay Goyal. "Convolutional Bi-directional LSTM for detecting inappropriate query suggestions in web search." In PAKDD, pp. 3-16. Springer, Cham, 2017.

Manish Gupta (gmanish@microsoft.com)

57

DL for QAC, ECIR 23

58 of 105

Agenda

  • LSTMs for inappropriate query suggestion detection
  • Offline Spell Correction
  • Online Spell Correction

Manish Gupta (gmanish@microsoft.com)

58

DL for QAC, ECIR 23

59 of 105

Spell correction using soft-masked BERT

  •  

Manish Gupta (gmanish@microsoft.com)

59

DL for QAC, ECIR 23

60 of 105

Soft-masked BERT

  •  

Manish Gupta (gmanish@microsoft.com)

60

DL for QAC, ECIR 23

Detection loss

Correction loss

61 of 105

Agenda

  • LSTMs for inappropriate query suggestion detection
  • Offline Spell Correction
  • Online Spell Correction

Manish Gupta (gmanish@microsoft.com)

61

DL for QAC, ECIR 23

62 of 105

Need for online spell correction

  • Offline spell corrections are very confident spell corrections.
  • Issue: Low coverage.
  • Online spell correction
    • Small portions of the prefix can be corrected at trie exploration time paying a penalty cost. Eg change “cbo” with “ceboo”
    • More flexible than Offline spell corrections because small portions of the prefix can be changed
    • More coverage
  • Key idea: it is possible to jump to a different node in the search trie paying a cost dictated from the Conversion Table

Manish Gupta (gmanish@microsoft.com)

62

DL for QAC, ECIR 23

63 of 105

Trie with online spell correction

Manish Gupta (gmanish@microsoft.com)

63

DL for QAC, ECIR 23

From

To

Cost

a

b

5000

a

c

5000

a

d

6000

Conversion Table

  • Each node stores the minimum score in the subtree
  • The Conversion Table stores the cost of a correction

Exploration: Path are explored with exact matching or paying some conversion cost

Example for Prefix “a”: the explored trie is highlighted in yellow and this is the resulting priority queue:

a

3000

b 1000+5000

d

1000 + 6000

c

5000 + 5000

64 of 105

Learning conversion rules

  • Utilizing spelling correction pairs, we train a Markov n-gram transformation model that captures user spelling behavior in an unsupervised fashion.
  • Joint-sequence modeling framework to define the probability of transforming the original query into the observed character sequence.
  • Treat the desired and realized queries as a sequence of substring transformation units, or transfemes.
    • E.g., the transformation Britney🡪britny can be segmented into the transfeme sequence {br🡪br, i🡪i, t🡪t, ney🡪ny}, where only the last transfeme, involves a correction.
  • We can decompose the probability of the overall transformation sequence as a product of the transfeme probabilities, each conditioned on the previous transfemes.
  • By applying the Markov assumption and experimenting with the length of the transfeme units, we can build transformation models of varying complexities.
  • To find the top spell-corrected completion suggestions in real-time, we adapt the A* search algorithm with various pruning heuristics to dynamically expand the search space efficiently

Manish Gupta (gmanish@microsoft.com)

64

DL for QAC, ECIR 23

65 of 105

Error correction with char RNNs

  • Char RNN model
    • The completion must be error correcting, able to handle small errors in the user’s initial input and provide completions for the most likely “correct” input.
    • The completion must be real-time
  • Propose a modified beam search process which integrates with a completion distance-based error correction model, combining the error correction process (as a potential function) together with the language model
  • Efficiently perform modified beam search to complete the queries with error correction in real time, by exploiting the greatly overlapped forward propagation process and conducting amortized dynamic programming on the search tree.

Manish Gupta (gmanish@microsoft.com)

65

DL for QAC, ECIR 23

66 of 105

Completion vs Error Correction

Manish Gupta (gmanish@microsoft.com)

66

DL for QAC, ECIR 23

General Beam Search

Edit Distance v.s. Completion Distance

  • we should not count the edit distance for adding words after the last character (of terms) from the user input.
  • we change the penalty to an indicator when dealing with the “add” operation in the edit distance algorithm.

67 of 105

Beam Search with Edit Distance

  •  

Manish Gupta (gmanish@microsoft.com)

67

DL for QAC, ECIR 23

68 of 105

Agenda

  • Components in Query Auto Completion systems [40 min]
  • Ranking [30 min]
  • Personalization [30 min]
  • Handling defective suggestions and prefixes [30 min]
  • Natural Language Generation [30 min]
  • Summary and Future Trends [10 min]

Manish Gupta (gmanish@microsoft.com)

68

DL for QAC, ECIR 23

69 of 105

NLG for QAC

  • Technical challenges
    • Handling partial words in the input
    • Optimization of computation requirements and throughput
    • Model compression/distillation
    • Beam search vs greedy decoding
  • Considerations
    • Multi-language support
    • Inappropriate leakage
    • Suggestion quality
    • Coverage
    • Latency

Manish Gupta (gmanish@microsoft.com)

69

DL for QAC, ECIR 23

70 of 105

Query Blazer: NLG without deep learning

  • n-gram language model at a subword-level
  • Exploits the n-gram model’s inherent data structure to precompute completions prior to runtime.

Manish Gupta (gmanish@microsoft.com)

70

DL for QAC, ECIR 23

71 of 105

Agenda

  • RNNs with character and word embeddings
  • LSTMs with subword embeddings
  • Hierarchical RNN Encoder-decoder
  • Next Phrase Prediction with T5
  • Problems with NLG

Manish Gupta (gmanish@microsoft.com)

71

DL for QAC, ECIR 23

72 of 105

Character-level Neural Language Model.

  • Char-LM
    • Can handle OOV words
    • Can use the last incomplete word in prefix.

Manish Gupta (gmanish@microsoft.com)

72

DL for QAC, ECIR 23

Top suggested queries by char-LM. Phrases such as “afraid of the dead” and “afraid of the dog” and all prefixes do not exist in the data. Note that there is also a low-quality suggestion “when is a good time to buy a lyrics.

Architecture of our language model for an example query where ‘\n’ indicates the end of the query. Green cells contain one-hot encoded vectors of characters, blue cells contain character-embedded vectors, and red cells contain word-embedded vectors. <INC> means incomplete word token.

73 of 105

Character-level Neural Language Model.

  • Mitra10K+MPC+λMART and Mitra100K+ MPC+λMART: use 10K and 100K synthetic candidates using suffixes.
  • NQLM: LM not using word-embedded character space
  • NQLM+WE: uses word embeddings.
  • NQLM(S): models with a small network using 512 hidden LSTM units
  • NQLM(L): large network using 1,536 units
  • +MPC: Append our LM-generated candidates to the end of MPC candidates, if there are any.
  • +λMART: Employ LambdaMART and the same features as Mitra et al., except that CLSM scores are replaced by NQLM scores.
  • New metric: Partial-matching MRR (PMRR)
    • Partial-match rank is the rank of the first candidate that is the same as the original query or that extends the prefix by one or more complete words.
    • Partial-match rank<=full match rank.

Manish Gupta (gmanish@microsoft.com)

73

DL for QAC, ECIR 23

74 of 105

Agenda

  • RNNs with character and word embeddings
  • LSTMs with subword embeddings
  • Hierarchical RNN Encoder-decoder
  • Next Phrase Prediction with T5
  • Problems with NLG

Manish Gupta (gmanish@microsoft.com)

74

DL for QAC, ECIR 23

75 of 105

Subword Language Model for QAC

  • Representing queries with subwords shorten decoding length significantly compared to char-LM.
  • Problem with subword LM
    • If we segment prefix as given to encode it using neural networks, the segmentation of prefix may not match with that of ground truth query because the prefix is an incomplete substring of the original desired query.
    • This enforced segmentation is less likely to appear in training
    • The model starting from this segmentation is unlikely to generate ground truth query
  • Two ways of segmentation of prefix
    • BPE algorithm is deterministic because it segments greedily from left to right.
    • Subword regularization (SR): stochastically samples multiple segmentations by utilizing a unigram LM.

Manish Gupta (gmanish@microsoft.com)

75

DL for QAC, ECIR 23

76 of 105

Subword Language Model for QAC

  • For SR, due to the stochasticity of segmentation, we should marginalize over all possible segmentations to calculate the likelihood of a query
  • The number of possible segmentations is exponentially large. Marginalization over all possible segmentations of very long sequences is intractable.
  • Hence, decode for the best token sequence.
  • Since finding best token sequence is also intractable, beam search decoding is used but only results in suboptimal predictions.
  • Solution: To consider every possible segmentation of target completion, retrace algorithm goes a few characters back from the end and generates candidates with the restriction that they should match with retraced characters.

Manish Gupta (gmanish@microsoft.com)

76

DL for QAC, ECIR 23

77 of 105

Subword Language Model for QAC

  • Subword LM is ~2.5x faster while maintaining a similar quality of generated results compared to the character-level LM.
  • New evaluation metric, mean recoverable length (MRL)
    • measures how many upcoming characters the model could complete correctly.
    • useful for additive QAC which suggests one word at a time instead of a whole query completion.
    • does not care about the order of candidates and check whether they contain the target query or not.

Manish Gupta (gmanish@microsoft.com)

77

DL for QAC, ECIR 23

78 of 105

Agenda

  • RNNs with character and word embeddings
  • LSTMs with subword embeddings
  • Hierarchical RNN Encoder-decoder
  • Next Phrase Prediction with T5
  • Problems with NLG

Manish Gupta (gmanish@microsoft.com)

78

DL for QAC, ECIR 23

79 of 105

Hierarchical recurrent encoder-decoder (HRED)

  •  

Manish Gupta (gmanish@microsoft.com)

79

DL for QAC, ECIR 23

The user types “cleveland gallery → lake erie art”. During training, the model encodes “cleveland gallery”, updates the session-level recurrent state and maximizes the probability of seeing “lake erie art”. The process is repeated for all queries in the session. During testing, a contextual suggestion is generated by encoding the previous queries, by updating the session-level recurrent states accordingly and by sampling a new query from the last obtained session-level recurrent state. Here, the generated contextual suggestion is “cleveland indian art”.

80 of 105

HRED with LambdaMART

  • Test Scenario 1: Next-Query Prediction
    • For each session, extract 20 candidate queries that most likely follow the anchor query in background data, i.e. with the highest ADJ score.
    • Take instances where target is in top 20 candidate set.
  • Test Scenario 2: Robust Prediction
    • Label 100 most frequent queries in background set as noisy.
    • For each entry in the previous next-query prediction task, corrupt its context by inserting a noisy query at a random position.
    • The candidates and the target are unchanged.
    • The probability of sampling a noisy query is proportional to its frequency in the background set.
    • E.g., given context “airlines → united airlines” and target “delta airlines”, the noisy sample “google” is inserted at a random position. Thus, corrupted context is “airlines → united airlines → google”.
  • Test Scenario 3: Long-Tail Prediction
    • Retain the sessions for which the anchor query has not been seen in the background set, i.e., it is a long-tail query.
    • For each session, iteratively shorten the anchor query by dropping terms until we have a query that appears in the background data.
    • If a match is found, we proceed as described in the next-query prediction setting, i.e., ensure that target is in top 20 candidate set.

Manish Gupta (gmanish@microsoft.com)

80

DL for QAC, ECIR 23

 

81 of 105

Comparison of HRED with BaselineRanker and ADJ

Manish Gupta (gmanish@microsoft.com)

81

DL for QAC, ECIR 23

82 of 105

RIN: Reformulation Inference Network for Context-Aware Query Suggestion

  •  

Manish Gupta (gmanish@microsoft.com)

82

DL for QAC, ECIR 23

83 of 105

RIN: Reformulation Inference Network for Context-Aware Query Suggestion

Manish Gupta (gmanish@microsoft.com)

83

DL for QAC, ECIR 23

  • Query Session encoder
    • To capture the structure of the session context, a RNN with the attention mechanism is employed to encode the search session by reading the homomorphic query and reformulation embeddings.
    • It enables the model to explicitly captures the former reformulation for each query in the search session and directly learn user reformulation behaviors

Self attention

84 of 105

RIN: Reformulation Inference Network for Context-Aware Query Suggestion

  •  

Manish Gupta (gmanish@microsoft.com)

84

DL for QAC, ECIR 23

  • Query Generator
    • Without any candidate query, the query generator aims to produce a sequence of terms as the generated query.

85 of 105

RIN: Reformulation Inference Network for Context-Aware Query Suggestion

  • Optimization

  • Task could be D or G.

  • Most Popular Suggestion (MPS): ranks queries by the co-occurrence to the last query in the context.
  • Query-based Variable Markov Model (QVMM): learns the probability of query transitions over sessions with the variable memory Markov model implemented by a suffix tree.
  • Hybrid Suggestion (Hybrid): ranking candidate queries based on a linear combination between the popularity (i.e., MPS) and the similarity to recent queries.
  • Personalized Completion (PC): Personalized LambdaMART ranking model using MPS as well as user long-term history signals.
  • Reformulation-based Completion (RC): LambdaMART with 43 reformulation-based features

Manish Gupta (gmanish@microsoft.com)

85

DL for QAC, ECIR 23

86 of 105

Agenda

  • RNNs with character and word embeddings
  • LSTMs with subword embeddings
  • Hierarchical RNN Encoder-decoder
  • Next Phrase Prediction with T5
  • Problems with NLG

Manish Gupta (gmanish@microsoft.com)

86

DL for QAC, ECIR 23

87 of 105

Next Phrase Prediction for QAC for Emails/Academic Writings

  • Next Phrase Prediction (NPP)
    • Encourages a language model to complete the partial query with enriched phrases
    • 2 steps
      • Phrase Extraction
        • extracts qualitative phrases by constituency parsing
      • Generative Question Answering
        • Start with a pre-trained T5 model
        • The pre-trained LM is guided to choose the correct next phrase among other phrases of the same type (e.g., NP, VP, etc.) in the sentence.
      • Finetune on QAC task.

Manish Gupta (gmanish@microsoft.com)

87

DL for QAC, ECIR 23

GPT-2 can generate syntactically sound, and semantically general sentence from partial query. However, it still needs to be fine-tuned a lot to generate semantically expert domain (e.g. Computer Science) focused sentence.

88 of 105

NPP Results

Manish Gupta (gmanish@microsoft.com)

88

DL for QAC, ECIR 23

89 of 105

Agenda

  • RNNs with character and word embeddings
  • LSTMs with subword embeddings
  • Hierarchical RNN Encoder-decoder
  • Next Phrase Prediction with T5
  • Problems with NLG

Manish Gupta (gmanish@microsoft.com)

89

DL for QAC, ECIR 23

90 of 105

When Are Search Completion Suggestions Problematic?

  • 15% to 47% of problematic suggestions were flagged as problematic due to the query prefix they were surfaced for.
  • Search voids: rare query prefixes to be up to 3 times more likely to be linked to problematic suffixes.
  • Problematic suggestions can also affect those issuing the search query if e.g., their dignity is compromised
    • Suggesting ... [bed bugs by yourself] when a user typed how to kill [...] may imply the system deduced the user has a bed bugs infestation issue.
    • Suggesting ... [my arrest records] when a user typed how to find [...] may hint the system assumed that such records may exist
  • Racist or sexist
  • Containing profanity or violence
  • Subtle ways
    • A certain phrase may be bothersome for some group of users, but not for others.
    • A suggestion such as “ … is evil” or “… is huge” may be acceptable if the user input was “heinous crime …” or “the universe …,” but likely problematic if it was a person’s name

Manish Gupta (gmanish@microsoft.com)

90

DL for QAC, ECIR 23

91 of 105

When Are Search Completion Suggestions Problematic?

Manish Gupta (gmanish@microsoft.com)

91

DL for QAC, ECIR 23

Problem Category

Working definitions (and sub-categories)

Keywords (p/s: query)

Harmful speech

  • Hate speech: suggestions that could be perceived as hateful or that intend to intimidate or promote violence, against a group or its members.
  • Intimidates & promotes violence: suggestions that may steer users towards acting violently or that aim to intimidate certain individuals.
  • Offensive speech: suggestions that dehumanize, insult, or ridicule, actively seeking to embarrass or harm reputation.
  • Discriminatory speech: suggestions showing known or existing bias, prejudice, or intolerance, perpetuating, employing negative stereotypes, or encouraging feelings of fear or disgust towards a group or individual.
  • Defamation & derogatory speech: suggestions that defame someone by suggesting negative associations, including suggestions of dishonesty or involvement in illicit activities.
  • Profane language: suggestions including any sort of slurs, expletives, swear or curse words.
  • punch (p: should i punch [my mother])
  • hit (s: should women be [hit by men])
  • deported (s: arabs should be [deported])
  • poison (p: which poison can kill [an adult fast])

Potentially illicit

  • Facilitates illicit activities: suggestions condoning & constituting illicit speech, infringing on intellectual property, copyright rights or trademark agreements, or that facilitate or nudge users towards illicit activities.
  • Privacy breaching: suggestions revealing unwanted details from someone’s past or anything that may be construed as sensitive or personal information.
  • Terrorist or extremist propaganda: suggestions that may steer or help users find extremist content related to terrorist or extremist activities like recruiting or sponsoring.
  • Defamation & derogatory speech: See above.
  • Child abuse & pornography: suggestions related to child abuse or child pornography
  • heroin (s: trustworthy website to [buy heroin])
  • fake passports (s: how to get [fake passports])
  • beat child (s: how to [beat my child])

92 of 105

When Are Search Completion Suggestions Problematic?

Manish Gupta (gmanish@microsoft.com)

92

DL for QAC, ECIR 23

Problem Category

Working definitions (and sub-categories)

Keywords (p/s: query)

Controversy, Misinformation, and Manipulation

  • Controversial topics: suggestions that seem to endorse one side of a known controversial debate.
  • Misinfo., disinfo. or misleading content: suggestions that promote information that is factually incorrect, or that reinforce or nudge users towards conspiracy theories.
  • Coordinated attacks & suggestions manipulation: suggestions that occur as a result of attempts to manipulate the search or suggestions results, such as by promoting certain businesses or by trying to affect someone’s reputation.
  • hoax (s: climate change is [a hoax])
  • staged (s: 911 was [staged])
  • vaccines (p: vaccines are [dangerous])
  • divorce lawyer (p: divorce lawyer [nashville LAW_- FIRM_NAME])

Stereotypes & Bias

  • Ideological bias: suggestions that validate or endorse views that belong to certain ideological groups, or that promote stereotypical beliefs about an ideological group.
  • Systemically biased suggestions: suggestions about certain topics that are systematically biased towards a group, reinforcing sensitive associations between the group & negative attributes or stereotypical beliefs.
  • Discriminatory speech, Defamation & derogatory speech, Offensive Speech: See above.
  • refugees (p: refugees are [taking jobs])
  • women (p: women need [to dress modestly])
  • girl (s: running like [a girl])
  • black men (p: black men [are lazy])

Adult queries

  • Adult content: suggestions that contain pornography-related terms or steer users towards pornographic/obscene content.
  • Child abuse: See above
  • naked (p: naked girls [videos])

Other types

  • Animal cruelty: suggestions that may steer users towards info about how to harm animals.
  • Self-harm and suicidal content: suggestions that may steers someone towards hurting themselves.
  • Sensitive topics: suggestions that may trigger memories of traumatic events or be considered sensitive or emotionally charged by certain groups due to historic or cultural reasons
  • strangle dog (p&s: how to strangle [a dog])
  • hitler (p: hitler is [my god])
  • hurt myself (s: I want to [hurt myself])

93 of 105

When Are Search Completion Suggestions Problematic?

Manish Gupta (gmanish@microsoft.com)

93

DL for QAC, ECIR 23

Target Category

Working definitions (and sub-categories)

Keywords (p/s: query)

Individuals

  • References to a public or private person, who may or may not be explicitly named
  • ruth ginsburg (p: ruth ginsburg [dead yet])
  • my dad (s: should I kill [my dad])

Groups

  • References to a group of individuals that share at least a common characteristic, such as race, gender, age, occupation, appearance, disability, or country of origin
  • muslims (p: muslims try to [conquer through numbers])
  • children with adhd (s: how to punish [adhd child])

Businesses

  • References to a specific business
  • Macy’s (p: macy’s is [scamming shoppers])
  • CNN (s: should we punish [cnn])
  • Starbucks (s: should I boycott [starbucks])

Organizations

  • References to an organization, institution or agency, which can be governmental or non-governmental (but not a business); or a group of for-profit organizations if they are not specifically identified (e.g., news media instead of CNN, social media instead of Twitter)
  • mainstream media (p: mainstream media is [destroying america])
  • UNICEF (p: UNICEF is running [a scam])
  • travel companies (s: don’t waste money on [travel companies])

94 of 105

When Are Search Completion Suggestions Problematic?

Manish Gupta (gmanish@microsoft.com)

94

DL for QAC, ECIR 23

Target Category

Working definitions (and sub-categories)

Keywords (p/s: query)

Animals & objects

  • References to an animal, a group of animals, or anything that may be construed as an object or a group of objects
  • cat (s: how to poison [a cat])
  • knife (p: how to use a knife [to kill])

Activities & ideas

  • References to a specific activity, action, or idea
  • cutting yourself (p: cutting yourself is [stupid])
  • crying (p: crying is [emotional blackmail])

Other targets

  • References to concepts like ideologies, religions, programs, health issues, a situation someone may find themselves in, or other types that do not fit other categories
  • bipolar disorder (p: bipolar disorder is [fraud])
  • vaccination (p: vaccination [herd mentality])
  • science (p: science should [stay out of faith])

Generic, no target

  • There is no identifiable target or subject
  • (what [the heck])
  • (damn damn [damn])

95 of 105

Agenda

  • Components in Query Auto Completion systems [40 min]
  • Ranking [30 min]
  • Personalization [30 min]
  • Handling defective suggestions and prefixes [30 min]
  • Natural Language Generation [30 min]
  • Summary and Future Trends [10 min]

Manish Gupta (gmanish@microsoft.com)

95

DL for QAC, ECIR 23

96 of 105

Components in Query Auto Completion systems

  • Ranking suggestions
    • Most popular completion
    • Time sensitive suggestions
    • Location sensitive suggestions
    • Personalization
  • Ghosting, Session co-occurrences
  • Online spell correction, Defect handling
  • Non-prefix matches, Generating suggestions
  • Mobile QAC, Enterprise QAC

Manish Gupta (gmanish@microsoft.com)

96

DL for QAC, ECIR 23

97 of 105

Ranking

  • Traditional Machine Learning methods for ranking suggestions
  • Convolutional Latent Semantic Model
  • LSTM encoder

Manish Gupta (gmanish@microsoft.com)

97

DL for QAC, ECIR 23

98 of 105

Personalization

  • Traditional Machine Learning methods
  • Hierarchical RNN Encoder-decoder
  • GRUs with user and time representations
  • Transformer-based hierarchical encoder

Manish Gupta (gmanish@microsoft.com)

98

DL for QAC, ECIR 23

99 of 105

Handling defective suggestions and prefixes

  • LSTMs for inappropriate query suggestion detection
  • Offline Spell Correction
  • Online Spell Correction

Manish Gupta (gmanish@microsoft.com)

99

DL for QAC, ECIR 23

100 of 105

Natural Language Generation

  • RNNs with character and word embeddings
  • LSTMs with subword embeddings
  • Hierarchical RNN Encoder-decoder
  • Next Phrase Prediction with T5
  • Problems with NLG

Manish Gupta (gmanish@microsoft.com)

100

DL for QAC, ECIR 23

101 of 105

Extreme Multi-label Classification (XC/XMR) for QAC

  • Given a data point, tag it with the most relevant subset of labels from a very large set of 𝐿 labels
  • Aspects
    • Set of labels very large – 𝐿 in the 1 billion+. For any data point, few e.g. 𝒪(log⁡𝐿 ) labels relevant
    • A few labels are “head” labels, have lots of training points. Most labels are “tail” labels, very few (often < 5) training points.
    • Main challenge: predict tail labels (where diversity lies) accurately
  • Session-aware QAC can be framed as a multi-label ranking task where the input is the user’s prefix and previous queries, and the observed next query is the ground-truth label.
  • Multiple methods exist: Parabel, Renee, NGAME, DeepXML, etc.

Manish Gupta (gmanish@microsoft.com)

101

DL for QAC, ECIR 23

Yadav, Nishant, Rajat Sen, Daniel N. Hill, Arya Mazumdar, and Inderjit S. Dhillon. "Session-aware query auto-completion using extreme multi-label ranking." In PKDD, pp. 3835-3844. 2021.

102 of 105

Personalized NLG

  • Obtaining clean training data is difficult.
    • Not all sessions are personalizable
  • Better ways of using session embeddings or user context signals as input.
  • Multi-lingual support
  • Lookup + generation: How to leverage trie signals to improve generation?

Manish Gupta (gmanish@microsoft.com)

102

DL for QAC, ECIR 23

103 of 105

References

[1] Mostafa Dehghani, Sascha Rothe, Enrique Alfonseca, and Pascal Fleury. Learning to attend, copy, and generate for session-based query suggestion. In Proceedings of the 2017 ACM on Conference on Information and Knowledge Management, pages 1747–1756, 2017.

[2] Giovanni Di Santo, Richard McCreadie, Craig Macdonald, and Iadh Ounis. Comparing approaches for query autocompletion. In Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 775–778, 2015.

[3] Huizhong Duan and Bo-June Hsu. Online spelling correction for query completion. In Proceedings of the 20th international conference on World wide web, pages 117–126, 2011.

[4] Nicolas Fiorini and Zhiyong Lu. Personalized neural language models for real-world query auto completion. In Proceedings of NAACL-HLT, pages 208–215, 2018.

[5] Jyun-Yu Jiang, Yen-Yu Ke, Pao-Yu Chien, and Pu-Jen Cheng. Learning user reformulation behavior for query auto-completion. In Proceedings of the 37th international ACM SIGIR conference on Research & development in information retrieval, pages 445–454, 2014.

[6] Jyun-Yu Jiang and Wei Wang. Rin: Reformulation inference network for context-aware query suggestion. In Proceedings of the 27th ACM International Conference on Information and Knowledge Management, pages 197–206, 2018.

[7] Gyuwan Kim. Subword language model for query autocompletion. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 5022– 5032, 2019.

[8] Dong-Ho Lee, Zhiqiang Hu, and Roy Ka-Wei Lee. Improving text auto-completion with next phrase prediction. In Findings of the Association for Computational Linguistics: EMNLP 2021, pages 4434–4438, 2021.

[9] Bhaskar Mitra and Nick Craswell. Query autocompletion for rare prefixes. In Proceedings of the 24th ACM international on conference on information and knowledge management, pages 1755–1758, 2015.

[10] Agnes Mustar, Sylvain Lamprier, and Benjamin Piwowarski. Using bert and bart for query suggestion. In Joint Conference of the Information Retrieval Communities in Europe, volume 2621. CEUR-WS. org, 2020.

Manish Gupta (gmanish@microsoft.com)

103

DL for QAC, ECIR 23

104 of 105

References

[11] Alexandra Olteanu, Fernando Diaz, and Gabriella Kazai. When are search completion suggestions problematic? Proceedings of the ACM on Human-Computer Interaction, 4(CSCW2):1–25, 2020.

[12] Dae Hoon Park and Rikio Chiba. A neural language model for query auto-completion. In Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 1189– 1192, 2017.

[13] Milad Shokouhi. Learning to personalize query autocompletion. In Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval, pages 103–112, 2013.

[14] Jun Song, Jun Xiao, Fei Wu, Haishan Wu, Tong Zhang, Zhongfei Mark Zhang, and Wenwu Zhu. Hierarchical contextual attention recurrent neural network for map query suggestion. IEEE Transactions on Knowledge and Data Engineering, 29(9):1888–1901, 2017.

[15] Alessandro Sordoni, Yoshua Bengio, Hossein Vahabi, Christina Lioma, Jakob Grue Simonsen, and Jian-Yun Nie. A hierarchical recurrent encoder-decoder for generative context-aware query suggestion. In proceedings of the 24th ACM international on conference on information and knowledge management, pages 553–562, 2015.

[16] Po-Wei Wang, Huan Zhang, Vijai Mohan, Inderjit S Dhillon, and J Zico Kolter. Realtime query completion via deep language models. In eCOM@ SIGIR, 2018.

[17] Sida Wang, Weiwei Guo, Huiji Gao, and Bo Long. Efficient neural query auto completion. In Proceedings of the 29th ACM International Conference on Information & Knowledge Management, pages 2797–2804, 2020.

[18] Harish Yenala, Manoj Chinnakotla, and Jay Goyal. Convolutional bi-directional lstm for detecting inappropriate query suggestions in web search. In Pacific-Asia Conference on Knowledge Discovery and Data Mining, pages 3–16. Springer, 2017.

[19] Di Yin, Jiwei Tan, Zhe Zhang, Hongbo Deng, Shujian Huang, and Jiajun Chen. Learning to generate personalized query auto-completions via a multi-view multitask attentive approach. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pages 2998–3007, 2020.

[20] Shaohua Zhang, Haoran Huang, Jicong Liu, and Hang Li. Spelling error correction with soft-masked bert. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 882–890, 2020.

Manish Gupta (gmanish@microsoft.com)

104

DL for QAC, ECIR 23

105 of 105

Thanks!

Manish Gupta (gmanish@microsoft.com)

105

DL for QAC, ECIR 23