Expand, Highlight, Generate:
RL-driven Document Generation for Passage Reranking
Arian Askari, Mohammad Aliannejadi, Chuan Meng, Evangelos Kanoulas, Suzan Verberne
Arian Askari
Arian Askari 2023
Generative LLMs are not immune to mistakes or hallucinations
Arian Askari 2023
Generative LLMs are not immune to mistakes or hallucinations
Arian Askari 2023
Search engines are important because:
They do not hallucinate!
relatively large amount of training data
Arian Askari 2023
Employing document generator
Arian Askari 2023
Filtering the pairs of query and synthetic document that are unlikely to be relevant
Generating a synthetic document per query
by document generator
A set of queries
Training a MonoT5 re-ranker�on the filtered data
Example of noisy text by LLaMA v1
Example 4:
Query: what goes on the bottom of shepherd's pie
Relevant document: \begin{itemize}
\item 50g butter
\begin {itemize}\item melted
\end{ulis}t
\n\item flour
\uline{15g} \
\linebreak
\lipsum[1]
\hfill
\vspace{2pt}
\textbf{
\textit{ }
\textbullet
\par
\noindent
\bigskip
\newline
\hline
*
**\end {enumerate}*\n *\end* \\\\\\ \\
&&&\n\\\\ \ \\ \ \ \\ \\ & \ & & \\ & &\n \\\\ &&&& \& \&\&\\& &\\\n \ &&&&\&\
&&& \\& &\\ \\&&&&
We propose two document generators: DocGen and DocGen-RL
We propose two document generators: DocGen and DocGen-RL
[1]Orion Weller, Marc Marone, Nathaniel Weir, Dawn Lawrie, Daniel Khashabi, and Benjamin Van Durme.2023. " according to..." prompting language models improves quoting from pre-training data. arXiv preprint arXiv:2305.13252.
We propose two document generators: DocGen and DocGen-RL
[1]Orion Weller, Marc Marone, Nathaniel Weir, Dawn Lawrie, Daniel Khashabi, and Benjamin Van Durme.2023. " according to..." prompting language models improves quoting from pre-training data. arXiv preprint arXiv:2305.13252.
Query expansion prompt
Example 1:
Query: Is a little caffeine ok during pregnancy?
Query Expanded: What is the recommended amount of caffeine intake during pregnancy, and are there any potential risks associated with consuming small amounts of caffeine while pregnant?
Example 2:
Query: What fruit is native to Australia?
Query Expanded: Which fruit is exclusive to Australia and provide some additional details about it?
Example 3:
Query: How large is the canadian military?
Query Expanded: What is the size of the canadian military ahd what is the number of active personnel and reserve members?
Example 4:
Query: {query_text}
Query Expanded:
Query highlighting prompt
Example 1:
Query: What is the recommended amount of caffeine intake during pregnancy, and are there any potential risks associated with consuming small amounts of caffeine while pregnant?
Query Highlighted: What is the recommended amount of [caffeine] intake during [pregnancy], and are there any potential risks associated with consuming small amounts of [caffeine] while [pregnant]?
Example 2:
Query: Which fruit is exclusive to Australia and provide some additional details about it?
Query Highlighted: Which [fruit] is exclusive to [Australia] and provide some additional details about it?
Example 3:
Query: What is the size of the canadian military ahd what is the number of active personnel and reserve members?
Query Highlighted: What is the size of the [canadian military] ahd what is the number of active personnel and reserve members?
Example 4:
Query: {query_text}
Query Highlighted:
Document generation prompt
Example1:
Query: What is the recommended amount of [caffeine] intake during [pregnancy], and are there any potential risks associated with consuming small amounts of [caffeine] while [pregnant]?
Relevant Document: We don't know a lot about the effects of caffeine during pregnancy on you and your baby. So it's best to limit the amount you get each day. If you are pregnant, limit caffeine to 200 milligrams each day. This is about the amount in 1½ 8-ounce cups of coffee or one 12-ounce cup of coffee.
Example 2:
Query: Which [fruit] is exclusive to [Australia] and provide some additional details about it?
Relevant Document: Passiflora herbertiana. A rare passion fruit native to Australia. Fruits are green-skinned, white fleshed, with an unknown edible rating. Some sources list the fruit as edible, sweet and tasty, while others list the fruits as being bitter and inedible.assiflora herbertiana. A rare passion fruit native to Australia. Fruits are green-skinned, white fleshed, with an unknown edible rating. Some sources list the fruit as edible, sweet and tasty, while others list the fruits as being bitter and inedible.
Example 3:
Query: What is the size of the [canadian military] ahd what is the number of active personnel and reserve members?
Relevant Document: The Canadian Armed Forces. 1 The first large-scale Canadian peacekeeping mission started in Egypt on November 24, 1956. 2 There are approximately 65,000 Regular Force and 25,000 reservist members in the Canadian military. 3 In Canada, August 9 is designated as National Peacekeepers' Day.
Example 4:
Query: {query_text}
Relevant Document:
MonoT5
true
false
Training a consistency filtering model
RL-training
Results
Baselines: InPars, GenRead, Q2D
Arian Askari 2023
Retriever | Data Augmentor | NQ nDCG | MS MARCO MRR | DL’20 MAP nDCG |
First stage BM25 | — | .329 | .187 | .286 .480 |
Rerankers MonoT5 | InPars (Bonifacio et al., 2020) | .335 | .259 | .360 .576 |
InPars (replicated) | .337 | .223 | .357 .569 | |
GenRead (replicated) | .368 | .230 | .354 .570 | |
Q2D (replicated) | .309 | .158 | .252 .437 | |
Rerankers w/DocGen MonoT5 | DocGen (Ours) | .467 | .275 | .398 .580 |
DocGen-RL (Ours) | .517 | .332 | .421 .618 | |
Human Annotation MonoT5 | — | .567 | .381 | .491 .714 |
Results on NQ, MSMARCO, DL;20, HotpotQA, and Fever datasets in terms of official metrics
Arian Askari 2023
Retriever | Data Augmentor | NQ nDCG | MS MARCO MRR | DL’20 MAP nDCG |
First stage BM25 | — | .329 | .187 | .286 .480 |
Rerankers MonoT5 | InPars (Bonifacio et al., 2020) | .335 | .259 | .360 .576 |
InPars (replicated) | .337 | .223 | .357 .569 | |
GenRead (replicated) | .368 | .230 | .354 .570 | |
Q2D (replicated) | .309 | .158 | .252 .437 | |
Rerankers w/DocGen MonoT5 | DocGen (Ours) | .467 | .275 | .398 .580 |
DocGen-RL (Ours) | .517 | .332 | .421 .618 | |
Human Annotation MonoT5 | — | .567 | .381 | .491 .714 |
Arian Askari 2023
Retriever | Data Augmentor | NQ nDCG | MS MARCO MRR | DL’20 MAP nDCG |
First stage BM25 | — | .329 | .187 | .286 .480 |
Rerankers MonoT5 | InPars (Bonifacio et al., 2020) | .335 | .259 | .360 .576 |
InPars (replicated) | .337 | .223 | .357 .569 | |
GenRead (replicated) | .368 | .230 | .354 .570 | |
Q2D (replicated) | .309 | .158 | .252 .437 | |
Rerankers w/DocGen MonoT5 | DocGen (Ours) | .467 | .275 | .398 .580 |
DocGen-RL (Ours) | .517 | .332 | .421 .618 | |
Human Annotation MonoT5 | — | .567 | .381 | .491 .714 |
Arian Askari 2023
We perform two different analysis on DocGen and DocGen-RL:
We use nDCG@10 for evaluation.
Arian Askari 2023
Ablation study on DocGen.
Arian Askari 2023
Dataset | NQ-test |
DocGen w/o expanding | .370 |
DocGen w/o highlighting | .363 |
DocGen w/o expanding and highlighting | .351 |
DocGen | .4670 |
RL-training analysis on DocGen-RL:
Arian Askari 2023
Dataset | NQ-test |
DocGen + only RL on highlighting ( = DocGen-RL) | .517 |
DocGen + only RL on expanding | .473 |
DocGen + only RL on doc generation | .448 |
DocGen | .4670 |
RL-training analysis on DocGen-RL:
Arian Askari 2023
Dataset | NQ-test |
DocGen + only RL on highlighting ( = DocGen-RL) | .517 |
DocGen + only RL on expanding | .473 |
DocGen + only RL on doc generation | .448 |
DocGen | .4670 |
RL-training analysis on DocGen-RL:
Arian Askari 2023
Dataset | NQ-test |
DocGen + only RL on highlighting ( = DocGen-RL) | .517 |
DocGen + only RL on expanding | .473 |
DocGen + only RL on doc generation | .448 |
DocGen | .467 |
Analyzing gap between synthetic and realistic data
Scaling analysis: Impact of scaling on DocGen. Eval on NQ-test in terms of nDCG@10
Arian Askari 2023
Dataset | NQ-test |
BLOOM 560M and T5-base (220M) | .467 |
BLOOM-3B | .482 |
T5-large (770M) | .495 |
Analysis on highlighting character
Arian Askari 2023
Limitations:
Arian Askari 2023
Takeaways
Scan the QR Code to check out the dataset
Arian Askari 2023
Thank you!
Appendix
Further training the trained model on MS MARCO on our synthetic data.
Arian Askari 2023
Further training the trained model on MS MARCO on our synthetic data.
Arian Askari 2023