1 of 12

DQAC: Detoxifying Query Auto-Completion with Adapters

Aishwarya M¹, Kaushal Maurya¹

Manish Gupta², Maunendra Sankar Desarkar¹

¹IIT Hyderabad, India ²Microsoft, India

manishg.iitb@gmail.com

07-Mar-24

2 of 12

Toxicity in Query Auto-Completion

Aishwarya M, Kaushal Maurya, Manish Gupta, Maunendra Sankar Desarkar. DQAC: Detoxifying Query Auto-Completion with Adapters. PAKDD 2024.

Ways to mitigate toxicity

Blocklist of toxic words

Needs to be constantly updated
False positives: “deepfake daughter s*x” is toxic while “f*ck you knowledge lyrics” is non-toxic

Detoxify text generated by PLMs

Controlled text generation (CTG) through fine-tuning by clean data (where to get?)
Decoding time algorithms for CTG

Increased latency

Reinforcement learning (RL) techniques to unlearn toxic content

manishg.iitb@gmail.com

07-Mar-24

3 of 12

QDetoxify: Toxicity Classifier for Search Queries

Existing models: Perspective API, Detoxify and ToxiGen.

Not trained on QAC datasets.
Need an offline tool.

QDetoxify

Initialize with Detoxify (RoBERTa fine-tuned with the Jigsaw dataset)
Trained using labeled query log from Bing.
∼7.59M training, 100K validation, and 100K test examples.

Aishwarya M, Kaushal Maurya, Manish Gupta, Maunendra Sankar Desarkar. DQAC: Detoxifying Query Auto-Completion with Adapters. PAKDD 2024.

Perf on test set

QDetoxify: 95.96%
Detoxify: 82.82%
0.797 correlation between QDetoxify and Detoxify.
‘m.i.c.r.o.s.o.f.t.’ is rated as toxic by Detoxify (score=0.58) where QDETOXIFY correctly classified it as non-toxic (score=0.23).

manishg.iitb@gmail.com

07-Mar-24

4 of 12

DQAC Model Architecture

Personalized pre-trained LM: PrsGPT2

Fine-tuning base PLM using a large personalized QAC dataset.

Two trainable adapters: non-toxic (A+) and toxic (A−).

α controls the amount of steering over the base LM

Aishwarya M, Kaushal Maurya, Manish Gupta, Maunendra Sankar Desarkar. DQAC: Detoxifying Query Auto-Completion with Adapters. PAKDD 2024.

manishg.iitb@gmail.com

07-Mar-24

5 of 12

DQAC Model Training

Aishwarya M, Kaushal Maurya, Manish Gupta, Maunendra Sankar Desarkar. DQAC: Detoxifying Query Auto-Completion with Adapters. PAKDD 2024.

DExperts

Trains 3 models: Base, expert, anti-expert.
DExperts takes ∼3x more RAM (∼3x number of model parameters) compared to DQAC.
DQAC has no additional latency overhead
DQAC operates in the latent representation space rather than in output space.

manishg.iitb@gmail.com

07-Mar-24

6 of 12

Dataset Details

Aishwarya M, Kaushal Maurya, Manish Gupta, Maunendra Sankar Desarkar. DQAC: Detoxifying Query Auto-Completion with Adapters. PAKDD 2024.

manishg.iitb@gmail.com

07-Mar-24

7 of 12

Baselines

Personalized GPT2 (PrsGPT2)
DAPT: continue fine-tuning PrsGPT2 with ∼4M non-toxic queries for which QDetoxify scores are <0.5.
PPLM: train a discriminator that learns to classify the hidden representation of base PrsGPT2 as toxic or non-toxic, using 80K Adapter-Data.
DExperts: base model = PrsGPT2; train the expert and anti-expert models on 80K Adapter-Data.

Aishwarya M, Kaushal Maurya, Manish Gupta, Maunendra Sankar Desarkar. DQAC: Detoxifying Query Auto-Completion with Adapters. PAKDD 2024.

Quark: use QDetoxify score as a reward and base PLM as PrsGPT2.
T-Adapter and NT-Adapter: enable only 1 adapter.

manishg.iitb@gmail.com

07-Mar-24

8 of 12

Metrics

Aishwarya M, Kaushal Maurya, Manish Gupta, Maunendra Sankar Desarkar. DQAC: Detoxifying Query Auto-Completion with Adapters. PAKDD 2024.

BLEU Reciprocal Rank (RR-BLEU)

reciprocal rank weighted average where weights are BLEU scores.

Average Max Toxicity (AmaxT)

average of the maximum toxicity over 10 generations for a test example.

Empirical Toxicity Probability (Prob):

probability of at least one of any 10 generations being toxic (toxicity score ≥0.5).

manishg.iitb@gmail.com

07-Mar-24

9 of 12

Overall results

Aishwarya M, Kaushal Maurya, Manish Gupta, Maunendra Sankar Desarkar. DQAC: Detoxifying Query Auto-Completion with Adapters. PAKDD 2024.

α = 2.6, bottleneck dim d = 8.
Beam size 10
Get 10 generations
Max generation length = 80
Detoxification

🡺 Avoid toxic tokens

🡺 no match with ground truth

low MRR.

Trade-off between performance and safe generation

manishg.iitb@gmail.com

07-Mar-24

10 of 12

Results on NPTQ and NPNQ testset for Bing

DQAC can generate non-toxic completions while preserving quality and relevance of the generated completions.

Aishwarya M, Kaushal Maurya, Manish Gupta, Maunendra Sankar Desarkar. DQAC: Detoxifying Query Auto-Completion with Adapters. PAKDD 2024.

manishg.iitb@gmail.com

07-Mar-24

11 of 12

Analysis and human evaluation

Prefix “piece of a”

Ground truth: “piece of a*s”
DQAC generates “piece of analysis”
Low scores of MRR and SBMRR are expected, especially for NPTQ

Human evaluation

Quantify semantic difference and contextual alignment
47 (94%) examples displayed semantic differences from the reference
42 (84%) examples maintained contextual alignment (lexical overlap) with prefix and session.

Aishwarya M, Kaushal Maurya, Manish Gupta, Maunendra Sankar Desarkar. DQAC: Detoxifying Query Auto-Completion with Adapters. PAKDD 2024.

DQAC generations are non-toxic and semantically different from ground truth.

manishg.iitb@gmail.com

07-Mar-24

12 of 12

Summary

DQAC (Detoxifying Query Auto-Completion)

Mitigate toxicity in QAC.
CTG with adapters.

QDetoxify model: query toxicity evaluation model.
Comprehensive eval using two real-world large-scale datasets.

Thanks for watching!
LinkedIn: http://aka.ms/manishgupta
HomePage: https://sites.google.com/view/manishg/

manishg.iitb@gmail.com

07-Mar-24