1 of 12

DQAC: Detoxifying Query Auto-Completion with Adapters

Aishwarya M1, Kaushal Maurya1

Manish Gupta2, Maunendra Sankar Desarkar1

1IIT Hyderabad, India 2Microsoft, India

manishg.iitb@gmail.com

1

07-Mar-24

2 of 12

Toxicity in Query Auto-Completion

  •  
  • Ways to mitigate toxicity
    • Blocklist of toxic words
      • Needs to be constantly updated
      • False positives: “deepfake daughter s*x” is toxic while “f*ck you knowledge lyrics” is non-toxic
    • Detoxify text generated by PLMs
      • Controlled text generation (CTG) through fine-tuning by clean data (where to get?)
      • Decoding time algorithms for CTG
        • Increased latency
      • Reinforcement learning (RL) techniques to unlearn toxic content

manishg.iitb@gmail.com

2

07-Mar-24

3 of 12

QDetoxify: Toxicity Classifier for Search Queries

  • Existing models: Perspective API, Detoxify and ToxiGen.
    • Not trained on QAC datasets.
    • Need an offline tool.
  • QDetoxify
    • Initialize with Detoxify (RoBERTa fine-tuned with the Jigsaw dataset)
    • Trained using labeled query log from Bing.
    • ∼7.59M training, 100K validation, and 100K test examples.

  • Perf on test set
    • QDetoxify: 95.96%
    • Detoxify: 82.82%
    • 0.797 correlation between QDetoxify and Detoxify.
    • ‘m.i.c.r.o.s.o.f.t.’ is rated as toxic by Detoxify (score=0.58) where QDETOXIFY correctly classified it as non-toxic (score=0.23).

manishg.iitb@gmail.com

3

07-Mar-24

4 of 12

DQAC Model Architecture

  • Personalized pre-trained LM: PrsGPT2
    • Fine-tuning base PLM using a large personalized QAC dataset.
  • Two trainable adapters: non-toxic (A+) and toxic (A−).

  • α controls the amount of steering over the base LM

manishg.iitb@gmail.com

4

07-Mar-24

5 of 12

DQAC Model Training

  •  
  • DExperts
    • Trains 3 models: Base, expert, anti-expert.
    • DExperts takes ∼3x more RAM (∼3x number of model parameters) compared to DQAC.
    • DQAC has no additional latency overhead
    • DQAC operates in the latent representation space rather than in output space.

manishg.iitb@gmail.com

5

07-Mar-24

6 of 12

Dataset Details

  •  
  •  

manishg.iitb@gmail.com

6

07-Mar-24

7 of 12

Baselines

  • Personalized GPT2 (PrsGPT2)
  • DAPT: continue fine-tuning PrsGPT2 with ∼4M non-toxic queries for which QDetoxify scores are <0.5.
  • PPLM: train a discriminator that learns to classify the hidden representation of base PrsGPT2 as toxic or non-toxic, using 80K Adapter-Data.
  • DExperts: base model = PrsGPT2; train the expert and anti-expert models on 80K Adapter-Data.
  • Quark: use QDetoxify score as a reward and base PLM as PrsGPT2.
  • T-Adapter and NT-Adapter: enable only 1 adapter.

manishg.iitb@gmail.com

7

07-Mar-24

8 of 12

Metrics

  •  
  • BLEU Reciprocal Rank (RR-BLEU)
    • reciprocal rank weighted average where weights are BLEU scores.
  • Average Max Toxicity (AmaxT)
    • average of the maximum toxicity over 10 generations for a test example.
  • Empirical Toxicity Probability (Prob):
    • probability of at least one of any 10 generations being toxic (toxicity score ≥0.5).

manishg.iitb@gmail.com

8

07-Mar-24

9 of 12

Overall results

  • α = 2.6, bottleneck dim d = 8.
  • Beam size 10
  • Get 10 generations
  • Max generation length = 80
  • Detoxification

🡺 Avoid toxic tokens

🡺 no match with ground truth

    • low MRR.
  • Trade-off between performance and safe generation

manishg.iitb@gmail.com

9

07-Mar-24

10 of 12

Results on NPTQ and NPNQ testset for Bing

DQAC can generate non-toxic completions while preserving quality and relevance of the generated completions.

manishg.iitb@gmail.com

10

07-Mar-24

11 of 12

Analysis and human evaluation

  • Prefix “piece of a”
    • Ground truth: “piece of a*s”
    • DQAC generates “piece of analysis”
    • Low scores of MRR and SBMRR are expected, especially for NPTQ
  • Human evaluation
    • Quantify semantic difference and contextual alignment
    • 47 (94%) examples displayed semantic differences from the reference
    • 42 (84%) examples maintained contextual alignment (lexical overlap) with prefix and session.
  • DQAC generations are non-toxic and semantically different from ground truth.

manishg.iitb@gmail.com

11

07-Mar-24

12 of 12

Summary

  • DQAC (Detoxifying Query Auto-Completion)
    • Mitigate toxicity in QAC.
    • CTG with adapters.
  • QDetoxify model: query toxicity evaluation model.
  • Comprehensive eval using two real-world large-scale datasets.

manishg.iitb@gmail.com

12

07-Mar-24