1 of 13

DAC: Detoxifying Query Auto-Completions

Aishwarya M1, Kaushal Maurya1

Manish Gupta2, Maunendra Sankar Desarkar1

1IIT Hyderabad, India 2Microsoft, India

1

2 of 13

Toxicity in NLG outputs for QAC task

  • Query Auto-Completion (QAC)
    • Recommends completions for prefixes.
    • Faster query formulation.
  • NLG-based QAC models can generate biased and toxic content.
    • LLMs are pretrained on unfiltered corpora.
    • QAC datasets for finetuning may contain toxic queries.

  • Detoxification for QAC
    • Controlled text generation
    • Given: ⟨session s, prefix p⟩
    • Generate 𝑚 (=10) completions
      • Close to the actual complete query q
      • Relevant to the session
      • non-toxic

manishg.iitb@gmail.com

2

20-Apr-24

3 of 13

Why is detoxification in QAC challenging?

  • Structure of queries
    • Short
    • Contain spelling errors
    • Disregard grammatical rules
    • Allow for flexible word order

  • Nature of queries
    • Toxic: “a big d*ck”
    • Non-toxic: “how to calculate interest”
    • Other categories (addressable toxic)
      • Implicitly toxic: “how to become a perfect liar”
      • Subjectively toxic: “black representation in the media”
      • Non-toxic but include toxic words: “what are sexual diseases”

manishg.iitb@gmail.com

3

20-Apr-24

4 of 13

Possible approaches for detoxification in QAC

  • Blocklist of offensive terms
    • Red teaming
    • Users report objectionable completions
    • Monitoring and maintenance is a challenge
  • Maintain query templates
    • coverage remains limited
  • ML approaches
    • Query embeddings
    • N-strike rules
    • Limited accuracy and coverage
    • Block the completions without proposing safe alternatives
  • Pre-train LMs using clean and unbiased (non-toxic) labeled examples
    • Creating such a clean dataset is difficult
  • Controllable Text Generation (CTG)
    • Tested for scenarios where the input/prompt and completions are well-formed and longer.
    • Word filtering during generation
    • Finetune models with desired attribute datasets
    • Steer the generation while decoding (PPLM): compute-intensive.
    • RL based CTG (Quark, FGRL) use toxicity reward signals guide the generation
      • Queries have diverse structures and nature.
      • Unreliable rewards

manishg.iitb@gmail.com

4

20-Apr-24

5 of 13

Detoxifying Query Auto-Completion (DAC)

  • PrsLM: GPT2
  • TClassify: RoBERTa

manishg.iitb@gmail.com

5

20-Apr-24

6 of 13

TClassify: Toxicity Classifier for Queries

  • Existing models: Perspective API, Detoxify, and ToxiGen.
    • Not trained using QAC data
    • Not available offline
    • Binary classification rather than ternary
  • TClassify
    • Detoxify finetuned on ternary classification dataset
      • ∼900K training, ∼60K validation, ∼90K test examples
    • Detoxify is RoBERTa finetuned on Jigsaw
  • High correlation of 0.8 between TClassify and Detoxify with T and NT sets

manishg.iitb@gmail.com

6

20-Apr-24

7 of 13

DAC: Detoxification with RL

manishg.iitb@gmail.com

7

20-Apr-24

8 of 13

DAC: Detoxification with RL

  • Quantized Optimal Transport (OT)
    • 3 rewards
      • quantized OT score
      • normalized toxicity score
        • 𝑞base is 0, 1/3, or 2/3 if class is NT, AT, or T
        • 𝑟𝑡(𝑞) = 𝑞base + 𝑞intensity/3
        • 𝑞intensity is prob of predicted toxicity class
      • length penalty
    • Normalization of Toxicity Scores
      • helps create quantiles for quantized OT reward.
      • enhances the training stability.
  •  

manishg.iitb@gmail.com

8

20-Apr-24

 

9 of 13

Datasets, metrics and baselines

  •  
  • Baselines
    • DAPT: continue fine-tuned PrsLM with ∼4M non-toxic examples
    • PPLM: iteratively updates hidden representation weights during the generation to steer it toward non-toxic completion.
    • Quark, PPO, FineGrained RL (FGRL): RL with TClassify toxicity score as reward.

manishg.iitb@gmail.com

9

20-Apr-24

10 of 13

Results on toxic, addressable toxic and non-toxic test sets

manishg.iitb@gmail.com

10

20-Apr-24

11 of 13

Interpreting detoxification process with DAC

manishg.iitb@gmail.com

11

20-Apr-24

12 of 13

Effect of OT distance weighting

  • Want the effect of optimal transport (OT) to be higher for non-toxic cases.

manishg.iitb@gmail.com

12

20-Apr-24

13 of 13

Summary

  • Ternary TClassify model: Toxic, addressable toxic, non-toxic
  • DAC: Exploration, Quantized Optimal Transport, RL.
  • Bing and AOL

manishg.iitb@gmail.com

13

20-Apr-24