1 of 11

Representation Learning for Conversational Data using Discourse Mutual Information Maximization

Bishal Santra, Sumegh Roychowdhury, Aishik Mandal, Vasu Gurram, Atharva Naik, Manish Gupta, Pawan Goyal

CNeRG Lab, IIT Kharagpur & Microsoft, India

NAACL 2022

2 of 11

NLU and Pretrained Representations

  • Natural Language Understanding (NLU) is a core component for building modern text-based systems.
  • Pretraining models allows us to capture rich NLU features in vectorized representations of texts.
  • NLU capabilities of a pretrained representation differ vastly depending on
    • Pretraining corpus/dataset, the loss function, and the downstream NLU tasks
  • But, standard pretraining objectives are not aware of any structure/properties in the conversational data
  • We believe that dialog-NLU would benefit from learning discourse-level features

3 of 11

Discourse Mutual Information

  • Mutual Information I(X;Y) indicates the expected reduction in uncertainty of X by having knowledge about value of Y.
  • Discourse Mutual Information: Defined as the MI between two subcomponents of a task-specific discourse
    • Break down a discourse into two segments, based on its structure
    • DMI: Mutual information between these segments
    • Example: Dialog
      • Segments: Context, Response

3

4 of 11

DMI Objective

  • InfoNCE-S loss: Estimates MI between continuous random variables
    • We propose a symmetric version of the InfoNCE loss function (Oord et al. 2018)
  • Transformer encoders with shared parameters for both context and response

4

InfoNCE-S

5 of 11

Experimental Setup

  • Scales
    • DMI-Small: 6 Layers in Encoder, 12 attention heads
    • DMI-Medium: 8 Layers in Encoder, 12 attention heads
    • DMI-Base: 12 Layers in Encoder, 12 attention heads
    • Initialization: BERT (Small, Medium) and RoBERTa (Base)
  • Pretraining Datasets
    • Subset of Reddit-727M conversational dataset
  • Downstream Tasks
    • Dialog Understanding (Classification) - Accuracy
      • Banking77 (Intent) - 77 class, SWDA (Dialog-act) - 41 class, Empathetic-Intent - 44 class
    • Dialog Reasoning (in the form of response selection) - Recall@k
      • MuTual, MuTual-Plus
    • Dialog Evaluation - Accuracy
      • DailyDialog++

5

6 of 11

Baselines

  • Two types of baselines: Representations from generic LMs, dialog specific models

7 of 11

Results – Probing

  • Probing is important for analyzing what the pretrained model understands out-of-the-box.
  • Our model outperforms baselines significantly.
  • Performance is consistent across all tasks.

8 of 11

Results – Finetuning

  • By also updating the pretrained model’s weights, finetuning allows for better accuracy on downstream tasks.
  • DMI outperforms all baselines under both settings.
  • Discourse level features are highly suitable for pretraining effective representation.

9 of 11

Case Studies

  • Select appropriate responses from a very large pool (~7000) of utterances
    • Response Candidates: Encode and cache all utterances in the pool
    • Encode the context and rank all the candidates in the pool using DMI score function

9

Context:

User 1: Are you busy tomorrow morning ?

Response:

By User 2 (Human): I'm free . What's up ?

By DMI Model: No , not this weekend . I have too much work to do .

Context:

User 1: Hi , Dan . What's new ?

User 2: Where have you been ? I've been trying to get in touch with you for two days .

User 1: I went to Salt Lake City on business . What's up ?

Response

By User 2 (Human): I got fired .

By DMI Model: Not much . I had to pay an unexpected bill , so I needed the money back .

  • Example Predictions from E-Intent Task

10 of 11

Conclusions

  • Proposed a novel pretraining objective for better modeling of dialogs that turns the discourse-level organizational structure of texts into a learnable objective.
  • We experimentally showed that representations learned using the DMI objective are much more effective and consistent across different downstream tasks.
  • We release pretrained DMI model checkpoints of various sizes.

Links

  1. https://bsantraigi.github.io/DMI
  2. https://arxiv.org/pdf/2112.05787.pdf

11 of 11

Thank you for your attention!