1 of 21

Learning through Auxiliary Supervision:�Practical Advancements and �Applications in Natural Language Processing

Md Rizwan Parvez

University of California, Los Angles (UCLA)

1

2 of 21

2

Correct = True

Correct = 10

Fox, I don’t like it.

Language is ambiguous and hence needs background knowledge

Why auxiliary supervision is important?

3 of 21

3

Find the median of an array

Concept

Slide idea: Graham Neubig

Code

Summary

Why auxiliary supervision is important?

Diverse token seq

Challenging to generate w/o additional info

4 of 21

4

All available data

Labeled training data

Reference: Barbara Plank

Other labeled data

Why auxiliary supervision is important?

5 of 21

Challenges in processing auxiliary data?

5

Bill Clinton, recently elected as the President of the USA, has been invited by the Russian President], [Vladimir Putin, to visit Russia. President Clinton said that he looks forward to strengthening ties between USA and Russia

Algorithm 2 is shown to perform better Berg-Kirkpatrick, ACL 2010. It can also be expected to converge faster -- anyway, the E-step changes the auxiliary function by changing the expected counts, so there's no point in finding a local maximum of the auxiliary

function in each iteration

a local-optimality guarantee. Consequently, LOLS can improve upon the reference policy, unlike previous algorithms. This enables us to develop structured contextual bandits, a partial information structured prediction setting with many potential applications.

Can learning to search work even when the reference is poor? �We provide a new learning to search algorithm, LOLS, which does well relative to the reference policy, but additionally guarantees low regret compared to deviations from the learned policy.

Methods for learning to search for structured prediction typically imitate a reference policy, with existing theoretical guarantees demonstrating low regret compared to that reference. This is unsatisfactory in many applications where the reference policy is suboptimal and the goal of learning is to

Robin is alive and well. He is the same person that you read about in the book, Winnie the Pooh. As a boy, Chris lived in a pretty home called Cotchfield Farm. When Chris was three years old, his father wrote a poem about him. The poem was printed in a magazine for others to read. Mr. Robin then wrote a book

Robin is alive and well. He is the same person that you read about in the book, Winnie the Pooh. As a boy, Chris lived in a pretty home called Cotchfield Farm. When Chris was three years old, his father wrote a poem about him. The poem was printed in a magazine for others to read. Mr. Robin then wrote a book

Heterogenous, unstructured, and noisy

Large amount of data

Wikipedia:

- 4.7 million English articles� - 35 million in total

Tweets:

- 500 million per day

- 200 billion per year

No direct supervision

Pronoun

Verb

Noun

And

Noun

Root They operate ships and banks .

6 of 21

My Research Contributions

6

Frameworks w/ Auxiliary Supervision

[ACL 18; EMNLP 19, 21; NAACL 21; LREC 18; ICTD 16]

  • Design tractable, principled, models
  • Leveraging open-source resources
  • Enhance in multiple aspects
    • (acc, speed, interpretability)

7 of 21

Retrieval Augmented Code Generation and Summarization

EMNLP-Findings 2021

7

8 of 21

Motivation

8

Find the median of an array

Concept

Slide idea: Graham Neubig

Code

Summary

9 of 21

Motivation

Sort my_tensor in descending order

Concept

Search API guidelines

Python sorted in descending order

my_tensor.sort(descending=True)

9

Browse thru. top few results

Adapt the results

Slide idea: Graham Neubig

10 of 21

Retrieved -> target code

10

Retrieved code for sorted array

Find the median of an array

11 of 21

REDCODER

11

Fig: Retrieval augmentED CODe gEneration and summaRization framework (REDCODER)

Summary and CODE Retriever (SCODE-R)

Summary and CODE Generator (SCODE-G)

PLBART, Ahmad et al., 2021

12 of 21

12

  • Must be fast

  • Needs understanding of both natural and programming languages

Sparse Vs Dense SCODE-R

Similarity

SCODE-R is based on

DPR (Karpukhin et al., 2020)

Input summary (i.e., query)

Candidate code (i.e., docs)

13 of 21

13

Example: A relevant yet not same retrieved code

SCODE-R Training

  • As a binary classification problem

  • Using the same <summary, code> training set in our final gen/sum task

  • No hard-negatives

Q1

Q2

Q3

Q4

Paired D1

Paired D2

Paired D3

Paired D4

positive

negative

negative

negative

Training minibatch

Hard Negative HD2

Weak Retriever

Slide idea: facebookresearch

14 of 21

14

SCODE-G

  • SCODE-G in REDCODER uses retrieved candidate code only

  • Available paired summaries are used in (REDCODER-ext)

15 of 21

15

CodeXGlue: Lu et al. (2021)

Monolingual:

Code

Metrics

  • BLEU
  • CodeBLEU
  • EM

Baselines

Benchmark

Retrieval DB

Bilingual:

(Code, Summary)

By default, target output is removed

CSNET: Husain et al. (2019)

Evaluation Settings

16 of 21

16

Evaluation

Retrieval based

Generative

Retrieval Augmented Generative

+18%

+4%

Table: Code gen. performances

BLEU scores

REDCODER

REDCODER-ext

17 of 21

17

Redcoder-ext Prediction

BLEU: 80.6

PLBART fails to predict the diverse identifiers (in red color) whereas REDCODER succeeds

Qualitative Example

18 of 21

Questions?

19 of 21

Active research direction/contribution

  • Landscape of methods in my area of contribution
    • Deep learning/hidden representation�e.g., seq2seq, pretrained models, text classification, bio-NLP

    • Efficient/interpretable inference�e.g., Speedup, computation, memory, green AI

    • Programming language processing

e.g., Code generation, summarization, translation, search

19

20 of 21

Active teaching direction/contribution

  • Undergraduate and graduate in my area of contribution
    • Natural Language Processing
    • Machine Learning
    • Deep learning
    • Artificial Intelligence
    • Pattern Recognition
    • Special Topics in AI/NLP

20

21 of 21

Thank You!

Questions?