1 of 21

Learning through Auxiliary Supervision:�Practical Advancements and �Applications in Natural Language Processing�

Md Rizwan Parvez

University of California, Los Angles (UCLA)

1

2 of 21

2

Correct = True

Correct = 10

Fox, I don’t like it.

Language is ambiguous and hence needs background knowledge

Why auxiliary supervision is important?

3 of 21

3

Find the median of an array

Concept

Slide idea: Graham Neubig

Code

Summary

Why auxiliary supervision is important?

Diverse token seq

Challenging to generate w/o additional info

4 of 21

4

All available data

Labeled training data

Reference: Barbara Plank

Other labeled data

Why auxiliary supervision is important?

5 of 21

Challenges in processing auxiliary data?

5

^{Bill Clinton}^{, recently elected as the}^{President of the USA}^{, has been invited by the}^{Russian President]}^,^{[Vladimir Putin}^{, to visit}^Russia^.^{President Clinton}^{said that}^he^{looks forward to strengthening ties between}^USA^and^Russia

Algorithm 2 is shown to perform better Berg-Kirkpatrick, ACL 2010. It can also be expected to converge faster -- anyway, the E-step changes the auxiliary function by changing the expected counts, so there's no point in finding a local maximum of the auxiliary

function in each iteration

a local-optimality guarantee. Consequently, LOLS can improve upon the reference policy, unlike previous algorithms. This enables us to develop structured contextual bandits, a partial information structured prediction setting with many potential applications.

Can learning to search work even when the reference is poor? �We provide a new learning to search algorithm, LOLS, which does well relative to the reference policy, but additionally guarantees low regret compared to deviations from the learned policy.

Methods for learning to search for structured prediction typically imitate a reference policy, with existing theoretical guarantees demonstrating low regret compared to that reference. This is unsatisfactory in many applications where the reference policy is suboptimal and the goal of learning is to

Robin is alive and well. He is the same person that you read about in the book, Winnie the Pooh. As a boy, Chris lived in a pretty home called Cotchfield Farm. When Chris was three years old, his father wrote a poem about him. The poem was printed in a magazine for others to read. Mr. Robin then wrote a book

Heterogenous, unstructured, and noisy

Large amount of data

Wikipedia:

- 4.7 million English articles� - 35 million in total

Tweets:

- 500 million per day

- 200 billion per year

No direct supervision

Pronoun

Verb

Noun

And

Noun

Root They operate ships and banks .

6 of 21

My Research Contributions

6

Frameworks w/ Auxiliary Supervision

[ACL 18; EMNLP 19, 21; NAACL 21; LREC 18; ICTD 16]

Design tractable, principled, models
Leveraging open-source resources
Enhance in multiple aspects

(acc, speed, interpretability)

7 of 21

Retrieval Augmented Code Generation and Summarization

EMNLP-Findings 2021

7

8 of 21

Motivation

8

Find the median of an array

Concept

Slide idea: Graham Neubig

Code

Summary

9 of 21

Motivation

Sort my_tensor in descending order

Concept

Search API guidelines

Python sorted in descending order

my_tensor.sort(descending=True)

9

Browse thru. top few results

Adapt the results

Slide idea: Graham Neubig

10 of 21

Retrieved -> target code

10

Retrieved code for sorted array

Find the median of an array

11 of 21

REDCODER

11

Fig: Retrieval augmentED CODe gEneration and summaRization framework (REDCODER)

Summary and CODE Retriever (SCODE-R)

Summary and CODE Generator (SCODE-G)

PLBART, Ahmad et al., 2021

12 of 21

12

Must be fast

Needs understanding of both natural and programming languages

Sparse Vs Dense SCODE-R

Similarity

SCODE-R is based on

DPR (Karpukhin et al., 2020)

Input summary (i.e., query)

Candidate code (i.e., docs)

13 of 21

13

Example: A relevant yet not same retrieved code

SCODE-R Training

As a binary classification problem

Using the same ＜summary, code＞ training set in our final gen/sum task

No hard-negatives

Q₁

Q₂

Q₃

Q₄

Paired D₁

Paired D₂

Paired D₃

Paired D₄

positive

negative

Training minibatch

Hard Negative HD₂

Weak Retriever

Slide idea: facebookresearch

14 of 21

14

SCODE-G

SCODE-G in REDCODER uses retrieved candidate code only

Available paired summaries are used in (REDCODER-ext)

15 of 21

15

CodeXGlue: Lu et al. (2021)

Monolingual:

Code

Metrics

BLEU
CodeBLEU
EM

Baselines

Benchmark

Retrieval DB

Bilingual:

(Code, Summary)

By default, target output is removed

CSNET: Husain et al. (2019)

Evaluation Settings

16 of 21

16

Evaluation

Retrieval based

Generative

Retrieval Augmented Generative

+18%

+4%

Table: Code gen. performances

BLEU scores

REDCODER

REDCODER-ext

17 of 21

17

Redcoder-ext Prediction

BLEU: 80.6

PLBART fails to predict the diverse identifiers (in red color) whereas REDCODER succeeds

Qualitative Example

18 of 21

Questions?

https://github.com/rizwan09/REDCODER

19 of 21

Active research direction/contribution

Landscape of methods in my area of contribution

Deep learning/hidden representation�e.g., seq2seq, pretrained models, text classification, bio-NLP

Efficient/interpretable inference�e.g., Speedup, computation, memory, green AI

Programming language processing

e.g., Code generation, summarization, translation, search

19

20 of 21

Active teaching direction/contribution

Undergraduate and graduate in my area of contribution

Natural Language Processing
Machine Learning
Deep learning
Artificial Intelligence
Pattern Recognition
Special Topics in AI/NLP

20

21 of 21

Thank You!

Questions?

https://rizwan09.github.io/