1 of 14

Assignment 4

Machine Translation

&

Natural Language Generation

TAs

adl2016ta@gmail.com

2 of 14

Outline

  • Machine Translation
    • Introduction
    • Dataset
  • Natural Language Generation
    • Introduction
    • Dataset
  • Grading & Format
  • FAQ
  • GitHub
  • CodaLab

3 of 14

Assignment

Release: 2016/12/1 09:00

Deadline: 2016/12/15 09:00

4 of 14

Machine Translation

  • Sequence-to-Sequence Model

  • Task
    • Input : No more homework please
    • Output : 請別再出作業了

Ref: Sequence to Sequence Learning with Neural Networks , arXiv:1409.3215v3

5 of 14

Machine Translation

  • Dataset
    • English to Spainish translation
    • About 10000000 sentence pairs
    • Including training set and validation set

6 of 14

Natural Language Generation

  • Task
    • Input : act( slot_name1=’value1’ ,slot_name2=’value2’, ... )
    • Output : sentece corresponding to the act and slot.
    • e.g : �given ...� “inform(food=japanese,type=restaurant,pricerange=moderate)“�output shold be like …� "i would like a japanese restaurant moderately priced"
  • Simplify Task
    • Input : � “inform(food=japanese,type=restaurant,pricerange=moderate)“
    • Output :� "i would like a food type pricerange priced"�

7 of 14

Natural Language Generation

  • Dataset
    • Dialogue State Tracking Challenge 2 (DSTC2)
    • Dialogue of finding restaurants
    • Don’t worry! already parsed!
    • json format�

8 of 14

Grading

&

Format

9 of 14

Grading Policy

  • Machine Translation (6+1%)
    • Achieve weak baseline : BLEU = 0.22
    • Achieve strong baseline : BLEU = 0.27
  • Natural Language Generation (4+1%)
    • Achieve weak baseline : BLEU = 0.32
    • Achieve strong baseline : BLEU = 0.60
  • Report (3%)
    • Describe your model in detail
    • Describe what you learned and how you improve the performance
    • Roughly comment on your code

Know more about BLEU : https://en.wikipedia.org/wiki/BLEU

10 of 14

Submission forfat on CodaLab

  • One sentence per line
  • BOS, EOS not included
  • EX:

11 of 14

Submission format on GitHub

  • Only Python with TensorFlow (TAs will run your code on TensorFlow-only environment)
  • Deadline: 2016/12/15 23:59:59
  • ADL2016/hw4 should contain all the things you use, run_translate.sh , run_generation.sh and report ...
  • Usage:�bash run_translation.sh [testing data] [answer file]�bash run_generation.sh [testing data] [answer file]

12 of 14

Other policy

  • Incompatible format will not be graded.
  • Late policy: 25% off per day late afterwards.
  • If you want to use free days, please send email to TAs and tell TAs how many free days you want to use on this assignment.
  • You can check the free days here.
  • Do not PM TAs through Facebook. Thanks!
  • Please ask through e-mail or post in FB group.

13 of 14

one more thing...

14 of 14

FAQ