1 of 35

MLDS HW2-1

TAs

ntu.mldsta@gmail.com

2 of 35

HW2-1 Update (4/19)

  1. baseline調降為BLEU@1 = 0.6 (Captions Avg.)
  2. data format 調整
  3. 更改hw2_seq2seq.sh 的寫法

3 of 35

Data & format

  • Dataset:
    • MSVD �- 1450 videos for training�- 100 videos for testing
  • Format:
    • Download MLDS_hw2_1_data.tar.gz (4/19 update)

更新 dataset:�

因為助教需要另外生成TAreview的output,�為了讓script的argument較為簡單,所以

把testing_data 移動到testing_data內�並改名為id.txt�把training_data移動到training_data內

並改名為id.txt�所以只需要輸入資料夾名稱,便可以直接生成

該資料夾影片的captions

ç√Ω

ç√Ω

4 of 35

Submission & Rules

  • For HW2-1:
    • Please write shell script “hw2_seq2seq.sh” to run your code and follow the script usage below:�- ./hw2_seq2seq.sh $1 $2�- $1: the data directory �- $2: test data output filename (format:.txt)�- Example1 ./hw2_seq2seq.sh testing_data testset_output.txt�- Example2 ./hw2_seq2seq.sh ta_review_data tareviewset_output.txt

助教除了會使用到testing_data這個資料夾之外,還會另外使用到ta_review_data這個資料夾來生成ta_review的句子。�Your script should be done within 10 mins excluding model donwloading.

5 of 35

Outline

  • Timeline
  • Task Descriptions
  • Q&A

6 of 35

Timeline

7 of 35

Two Parts in HW2

  • (2-1) Video caption generation
    • Sequence-to-sequence model
    • Training Tips
  • (2-2) Chat-bot

8 of 35

Schedule

  • 3/30:
    • Release HW2-1
  • 4/13:
    • Release HW2-2
  • 4/27:
    • Midterm
    • HW1 上台分享
  • 5/4:
    • All HW2 due (including HW2-1, HW2-2)

9 of 35

Task Descriptions

10 of 35

HW2-1: Video caption generation

  • Introduction
  • Sequence-to-sequence model
  • Training Tips
    • Attention
    • Schedule Sampling
    • Beamsearch
  • How to reach the baseline ?

11 of 35

HW2-1: Video caption generation

  • Introduction
  • Sequence-to-sequence model
  • Training Tips
    • Attention
    • Schedule Sampling
    • Beamsearch
  • How to reach the baseline ?

12 of 35

HW2-1 Introduction

  • Video Caption Generation
    1. Input : A short video
    2. Output: The corresponding caption that depicts the video��

  • There are several difficulties including:
    • Different attributes of video (object, action)
    • Variable length of I/O

( In this task, video features will be provided )

Input

Output

“a man is playing a song on the piano”

13 of 35

HW2-1 Sequence-to-sequence 1/5

  • Two recurrent neural networks (RNNs) �an encoder that processes the input�a decoder that generates the output

encoder

decoder

<BOS>

<EOS>

14 of 35

HW2-1 Sequence-to-sequence 2/5

  • Data preprocess:
    • Dictionary - most frequently word or min count
    • other tokens:<PAD>, <BOS>, <EOS>, <UNK>- <PAD> :Pad the sentencen to the same length�- <BOS> :Begin of sentence, a sign to generate the output sentence.�- <EOS> :End of sentence, a sign of the end of the output sentence.�- <UNK> :Use this token when the word isn’t in the dictionary or just ignore the unknown word.

15 of 35

HW2-1 Sequence-to-sequence 3/5

  • Text Input: reference
    • One-hot Vector encoding�( 1-to-N coding, N is the size of the vocabulary in dictionary )
    • e.g.�- neural = [0, 0, 0, …, 1, 0, 0, …, 0, 0, 0]�- network = [0, 0, 0, …, 0, 0, 1, …, 0, 0, 0]�
  • LSTM unit:�cell output than project to a �vocabulary-size vector

16 of 35

HW2-1 Sequence-to-sequence - S2VT 4/5

  • Sequence-to-Sequence Based Model: S2VT

Refer to the following paper for detailed info:

http://www.cs.utexas.edu/users/ml/papers/venugopalan.iccv15.pdf

17 of 35

HW2-1 Sequence-to-sequence - S2VT 5/5

  • Sequence-to-Sequence Based Model:S2VT�- Two layer LSTM structure

18 of 35

HW2-1 Training Tips - Attention 1/3

  • Attention on encoder hidden states :
    • Allow model to peek at different sections of inputs at each decoding time step

19 of 35

HW2-1 Training Tips - Schedule Sampling 2/3

  • Schedule Sampling:
    • To solve “exposure bias” problem, �When training, we feed (groundtruth) or (last time step’s output) as input at odds

20 of 35

HW2-1 Training Tips - Beam search 3/3

  • Beam search:
    • keep a fixed number of paths

Demo:http://dbs.cloudcv.org/captioning

21 of 35

HW2-1 How to reach the baseline ? 1/2

  • Evaluation:BLEU@1
    • Precision = correct words / candidate length
    • BLEU@1 = BP * Precision
    • e.g.:�Ground Truth a man is mowing a lawnPrediction a man is riding a man on a woman is riding a motorcycle�BLEU:1 * 4/13 = 0.308
    • paper

where c = candidate length, r = reference length

22 of 35

HW2-1 How to reach the baseline ? 2/2

  • Baseline: BLEU@1 = 0.65 (Captions Avg.)
  • baseline model:

  • Training Epoch = 200 - AdamOptimizer
  • LSTM dimension = 256 - Training time = 72 mins, using 960 TX
  • Learning rate = 0.001
  • vocab size = min count > 3

23 of 35

Data & format

  • Dataset:
    • MSVD �- 1450 videos for training�- 100 videos for testing
  • Format:
    • Download MLDS_hw2_1_data.tar.gz (4/19 update)

更新 dataset:�

因為助教需要另外生成TAreview的output,�為了讓script的argument較為簡單,所以

把testing_data 移動到testing_data內�並改名為id.txt�把training_data移動到training_data內

並改名為id.txt�

ç√Ω

ç√Ω

24 of 35

Submission & Rules

  • Please implement one seq-to-seq model (or it’s variant) to fulfill the task
  • Extra dataset is allowed to use.
  • Allow package:
    • python 3.6
    • TensorFlow r1.6 ONLY (CUDA 9.0)
    • PyTorch 0.3 / torchvision
    • Keras 2.0.7 (TensorFlow backend only)
    • MXNet 1.1.0, CNTK 2.4
    • matplotlib, Python Standard Library
    • If you want to use other packages, please ask TAs for permission first!
    • new allowed package: pandas, tensorlayer, gensim, nltk

25 of 35

Submission & Rules

  • Deadline2018/5/4 23:59 (GMT+8)
  • Upload code and report of HW2-1, HW2-2 to Github in different directory.
  • For HW2-1:
    • Your github must have directory hw2/hw2_1/, and there should be:(1) report.pdf (2) your_seq2seq_model (3) hw2_seq2seq.sh�(4) model_seq2seq.py ( training code should include )
    • If your model are too big for github, upload to a cloud space and write it in your script to download the model.
    • Please do not upload any dataset to Github (include external dataset).

26 of 35

Submission & Rules

  • For HW2-1:
    • Please write shell script “hw2_seq2seq.sh” to run your code and follow the script usage below:�- ./hw2_seq2seq.sh $1 $2�- $1: the data directory �- $2: test data output filename (format:.txt)�- Example1 ./hw2_seq2seq.sh testing_data testset_output.txt�- Example2 ./hw2_seq2seq.sh ta_review_data tareviewset_output.txt

助教除了會使用到testing_data這個資料夾之外,還會另外使用到ta_review_data這個資料夾來生成ta_review的句子。��Your script should be done within 10 mins excluding model donwloading.

27 of 35

Grading Policy

  • HW2-1 : 15%
    • Baseline (4%):�- BLEU@1 = 0.6 (Captions Avg.) (4/16 修改)
    • TAs review (4%):

- Grammar score (2%)� - Relative score (2%)

    • Report (7%)
  • HW2-2 : 10%
  • 分工表:0.5%
  • 上台分享 : 1%
  • 上台分享前三名:1%

28 of 35

Grading Policy - Report (7%)

  • Do not exceed 4 pages and written in Chinese.
  • Model description (3%)
    • Describe your seq2seq model
  • How to improve your performance (3%)�(e.g. Attention, Schedule Sampling, Beamsearch...)
    • Write down the method that makes you outstanding (1%)
    • Why do you use it (1%)
    • Analysis and compare your model without the method. (1%)
  • Experimental results and settings (1%)
    • parameter tuning, schedual sampling … etc�
  • README : please specify library and the corresponding version in README

29 of 35

Grading Policy - NOTICE

  • Late submission (link)
    • Please fill the late submission form first only if you will submit HW late.
    • Please push your code before you fill the form
    • There will be 25% penalty per day for late submission, so you get 0% after four days
  • Bug:
    • You will get 0% in Baseline and TAs review if the required script has bug.
    • If the error is due to the format issue, please come to fix the bug at the announced time, or you will get 10% penalty afterwards.

30 of 35

Q&A

ntu.mldsta@gmail.com

31 of 35

Q1: 請問助教會跑training的程式嗎?

A:不會。 我們所規定的十分鐘只包含testing。除非我們認為有必要就會請你們來跑training的code。

32 of 35

Q2: 有推薦上傳model的平台嗎?

A:dropbox, google drive都是大家常用的平台。 不過推薦大家可以使用gitlab,操作方法與github類似,但是可以上傳大容量的檔案。

p.s. github 單一檔案上傳上限為100MB, 若超過50MB則會出現警告,但依舊能上傳。也可參考網路上的教學 (ref) 。

33 of 35

Q3: test set 的答案怎麼一起給了?

A:因為沒有Kaggle,方便大家validation 和測準確率,因此也給大家testset 的答案。

34 of 35

Q4: data 裡的feature是怎麼抽的呢?

A:pretrain在ILSVRC的VGG19。

80*4096維的feature,是指每個影片抽80個frame,每個frame有4096維feature。

35 of 35

Q5: Average bleu score 是怎麼算的呢?

A:對於每個影片,你的答案會對他的所有的字幕算bleu score。將所有影片的分數取平均後,就是你的總bleu score。

p.s. 詳細演算法請見 bleu_eval.py