2 of 26

HW2-2 UPDATE(4/27)

baseline code release (link)
Testing data release (link)
Perplexity baseline release <= 100�Correlation baseline release >= 0.45
Whole dataset download: (link)

重要：本次 model evaluation 的結果都僅供參考而已，請同學不要在這上面做太多琢磨，只是給同學們寫報告時有個量化依據。

3 of 26

Outline

Timeline
Task Descriptions
Q&A

5 of 26

Two Parts in HW2

(2-1) Video caption generation

Sequence-to-sequence model
Training Tips

(2-2) Chatbot�

6 of 26

Schedule

3/30：

Release HW2-1

4/13：

Release HW2-2

4/27：

Midterm
HW1 上台分享

5/4：

All HW2 due (including HW2-1, HW2-2)

7 of 26

Task Descriptions

8 of 26

HW2-2: Chinese Chatbot

Introduction
Sequence-to-sequence model
Training Tips

Attention
Schedule Sampling
Beamsearch

How to reach the baseline ?

�

9 of 26

HW2-2 Introduction

Chatbot

Input ： A sentence
Output： The corresponding reply.

�

There are several difficulties including：

Variable length of I/O

�

Output

“我覺得不行”

Input

“我覺得可以”

10 of 26

HW2-2 Sequence-to-sequence 1/5

Two recurrent neural networks (RNNs) �an encoder that processes the input�a decoder that generates the output

encoder

decoder

<BOS>

<EOS>

https://www.tensorflow.org/tutorials/seq2seq

11 of 26

HW2-2 Sequence-to-sequence 2/5

Data preprocess：

Dictionary - most frequently word or min count
other tokens：<PAD>, <BOS>, <EOS>, <UNK> �- <PAD> ：Pad the sentencen to the same length�- <BOS> ：Begin of sentence, a sign to generate the output sentence.�- <EOS> ：End of sentence, a sign of the end of the output sentence.�- <UNK> ：Use this token when the word isn’t in the dictionary or just ignore the unknown word.

12 of 26

HW2-2 Sequence-to-sequence 3/5

Text Input： reference

One-hot Vector encoding�( 1-to-N coding, N is the size of the vocabulary in dictionary )
e.g.�- neural = [0, 0, 0, …, 1, 0, 0, …, 0, 0, 0]�- network = [0, 0, 0, …, 0, 0, 1, …, 0, 0, 0]�

LSTM unit：�cell output than project to a �vocabulary-size vector

13 of 26

HW2-2 Training Tips - Attention 1/3

Attention on encoder hidden states :

Allow model to peek at different sections of inputs at each decoding time step

14 of 26

HW2-2 Training Tips - Schedule Sampling 2/3

Schedule Sampling：

To solve “exposure bias” problem, �When training, we feed (groundtruth) or (last time step’s output) as input at odds

https://arxiv.org/abs/1506.03099

15 of 26

HW2-2 Training Tips - Beam search 3/3

Beam search：

keep a fixed number of paths

Demo：http://dbs.cloudcv.org/captioning

16 of 26

HW2-2 How to reach the baseline ? 1/3

Baseline： � Perplexity < 100� Correlation Score >0.45

Baseline model vocab

baseline code release (link)

重要：本次 model evaluation 的結果都僅供參考而已� ，請同學不要在這上面做太多琢磨，只是給同學們寫� 報告時有個量化依據。�

Baseline model：�Training iteration = 750000
Batchsize = 100
GRU dimension = 256 2 layers
Learning rate = 0.001
Sgd Optimizer
Training time = 8hrs on GTX1060

17 of 26

HW2-2 How to reach the baseline ? 2/3

Evaluation：Perplexity

�

e.g.：�“I love NLP.”

Language Model will be released soon.
數位語音處理概論 lesson6

where H = entropy, PP= Perplexity

18 of 26

HW2-2 How to reach the baseline ? 3/3

Evaluation：Correlation Score

Decided by Model.
The model is training by given dataset.
A kind of Discriminator.�

Model detail:

Correct scored 1, incorrect scored 0
Activation function sigmoid

19 of 26

Data & format

Dataset：

語音實驗室的電影字幕�- 500萬句對話

Format：

一行一句話
對話跟對話中間用+++$+++分隔
Download clr_conversation.txt�

Extra Data：

以下為未整理data不符合上列格式
連續劇data
電影data(完整版)
簡體corpus (baseline的language model不認得簡體請自行轉換)

�

20 of 26

I/O Format

Input：

一行一句話

Output：

一行一句話

�

21 of 26

Submission & Rules

Please implement one seq-to-seq model (or it’s variant) to fulfill the task
Extra dataset is allowed to use.
Allow package：

python 3.6
TensorFlow r1.6 ONLY (CUDA 9.0)
PyTorch 0.3 / torchvision
Keras 2.0.7 (TensorFlow backend only)
MXNet 1.1.0, CNTK 2.4
matplotlib, Python Standard Library
If you want to use other packages, please ask TAs for permission first!
new allowed package: �Gensim, pandas, tqdm

22 of 26