Machine Learning HW4
Recurrent Neural Networks
MLTAs
ntueemlta2021fall@gmail.com
Outline
Task introduction
(Text Sentiment Classification)
Task - Text Sentiment Classification
Text Sentiment Classification
本次作業為 Twitter 上收集到的推文,每則推文都會被標注為正面或負面,如:
除了 labeled data 以外,我們還額外提供了 120 萬筆左右的 unlabeled data
1:正面
0:負面
Task and Dataset
Kaggle Info & Deadline
Preprocessing the sentences
example:
“I have a pen.” -> [1, 2, 3, 4]
“I have an apple.” -> [1, 2, 5, 6]
What is Word Embedding
1-of-N encoding
我們可以用不同的 one-hot vector 來代表這個字
apple -> [1,0,0,0,0]
bag -> [0,1,0,0,0]
cat -> [0,0,1,0,0]
dog -> [0,0,0,1,0]
elephant -> [0,0,0,0,1]
200000(data)*30(length)*20000(vocab size) *4(Byte) = 4.8*10^11 = 480 GB
Word Embedding
Data Format
Data Format (labeled data)
label +++$+++ text
Data Format (unlabeled data)
text
Kaggle
Kaggle submission format
Kaggle link:https://www.kaggle.com/c/ml-2021fall-hw4/leaderboard
請預測 testing set 中一萬筆資料並將結果上傳 Kaggle
Rules, Deadline, Policy, Score
Ceiba Submissions
你的ceiba上請至少包含:
請不要上傳dataset,請不要上傳dataset,請不要上傳dataset
Report 格式
https://docs.google.com/document/d/1mjawi2jtHhBrnxluXZ-Q2pNbh8YWuAm4HK3H6Khgoc8/edit?usp=sharing
其他規定 Other Policy
其他規定 Other Policy
Score - Report.pdf
Requirements
Assignment Regulation
配分 Grading Criteria-Kaggle(2%)
配分 Grading Criteria - report(8%)
FAQ
TA Hour