1 of 9

MLDS HW0

TAs

ntu.mldsta@gmail.com

2 of 9

Outline

- Task Description

- 加簽規定

- Kaggle Rules

- Q&A

3 of 9

Task Description - Text Sentiment Classification

There is no limitation on the method you use to solve the task as long as it’s ML based.

4 of 9

Task Description - Dataset

本次作業為twitter上收集到的推文,每則推文都會被標注為正面或負面,如:

除了有label的data以外,我們還額外提供了120萬筆左右沒有label的data

  • labeled training data :20萬
  • unlabeled training data :120萬
  • testing data :20萬(No private testing set

1:正面

0:負面

5 of 9

Task Description - Data format

Three files provided on kaggle

training_label.csv

labeled training data with each line being <sentence label> +++$+++ <sentence>

training_noabel.csv

unlabeled training data with each line being <sentence>

testing_data.csv

testing data for submission with header id,text and each of the following line being <sentence id>,<sentence>

6 of 9

Task Description - Submission format

Submissions should follow the format in sampleSubmission.csv

7 of 9

加簽規定

To join the class, you will have to...

參加Kaggle並且通過門檻

大學部學生:通過Simple Baseline 研究生以上:通過Strong Baseline

如果太多同學符合資格將按照通過Baseline的時間順序加簽, Leaderboard 最後排名成績並不影響加簽順序

TA會在2018/3/6 23:59:59 之前將符合加簽資格同學的授權碼寄到學校信箱(學號@ntu.edu.tw)

沒有成功加簽的同學也會收到信件,如果有問題可以直接回覆給TA詢問。

8 of 9

Kaggle Rules

  1. Kaggle link : https://www.kaggle.com/t/73e2d5edd30f4608a17a84130cb91969
  2. One person per team. Team name must be [your student ID] with lowercase letter. e.g. b03902034
  3. Deadline : 2018/3/4 23:59:59 (GMT+8)
  4. 10 submissions allowed per day
  5. Please do NOT upload prediction again once you’ve passed the threshold.
  6. Code submission is not required.
  7. Only the public leaderboard will be use for hw0.

9 of 9

Q&A

Email TA if you have any question.

ntu.mldsta@gmail.com