1 of 34

Advanced AI

Final Report

資工碩一

R09922188 曾泓硯

2 of 34

Outline

  1. Project Proposal
    1. Team Members
    2. Motivations
    3. What is the state of the art?
  2. Introduction
    • Look into the data
    • Evaluation
    • Problem Definition
  3. Classification
    • Data Preprocessing
    • Experiment setting
    • Results
  1. Regression
    1. Data Preprocessing
    2. Experiment setting
    3. Results
  2. Ensemble
    • How to do it?
    • Results
  3. Failure Attemptions
    • Methods & Results
  4. Conclusion

3 of 34

Project Proposal

1

4 of 34

Team Members

ID/Name:R09922188 曾泓硯

Education: NTHU EE -> NTU CS

Research Topic: Deep Learning on HDR tone mapping

Project Experience related to AI:

Adaptive Learning in Education, Facial Super-Resolution,

Lensless device & Blind-deblurring, Style Transfer on Android Devices

5 of 34

Motivation/Background

  1. Advertisement is now everywhere, which is annoying.
  2. I hope to get useful information only.

6 of 34

What is the state of the art?

  • The setting is quite different with the original learn to rank, which label is in general binary.
  • There is a learn to rank challenge hold by Yahoo in 2011, and the best team get the NDCG for about 0.8. However, the definition of NDCG in this contest is different with original ones.
  • There are two difference, the numerator of NDCG is the truth amount of consumption and we only choose the top 3 categories to calculate the NDCG.

7 of 34

Introduction

2

8 of 34

Look into the data

Column Name

Description

Column Name

Description

dt

Consumption Month

Location_cnt

(domestic/ overseas, offline/ online)

Numbers of where the consumption happens

chid

Customer ID

Location_amt

(domestic/ overseas, offline/ online)

Ratio of total amount where the consumption happens

Shop tag

Category Type

Card_txn (1~14, other)

Numbers of consumption of each card

txn_cnt

Consumption counts

Card_amt (1~14, other)

Ratio of total amount of each card

txn_amt

Consumption amount

Other private data

Marital, education, nationality, …

9 of 34

Evaluation

10 of 34

Problem Definition

  1. Multiclass classification
    1. Given the data of one customer, predict the probability of each shop tag that the customers are most likely to spend on in the next month.
    2. Given the data of one customer, predict the shop tags for 2nd, 3rd most likely to spend on in the next month.
  2. Regression
    • Given the data of one shop tag, predict the total amount of consumption in the next month.
  3. Ranking
    • Given a set of data according to each target shop tags of an customer, rank the data for the next month.

11 of 34

Classification

3

12 of 34

Data Preprocessing

  • Consider appropriate constant for imputing missing data.
  • Some of the data values exceed the max value of float32, thus I apply logarithm function to scale down the range.
  • Merge all of the tags within one month for a customer into one row.
  • Some of the columns are related, in order to normalize the data into the range between 0~1, I divide one column with another. (ex. tag1_amt/ txn_amt)

13 of 34

Data Preprocessing

chid

tag_0

tag_1

tag_2

tag_3

tag_4

other

123456

20

60

50

0

5

chid

Shop_tag

total

Other

123456

0

20

123456

1

60

123456

2

50

123456

4

5

chid

tag_0

tag_1

tag_2

tag_3

tag_4

other

123456

20

60

50

0

5

chid

tag_0

tag_1

tag_2

tag_3

tag_4

other

123456

20

60

50

0

5

chid

tag_0

tag_1

tag_2

tag_3

tag_4

other

123456

20

60

50

0

5

chid

tag_0

tag_1

tag_2

tag_3

tag_4

other

123456

20

60

50

0

5

dt_1

dt_1

dt_1

dt_1

dt_5

14 of 34

Experiment setting

  1. Assume the amount of consumption is proportional to the probability of classification results.
    1. Merge the data by each customer from month 1 to month 12 as input. Sum the amount of consumption of each tag from month 13 to month 24 and take the max label as ground truth.
  2. View each rank as an individual multiclass classification and train 3 models for each rank.
  3. Assume the time when the data is collected has different effect.
    • Trained 23 models to predict the result of next month and merged the probability together.
    • I expect that the importance of data may be affected by the time of when the data is collected. Therefore, I ensemble these models with weighted sum according to the time.

15 of 34

Results

Model with some descriptions

NDCG Result

#1 Random Forest

max_depth = 10

0.4834

#1 Random Forest

max_depth = 100

0.6538

#2 Random Forest

Ensemble with different rank as ground truth

max_depth = 100

0.6539

#3 Random Forest

Ensemble with different month model

max_depth = 100

0.6656

#3 XGBoost Classifier

Ensemble with different month model

n_estimators=100, learning_rate= 0.3

0.6744

# N: Setting N from previous slide

16 of 34

Feature Importance

17 of 34

Regression

4

18 of 34

Data Preprocessing

  • Consider appropriate constant for imputing missing data.
  • Use scikit-learn packages to perform scalar transformation, instead of dividing by related column.
  • Merge all of the month data within a tag for a customer into one row.

19 of 34

Data Preprocessing

chid

dt

total

Other

123456

1

20

123456

2

60

123456

5

50

123456

4

5

chid

dt_1

dt_2

dt_3

dt_4

dt_5

other

123456

20

60

0

5

50

chid

dt_1

dt_2

dt_3

dt_4

dt_5

other

123456

20

60

0

5

50

chid

dt_1

dt_2

dt_3

dt_4

dt_5

other

123456

20

60

0

5

50

chid

dt_1

dt_2

dt_3

dt_4

dt_5

other

123456

20

60

0

5

50

chid

dt_1

dt_2

dt_3

dt_4

dt_5

other

123456

20

60

0

5

50

tag_0

tag_1

tag_1

tag_1

tag_5

20 of 34

Experiment setting

  1. Assume the consumption trend of each shop tag is consistent with the sum of all shop tag.
    1. Merge all of the data together, and predict the total amount of consumption next month.
  2. Assume the consumption trend of each shop tag is different to the sum of all shop tag.
    • Training on the whole data of shop tag, and predict the amount of consumption in each tag next month.

21 of 34

Results

Model with some descriptions

NDCG Result

#1 XGBoost Regressor

n_estimators=100, max_depth=15, eta=0.1

0.651

#2 XGBoost Regressor

n_estimators=100, max_depth=15, eta=0.1

0.702

#2 XGBoost Regressor

selected features & only related shop tag

n_estimators=100, max_depth=15, eta=0.1

0.705

#2 CatBoost Regressor

depth=10, iterations = 1000, learning_rate=0.1

0.708

22 of 34

Feature Importance (XGBoost)

23 of 34

Ensemble

5

24 of 34

How to Do it?

  1. During Training (Regression as example)
    1. Separate training data into two parts, data of predicted tag/ not predicted tag
    2. Train two model respectively, and sum the result from both model
  2. Ensemble the results
    • Ensemble from previous result, including classification, regression, ranking.
    • Weight the top 3 label with [3,2,1] of each previous result and sum up to get the final prediction.

25 of 34

Results

Model with some descriptions

NDCG Result

#1 CatBoost Regressor

(predict tag data/ non-predict tag data)

0.706

#2 Classification (0.67)

Regression (0.70)

Ranking (0.51)

0.6927

#2 Classification (0.67)

Regression (0.70)

0.7027

26 of 34

Failure Cases

6

27 of 34

Methods & Results

  1. Deep Learning (Linear Model: For classification)
  2. Learn to Rank (LambdaMART : Based on Regression Methods)

Methods

NDCG

Deep Learning

X (acc: 0.11 during training)

Learn to Rank

0.5169

28 of 34

Conclusions

7

29 of 34

Classification

  1. Potential Problem:
    1. Merging all of the data together during training may affect the performance of model.
    2. How to combine the month information can be done more systematically by models.
    3. Classification results isn’t exactly what we want.
  2. Observations:
    • Using the probability to rank has the same effect as using 3 models.
    • Weighting result from different month may improve the performance.
    • The feature related to the predicted tag, whether buy in offline/ online, domestic/ oversea might affect the prediction results.

30 of 34

Regression

  • Potential Problem:
    • Some of the regression ground truth is 0, whether we should remove the data is questionable.
    • Regression results isn’t exactly what we want, but better than classification results.
  • Observations:
    • Merging all of the data together during training may affect the performance of model.
    • Training with data from predicted tag is enough to get decent results.
    • XGBoost model cares more about data from recent 11 months, whether buy in offline/ online, domestic/ oversea and personal information.

31 of 34

Ensemble & Failure Cases

  1. Ensemble methods doesn’t perform better than origin model.
  2. Deep Learning might still be able to solve the problem of classification, but I failed with naive attemptions.
  3. Ranking model fail to outperform other models, which is a surprising fact.
  4. Sparse data may still be able to get decent results.

32 of 34

Next Step?

  1. Learn to downsample the label to deal with unbalanced data
  2. Apply deep learning method on regression

33 of 34

Reference

[1] Olivier Chapelle, Yi Chang, Yahoo learn to rank challenge

[2] Chen, T., & Guestrin, C. (2016). XGBoost: A Scalable Tree Boosting System. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 785–794). New York, NY, USA: ACM. https://doi.org/10.1145/2939672.2939785

[3] Scikit-learn: Machine Learning in Python, Pedregosa et al., JMLR 12, pp. 2825-2830, 2011.

[4] Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., … Chintala, S. (2019). PyTorch: An Imperative Style, High-Performance Deep Learning Library. In Advances in Neural Information Processing Systems 32 (pp. 8024–8035). Curran Associates, Inc. Retrieved from http://papers.neurips.cc/paper/9015-pytorch-an-imperative-style-high-performance-deep-learning-library.pdf

34 of 34

Q&A

  1. 我後來有做一個版本只取 13~23 月份的資料,但效果也是差不多。