1 of 34

Advanced AI

Final Report

資工碩一

R09922188 曾泓硯

2 of 34

Outline

Project Proposal

Team Members
Motivations
What is the state of the art?

Introduction

Look into the data
Evaluation
Problem Definition

Classification

Data Preprocessing
Experiment setting
Results

Regression

Data Preprocessing
Experiment setting
Results

Ensemble

How to do it?
Results

Failure Attemptions

Methods & Results

Conclusion

3 of 34

Project Proposal

1

4 of 34

Team Members

ID/Name：R09922188 曾泓硯

Education： NTHU EE -> NTU CS

Research Topic： Deep Learning on HDR tone mapping

Project Experience related to AI：

Adaptive Learning in Education, Facial Super-Resolution,

Lensless device & Blind-deblurring, Style Transfer on Android Devices

5 of 34

Motivation/Background

Advertisement is now everywhere, which is annoying.
I hope to get useful information only.

6 of 34

What is the state of the art?

The setting is quite different with the original learn to rank, which label is in general binary.
There is a learn to rank challenge hold by Yahoo in 2011, and the best team get the NDCG for about 0.8. However, the definition of NDCG in this contest is different with original ones.
There are two difference, the numerator of NDCG is the truth amount of consumption and we only choose the top 3 categories to calculate the NDCG.

7 of 34

Introduction

2

8 of 34

Look into the data

Column Name	Description	Column Name	Description
dt	Consumption Month	Location_cnt (domestic/ overseas, offline/ online)	Numbers of where the consumption happens
chid	Customer ID	Location_amt (domestic/ overseas, offline/ online)	Ratio of total amount where the consumption happens
Shop tag	Category Type	Card_txn (1~14, other)	Numbers of consumption of each card
txn_cnt	Consumption counts	Card_amt (1~14, other)	Ratio of total amount of each card
txn_amt	Consumption amount	Other private data	Marital, education, nationality, …

9 of 34

Evaluation

10 of 34

Problem Definition

Multiclass classification

Given the data of one customer, predict the probability of each shop tag that the customers are most likely to spend on in the next month.
Given the data of one customer, predict the shop tags for 2nd, 3rd most likely to spend on in the next month.

Regression

Given the data of one shop tag, predict the total amount of consumption in the next month.

Ranking

Given a set of data according to each target shop tags of an customer, rank the data for the next month.

11 of 34

Classification

3

12 of 34

Data Preprocessing

Consider appropriate constant for imputing missing data.
Some of the data values exceed the max value of float32, thus I apply logarithm function to scale down the range.
Merge all of the tags within one month for a customer into one row.
Some of the columns are related, in order to normalize the data into the range between 0~1, I divide one column with another. (ex. tag1_amt/ txn_amt)

13 of 34

Data Preprocessing

chid	tag_0	tag_1	tag_2	tag_3	tag_4	other
123456	20	60	50	0	5	…

chid	Shop_tag	total	Other
123456	0	20	…
123456	1	60	…
123456	2	50	…
123456	4	5	…

chid	tag_0	tag_1	tag_2	tag_3	tag_4	other
123456	20	60	50	0	5	…

chid	tag_0	tag_1	tag_2	tag_3	tag_4	other
123456	20	60	50	0	5	…

chid	tag_0	tag_1	tag_2	tag_3	tag_4	other
123456	20	60	50	0	5	…

chid	tag_0	tag_1	tag_2	tag_3	tag_4	other
123456	20	60	50	0	5	…

dt_1

dt_1

dt_1

dt_1

dt_5

14 of 34

Experiment setting

Assume the amount of consumption is proportional to the probability of classification results.

Merge the data by each customer from month 1 to month 12 as input. Sum the amount of consumption of each tag from month 13 to month 24 and take the max label as ground truth.

View each rank as an individual multiclass classification and train 3 models for each rank.
Assume the time when the data is collected has different effect.

Trained 23 models to predict the result of next month and merged the probability together.
I expect that the importance of data may be affected by the time of when the data is collected. Therefore, I ensemble these models with weighted sum according to the time.

15 of 34

Results

Model with some descriptions	NDCG Result
#1 Random Forest max_depth = 10	0.4834
#1 Random Forest max_depth = 100	0.6538
#2 Random Forest Ensemble with different rank as ground truth max_depth = 100	0.6539
#3 Random Forest Ensemble with different month model max_depth = 100	0.6656
#3 XGBoost Classifier Ensemble with different month model n_estimators=100, learning_rate= 0.3	0.6744

# N: Setting N from previous slide

16 of 34

Feature Importance

17 of 34

Regression

4

18 of 34

Data Preprocessing

Consider appropriate constant for imputing missing data.
Use scikit-learn packages to perform scalar transformation, instead of dividing by related column.
Merge all of the month data within a tag for a customer into one row.

19 of 34

Data Preprocessing

chid	dt	total	Other
123456	1	20	…
123456	2	60	…
123456	5	50	…
123456	4	5	…

chid	dt_1	dt_2	dt_3	dt_4	dt_5	other
123456	20	60	0	5	50	…

chid	dt_1	dt_2	dt_3	dt_4	dt_5	other
123456	20	60	0	5	50	…

chid	dt_1	dt_2	dt_3	dt_4	dt_5	other
123456	20	60	0	5	50	…

chid	dt_1	dt_2	dt_3	dt_4	dt_5	other
123456	20	60	0	5	50	…

chid	dt_1	dt_2	dt_3	dt_4	dt_5	other
123456	20	60	0	5	50	…

tag_0

tag_1

tag_1

tag_1

tag_5

20 of 34

Experiment setting

Assume the consumption trend of each shop tag is consistent with the sum of all shop tag.

Merge all of the data together, and predict the total amount of consumption next month.

Assume the consumption trend of each shop tag is different to the sum of all shop tag.

Training on the whole data of shop tag, and predict the amount of consumption in each tag next month.

21 of 34

Results

Model with some descriptions	NDCG Result
#1 XGBoost Regressor n_estimators=100, max_depth=15, eta=0.1	0.651
#2 XGBoost Regressor n_estimators=100, max_depth=15, eta=0.1	0.702
#2 XGBoost Regressor selected features & only related shop tag n_estimators=100, max_depth=15, eta=0.1	0.705
#2 CatBoost Regressor depth=10, iterations = 1000, learning_rate=0.1	0.708

22 of 34

Feature Importance (XGBoost)

23 of 34

Ensemble

5

24 of 34

How to Do it?

During Training (Regression as example)

Separate training data into two parts, data of predicted tag/ not predicted tag
Train two model respectively, and sum the result from both model

Ensemble the results

Ensemble from previous result, including classification, regression, ranking.
Weight the top 3 label with [3,2,1] of each previous result and sum up to get the final prediction.

25 of 34

Results

Model with some descriptions	NDCG Result
#1 CatBoost Regressor (predict tag data/ non-predict tag data)	0.706
#2 Classification (0.67) Regression (0.70) Ranking (0.51)	0.6927
#2 Classification (0.67) Regression (0.70)	0.7027

26 of 34

Failure Cases

6

27 of 34

Methods & Results

Deep Learning (Linear Model: For classification)
Learn to Rank (LambdaMART : Based on Regression Methods)

Methods	NDCG
Deep Learning	X (acc: 0.11 during training)
Learn to Rank	0.5169

28 of 34

Conclusions

7

29 of 34

Classification

Potential Problem:

Merging all of the data together during training may affect the performance of model.
How to combine the month information can be done more systematically by models.
Classification results isn’t exactly what we want.

Observations:

Using the probability to rank has the same effect as using 3 models.
Weighting result from different month may improve the performance.
The feature related to the predicted tag, whether buy in offline/ online, domestic/ oversea might affect the prediction results.

30 of 34

Regression

Potential Problem:

Some of the regression ground truth is 0, whether we should remove the data is questionable.
Regression results isn’t exactly what we want, but better than classification results.

Observations:

Merging all of the data together during training may affect the performance of model.
Training with data from predicted tag is enough to get decent results.
XGBoost model cares more about data from recent 11 months, whether buy in offline/ online, domestic/ oversea and personal information.

31 of 34

Ensemble & Failure Cases

Ensemble methods doesn’t perform better than origin model.
Deep Learning might still be able to solve the problem of classification, but I failed with naive attemptions.
Ranking model fail to outperform other models, which is a surprising fact.
Sparse data may still be able to get decent results.

32 of 34

Next Step?

Learn to downsample the label to deal with unbalanced data
Apply deep learning method on regression

33 of 34

Reference

[1] Olivier Chapelle, Yi Chang, Yahoo learn to rank challenge

[2] Chen, T., & Guestrin, C. (2016). XGBoost: A Scalable Tree Boosting System. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 785–794). New York, NY, USA: ACM. https://doi.org/10.1145/2939672.2939785

[3] Scikit-learn: Machine Learning in Python, Pedregosa et al., JMLR 12, pp. 2825-2830, 2011.

[4] Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., … Chintala, S. (2019). PyTorch: An Imperative Style, High-Performance Deep Learning Library. In Advances in Neural Information Processing Systems 32 (pp. 8024–8035). Curran Associates, Inc. Retrieved from http://papers.neurips.cc/paper/9015-pytorch-an-imperative-style-high-performance-deep-learning-library.pdf

34 of 34

Q&A

我後來有做一個版本只取 13~23 月份的資料，但效果也是差不多。