1 of 45

Music Recommendation System

Team Name: APT

강한이, 이민희, 이서일

2 of 45

Contents

Why Music?
Project Overview

Problem Definition
Dataset
Evaluation Metric

Methodology

Model-based Approach
Additional Approach

Project Plans

Schedule
Team Roles

Q&A

2

3 of 45

Why Music?

Music recommendation system is on high demand for music streaming services.

3

4 of 45

Why Music?

Music consumption does not necessarily guarantee a correlation with user preference.

4

5 of 45

Why Music?

Personalization is not always the best approach in music recommendations.

5

6 of 45

Project Goal

KKBox's Music Recommendation Challenge¹

The 11th ACM International Conference on Web Search and Data Mining (WSDM 2018)

6

[1] [Deleted User], Howard, A., Chiu, A., McDonald, M., msla, Kan, W., & Yianchen. (2017). WSDM - KKBox's Music Recommendation Challenge. Kaggle. https://kaggle.com/competitions/kkbox-music-recommendation-challenge

7 of 45

Dataset

Use the dataset provided by the challenge.

User-song play data
User info. & Song info.

The train/test split is provided:

Training data: 7,377,419 entries
Test data: 2,556,791 entries

7

8 of 45

Dataset

Play information for each song per user is provided.

Users: 34,404, Songs: 2,296,834
Whether a user has chosen a specific song is represented by 0 or 1 # target

8

* msno: user_id

9 of 45

Dataset

Additional information about users and songs is provided.

Users: region, age, gender, registered_via, etc.
Songs: length, genre, artist_name, composer_name, etc.

9

10 of 45

Evaluation Metric & Our Goal

AUROC

Metric used in the challenge
Predicts the probability that an item will be marked as "1," indicating it has been chosen by the user.
AUROC metric is calculated based on these.

The goal is to exceed the given benchmark.

0.61337

10

11 of 45

Methodology:

Model Based

Recommendation System

11

12 of 45

LightGBM

Tree-based Gradient Boosting Machine
Builds a sequence of decision trees, where each tree corrects the errors of the previous ones.
Assigns prediction scores based on the value of the leaf node where each example lands.

12

13 of 45

LightGBM

Node Functionality:

Non-Leaf Nodes: Serve as decision points that guide the example through the tree based on feature values, routing it to a specific leaf node.
Leaf Nodes: Contain the final score that contributes to the overall prediction.

13

14 of 45

LightGBM

e.g.

14

Sample	Feature 1	Feature 2	Label (y)
1	0.5	1.2	1
2	0.7	0.8	0
3	1.5	2.1	1
4	0.9	1.5	0

15 of 45

LightGBM

e.g.

15

Sample	Feature 1	Feature 2	Label (y)	Predicted
1	0.5	1.2	1
2	0.7	0.8	0
3	1.5	2.1	1
4	0.9	1.5	0

Feature 1 <= 0.8

Feature 2 <= 1.0

0.5

1

-0.3

Tree 1

Leaf node values represent adjustments to the log-odds of the prediction.

16 of 45

LightGBM

e.g.

16

Feature 1 <= 0.8

Feature 2 <= 1.0

-0.3

0.5

1

Sample	Feature 1	Feature 2	Label (y)	Leaf
1	0.5	1.2	1	0.5
2	0.7	0.8	0
3	1.5	2.1	1
4	0.9	1.5	0

Leaf node values represent adjustments to the log-odds of the prediction.

Tree 1

17 of 45

LightGBM

e.g.

17

Feature 1 <= 0.8

Feature 2 <= 1.0

-0.3

0.5

1

Sample	Feature 1	Feature 2	Label (y)	Leaf
1	0.5	1.2	1	0.5
2	0.7	0.8	0
3	1.5	2.1	1	1
4	0.9	1.5	0

Leaf node values represent adjustments to the log-odds of the prediction.

Tree 1

18 of 45

LightGBM

e.g.

18

Sample	Feature 1	Feature 2	Label (y)	Leaf
1	0.5	1.2	1	0.5
2	0.7	0.8	0	-0.3
3	1.5	2.1	1	1
4	0.9	1.5	0	1

Feature 1 <= 0.8

Feature 2 <= 1.0

0.5

1

-0.3

Leaf node values represent adjustments to the log-odds of the prediction.

Tree 1

19 of 45

LightGBM

19

In each boosting round, LightGBM grows a tree by recursively splitting nodes

based on feature thresholds to maximize information gain,

assigning leaf values as adjustments to predictions.

Feature 1 <= 0.8

Feature 2 <= 1.0

0.5

1

-0.3

Tree 1

20 of 45

LightGBM

20

Tree 1

Tree k

…

Trains multiple trees iteratively, with each tree learning to correct the residual errors

from the ensemble of all previously trained trees

to minimize the overall loss.

21 of 45

Why LightGBM?

Excellent for tabular data

Can process categorical features directly without requiring one-hot encoding or manual transformation.

Supports feature selection

Prioritizes important features during training by evaluating feature importance based on split gain or frequency

Plain, simple, and out-of-the-box model
Able to processing large-scale data with limited resources

21

22 of 45

Features to Use

This dataset is much richer than what we have learned during classes
It gives information about user and item, in addition to user-item matrix.

22

23 of 45

Features to Use

user, song metadata

information given by the dataset -> likes other did at the competition

user-song Collaborative Filtering score
song audio embedding (not used)

23

24 of 45

User, Song Metadata

Songs: length, genre, artist_name, composer_name, lyricist, language
Users: region, age, gender, registered_via, registration_init_time, expiration_date

24

25 of 45

User, Song Metadata

25

26 of 45

User, Song Metadata

26

27 of 45

2. Collaborative Filtering Score

From User-Song Matrix

Compute the reduced representations (ft. SVD)
Compute the Similarity between Users or Songs
Construct Peer groups: Top 100 neighborhood
Generate Top-100 CF score for each User-Song pair

* For Similarity: Cosine similarity or NN-based similarity

27

28 of 45

3. Song Audio Embedding (not used)

Get audio embedding by a pretrained model

How to obtain audio

Download a 30-second audio clip from YouTube

Pre-trained Model: MusicFM-MSD²

Output: Embeddings with dimensions (4, 1024)

28

[2] M. Won, Y.-N. Hung, and D. Le, “A foundation model for music informatics,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP), 2024.

29 of 45

3. Song Audio Embedding (not used)

Crawling audio data within the project period is challenging.
While using music embeddings as features could enhance performance, we decided not to include them in this project.

29

30 of 45

Project Plans

30

31 of 45

Project Timeline

11/13 ~ 11/19: Data Preprocessing & Studying Reference

11/20 ~ 11/26: Training the Model

11/27 ~ 12/3: Evaluating & Modifying the Model

12/4 ~ 12/11: A Preliminary Period

12/11: Final Presentation

31

32 of 45

Each Member’s roles

All Members: Data Processing, Making the Model, Studying Reference

강한이: Explore LightGBM applications

이민희: Tune LightGBM Hyperparameters

이서일: Set the Dataset

32

33 of 45

Q & A

34 of 45

Additional Approach:

Neural Network Based

Recommendation System

34

35 of 45

Motivation

Unlike other types of contents, music consumption does not necessarily guarantee a correlation with preference.

Content-based item (song) embeddings might give more reliable result.

35

36 of 45

Proposed Method

36

	Item 1	Item 2	Item 3	Item 4	…
User 1	1	0	1	1
User 2	1	1	0	0
…

37 of 45

Proposed Method

37

	Item 1	Item 2	Item 3	Item 4	…
User 1	1	0	1	1
User 2	1	1	0	0
…

Pretrained

Model

Item 1 embeddings

Item 3 embeddings

User 1 embeddings

Item 4 embeddings

Average

38 of 45

Proposed Method

38

	Item 1	Item 2	Item 3	Item 4	…
User 1	U₁ᐧ I_a	U₁ᐧ I_b	U₁ᐧ I_c	U₁ᐧ I_d
User 2	U₂ᐧ I_a	U₂ᐧ I_b	U₂ᐧ I_c	U₂ᐧ I_d
…

I_a

I_b

I_c

I_d

U₁

Calculate the probability of being chosen by computing the cosine similarity between user and item embeddings.

39 of 45

Proposed Method

Obtain item embeddings using a pre-trained model.
Calculate user embeddings by averaging the embeddings of items the user has chosen.
For test items, calculate the probability of being chosen by computing the cosine similarity between user and item embeddings.

39

40 of 45

Details

How to Obtain Audio

Download a 30-second audio clip from each song, specifically from the 30- to 60-second mark.
Use the ‘yt-dlp’ Python library.
Query YouTube with ‘{artist} {title}’ and select the most suitable video.

40

41 of 45

Details

Pre-trained Model: MusicFM-MSD²

Input: 30 seconds, 24kHz audio
Output: Embeddings with dimensions (4, 1024)

Take the mean across the time axis

41

[2] M. Won, Y.-N. Hung, and D. Le, “A foundation model for music informatics,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP), 2024.

42 of 45

Is it Possible?

Time required to crawl audio from YouTube

42

43 of 45

GBM 모델 설명 (내용 덤프)

43

44 of 45

GBM

Tree-based, Gradient Boost Machine (GBM)

그래서 이게 뭔데? (설명할 정도의 공부 필요)
classifier, ranker 등의 종류

feature를 이용해서 분류해준다

그리고 feature는 뭐가 되든 간에 상관 없다
user, item, user-item

결국 다 이 방식을 쓴다면 feature를 어떻게 정하느냐가 중요

우리는 이걸 잘하겠다. ㅋㅋ

44

45 of 45

LightGBM + FeatureEngineering

Tree-based Gradient Boosting Machine
Features: User info. & Song info. & User-Song info.
Target: User-Song Play Probability (Binary Classification)

Collaborative Filtering used in Feature Engineering

45