1 of 45

Music Recommendation System

Team Name: APT

강한이, 이민희, 이서일

2 of 45

Contents

  1. Why Music?
  2. Project Overview
    1. Problem Definition
    2. Dataset
    3. Evaluation Metric
  3. Methodology
    • Model-based Approach
    • Additional Approach
  1. Project Plans
    1. Schedule
    2. Team Roles
  2. Q&A

2

3 of 45

Why Music?

Music recommendation system is on high demand for music streaming services.

3

4 of 45

Why Music?

Music consumption does not necessarily guarantee a correlation with user preference.

4

5 of 45

Why Music?

Personalization is not always the best approach in music recommendations.

5

6 of 45

Project Goal

KKBox's Music Recommendation Challenge1

  • The 11th ACM International Conference on Web Search and Data Mining (WSDM 2018)

6

[1] [Deleted User], Howard, A., Chiu, A., McDonald, M., msla, Kan, W., & Yianchen. (2017). WSDM - KKBox's Music Recommendation Challenge. Kaggle. https://kaggle.com/competitions/kkbox-music-recommendation-challenge

7 of 45

Dataset

Use the dataset provided by the challenge.

  • User-song play data
  • User info. & Song info.

The train/test split is provided:

  • Training data: 7,377,419 entries
  • Test data: 2,556,791 entries

7

8 of 45

Dataset

Play information for each song per user is provided.

  • Users: 34,404, Songs: 2,296,834
  • Whether a user has chosen a specific song is represented by 0 or 1 # target

8

* msno: user_id

9 of 45

Dataset

Additional information about users and songs is provided.

  • Users: region, age, gender, registered_via, etc.
  • Songs: length, genre, artist_name, composer_name, etc.

9

10 of 45

Evaluation Metric & Our Goal

  • AUROC
    • Metric used in the challenge
    • Predicts the probability that an item will be marked as "1," indicating it has been chosen by the user.
    • AUROC metric is calculated based on these.
  • The goal is to exceed the given benchmark.
    • 0.61337

10

11 of 45

Methodology:

Model Based

Recommendation System

11

12 of 45

LightGBM

  • Tree-based Gradient Boosting Machine
  • Builds a sequence of decision trees, where each tree corrects the errors of the previous ones.
  • Assigns prediction scores based on the value of the leaf node where each example lands.

12

13 of 45

LightGBM

Node Functionality:

  • Non-Leaf Nodes: Serve as decision points that guide the example through the tree based on feature values, routing it to a specific leaf node.
  • Leaf Nodes: Contain the final score that contributes to the overall prediction.

13

14 of 45

LightGBM

e.g.

14

Sample

Feature 1

Feature 2

Label (y)

1

0.5

1.2

1

2

0.7

0.8

0

3

1.5

2.1

1

4

0.9

1.5

0

15 of 45

LightGBM

e.g.

15

Sample

Feature 1

Feature 2

Label (y)

Predicted

1

0.5

1.2

1

2

0.7

0.8

0

3

1.5

2.1

1

4

0.9

1.5

0

Feature 1 <= 0.8

Feature 2 <= 1.0

0.5

1

-0.3

Tree 1

Leaf node values represent adjustments to the log-odds of the prediction.

16 of 45

LightGBM

e.g.

16

Feature 1 <= 0.8

Feature 2 <= 1.0

-0.3

0.5

1

Sample

Feature 1

Feature 2

Label (y)

Leaf

1

0.5

1.2

1

0.5

2

0.7

0.8

0

3

1.5

2.1

1

4

0.9

1.5

0

Leaf node values represent adjustments to the log-odds of the prediction.

Tree 1

17 of 45

LightGBM

e.g.

17

Feature 1 <= 0.8

Feature 2 <= 1.0

-0.3

0.5

1

Sample

Feature 1

Feature 2

Label (y)

Leaf

1

0.5

1.2

1

0.5

2

0.7

0.8

0

3

1.5

2.1

1

1

4

0.9

1.5

0

Leaf node values represent adjustments to the log-odds of the prediction.

Tree 1

18 of 45

LightGBM

e.g.

18

Sample

Feature 1

Feature 2

Label (y)

Leaf

1

0.5

1.2

1

0.5

2

0.7

0.8

0

-0.3

3

1.5

2.1

1

1

4

0.9

1.5

0

1

Feature 1 <= 0.8

Feature 2 <= 1.0

0.5

1

-0.3

Leaf node values represent adjustments to the log-odds of the prediction.

Tree 1

19 of 45

LightGBM

19

In each boosting round, LightGBM grows a tree by recursively splitting nodes

based on feature thresholds to maximize information gain,

assigning leaf values as adjustments to predictions.

Feature 1 <= 0.8

Feature 2 <= 1.0

0.5

1

-0.3

Tree 1

20 of 45

LightGBM

20

Tree 1

Tree k

Trains multiple trees iteratively, with each tree learning to correct the residual errors

from the ensemble of all previously trained trees

to minimize the overall loss.

21 of 45

Why LightGBM?

  • Excellent for tabular data
    • Can process categorical features directly without requiring one-hot encoding or manual transformation.
  • Supports feature selection
    • Prioritizes important features during training by evaluating feature importance based on split gain or frequency
  • Plain, simple, and out-of-the-box model
  • Able to processing large-scale data with limited resources

21

22 of 45

Features to Use

  • This dataset is much richer than what we have learned during classes
  • It gives information about user and item, in addition to user-item matrix.

22

23 of 45

Features to Use

  1. user, song metadata
    • information given by the dataset -> likes other did at the competition
  2. user-song Collaborative Filtering score
  3. song audio embedding (not used)

23

24 of 45

  1. User, Song Metadata
  • Songs: length, genre, artist_name, composer_name, lyricist, language
  • Users: region, age, gender, registered_via, registration_init_time, expiration_date

24

25 of 45

  • User, Song Metadata

25

26 of 45

  • User, Song Metadata

26

27 of 45

2. Collaborative Filtering Score

From User-Song Matrix

  1. Compute the reduced representations (ft. SVD)
  2. Compute the Similarity between Users or Songs
  3. Construct Peer groups: Top 100 neighborhood
  4. Generate Top-100 CF score for each User-Song pair

* For Similarity: Cosine similarity or NN-based similarity

27

28 of 45

3. Song Audio Embedding (not used)

Get audio embedding by a pretrained model

How to obtain audio

  • Download a 30-second audio clip from YouTube

Pre-trained Model: MusicFM-MSD2

  • Output: Embeddings with dimensions (4, 1024)

28

[2] M. Won, Y.-N. Hung, and D. Le, “A foundation model for music informatics,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP), 2024.

29 of 45

3. Song Audio Embedding (not used)

  • Crawling audio data within the project period is challenging.
  • While using music embeddings as features could enhance performance, we decided not to include them in this project.

29

30 of 45

Project Plans

30

31 of 45

Project Timeline

11/13 ~ 11/19: Data Preprocessing & Studying Reference

11/20 ~ 11/26: Training the Model

11/27 ~ 12/3: Evaluating & Modifying the Model

12/4 ~ 12/11: A Preliminary Period

12/11: Final Presentation

31

32 of 45

Each Member’s roles

All Members: Data Processing, Making the Model, Studying Reference

강한이: Explore LightGBM applications

이민희: Tune LightGBM Hyperparameters

이서일: Set the Dataset

32

33 of 45

Q & A

34 of 45

Additional Approach:

Neural Network Based

Recommendation System

34

35 of 45

Motivation

Unlike other types of contents, music consumption does not necessarily guarantee a correlation with preference.

Content-based item (song) embeddings might give more reliable result.

35

36 of 45

Proposed Method

36

Item 1

Item 2

Item 3

Item 4

User 1

1

0

1

1

User 2

1

1

0

0

37 of 45

Proposed Method

37

Item 1

Item 2

Item 3

Item 4

User 1

1

0

1

1

User 2

1

1

0

0

Pretrained

Model

Item 1 embeddings

Item 3 embeddings

User 1 embeddings

Item 4 embeddings

Average

38 of 45

Proposed Method

38

Item 1

Item 2

Item 3

Item 4

User 1

U1 Ia

U1 Ib

U1 Ic

U1 Id

User 2

U2 Ia

U2Ib

U2 Ic

U2 Id

Ia

Ib

Ic

Id

U1

U1

Calculate the probability of being chosen by computing the cosine similarity between user and item embeddings.

39 of 45

Proposed Method

  • Obtain item embeddings using a pre-trained model.
  • Calculate user embeddings by averaging the embeddings of items the user has chosen.
  • For test items, calculate the probability of being chosen by computing the cosine similarity between user and item embeddings.

39

40 of 45

Details

How to Obtain Audio

  • Download a 30-second audio clip from each song, specifically from the 30- to 60-second mark.
  • Use the ‘yt-dlp’ Python library.
  • Query YouTube with ‘{artist} {title}’ and select the most suitable video.

40

41 of 45

Details

Pre-trained Model: MusicFM-MSD2

  • Input: 30 seconds, 24kHz audio
  • Output: Embeddings with dimensions (4, 1024)
    • Take the mean across the time axis

41

[2] M. Won, Y.-N. Hung, and D. Le, “A foundation model for music informatics,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP), 2024.

42 of 45

Is it Possible?

  • Time required to crawl audio from YouTube

42

43 of 45

GBM 모델 설명 (내용 덤프)

43

44 of 45

GBM

  • Tree-based, Gradient Boost Machine (GBM)
    • 그래서 이게 뭔데? (설명할 정도의 공부 필요)
    • classifier, ranker 등의 종류
  • feature를 이용해서 분류해준다
    • 그리고 feature는 뭐가 되든 간에 상관 없다
    • user, item, user-item
  • 결국 다 이 방식을 쓴다면 feature를 어떻게 정하느냐가 중요
    • 우리는 이걸 잘하겠다. ㅋㅋ

44

45 of 45

LightGBM + FeatureEngineering

  • Tree-based Gradient Boosting Machine
  • Features: User info. & Song info. & User-Song info.
  • Target: User-Song Play Probability (Binary Classification)

  • Collaborative Filtering used in Feature Engineering

45