1 of 27

Building a Book Recommendation System

Natasha Lamperti, Ru Sanjeev, Yurii Hanley, Julianne Itliong

2 of 27

Storyboard and Workflow Process

Collecting the Data

After exploring many datasets, we decided to choose a book dataset from Kaggle that included attributes such as: book title, isbn, average rating, rating count, ect. We wanted to add more features to the dataset to ensure our ML models and recommendation system will work.

Exploring the Data

When we finally got our dataset finalized, we wanted to summarize our data with visualizations using Tableau. This allows our audience for the project to digest our data in a simple and visual way.

Building and Testing ML Models

For our Machine Learning models, we used three models to test if our dataset can make close-to-accurate predictions. We used RandomOverSampler, RandomForestClassifier, and KMeans to collect accuracy scores, precision, sensitivity, and F1 scores.

Creating the Webpage

Our goal is to deploy a GitHub page to display our results and create an user-friendly interface for our recommendation systems.

Building the Recommendation System

After adding dummy data to create four new features: genre, most votes by gender, average age, and age groups, we built a popularity-based recommendation system and a content-based filtering recommendation system.

3 of 27

4 of 27

Why a recommendation system?

Our goal was to create a book recommendation system.

We chose a recommendation system because popular platforms such as Netflix, Amazon, and Spotify use them to keep their users interested and takes the guesswork out of having to choose their next show, item to purchase, or song to listen to.

There are many different inputs that the model can look at with a recommendation system for the user or subject. It seemed like a broad and relevant topic in today’s data science realm and a good opportunity to practice a number of the skill that we learned over the course of the past several months. As an added bonus, we all enjoy reading!

5 of 27

Dataset

Data Source: Goodreads Books.csv sourced by Nilim-Kaggle

Columns:

Software: Python 3.7.6, sqlite, Jupyter Notebook

6 of 27

The code for scraping the books genre ran successfully, unfortunately we ran into issues with HTTP Error 429: Too Many Requests and in the end moved forward with dummy data.

Example:http://google.com/search?q=The+Known+World+book

Dataset - Adding features with web scraping

7 of 27

Database using SQLite

We chose SQLITE for our database as it suited the size of our dataset and our need for this project.

8 of 27

Summary of Dataset using Tableau visualizations

  1. Non-Fiction
  2. Thriller
  3. Adventure
  4. Romance
  5. Fiction

9 of 27

Summary of Dataset using Tableau visualizations

User Age

10 of 27

Summary of Dataset using Tableau visualizations

User Location

Most Votes By Gender

11 of 27

Machine Learning Models - Setting Up

Setting up our target column. No Recommend = 0, Yes Recommend = 1

12 of 27

Machine Learning Models - Setting Up

Creating our features

Setting up the y values

13 of 27

Machine Learning Models - Results

For our Machine Learning Models, we used RandomOverSampler, RandomForestClassifier, and KMeans clustering.

Results for RandomOverSampler

Results for RandomForestClassifier�

14 of 27

Machine Learning Models - Results

For our Machine Learning Models, we used RandomOverSampler, RandomForestClassifier, and KMeans clustering.

Results for KMeans Clustering�

15 of 27

Results

These were the books recommended by the clustering model.

16 of 27

About our Book Recommendation System

For our recommendation system, we have built a simple recommender and content-based recommender.

Simple Recommender: Content-Based:

17 of 27

Simple Recommender Approach (Popularity-based)

The dataset included these features/attributes necessary to build our recommender:

  • Title
  • Average_rating
  • Rating_count

In order for us to recommend book titles based on popularity, we needed to collect the weighted rating for each title

18 of 27

Simple Recommender Approach (Popularity-based)

Once the weighted values were created, we could sort our list of books by top 25 most popular titles.

19 of 27

Results

20 of 27

Content-Based Approach

21 of 27

Content-Based Approach

22 of 27

Content-Based Approach

23 of 27

Content-Based Approach

24 of 27

Results

25 of 27

Limitations

Limitations for this content-based model is that we are using randomly generated attributes so our output will not be accurate.

When typing in the book title, the title must match exactly as the book title in the list. This can be solved by using a drop-down book title menu.

26 of 27

What we would have done differently?

  1. Test more models with machine learning.
  2. Web Scraping- get the code to successfully collect the genre data necessary.
  3. Find more data to test with
  4. Bug fixes in HTML
  5. Connect our clustering recommender to our genre based recommender to see what books it would select
  6. Include in our webpage a section where a user can type the book title and it will recommend similar titles

27 of 27

Dashboard