Building a Book Recommendation System
Natasha Lamperti, Ru Sanjeev, Yurii Hanley, Julianne Itliong
Storyboard and Workflow Process
Collecting the Data
After exploring many datasets, we decided to choose a book dataset from Kaggle that included attributes such as: book title, isbn, average rating, rating count, ect. We wanted to add more features to the dataset to ensure our ML models and recommendation system will work.
Exploring the Data
When we finally got our dataset finalized, we wanted to summarize our data with visualizations using Tableau. This allows our audience for the project to digest our data in a simple and visual way.
Building and Testing ML Models
For our Machine Learning models, we used three models to test if our dataset can make close-to-accurate predictions. We used RandomOverSampler, RandomForestClassifier, and KMeans to collect accuracy scores, precision, sensitivity, and F1 scores.
Creating the Webpage
Our goal is to deploy a GitHub page to display our results and create an user-friendly interface for our recommendation systems.
Building the Recommendation System
After adding dummy data to create four new features: genre, most votes by gender, average age, and age groups, we built a popularity-based recommendation system and a content-based filtering recommendation system.
Why a recommendation system?
Our goal was to create a book recommendation system.
We chose a recommendation system because popular platforms such as Netflix, Amazon, and Spotify use them to keep their users interested and takes the guesswork out of having to choose their next show, item to purchase, or song to listen to.
There are many different inputs that the model can look at with a recommendation system for the user or subject. It seemed like a broad and relevant topic in today’s data science realm and a good opportunity to practice a number of the skill that we learned over the course of the past several months. As an added bonus, we all enjoy reading!
Dataset
Data Source: Goodreads Books.csv sourced by Nilim-Kaggle
Columns:
Software: Python 3.7.6, sqlite, Jupyter Notebook
The code for scraping the books genre ran successfully, unfortunately we ran into issues with HTTP Error 429: Too Many Requests and in the end moved forward with dummy data.
Dataset - Adding features with web scraping
Database using SQLite
We chose SQLITE for our database as it suited the size of our dataset and our need for this project.
Summary of Dataset using Tableau visualizations
Summary of Dataset using Tableau visualizations
User Age
Summary of Dataset using Tableau visualizations
User Location
Most Votes By Gender
Machine Learning Models - Setting Up
Setting up our target column. No Recommend = 0, Yes Recommend = 1
Machine Learning Models - Setting Up
Creating our features
Setting up the y values
Machine Learning Models - Results
For our Machine Learning Models, we used RandomOverSampler, RandomForestClassifier, and KMeans clustering.
Results for RandomOverSampler
�Results for RandomForestClassifier�
Machine Learning Models - Results
For our Machine Learning Models, we used RandomOverSampler, RandomForestClassifier, and KMeans clustering.
Results for KMeans Clustering�
��
Results
These were the books recommended by the clustering model.
About our Book Recommendation System
For our recommendation system, we have built a simple recommender and content-based recommender.
Simple Recommender: Content-Based:
Simple Recommender Approach (Popularity-based)
The dataset included these features/attributes necessary to build our recommender:
In order for us to recommend book titles based on popularity, we needed to collect the weighted rating for each title
Simple Recommender Approach (Popularity-based)
Once the weighted values were created, we could sort our list of books by top 25 most popular titles.
Results
Content-Based Approach
Content-Based Approach
Content-Based Approach
Content-Based Approach
Results
Limitations
Limitations for this content-based model is that we are using randomly generated attributes so our output will not be accurate.
When typing in the book title, the title must match exactly as the book title in the list. This can be solved by using a drop-down book title menu.
What we would have done differently?
Dashboard
Link to webpage: https://yurii151.github.io/final_project/