1 of 7

AI Mini-Project ReelRatings

Group 16: Jeff Suliga, Tanya Acharya, Rishi Patel, Parth Mittal

2 of 7

Problem Statement & Analysis

In a world flooded with content, it's hard to figure out what's worth your time.
Reviews are crucial for deciding if movies or shows match your preferences, but many review sites, like Letterboxd, can be confusing due to "joke" reviews.
ReelRatings is a tool that analyzes user reviews to predict real ratings, helping users determine if a movie or show has a positive or negative sentiment and navigate the platform more effectively.

3 of 7

Use-Case Scenarios

Help users decide which movies/shows to watch
Filter out “joke” reviews via analyzing sentiment, allowing users to see reviews based on a more accurate understanding of their meaning
Can be implemented by companies wanting to improve their recommendation algorithms
Can help filmmakers and producers understand insights to how the public perceives their work

4 of 7

AI Algorithm & Model

Support Vector Regression (SVR) Model:

Implemented a traditional Machine Learning approach using a Support Vector Regression (SVR) model on 50,000 IMDB reviews.
Designed for numerical sentiment analysis, providing a baseline for performance comparison.
Preprocessing steps included text normalization, tokenization, and TF-IDF vectorization for feature extraction.

DistilBERT Model:

Leveraged DistilBERT, a transformer-based model optimized for efficiency.
Despite being smaller and faster than BERT, DistilBERT retains 97% of its predecessor's language understanding capabilities.
Fine-tuned on 15,000 IMDB reviews, optimizing for sentiment classification within our computational limits.

"Let's talk about the AI algorithms and models that power our sentiment analysis tool.

Regression Model

We've implemented a traditional Machine Learning approach, using a Support Vector Regression, or SVR model, on 50,000 IMDb reviews. The SVR model serves as our baseline for numerical sentiment analysis. It was essential for comparison purposes and was designed with preprocessing steps like text normalization, tokenization, and TF-IDF vectorization to extract meaningful features from the text data.

DistilBERT Model

Next, we have the DistilBERT model, a leaner, more efficient variant of the well-known BERT architecture. Despite being optimized for efficiency, DistilBERT retains a remarkable 97% of its predecessor's language understanding capabilities. We fine-tuned DistilBERT on 15,000 IMDb reviews, optimizing for sentiment classification within our computational constraints.

Together, these models form the backbone of ReelRatings, providing us with both granular and categorical insights into the sentiment of movie reviews."

5 of 7

Results and Demonstration

SVR Model Performance

The SVR model's RMSE of 0.3085 is indicative of its precision in predicting sentiment scores. This level of accuracy suggests that the model can discern the subtleties within the sentiment spectrum effectively.

DistilBERT Model

Threefold reduction seen in Training Loss across Epochs
Comparing the RMSE of the SVR (0.3085) and DistilBERT (0.3271) models highlights a close contest in predictive accuracy.
High level of reliability for classification task.

Combined Approach Analysis

Leveraging the strengths of both models, our combined approach offers a more balanced sentiment rating.
DistilBERT first classifies the sentiment (positive/negative), which then guides the SVR model to adjust its rating within a specific range, providing a more context-aware sentiment score.

Looking at the Results and Demonstration let’s delve into the performance of our sentiment analysis models.

SVR Model Performance

First, we discuss the SVR model. Our model showcased remarkable precision with an RMSE of 0.3085. This suggests that the SVR model can effectively discern the nuances within the sentiment spectrum, providing us with a reliable sentiment score.

DistilBERT Model

The DistilBERT model showed a substantial reduction in training loss by more than threefold from the first to the third epoch, which illustrates the model's rapid adaptation and learning efficiency. When we compare the RMSE of the SVR and DistilBERT models, we see a tight race in predictive accuracy, indicating the efficacy of both models in sentiment analysis tasks. The slightly higher RMSE for DistilBERT may be due to the model's categorical nature versus the continuous numeric predictions of the SVR.DistilBERT's accuracy, precision, recall, and F1 scores collectively suggest a high level of reliability for classification tasks, making it an excellent choice for applications that require binary sentiment decisions.

Combined Approach Analysis

Lastly, our combined approach takes advantage of both models' strengths, offering a more balanced sentiment rating. DistilBERT first classifies the sentiment into positive or negative categories, which then informs the SVR model to adjust its rating within a specific range. This results in a context-aware sentiment score that reflects a deeper understanding of the review's sentiment.

Bad: This movie was absolutely terrible. Terrible acting, dialogue, etc.

Mid: This movie was so average that I forgot what happened halfway through.

Good: Nolan has come out with his greatest movie ever. I would go see it again tomorrow if I could.

6 of 7

Lesson Learned

Increased Epochs:

Training the model for longer periods (more epochs) can improve its learning and fine-tuning.

Expanded Training Dataset:

Using a larger and diverse collection of reviews can help the model understand a wider range of opinions and become more accurate.

Enhanced Hardware Resources:

Using faster and more powerful computers can speed up the model training, allowing it to handle more complex tasks and larger sets of data.

Ensemble Methods:

Combining multiple models or techniques can improve overall prediction accuracy by leveraging diverse approaches.

��

7 of 7

Q & A