Analysis of Starbucks reviews
Exploratory data analysis
Key findings:
- imbalanced target variable
- significant number of missing value
- some useless features
🡪 Drop Nan and useless features
Text preprocessing
Amber and LaDonna at the Starbucks on Southwest Parkway are always so warm and welcoming. There is always a smile in their voice when they greet you at the drive-thru
amber ladonna starbuck southwest parkway alway warm welcom alway smile voic greet drivethru
To make text more machine-interpretable:
- character to lowercase
- remove punctuation
- removes line breaks '\n'
- removes stopwords
- changes the words to their stem
Vectorization – Word2Vec
🡪 shallow neural network model used to create word embeddings from a large corpus.
Parameters used:
- vector_size=25
- window=5
- min_count=3
- sg=0 (continuous-bag-of-word approach)
Resulting embedding has 1100 features 🡪 apply PCA for dimensionality reduction
Vectorization - Tfid
Short for Term Frequency Inverse Document Frequency.
Take two sentences:
d1: “my new model is better than your model”
d2: “my new model is good”
For word “model” we have:
d1: Tf = 2
d2: Tf= 1
To get tf-id scores compute:
d1: 2 * log(2/2)
d2: 1 * log(2/2)
The formula to compute it is:
Where:
- tf = term frequency
- N = # of documents
- df = document frequency
Sentiment Analysis
Map labels to positive vs negative reviews 🡪 binary classification
RandomForestClassifier(
class_weight: 'balanced’,
criterion: 'entropy’,
max_depth: 8,
max_features: 'sqrt’,
min_samples_leaf: 5)
Best model
Cross validation ‘balanced accuracy score’: 0.80
Classification report (on test)
Rating prediction
Multiclass classification
Regression
One vs Rest Classifier
Output Code Classifier
Neural Networks
Rating prediction – NN setup
Setup:
- two layers with ReLu activation
- two dropout layers
- softmax layer for prediction
3) Train!
Rating prediction – NN results
Setup:
- two layers with ReLu activation
- two dropout layers
- softmax layer for prediction
3) Train!
Classification report (on test)
Thank you for your attention �: )