Luther: Emerging Film Markets
29 January, 2016
Emily Hough-Kovacs
Metis Data Science
“Keep it 8 more than 92 with me, 100”
-Drake,
rapper and budding data scientist
Problem Statement:
Can we predict the percentage of foreign gross total that comes from Hungary?
Understanding the problem
Wrangling the Data
Ridge Cross Validation
Regression and Results
Wrangling the Data
http://www.boxofficemojo.com/movies/?page=intl&id=easya.htm
0.7%*
*Percent of “Easy A” Foreign Gross that came from Hungary
But how do we choose our features?
Feature Selection
Given:
Categorical:
Created:
Time deltas:
Identifying Our Lambda: Ridge Regression
First Pass: Ridge Regularization with all Features
Second Pass: Remove MPAA Rating and Genre
Third Pass: Feature Selection
Given:
Given:
Categorical:
Created:
Time deltas:
Third Pass: Only Runtime and Budget, remove nans
Let’s give it a go...
Small variance in y +
Limited (quality) features
------------------------------------------------------
nearly horizontal regression
Better Features
Further Consideration
Not the question
Thank you
Blog: http://emily-hk.com
Github: emilyhoughkovacs
Email: emilyhoughkovacs@gmail
Further Consideration
choose data better before hand (think if data is possible to use)
Better features (fewer categorical variables)
pare down features before using regularization
predict alpha
try lasso, elastic net
try parabolic combination of features
**consider how to identify films that are not american but still foreign to given country