1 of 24

Luther: Emerging Film Markets

29 January, 2016

Emily Hough-Kovacs

Metis Data Science

2 of 24

“Keep it 8 more than 92 with me, 100”

-Drake,

rapper and budding data scientist

3 of 24

Problem Statement:

Can we predict the percentage of foreign gross total that comes from Hungary?

4 of 24

Understanding the problem

Wrangling the Data

  • BoxOfficeMojo
  • Identify features
  • Identify y value

Ridge Cross Validation

  • All features
  • Remove some categorical
  • Remove all categorical
  • Interpret lambda results

Regression and Results

  • Ridge Regularization
  • Interpret Results
  • Identify points of improvement
  • Iterate

5 of 24

Wrangling the Data

6 of 24

http://www.boxofficemojo.com/movies/?page=intl&id=easya.htm

7 of 24

8 of 24

9 of 24

0.7%*

*Percent of “Easy A” Foreign Gross that came from Hungary

10 of 24

11 of 24

But how do we choose our features?

12 of 24

Feature Selection

  • Runtime
  • Budget
  • Rating
  • Genre
  • Hungarian Distributor
  • (American Distributor)
  • Hungarian Release
  • Foreign Release
  • (American Release)

Given:

  • Runtime
  • Budget

Categorical:

  • Rating
  • Genre (reduced num of categories)
  • Hungarian Distributor

Created:

  • same Hungarian and American distributor?

Time deltas:

  • Hungarian delta
  • Foreign delta

13 of 24

Identifying Our Lambda: Ridge Regression

14 of 24

First Pass: Ridge Regularization with all Features

15 of 24

Second Pass: Remove MPAA Rating and Genre

16 of 24

Third Pass: Feature Selection

Given:

  • Runtime
  • Budget

Given:

  • Runtime
  • Budget

Categorical:

  • Hungarian Distributor

Created:

  • same distributor

Time deltas:

  • Hungarian delta
  • Foreign delta

17 of 24

Third Pass: Only Runtime and Budget, remove nans

18 of 24

Let’s give it a go...

19 of 24

20 of 24

Small variance in y +

Limited (quality) features

------------------------------------------------------

nearly horizontal regression

21 of 24

Better Features

  • Foreign film but not from target country
  • Actors
  • Location Filmed
  • Nationality of director
    • American
    • Hungarian
    • other
  • Funding: backed by foreign or Hungarian filmmakers?

22 of 24

Further Consideration

Not the question

  • Try Lasso, Elastic Net
  • Use larger lambda
  • Predict Lambda
  • Polynomial Combinations of Features
  • Other countries
  • Which genres do best in a given country
  • Predict if a movie is a foreign film

23 of 24

Thank you

Blog: http://emily-hk.com

Github: emilyhoughkovacs

Email: emilyhoughkovacs@gmail

24 of 24

Further Consideration

choose data better before hand (think if data is possible to use)

Better features (fewer categorical variables)

pare down features before using regularization

predict alpha

try lasso, elastic net

try parabolic combination of features

**consider how to identify films that are not american but still foreign to given country