1 of 10

Predicting Restaurant Traffic

Art or Science?

Manu Lohiya

August 2017

Data Science Bootcamp, General Assembly

2 of 10

The Problem Statement

I am an independent restaurant owner with a very small marketing budget. How can I bring in more customers?

3 of 10

Hypothesis

The number of “check-ins” at a restaurant has a relationship with various attributes of the restaurant’s profile

4 of 10

The Data: DataSF + Foursquare

  1. Health Inspections� (DataSF, 2014-17)
  • Inspection ID
  • Business ID
  • Business Name
  • Lat/Long
  • Inspection Score

2. DataSF + Foursquare Mapping (Foursquare API Search)

  • Business ID
  • Business Name
  • Foursquare ID
  • Foursquare Name

3. Foursquare Attributes (Foursquare API Venues)

  • Foursquare ID
  • Foursquare Name (string)
  • checkinsCount (int)
  • createdAt (timestamp)
  • hasMenu (boolean)
  • isVerified (boolean)
  • Photos (continuous)
  • Price (discrete)
  • Rating (discrete)
  • ratingSignals (continuous)
  • tipCount (continuous)

5,568 Rows

5 of 10

EDA - Health Scores

HIGH RISK Restaurants (Lowest Quartile Scores)

LOW RISK Restaurants (Highest Quartile Scores)

6 of 10

EDA - Collinearity

7 of 10

Training - Linear Regression (checkinsPerDay ~ X)

Feature

Adj. R^2

t score

Result

Number of Ratings

0.655

54

Significant

Avg Rating Score

0.21

21

Significant

Avg Health Score

0.01

3

Significant

Has Menu

0.02

5

Significant

Price

0.05

9

Significant

8 of 10

Testing: Using Sklearn and cross-validation (RatingSignals vs CheckinsPerDay) to predict

Takeaways: A restaurant owner should focus on trying to get as many ratings as possible. This feature alone predicts 65% of checkins.

It is interesting to note that the quality of the ratings matters less. In other words, customers are more likely to come to a restaurant with more ratings and a lower score than a restaurant with high score and few ratings.

Method

Score

Sklearn on df_test

0.65

Cross-validation on df

0.67

9 of 10

Next Steps

  • NLP Analysis on individual reviews
  • Time series analysis on each check-in and rating to address seasonality and promotion trends
  • Compare against other data sources such as Yelp, google, menupages
  • Compare against other cities
  • Consider other outcome variables (such as rating)

10 of 10

Thank You!