Region of Boom
Finding growth predictors for major US metropolitan housing markets
by: Alec Hartman, Daniel Guerrero,
Noah Melngailis and Nick Joseph
Video Goes Here
Team
Data & Urban Development
Alec Hartman - Labeling Data for Modeling
Noah Melngailis - Tools, Modeling Issues
Daniel Guerrero - Model, Results, and Takeaways
Nick Joseph - Project Overview
Video Goes Here
Agenda
Executive Summary
Project Introduction
The Problem / Our Solution
Our Findings
Takeaways
Video Goes Here
Executive Summary
Video Goes Here
The Problem
Our Solution
Case Study
Create a machine learning model that can help us recommend markets TestFit should target in 2021, using historical construction data from cities all over the US
Our stakeholder is TestFit, a software company that designs building diagrams for architects and developers.
Should TestFit invest in targeting
San Antonio, TX in 2021 in order to maximize their gains in 2023?
Video Goes Here
Skills and tools used throughout the data science pipeline
Pandas�Numpy�RegEx
Web Scraping�Pandas
Matplotlib�Seaborn�SciPy
Tableau�Google Slides�Jupyter Notebook
Sklearn to scale, model and evaluate our results
Modeling
Acquisition
Exploration
Delivery
Preparation
Video Goes Here
US Census Bureau Building Permits Surveys
1997-2019
390 metro areas
High Density Multifamily Housing
Market Size
Number of buildings Number of units
The market performance
Overall market growth
What did the data look like?
Information in the data
Information we calculated
San Antonio 2017
What is an observation in the data?
Where did it come from?
What date range did it include?
What was the size of the data?
What is the data showing?
Video Goes Here
Next step - Modeling
What do we need to create a model?
Should TestFit target
San Antonio, TX in 2021?
Compares the data to historical data
Prediction
Modeling
No labels = No predictions
Challenges:
How do you define a hot market?
How do you measure a hot market?
Labels
Solution:
Clustering
Video Goes Here
What is clustering?
Evolution Index
Average Units per Building
Cluster 3
Cluster 1
Cluster 2
Cluster 0
Cluster 4
Cluster 5
Top performing markets
Low performing markets
Modeling
Under-
performing
On pace
Outpacing
Video Goes Here
Fewer
More
Our city
All cities
Our city
All cities
How do we use clusters?
Cluster 3
Cluster 1
Cluster 2
Cluster 0
Cluster 4
Cluster 5
Top performing markets
Low performing markets
Modeling
Video Goes Here
Evolution Index
Under-
performing
On pace
Outpacing
Our city
All cities
Our city
All cities
Average Units per Building
Fewer
More
How do we use clusters?
Modeling
We identified markets that
were underperforming at
one point...
San Antonio TX, 2012
Applied this logic to all observations
to create labels for our model.
...and two years later were outpacing the greater U.S. market.
San Antonio TX, 2014
Video Goes Here
Evolution Index
Under-
performing
On pace
Outpacing
Our city
All cities
Our city
All cities
Average Units per Building
Fewer
More
What do our clusters actually look like?
Video Goes Here
Average Units per Building
Evolution Index
What groups exist in our data?
Our city
All cities
Our city
All cities
Next step - Modeling
Should TestFit enter San Antonio, TX in 2021?
Compares the data to historical data
Prediction
Now that we have our labels - we are ready to model
Our solution
Video Goes Here
We take our predictions for 2021
Difference of $$
Difference of $$$$
San Antonio, TX in 2021
Market Size
Number of buildings
The market performance
The cluster the city belongs to
Prediction
Should Enter
Historical Data
How does the model work?
We used a classification model called K-Nearest Neighbors
Our solution
Create predictions for 2021 using trends from 2018 and 2019
Video Goes Here
Number of apartments per building
Model Results
Likelihood of accurately predicting emerging markets?
91%
From our predictions, we identified 72 markets that we recommend entering in 2020 and 2021.
Findings
Video Goes Here
Model Results
Markets with high ROI
Medium Markets
Steady growth markets
Markets we don’t recommend entering
We further broke these markets into subcategories, for prioritization:
Findings
+$330M
+$270M
+$100M
-$332M
Video Goes Here
58 Cities
9 Cities
44 Cities
19 Cities
Key - Takeaways
Takeaways
91% model accuracy
72 profitable markets
9 markets /~ $330M growth
Predict emerging markets
Bring census population data
Objective
Findings
Future improvements
Video Goes Here
Thank you!
We are the Data and Urban Development Team
Alec Hartman
Noah Melngailis
Daniel Guerrero
Nick Joseph
You can read more about our project here:
Video Goes Here
Appendix
Introduction
Project Introduction
Stakeholder
U.S. everyday?
Acquisition
\
Census Bureau Building Permit Surveys
390 metro areas
1997-2019
US Department of Commerce Building Permits Survey
Features of this data set:� - Number of units� - Number of structures� - Value of structures
What was the size of the data?
What dates?
Where did it come from?
What did it say?
took the data and turned it into a usable dataframe by transforming the data into unique city_state_year observations
Acquisition
Prepare
Explore
Label Creation
The avg number of units per building, as an indicator of urban growth / population density
Group 1
Group 3
Group 2
The clustering model look at all the points, and assign them to groups based on their proximity to other points.
So we created clusters using:
How the city was performing when compared to the rest of the country
Exploration
How we created labels for the data
...and then two years later had been overperforming...
EI
Average Units per Building
San Antonio TX, 2012
San Antonio TX, 2014
Top performing cities
Low performing cities
We look for all the cities that at one point where underperforming...
...and we used these observations as our labels for the model.
Exploration
What do the clusters look like?
EI
Average Units per Building
Cluster 3
Cluster 1
Cluster 2
Cluster 0
Cluster 4
Cluster 5
Top performing markets
Low performing markets
Exploration
Fewer units
More Units
Underperforming
On pace
Outpacing
What did our clusters look like?
Each point represents a city at a specific year
Dallas, TX in 2012
*X are centerpoints of each cluster
THen we had to label our clusters
City was a low performing on 2009
After two years, there was significantly higher investment in infrastructure
We noticed that when cities were under performing, they would be in cluster 0 or cluster 4. If they started over performing, they would move to cluster 3 or 1.
So we began doing this for random cities, and study the behaviors. Using our observations, we were able to come up with a label for each group or cluster.
What could we do we our clusterS?
Underperforming markets building an average number of units per building
Mixed growth markets building a high number of units per building
Underperforming markets building a low number of units per building
Mixed growth markets building a low number of units per building
Markets outpacing the population building an average number of units per building
Mixed growth markets building an average number of units per building
What could we do we our clusterS?
We want markets that are here in 2021
And will move here by 2023
How we labeled the data
We told the computer to look for all the cities that at one point where underperforming, and then two years later had been overperforming, and we used these observations as our labels for the model.
Model
How do we make predictions?
35
We use data from 2018 and 2019 to create predictions on what the markets would look like in 2020 and 2021
Calculate values for 2021
Values for 2018
Values for 2019
Use the trends and values
Modeling
36
...and our set of editable icons
You can resize these icons, keeping the quality.
You can change the stroke and fill color; just select the icon and click on the paint-bucket/pen.�
Business Icons
Avatar Icons
Creative Process Icons
Educational Process Icons
Help & Support Icons
Medical Icons
Nature Icons
Performing Arts Icons
SEO & Marketing Icons
Teamwork Icons