1 of 47

Region of Boom

Finding growth predictors for major US metropolitan housing markets

by: Alec Hartman, Daniel Guerrero,

Noah Melngailis and Nick Joseph

Video Goes Here

2 of 47

Team

Data & Urban Development

Alec Hartman - Labeling Data for Modeling

Noah Melngailis - Tools, Modeling Issues

Daniel Guerrero - Model, Results, and Takeaways

Nick Joseph - Project Overview

Video Goes Here

3 of 47

Agenda

Executive Summary

Project Introduction

The Problem / Our Solution

Our Findings

Takeaways

Video Goes Here

4 of 47

Executive Summary

  1. Objective: Identify markets that are going to experience significant housing infrastructure growth over the next two years

  • Solution: Using historical multifamily housing data, we can predict markets that will see growth in property investment & construction

  • Results: We can accurately predict emerging housing markets with 91% accuracy, using historical multifamily housing data

Video Goes Here

5 of 47

The Problem

Our Solution

Case Study

Create a machine learning model that can help us recommend markets TestFit should target in 2021, using historical construction data from cities all over the US

Our stakeholder is TestFit, a software company that designs building diagrams for architects and developers.

Should TestFit invest in targeting

San Antonio, TX in 2021 in order to maximize their gains in 2023?

Video Goes Here

6 of 47

Skills and tools used throughout the data science pipeline

Pandas�Numpy�RegEx

Web Scraping�Pandas

Matplotlib�Seaborn�SciPy

Tableau�Google Slides�Jupyter Notebook

Sklearn to scale, model and evaluate our results

Modeling

Acquisition

Exploration

Delivery

Preparation

Video Goes Here

7 of 47

US Census Bureau Building Permits Surveys

1997-2019

390 metro areas

High Density Multifamily Housing

Market Size

Number of buildings Number of units

The market performance

Overall market growth

What did the data look like?

Information in the data

Information we calculated

San Antonio 2017

What is an observation in the data?

Where did it come from?

What date range did it include?

What was the size of the data?

What is the data showing?

Video Goes Here

8 of 47

Next step - Modeling

What do we need to create a model?

Should TestFit target

San Antonio, TX in 2021?

Compares the data to historical data

Prediction

Modeling

No labels = No predictions

Challenges:

How do you define a hot market?

How do you measure a hot market?

Labels

Solution:

Clustering

Video Goes Here

9 of 47

What is clustering?

Evolution Index

Average Units per Building

Cluster 3

Cluster 1

Cluster 2

Cluster 0

Cluster 4

Cluster 5

Top performing markets

Low performing markets

Modeling

Under-

performing

On pace

Outpacing

Video Goes Here

Fewer

More

Our city

All cities

Our city

All cities

10 of 47

How do we use clusters?

Cluster 3

Cluster 1

Cluster 2

Cluster 0

Cluster 4

Cluster 5

Top performing markets

Low performing markets

Modeling

Video Goes Here

Evolution Index

Under-

performing

On pace

Outpacing

Our city

All cities

Our city

All cities

Average Units per Building

Fewer

More

11 of 47

How do we use clusters?

Modeling

We identified markets that

were underperforming at

one point...

San Antonio TX, 2012

Applied this logic to all observations

to create labels for our model.

...and two years later were outpacing the greater U.S. market.

San Antonio TX, 2014

Video Goes Here

Evolution Index

Under-

performing

On pace

Outpacing

Our city

All cities

Our city

All cities

Average Units per Building

Fewer

More

12 of 47

What do our clusters actually look like?

Video Goes Here

Average Units per Building

Evolution Index

What groups exist in our data?

Our city

All cities

Our city

All cities

13 of 47

Next step - Modeling

Should TestFit enter San Antonio, TX in 2021?

Compares the data to historical data

Prediction

Now that we have our labels - we are ready to model

Our solution

Video Goes Here

14 of 47

We take our predictions for 2021

Difference of $$

Difference of $$$$

San Antonio, TX in 2021

Market Size

Number of buildings

The market performance

The cluster the city belongs to

Prediction

Should Enter

Historical Data

How does the model work?

We used a classification model called K-Nearest Neighbors

Our solution

Create predictions for 2021 using trends from 2018 and 2019

Video Goes Here

Number of apartments per building

15 of 47

Model Results

Likelihood of accurately predicting emerging markets?

91%

From our predictions, we identified 72 markets that we recommend entering in 2020 and 2021.

Findings

Video Goes Here

16 of 47

Model Results

Markets with high ROI

Medium Markets

Steady growth markets

Markets we don’t recommend entering

We further broke these markets into subcategories, for prioritization:

Findings

+$330M

+$270M

+$100M

-$332M

Video Goes Here

58 Cities

9 Cities

44 Cities

19 Cities

17 of 47

Key - Takeaways

Takeaways

91% model accuracy

72 profitable markets

9 markets /~ $330M growth

Predict emerging markets

Bring census population data

Objective

Findings

Future improvements

Video Goes Here

18 of 47

Thank you!

We are the Data and Urban Development Team

Alec Hartman

Noah Melngailis

Daniel Guerrero

Nick Joseph

You can read more about our project here:

To learn more about TestFit:

https://blog.testfit.io/testfit-home

Video Goes Here

19 of 47

Appendix

20 of 47

Introduction

21 of 47

Project Introduction

Stakeholder

  • TestFit.io
    • Founded in 2015 and based in Dallas, TX
    • Clifton Harness, CEO
    • streamlines feasibility studies, helping architects craft buildings more quickly
  • Research Question
    • How many high-density, multifamily structures are being built in the

U.S. everyday?

22 of 47

Acquisition

23 of 47

\

Census Bureau Building Permit Surveys

390 metro areas

1997-2019

US Department of Commerce Building Permits Survey

Features of this data set:� - Number of units� - Number of structures� - Value of structures

What was the size of the data?

What dates?

Where did it come from?

What did it say?

took the data and turned it into a usable dataframe by transforming the data into unique city_state_year observations

Acquisition

24 of 47

Prepare

25 of 47

Explore

26 of 47

Label Creation

The avg number of units per building, as an indicator of urban growth / population density

Group 1

Group 3

Group 2

The clustering model look at all the points, and assign them to groups based on their proximity to other points.

So we created clusters using:

How the city was performing when compared to the rest of the country

Exploration

27 of 47

How we created labels for the data

...and then two years later had been overperforming...

EI

Average Units per Building

San Antonio TX, 2012

San Antonio TX, 2014

Top performing cities

Low performing cities

We look for all the cities that at one point where underperforming...

...and we used these observations as our labels for the model.

Exploration

28 of 47

What do the clusters look like?

EI

Average Units per Building

Cluster 3

Cluster 1

Cluster 2

Cluster 0

Cluster 4

Cluster 5

Top performing markets

Low performing markets

Exploration

Fewer units

More Units

Underperforming

On pace

Outpacing

29 of 47

What did our clusters look like?

Each point represents a city at a specific year

Dallas, TX in 2012

*X are centerpoints of each cluster

30 of 47

THen we had to label our clusters

City was a low performing on 2009

After two years, there was significantly higher investment in infrastructure

We noticed that when cities were under performing, they would be in cluster 0 or cluster 4. If they started over performing, they would move to cluster 3 or 1.

So we began doing this for random cities, and study the behaviors. Using our observations, we were able to come up with a label for each group or cluster.

31 of 47

What could we do we our clusterS?

Underperforming markets building an average number of units per building

Mixed growth markets building a high number of units per building

Underperforming markets building a low number of units per building

Mixed growth markets building a low number of units per building

Markets outpacing the population building an average number of units per building

Mixed growth markets building an average number of units per building

32 of 47

What could we do we our clusterS?

We want markets that are here in 2021

And will move here by 2023

33 of 47

How we labeled the data

We told the computer to look for all the cities that at one point where underperforming, and then two years later had been overperforming, and we used these observations as our labels for the model.

34 of 47

Model

35 of 47

How do we make predictions?

35

We use data from 2018 and 2019 to create predictions on what the markets would look like in 2020 and 2021

Calculate values for 2021

Values for 2018

Values for 2019

Use the trends and values

Modeling

36 of 47

36

37 of 47

...and our set of editable icons

You can resize these icons, keeping the quality.

You can change the stroke and fill color; just select the icon and click on the paint-bucket/pen.�

38 of 47

Business Icons

39 of 47

Avatar Icons

40 of 47

Creative Process Icons

41 of 47

Educational Process Icons

42 of 47

Help & Support Icons

43 of 47

Medical Icons

44 of 47

Nature Icons

45 of 47

Performing Arts Icons

46 of 47

SEO & Marketing Icons

47 of 47

Teamwork Icons