1 of 47

Region of Boom

Finding growth predictors for major US metropolitan housing markets

by: Alec Hartman, Daniel Guerrero,

Noah Melngailis and Nick Joseph

Video Goes Here

2 of 47

Team

Data & Urban Development

Alec Hartman - Labeling Data for Modeling

Noah Melngailis - Tools, Modeling Issues

Daniel Guerrero - Model, Results, and Takeaways

Nick Joseph - Project Overview

Video Goes Here

3 of 47

Agenda

Executive Summary

Project Introduction

The Problem / Our Solution

Our Findings

Takeaways

Video Goes Here

4 of 47

Executive Summary

Objective: Identify markets that are going to experience significant housing infrastructure growth over the next two years

Solution: Using historical multifamily housing data, we can predict markets that will see growth in property investment & construction

Results: We can accurately predict emerging housing markets with 91% accuracy, using historical multifamily housing data

Video Goes Here

5 of 47

The Problem

Our Solution

Case Study

Create a machine learning model that can help us recommend markets TestFit should target in 2021, using historical construction data from cities all over the US

Our stakeholder is TestFit, a software company that designs building diagrams for architects and developers.

Should TestFit invest in targeting

San Antonio, TX in 2021 in order to maximize their gains in 2023?

Video Goes Here

6 of 47

Skills and tools used throughout the data science pipeline

Pandas�Numpy�RegEx

Web Scraping�Pandas

Matplotlib�Seaborn�SciPy

Tableau�Google Slides�Jupyter Notebook

Sklearn to scale, model and evaluate our results

Modeling

Acquisition

Exploration

Delivery

Preparation

Video Goes Here

7 of 47

US Census Bureau Building Permits Surveys

1997-2019

390 metro areas

High Density Multifamily Housing

Market Size

Number of buildings Number of units

The market performance

Overall market growth

What did the data look like?

Information in the data

Information we calculated

San Antonio 2017

What is an observation in the data?

Where did it come from?

What date range did it include?

What was the size of the data?

What is the data showing?

Video Goes Here

8 of 47

Next step - Modeling

What do we need to create a model?

Should TestFit target

San Antonio, TX in 2021?

Compares the data to historical data

Prediction

Modeling

No labels = No predictions

Challenges:

How do you define a hot market?

How do you measure a hot market?

Labels

Solution:

Clustering

Video Goes Here

Why did we have to create labels?

If want to train a model to predict what markets will be considered “hot” in the future, we need to give it historical data that has those definitions, otherwise the model won’t know how to make those predictions.

How did we create them?

We used a special type of modeling that help us group the data together, so that we can create clusters. The model looks at all the points, and begins to assign them to groups based on their position to other points.

Challenges:

There is no universal definition for when a market is “hot”
There are many combinations of features that could be used to define a booming market

So we used a type of machine learning, called clustering, to help us create labels to indicate when we should have entered a market in the past that would have given us the best return on investment.

9 of 47

What is clustering?

Evolution Index

Average Units per Building

Cluster 3

Cluster 1

Cluster 2

Cluster 0

Cluster 4

Cluster 5

Top performing markets

Low performing markets

Modeling

Under-

performing

On pace

Outpacing

Video Goes Here

Fewer

More

Our city

All cities

Our city

All cities

10 of 47

How do we use clusters?

Cluster 3

Cluster 1

Cluster 2

Cluster 0

Cluster 4

Cluster 5

Top performing markets

Low performing markets

Modeling

Video Goes Here

Evolution Index

Under-

performing

On pace

Outpacing

Our city

All cities

Our city

All cities

Average Units per Building

Fewer

More

11 of 47

How do we use clusters?

Modeling

We identified markets that

were underperforming at

one point...

San Antonio TX, 2012

Applied this logic to all observations

to create labels for our model.

...and two years later were outpacing the greater U.S. market.

San Antonio TX, 2014

Video Goes Here

Evolution Index

Under-

performing

On pace

Outpacing

Our city

All cities

Our city

All cities

Average Units per Building

Fewer

More

12 of 47

What do our clusters actually look like?

Video Goes Here

Average Units per Building

Evolution Index

What groups exist in our data?

Our city

All cities

Our city

All cities

13 of 47

Next step - Modeling

Should TestFit enter San Antonio, TX in 2021?

Compares the data to historical data

Prediction

Now that we have our labels - we are ready to model

Our solution

Video Goes Here

14 of 47

We take our predictions for 2021

Difference of $$

Difference of $$$$

San Antonio, TX in 2021

Market Size

Number of buildings

The market performance

The cluster the city belongs to

Prediction

Should Enter

Historical Data

How does the model work?

We used a classification model called K-Nearest Neighbors

Our solution

Create predictions for 2021 using trends from 2018 and 2019

Video Goes Here

Number of apartments per building

We create prediction data for 2021 using trends from 2018 and 2019

We then use a classifciation model called KNN. The model takes our predictions, and compares them to the historical data. Similarly to clustering, it looks for the relative distance between the points, and assigns a prediction based on the proximity of that point to a group. For example, San Antonio is closer in value to the green group than the red group, so the model would predict San Antonio to be green. In reality, our model is a bit more complex. It looks at five different features, and using a combination of all of this metrics, it compares it to the historical data to understand what label it should be assigned. In our case, our model predicts that San Antonio will infact begin to boom in 2023, and recommends that TestFit enters the market in 2021.

15 of 47

Model Results

Likelihood of accurately predicting emerging markets?

91%

From our predictions, we identified 72 markets that we recommend entering in 2020 and 2021.

Findings

Video Goes Here

16 of 47

Model Results

Markets with high ROI

Medium Markets

Steady growth markets

Markets we don’t recommend entering

We further broke these markets into subcategories, for prioritization:

Findings

+$330M

+$270M

+$100M

-$332M

Video Goes Here

58 Cities

9 Cities

44 Cities

19 Cities

From the 72 markets, we further broke these markets into subcategories for prioritization.

The first market was a market we labeled as “markets with high return on investment”. Basically, these are the markets that we predict will be underperforming in 2021, and will be over performing by 2023. We have identified 13 cities that would fall within this group, and we expect that over the next two years, on average, each of these cities could grow by about $330 M.

The next group is the medium markets. Markets that we predict will be performing about the same as the rest of the market for 2021, but will likely be overperforming by 2023. We expect each of these cities to increase in value by about $270M over the next two years.

Third, we have the steady growth markets, which we predict they will be overperforming by 2021, and will continue to overperform by 2023. These markets will likely increase, on average, by $100M over the next two years.

The last group is the markets we don’t recommend TestFit enters in 2021. These are markets we predict will decrease, on average, by about $330M over the next two years.

17 of 47

Key - Takeaways

Takeaways

91% model accuracy

72 profitable markets

9 markets /~ $330M growth

Predict emerging markets

Bring census population data

Objective

Findings

Future improvements

Video Goes Here

Now, to bring everything together

As a reminder, we were trying to predict which cities in the US would be “emerging markets” by 2023, so we could provide the best recommendations for what cities our stakeholder, TestFit, should invest in 2021 so that they can obtain the best return on investment.

We found that we could use historical multifamily housing data to predict with 91% accuracy (on new data), which markets would be “booming” in the future. Using our predictive model, we identified 82 markets that would be profitable for our stakeholder to enter, and we further identified 13 markets that we believe would offer the greatest return on investment.

If we had had more time, we suggest bringing census population data. As we are trying to predict multifamily housing trends, the ability to understand population trends would greatly help us increase our model’s accuracy

18 of 47

Thank you!

We are the Data and Urban Development Team

Alec Hartman

Noah Melngailis

Daniel Guerrero

Nick Joseph

You can read more about our project here:

To learn more about TestFit:

https://blog.testfit.io/testfit-home

Video Goes Here

https://alumni.codeup.com/

https://bit.ly/3eWMMiP

19 of 47

Appendix

20 of 47

Introduction

21 of 47

Project Introduction

Stakeholder

TestFit.io

Founded in 2015 and based in Dallas, TX
Clifton Harness, CEO
streamlines feasibility studies, helping architects craft buildings more quickly

Research Question

How many high-density, multifamily structures are being built in the

U.S. everyday?

22 of 47

Acquisition

23 of 47

\

Census Bureau Building Permit Surveys

390 metro areas

1997-2019

US Department of Commerce Building Permits Survey

Features of this data set:� - Number of units� - Number of structures� - Value of structures

What was the size of the data?

What dates?

Where did it come from?

What did it say?

took the data and turned it into a usable dataframe by transforming the data into unique city_state_year observations

Acquisition

24 of 47

Prepare

25 of 47

Explore

26 of 47

Label Creation

The avg number of units per building, as an indicator of urban growth / population density

Group 1

Group 3

Group 2

The clustering model look at all the points, and assign them to groups based on their proximity to other points.

So we created clusters using:

How the city was performing when compared to the rest of the country

Exploration

27 of 47

How we created labels for the data

...and then two years later had been overperforming...

EI

Average Units per Building

San Antonio TX, 2012

San Antonio TX, 2014

Top performing cities

Low performing cities

We look for all the cities that at one point where underperforming...

...and we used these observations as our labels for the model.

Exploration

28 of 47

What do the clusters look like?

EI

Average Units per Building

Cluster 3

Cluster 1

Cluster 2

Cluster 0

Cluster 4

Cluster 5

Top performing markets

Low performing markets

Exploration

Fewer units

More Units

Underperforming

On pace

Outpacing

29 of 47

What did our clusters look like?

Each point represents a city at a specific year

Dallas, TX in 2012

*X are centerpoints of each cluster

30 of 47

THen we had to label our clusters

City was a low performing on 2009

After two years, there was significantly higher investment in infrastructure

We noticed that when cities were under performing, they would be in cluster 0 or cluster 4. If they started over performing, they would move to cluster 3 or 1.

So we began doing this for random cities, and study the behaviors. Using our observations, we were able to come up with a label for each group or cluster.

31 of 47

What could we do we our clusterS?

Underperforming markets building an average number of units per building

Mixed growth markets building a high number of units per building

Underperforming markets building a low number of units per building

Mixed growth markets building a low number of units per building

Markets outpacing the population building an average number of units per building

Mixed growth markets building an average number of units per building

32 of 47

What could we do we our clusterS?

We want markets that are here in 2021

And will move here by 2023

33 of 47

How we labeled the data

We told the computer to look for all the cities that at one point where underperforming, and then two years later had been overperforming, and we used these observations as our labels for the model.

34 of 47

Model

35 of 47

How do we make predictions?

35

We use data from 2018 and 2019 to create predictions on what the markets would look like in 2020 and 2021

Calculate values for 2021

Values for 2018

Values for 2019

Use the trends and values

Modeling

36 of 47

36

37 of 47

...and our set of editable icons

You can resize these icons, keeping the quality.

You can change the stroke and fill color; just select the icon and click on the paint-bucket/pen.�

38 of 47

Business Icons

39 of 47

Avatar Icons

40 of 47

Creative Process Icons

41 of 47

Educational Process Icons

42 of 47

Help & Support Icons

43 of 47

Medical Icons

44 of 47

Nature Icons

45 of 47

Performing Arts Icons

46 of 47

SEO & Marketing Icons

47 of 47

Teamwork Icons