After Repair Property Value Prediction Tool
(Patent Pending)
ARV Holdings LLC
1
Background and Theory
2
Real Estate Industry Background
3
Accurate pricing estimations
An Industry-Wide Problem:�Zestimate® Model Accuracy is Insufficient for Investors
Zillow® Errors Plot
4
Median Error:
7.3% (± $20,200)
Solution:�Automation of After Repair Valuation (ARV)
What is ARV?
“After-Repaired Value” is a pricing metric of a property’s fully renovated value
Why ARV?
5
Hypothesis:
An ARV model will have reduced errors compared to a similar price prediction model that is using all other sold properties
Implementation
6
Data: Red Cedar Real Estate®
7
8
ARV Estimator Process
SQL call to remotely import Data
Process Property Data
Derive Renovation Term weights using TfidfVectorizer
Predict renovation status with classification model
Geo-coding Lat/Lon to Census Tract Script
Variables normalized to by Tract
Predict ARV with regression model
Visualize ARVs on a Web Tool
Identifying Renos Vs. non-Renos
13% determined
Deriving Ground Truth for Renovation Status
Identifying Renovations
Identifying Non Renovations
9
Renovation Classification Model Results
10
A Peak Under the Hood:�Renovation Classification Models Tested
11
Classification Model | F1score | Accuracy |
Linear SVC | 83.8% | 95.0% |
Logistic Regression Classifier | 83.6% | 94.8% |
Extra Trees Classifier | 81.8% | 94.2% |
SGD Classifier | 81.8% | 93.8% |
Random Forest Classifier | 81.7% | 94.4% |
A Peak Under the Hood:�Linear SVC Renovation Classification Performance
12
Accuracy: 0.950 ±0.008
F1score: 0.838 ±0.021
Methodology:
Results were summarized by averaging the scores obtained from a k-fold cross-validation run.
A Peak Under the Hood:�Linear SVC Renovation Feature Importances
13
Renovated.
Vibrant words like "granite", “stunning”, “gorgeous”, or ”stainless” are strong indicators of a renovated property.
Non-Renovated.
Characteristics of the sale itself like “estate sale”, “investor”, “opportunity”, or “sold” indicate a non-renovated property.
Property Descriptions fed into the TfidfVectorizer() function to derive weights indicating renovation status.
ARV Regression Model Results
14
A Peak Under the Hood: �Raw ARV Regression Models Compared
(Results trained/tested on an 80/20 split)
15
Regression Model | R2 Score | 50th Percentile of Absolute Errors | 75th Percentile of Absolute Errors |
Extra Trees Regression | 0.851 | 6.62% | 12.85% |
Random Forest Regression | 0.840 | 6.87% | 13.24% |
Gradient Boosting Regression | 0.817 | 7.84% | 14.28% |
KNN Regression | 0.737 | 8.85% | 16.73% |
Linear Regression | 0.780 | 10.80% | 19.97% |
A Peak Under the Hood:�Extra Trees Regression Model Performance
16
Percentile Errors:
25%: 0.0297
50%: 0.0662
75%: 0.129
R2Score: 0.983
Methodology:
Results were obtained using a train-test method on an 80/20 split.
Percent Error
Number of Properties per Bin
Bins of Properties by Percent Error
The predicted value of most renovated properties are within 5% - 10% of the true value, with some properties having greater errors on the tail end.
Comparison of ARV Model with Generic Value Models
17
Full Model Errors vs ARV Model Errors
18
Data used for Extra Trees Regression Model | R2 Score | 25th Percentile of Absolute Errors | 50th Percentile of Absolute Errors | 75th Percentile of Absolute Errors | Mean Dollar Error | Median Dollar Error |
Full Data | 0.854 | 4.55% | 10.38% | 21.66% | $45,100 | $26,138 |
Renovated Data | 0.851 | 2.93% | 6.62% | 12.85% | $42,214 | $20,425 |
ARV Median Error Rate: 6.62%
Full Model Median Error Rate: 10.38%
Full Model Errors vs ARV Model Errors vs Researched Model Errors
19
Median Error Rates of researched price prediction models: 7.3% -12.27%
ARV Median Error Rate: 6.62%
The End Game
20
Potential ARV Visualizations: �Aggregated into a Tableau® Map
21
Note: Data displayed in mouseover is currently pre-COVID
Potential Visualizations:�Posted Individually on Website
22
Big Lesson Learned
23
Big Lesson Learned:�Complexities in Value From Variable Interactions
24
Big Lesson Learned:�Complexities in Value From Variable Interactions
diffFrom_Med_ARV_SqftPerc
25
Big Lesson Learned:�Complexities in Value From Variable Interactions
SqftPerBaths
26
Future of ARV Prediction Tool
27
Questions?
28
Appendix
29
Tools and Methods
30
Median Absolute Error Rates of All Models
31
Model | Median Absolute Error |
Kummerow’s OLS model | 12.27% |
Clapp’s Local Regression | 11.31% |
This Project’s Model (Full Data) | 10.38% |
Freddie Mac Model-Use Requirement | 10.00% |
Dubin’s Kriging Model | 8.34% |
Case’s Homogeneous Districts | 8.07% |
Kintzel’s Random Forest | 7.60% |
Zillow | 7.30% |
This Project’s Model (Renovation Data) | 6.62% |
Reflections and Future Work
32
Works Cited
33