1 of 17

Final Presentation

  • Liam Owens
  • Xuan Liu
  • Chensheng Wen

2 of 17

  • Our Goals
  • Top Model -- Ridge Regression
  • Top Model -- Multiple Linear Regression
  • Top Model -- Support Vector Regression
  • Comparison of Prediction
  • Dash Visualization
  • Conclusion

Agenda:

3 of 17

Our Goals

  1. What are the important features to predict target variable(“Quantity”)?
  2. What are the excellent models to evaluate Corona’s data?
  3. What is the difference between data with 2020 and data without 2020?

4 of 17

Top Model 1-- Ridge Regression

Equation:

-Training Metrics:

R squared: 0.8297569444513031

Mean Absolute Error: 0.3018142569168214

Mean Squared Error: 0.1300680362503906

Root Mean Squared Error: 0.36064946450867025

Quantity = -0.02247693 + (0.32916839 * Housing_total_sales_NO_SI) + (0.07424731 * Gray_cement_dispatch_contractor )+ (-0.13035303 * RADAR_Plumbinglag_3) + (0.05395233 * Housing_total_launch_NO_SI) + ( 0.19237967 * Housing_total_sales) + (0.14497477 * Gray_cement_dispatch_other) + (-0.15374323 * RADAR_Tools) + (0.08755184 * Housing_total_launch_NO_SIlag_4) + (0.160274 * Seasonal) + (-0.01935941 * Offshore_microcredit_loans) + Error

-Testing Metrics:

R squared: 0.9021347876241389

Mean Absolute Error: 0.32789154500662643

Mean Squared Error: 0.1599865167572618

Root Mean Squared Error: 0.3999831455914884

Training & Testing Metrics:

5 of 17

6 of 17

Top Model 2 -- Multiple Linear Regression

Equation:

Quantity = -0.02250867 + (0.33720736 * Housing_total_sales_NO_SI) + (0.07147608 * Gray_cement_dispatch_contractor )+( -0.12998208 * RADAR_Plumbinglag_3) + (0.05093813 * Housing_total_launch_NO_SI) +(0.19213823 * Housing_total_sales) + (0.14664222 * Gray_cement_dispatch_other) + ( -0.15276257 * RADAR_Tools) + ( 0.08666063 * Housing_total_launch_NO_SIlag_4) + (0.16080055 * Seasonal) + (-0.0191351 * Offshore_microcredit_loans) + Error

7 of 17

Training Metrics:

R squared: 0.829783817328369

Mean Absolute Error: 0.30181505506879586

Mean Squared Error: 0.13004750500265722

Root Mean Squared Error: 0.3606209991149395

Testing Metrics:

R squared: 0.9021575129730091

Mean Absolute Error: 0.3278331114982842

Mean Squared Error: 0.15994936617719807

Root Mean Squared Error: 0.39993670271331444

Training & Testing Metrics

8 of 17

9 of 17

10 of 17

11 of 17

Residual Plot

12 of 17

Top Model 3-- Support Vector Regression

with 2020 data

Training Metrics:

R squared: 0.7008272989028212

Mean Absolute Error: 0.33637523479818615

Mean Squared Error: 0.16905712162286324

Root Mean Squared Error: 0.41116556473379823

Testing Metrics:

R squared: 0.9090677902662546

Mean Absolute Error: 0.33724280668116485

Mean Squared Error: 0.20035180371388273

Root Mean Squared Error: 0.4476067511933692

without 2020 data

Training Metrics:

R squared: 0.8229853456278702

Mean Absolute Error: 0.30454132150801483

Mean Squared Error: 0.13524163089952712

Root Mean Squared Error: 0.367752132420095

Testing Metrics:

R squared: 0.8987589015416149

Mean Absolute Error: 0.3220724685425985

Mean Squared Error: 0.1655052934727105

Root Mean Squared Error: 0.40682341804855643

13 of 17

Weight of Each Feature for Predicting Quantity

14 of 17

Comparison of Prediction

15 of 17

Dash Visualization

16 of 17

Conclusion

  • These are the most important features:
    • Housing_total_sales_NO_SI
    • Gray_cement_dispatch_contractor
    • RADAR_Tools,Housing_total_launch_NO_SI
    • Housing_total_sales
    • Gray_cement_dispatch_other
    • RADAR_Plumbinglag_3
    • Housing_total_launch_NO_SIlag_4
    • Seasonal
    • Offshore_microcredit_loans
  • The best model is Support Vector Regression
  • The trend of data is different.

17 of 17

All the plotly graphs we made are on this website: Website Link