1 of 17

Team: E-Z Coders

Daniel, Darren, Dwayne, Jessica, and Uzma

2 of 17

Overview

  • One in five U.S. adults live with a mental illness

  • Any Mental Illness (AMI) and Serious Mental Illness (SMI)

  • Educational attainment, median household income, and AMI cases

3 of 17

Questions

What impact did covid have on any mental health in adults?

Was every state/region affected the same way or were some states affected more than others?

How accurately will our ML predict mental health issues across states/regions based on features?

1

2

3

4 of 17

5 of 17

6 of 17

Primary Data Search

  • Finding suitable variable for analysis
  • Recognizing and adapting to limitations of data as a byproduct of COVID-19
  • Initial Analysis focused on four regions and three age groups
  • Shifting from prevalence estimates to raw totals in hundreds of thousands for AMI

7 of 17

Data Transformation

  • How to best organize data to allow for optimal use of ML Model
  • Working from and modifying high level summary table to avoid wide data
  • Additional years
  • Organizing by Year rather than by State
  • Avoiding compromising analysis by expanding search

8 of 17

Database Process

  • Postgres SQL as our database to store final AMI and socio-economic tables
  • SQLAlchemy facilitated connection for dataframes from JN file to postgres local server

9 of 17

MACHINE LEARNING MODEL SELECTION

  • EDA was performed to get insight into the dataset.

  • Supervised Linear regression was chosen to predict the discrete nature of our numeric output feature

10 of 17

FEATURE ENGINEERING & SELECTION

11 of 17

Find the Best Performing Model using GridSearchCV

Linear Regression

DecisionTreeRegressor

GradientBoostingRegressor

12 of 17

REGRESSION MODEL TESTING

Model Name

Data Processing

R^2 Score

Multivariate Linear Regression

X_train, X_test, y_train, y_test = train_test_split(X,y, test_size = .25, random_state = 42)

Testing Score: 0.89

Gradient Boosting Regressor

gbr_params = {'n_estimators': 1000,

'max_depth': 3,

'min_samples_split': 5,

'learning_rate': 0.01,

'loss': 'ls'}

Testing Score: 0.95

Ridge Regression

Ridge_regressor = GridSearchCV(ridge, parameters, scoring = ‘r2’, cv=5)

Testing Score: 0.49

13 of 17

Gradient Boosting Regressor Predictions vs Actual Values

Predictions vs Actual Data

14 of 17

Interactive Tableau Dashboard

15 of 17

Conclusion

16 of 17

  • Find the total cases for different types of mental Illnesses per state’s county level

  • Collect more data for the covid-19 cases after two years

  • Population and geographic factors could be included

17 of 17

Things that We Have Done Differently

  • Added more mental illnesses to compare the increase in a certain type as the impact of Covid-19
  • Going back to as many as five more years to get the AMI cases
  • Create a web app to deploy our machine learning model for end users to predict the outcomes.