1 of 20

Cardiovascular Disease Project

Azuka Atum

2 of 20

CVD, also known as cardiovascular disease, or colloquially known as “heart disease” is a prominent illness that impacts the heart and vascular system throughout the body. It affects many Americans nationally, and is the leading cause of death for Americans annually[1].

According to the CDC, one person dies every 34 seconds from heart disease[2]. It is said that up to 80% of CVD can be prevented via changes to modifiable risk factors, of which high blood pressure, high cholesterol, cigarette smoking, diet, physical activity, and obesity are a part [3].

Source:

Background: Context

Figure 1. Heat map of Heart Disease death rate by county.

3 of 20

There are many different types of cardiovascular disease, a major one being stroke, which is defined as a blockage of a blood vessel in the brain, and hypertension. Additionally, 1 in 6 deaths from cardiovascular disease was from stroke in 2020[4].

There are many factors that can impact risk, and there is a question of whether an immutable characteristic like height can impact odds of developing cardiovascular disease, in combination with modifiable risk factors such as exercise or BMI.

According to a study performed by Samara, Elrick, and Storms, “..the results…indicate that shorter people have substantially lower rates of CHD mortality and moderately lower levels of stroke mortality. For example…shorter ethnic groups vs taller groups in California had substantially lower mortality rates[5].

Background: Context

Figure 2. Stroke diagram.

4 of 20

Background: Dataset

  • https://www.kaggle.com/datasets/aiaiaidavid/cardio-data-dv13032020
  • Contains variables:
  • 68,783 values , 12 columns. Data set was originally 70,000 values but cleaning narrowed it down.
  • .csv file
  • Obtained from patient data, however specific source of data is unknown

Figure 3. Variables list contained within dataset.

5 of 20

Background: Dataset Summary

Table 1. Means and standard deviation by gender and overall figures.

6 of 20

Background: Dataset Summary

  • Variables fit the normality assumption
    • They are bell-curve shaped, although they do have a skew

Figure 4a-f. Histograms of: a)SBP, b) DBP, c) Height, d) Weight, e) BMI, f) age.

a

b

c

d

e

f

7 of 20

Analysis: Definitions

  • Body Mass Index (BMI)
    • The data contained height and weight, so I decided to collapse them into a new variable called BMI which is calculated as weight/[height(m)]^2.
  • I re-coded the following: AP_High, AP_LOW, CVD , AGE
    • AP_HIGH became Systolic Blood Pressure (SBP)
    • AP_LOW became Diastolic Blood Pressure (DBP)
    • CVD became “Yes” and “No”.
  • Re-coding SBP/DBP allowed me to create a categorization of blood pressure readings according to official literature for ease of analysis.
    • Trade-off is you lose degrees of freedom for statistical tests

Figure 5. Blood pressure chart adapted from the American Heart Association.

8 of 20

Analysis: Definitions

  • Body Mass Index
    • The data contained height and weight, so I decided to collapse them into a new variable called BMI (body mass index) which is calculated as weight(kg)/[height(m)]^2.
    • Levels:
      • 18.5> Underweight
      • 18.5-24.9 Normal
      • 25.0-29.9 Overweight
      • 30.0+ Obese range
  • Age:
    • 6 categories:
      • 1 = "<30"
      • 2 = "30-37"
      • 3 = "37-44"
      • 4 = "44-51"
      • 5 = "51-58"
      • 6 = "58-65";

Figure 6. BMI chart adapted from the American Heart Association.

9 of 20

Proposal

Do tall people who work out, have a “overweight” body mass index, and high blood pressure have more odds to develop CVD? Does it differ by gender?

10 of 20

Analysis: Parameters

  • For the purpose of this analysis, we will arbitrarily set tallness to be heights that are more than two standard deviations from the mean, for the overall statistic, and also by gender, in the dataset.
    • Following the empirical rule, 65-95-99.7, 95% of the data should fall within 2 standard deviations of the mean. This means that the rest of the data are very rare.
  • “High blood pressure” is defined as SBP 130-139 or DBP 80-89.
  • Cardiovascular disease is a binary response variable, where 0 is “No” and 1 is “Yes”.
  • Gender is a binary predictor, where 1 is “female and 2 is “male”.
  • Alpha-level is 0.05.

11 of 20

Analysis: Parameters (Code)

Figure 7 a-e. Code depicting variable changes and logistic regression analysis.

a

b

c

d

e

12 of 20

Analysis: Results

  • The odds of someone who is tall having heart disease is 0.898 times less that of someone who is not tall. The probability of having CVD given you are tall is 0.47 or 47%. It was found to be statistically significant.
  • The probability of having CVD given that you are hypertensive is 0.65. The odds of someone developing cvd is 1.865 times greater than those with normal blood pressure.
  • The probability of developing CVD given you work out is 0.44, or the odds are 0.795 times less than if you didn’t.
  • The odds of having CVD was 3.1 times greater for the 37-44 age group than the 58-65 age group.

log[p/(1-p)] = -0.054*HH + 1.1533*BP + 0.0629*BMI + 0.1153*Alcohol + 0.07*Smoke - 0.5067*Cholesterol -0.11*Physical_Activity + 0.099*Glucose -0.0542*Gender + 2.6102

Table 2. MLE estimates of each variable.

13 of 20

Analysis: Results

  • The odds of having CVD given you have an “overweight” BMI is 1.214. Your probability of CVD in the presence of this predictor is 0.55.
  • The odds of a woman having CVD given the predictors is 1.056 times greater than that of a man, based on the dataset. The chance of having it is 0.51 or 51%.

Table 3. Odds ratios of each comparison group.

14 of 20

Analysis: Results

  • The area under the ROC curve is 0.7783, meaning there is a 78% chance the model can distinguish between those who have CVD and those who do not.

Figure 8. ROC curve.

15 of 20

Conclusion: Discussion

So, if you’re a woman between the ages of 37-44 and you are tall, exercise, hypertensive, and overweight, you are more likely to have CVD, than a man with the same characteristics. The analysis seemed to show that height had a protective effect overall.

One way we can reduce incidence of CVD in this group is by creating targeted programs for the population.

16 of 20

Conclusion: Limitations

  • All the potential predictors that impact risk of CVD are not listed in the dataset.
  • Because the country of origin for the data was not listed, it was hard to compare values.
  • The average age of the participants in the data was 53 (years) 11 (days), and it was collected in a clinical setting. People who are hospitalized tend to have more serious disease, older people tend to experience more frequent hospitalizations, so the results are not that generalizable, only for this specific group.
  • Tallness taken to be greater than 2 standard deviations means that only about 2.1% of height the data is represented.

17 of 20

Questions

  • Q & A

18 of 20

Appendix

19 of 20

Appendix

20 of 20

END