STAT-155 Quiz 1 Review
9/21/2025
Overview
Descriptive Statistics
Plots
Simple Linear Regression
Interpretations
Model Evaluation
Transformations (Optional)
1
2
3
4
5
6
National Health and Nutrition Examination Survey
Data Context
Descriptive Statistics
Standard Deviation
Mean
Median
The average. Sensitive to outliers
The middle. Resistant to outliers
Measure of variation from the mean
Descriptive Statistics
Range
Maximum
Minimum
The maximum number for a variable
The minimum number for a variable
Max - Min. Measure of spread in our data
Simple Linear Regression
E[DaysMentHlthBad | SleepHrsNight] = 10.99461 - 0.98457(SleepHrsNight)
Intercept Interpretation
On average, we expect Y to be 𝛽0 y-units for groups with X = 0.
(Intercept) 10.99461
SleepHrsNight -0.98457
Intercept Interpretation
On average, we expect Y to be 𝛽0 y-units for groups with X = 0.
On average, we expect an individual to
report 10.99461 bad mental health days within
the last 30 days for those that get 0 hours of
sleep a night.
Slope Interpretation
On average, we expect a 1 x-unit increase in X to be associated with a 𝛽1 y-unit increase in Y.
(Intercept) 10.99461
SleepHrsNight -0.98457
Slope Interpretation
On average, we expect a 1 x-unit increase in X to be associated with a 𝛽1 y-unit increase in Y.
On average, we expect a 1 hour increase in
hours of sleep a night to be associated with
a 0.98457 day decrease in the amount of
reported bad mental health days.
Intercept Interpretation - Categorical Predictor
On average, we expect Y to be 𝛽0 y-units for groups that are the reference category.
(Intercept) 30445 (8th grade edu. reference)
Intercept Interpretation - Categorical Predictor
On average, we expect Y to be 𝛽0 y-units for groups that are the reference category.
(Intercept) 30445 (8th grade edu. reference)
On average, we expect those that have an
8th grade education to have an average HH
Income of 30445.
Slope Interpretation - Categorical Predictor
On average, we expect a difference between the group we’re looking at and the reference category to be associated with a 𝛽1 y-unit increase in Y.
Education High School 17403 (8th grade edu. reference)
Slope Interpretation - Categorical Predictor
On average, we expect a difference between the group we’re looking at and the reference category to be associated with a 𝛽1 y-unit increase in Y.
Education High School 17403 (8th grade edu. reference)
On average, we expect those that have a high
school education to have a HH income that is
17403 dollars higher than those that have a 8th
grade education.
Model Evaluation - R2
The percentage of variation in Y that can be explained by the variation in X
Multiple R-Squared: 0.02823
Model Evaluation - R2
The percentage of variation in Y that can be explained by the variation in X
2.8% of the variation in reported days
of bad mental health within the last 30
days can be explained by the variation
in the reported hours of sleep a night.
Model Evaluation - Residual/Fitted Plots
Is this model wrong? strong? fair?
Is this model wrong? strong? fair? R^2 =0.5609
Model Evaluation - Residual/Fitted Plots
Is this model wrong? strong? fair? R^2 =0.5609
Wrong - Answers may vary. Line is mostly centered on .resid = 0, but the predictions get crazy at about .fitted = 60.
Strong: Depends on context. R^2 is 0.56, so the model does an okay job at the very least.
Fair: Depends on data. What’s going on in the green circle? Red circle?
Transformations - Location
The intercept is roughly -92. If the minimum height is about 100cm, what would a logical transformation be?
Transformations - Location
Intercept is now positive, which in this context makes it meaningful.
Transformations - Scale
Notice how the scale on the X axis is from 0-1. A “one unit increase” would only be relevant for schools with a 0% and a 100% admission rate. What would a logical transformation be?
Transformations - Scale
Multiplying both graduation and admissions rates by 100 would make our slope easier to interpret.
Transformations - Log
Data is not linear, what transformation would be logical?
Transformations - Log
It turns out that if we log the x-axis in this case, the data becomes much more linear.