1
Applied Data Analysis (CS401)
Maria Brbić
Lecture 5
Regression for disentangling data
08 Oct 2025
Announcements
2
Feedback
3
Give us feedback on this lecture here: https://go.epfl.ch/ada2024-lec5-feedback
4
Linear
regression
Credits
5
What you should already know about linear regression
6
POLLING TIME
Linear regression as you know it
7
Scalar product (a.k.a. dot product) of 2 vectors
Example with one predictor
8
X
y
𝛽1: intercept
𝛽2: slope
y ≈ 𝛽1 + 𝛽2X
8
Linear regression as you know it
9
?
Optimality criterion: least squares
10
Use cases of regression
11
Regression as comparison of�average outcomes
12
Example with one binary predictor Xi
yi = 𝛽1 + 𝛽2Xi + 𝜖i .
kid_score = 78 + 12 · mom_hs + error
13
0
1
mom_hs
kid_score
20 60 100 140
No
Yes
mean kid_score for moms who didn’t finish high school: 78
mean kid_score for moms who finished high school: 78 + 12 = 90
One binary predictor Xi:�Interpretation of fitted parameters 𝛽
yi = 𝛽1 + 𝛽2Xi + 𝜖i .
14
One binary predictor Xi:�Interpretation of fitted parameters 𝛽
yi = 𝛽1 + 𝛽2Xi + 𝜖i .
15
So why not just compute the two means separately and then compare them?
What a mean monkey!
Example with one continuous predictor Xi
yi = 𝛽1 + 𝛽2Xi + 𝜖i .
kid_score = 26 + 0.6 · mom_iq + error
16
mom_iq
kid_score
0 50 100
estimated (hypothetical) mean kid_score for moms with IQ = 0: 26
estimated mean kid_score for moms with IQ = 100: 26 + 0.6 · 100 = 86
0 50 100 150
One continuous predictor Xi:�Interpretation of fitted parameters 𝛽
yi = 𝛽1 + 𝛽2Xi + 𝜖i .
17
Example with multiple predictors
yi = 𝛽1 + 𝛽2Xi2 + 𝛽3Xi3 + 𝜖i .
kid_score = 26 + 6 · mom_hs + 0.6 · mom_iq + error
No
Yes
18
Example with multiple predictors
kid_score = 26 + 6 · mom_hs + 0.6 · mom_iq + error
mom_iq
kid_score
20 60 100 140
80 100 120 140
kids of moms who didn’t finish high school:
intercept = 26
slope = 0.6
kids of moms who finished high school:
intercept = 26 + 6 = 32
slope = 0.6
19
Example with interaction of predictors
yi = 𝛽1 + 𝛽2Xi2 + 𝛽3Xi3 + 𝛽4Xi2Xi3 + 𝜖i .
kid_score = −11 + 51 · mom_hs + 1.1 · mom_iq − 0.5 · mom_hs · mom_iq + error
No
Yes
20
Example with interaction of predictors
kid_score = −11 + 51 · mom_hs + 1.1 · mom_iq − 0.5 · mom_hs · mom_iq + error
mom_iq
kid_score
20 60 100 140
80 100 120 140
kids of moms who didn’t finish high school:
intercept = −11
slope = 1.1
kids of moms who finished high school:
intercept = −11 + 51 = 40
slope = 1.1 − 0.5 = 0.6
21
So why not just compute the two means separately and then compare them?
22
So why not just compute the two means separately and then compare them?
avg kid_score 90 | avg kid_score 90 |
avg kid_score 78 | avg kid_score 78 |
Mom finished high school
Mom�didn’t finish high school
Mom drives Mercedes
Mom doesn’t drive Mercedes
990 women | 10 women |
10 women | 990 women |
Mom finished high school
Mom�didn’t finish high school
Mom drives Mercedes
Mom doesn’t drive Mercedes
23
24
THINK FOR A MINUTE:
What is the mean outcome for Mercedes-driving moms vs. for non-Mercedes-driving moms?�Compare the two means! What does the comparison tell you about the link between Mercedes-driving and kid_score?
(Feel free to discuss with your neighbor.)
mean kid_score 90 | mean kid_score 90 |
mean kid_score 78 | mean kid_score 78 |
Mom finished high school
Mom�didn’t finish high school
Mercedes
990 women | 10 women |
10 women | 990 women |
Mom finished high school
Mom�didn’t finish high school
No Mercedes
Mercedes
No Mercedes
Aha!
25
Course eval (“indicative feedback”) open until �Sun Oct 12th �Go to https://isa.epfl.ch now!
26
Quantifying uncertainty
27
Quantifying uncertainty
p-value: probability of estimating such an extreme coefficient if the true coefficient were zero�(= null hypothesis)
Aha!
28
Residuals and R2
Variance of�outcomes y
29
residual
Residuals and R2
Variance of�outcomes y
Aha!
30
Coefficient of determination: R2
R2 = 0.147
R2 = 0.865
31
Coefficient of determination: R2
32
Coefficient of determination: R2
R2 = 0.67 everywhere!
33
Assumptions made in regression modeling
34
Assumptions for regression modeling
35
Assumptions for regression modeling (2)
��But very flexible: we require linearity in predictors (not necessarily in raw inputs); predictors can be arbitrary functions of raw inputs, e.g.,�- logarithms, polynomials, reciprocals, … �- interactions (i.e., products) of multiple inputs�- discretization of raw inputs, coded as indicator variables
36
Assumptions for regression modeling (3)
less important�in practice
37
Transformations of predictors and outcomes
38
Transformations of predictors
39
Mean-centering of predictors
-100 -50 0 50
mean kid_score for moms with mean IQ: 86
0 50 100
mom_iq
kid_score
0 50 100
(hypothetical) mean kid_score for moms with IQ = 0: 26
0 50 100 150
40
After mean-centering of predictors, …
… you have a convenient interpretation of coefficients 𝛽j of main predictors (i.e., non-interaction predictors):
41
Standardization via z-scores
42
Logarithmic outcomes
43
Logarithmic outcomes: Interpreting coefficients
44
Going beyond linear regression for comparing means
45
Beyond linear regression:�generalized linear models
46
Beyond comparing means; or, A taste of causality: “Difference in differences”
47
Beyond comparing means; or, A taste of causality: “Difference in differences” (2)
48
a
b
c
d
Beyond comparing means; or, A taste of causality: “Difference in differences” (2)
49
a
b
c
d
What a treat!
Summary
50
Feedback
51
Give us feedback on this lecture here: https://go.epfl.ch/ada2025-lec5-feedback
Credits
52
Bonus: Logarithmic outcomes and predictors
Interpretation of coefficient of logarithmic predictor:
53