Linear mixed models
BIOM285
May 16 2025
Outline
Linear mixed models - Friday May 16
Biological examples using mixed models - Mon May 19
Generalized linear models
Generalized linear models (GLMs) allow us to define a linear relationship between one or more predictor variables and an outcome variable that can follow a variety of distributions
A key assumption of GLMs is that the effect is ‘constant’ across samples for each predictor - which is also known as a ‘fixed effect’
Linear models
The ‘intercept’ of the line is 10 - because we start at hour 0 with a happiness of 10
The ‘slope’ of the line is -0.5 because we lose 0.5 units of happiness per hour
We can write the model for this line as:
happiness = 10 + -0.5 * Zoom hours
happiness = 10 + -0.5 * 5 = 7.5
y = b0 + b1*x
b0 - intercept, b1 - slope
In a linear model the effects are the same across individuals - i.e. assumed that everyone decreases in happiness at the same rate
Limitations of GLMs
The constraint of fixed effects for each predictor is a limitation of GLMs that prohibit their application to certain questions
For example:
- Repeated measures such as across a time course
- Longitudinal measurements from a cohort
- Technical replicates of measurements within independent samples
- Experimental measurements of varying effect within blocks/groups
- ‘Pseudo-replication’, e.g. profiling many cells in a sample, such as from single cell genomics or imaging
Overcoming these limitations
For these and other questions we can use linear mixed models (aka mixed-effect models)
Linear mixed models contain a mixture of effects including both ‘fixed’ effects and ‘random’ effects to account for additional patterns, or non-independence, in aspects of your data
Can enable modeling nested - or hierarchical - measurements organized in groups, e.g. within a sample or another variable that is categorical
What are random effects?
Overcoming these limitations
Fixed effect:
Colors represent different groups of a categorical variable
- in this case university department
Overcoming these limitations
Random intercept:
Overcoming these limitations
Random slope:
Overcoming these limitations
Random intercept and slope:
Random effects
Random effects can vary by intercept, by slope, or both:
Mixed/random effects
Mixed effects can vary by intercept, by slope, or both:
The choice of whether to vary the intercept or the slope or both depends on the question being asked
Fully random effects is most flexible and generalizable (and in that sense preferred) but requires more levels/data to accurately estimate parameters per group
Mixed/random effects
Mixed effects can vary by intercept, by slope, or both:
What is an example of where a random intercept and fixed slope might be appropriate?
Mixed/random effects
Mixed effects can vary by intercept, by slope, or both:
What is an example of where a fixed intercept and random slope might be appropriate?
Creating mixed effects models
Mixed effects can vary by intercept, by slope, or both:
The ‘lmer’ function in the package lme can be used to fit mixed-effects linear models
For linear regression/GLMs - the syntax for predictor x and response y with a fixed effect is: y ~ x
For mixed models, the syntax for considering a fixed and random effect is:
y ~ x + (1|z) fixed effect x, random intercept for z
y ~ x + (0 + x|z) fixed effect x, random slope of x within z
y ~ x + (x|z) fixed effect x, random effect of x within z
Example of where linear mixed models can be effective
What is the effect of increasing doses of a new drug on weight of mice, when using different strains of mice
model <- lm(Weight ~ Dose,data=mouse_data)
summary(model)
Call:
lm(formula = Weight ~ Dose, data = mouse_data)
Residuals:
Min 1Q Median 3Q Max
-5.6239 -1.5714 -0.3714 2.1863 4.8653
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 26.9636 0.9371 28.77 < 2e-16 ***
Dose -2.4387 0.2949 -8.27 8.63e-11 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 2.711 on 48 degrees of freedom
Multiple R-squared: 0.5876, Adjusted R-squared: 0.579
F-statistic: 68.4 on 1 and 48 DF, p-value: 8.627e-11
Example of where linear mixed models can be effective
What is the effect of increasing doses of a new drug on weight of mice? Consider that there are multiple strains of mice
library(lme4)
lmm <- lmer(Weight ~ Dose + (1|Strain),data=mouse_data)
Linear mixed model fit by REML ['lmerMod']
Formula: Weight ~ Dose + (1 | Strain)
Data: mouse_data
REML criterion at convergence: 156.5756
Random effects:
Groups Name Std.Dev.
Strain (Intercept) 2.842
Residual 0.939
Number of obs: 50, groups: Strain, 5
Fixed Effects:
(Intercept) Dose
25.839 -2.051
How can we model the effect of drug dose on weight considering different starting weights of each strain?
Example of where linear mixed models can be effective
What is the effect of increasing doses of a new drug on weight of mice? Consider that there are multiple strains of mice
coef(lmm)
$Strain
(Intercept) Dose
m1 27.93008 -2.050958
m2 25.17448 -2.050958
m3 22.13342 -2.050958
m4 24.65351 -2.050958
m5 29.30412 -2.050958
How can we model the effect of drug dose on weight considering different starting weights of each strain?
Example of where linear mixed models can be effective
What is the effect of increasing doses of a new drug on weight of mice? Consider that there are multiple strains of mice
library(lme4)
lmm2 <- lmer(Weight ~ Dose + (Dose|Strain),data=mouse_data)
Linear mixed model fit by REML ['lmerMod']
Formula: Weight ~ Dose + (Dose | Strain)
Data: mouse_data
REML criterion at convergence: 153.4312
Random effects:
Groups Name Std.Dev. Corr
Strain (Intercept) 3.4405
Dose 0.2004 -1.00
Residual 0.9053
Number of obs: 50, groups: Strain, 5
Fixed Effects:
(Intercept) Dose
25.795 -2.052
optimizer (nloptwrap) convergence code: 0 (OK) ; 0 optimizer warnings; 1 lme4 warnings
How can we model the effect of drug dose on weight considering different starting weights of each strain and different effects within each strain?
Example of where linear mixed models can be effective
What is the effect of increasing doses of a new drug on weight of mice? Consider that there are multiple strains of mice
coef(lmm2)
$Strain
(Intercept) Dose
m1 28.37558 -2.202011
m2 25.06797 -2.009368
m3 21.20428 -1.784338
m4 24.43676 -1.972606
m5 29.88961 -2.290192
attr(,"class")
[1] "coef.mer"
How can we model the effect of drug dose on weight considering different starting weights of each strain and different effects within each strain?
Example of where linear mixed models can be effective
What is the effect of increasing doses of a new drug on weight of mice? Consider that there are multiple strains of mice
library(lme4)
lmm2 <- lmer(Weight ~ Dose + (Dose|Strain),data=mouse_data)
Linear mixed model fit by REML ['lmerMod']
Formula: Weight ~ Dose + (Dose | Strain)
Data: h
REML criterion at convergence: 153.4312
Random effects:
Groups Name Std.Dev. Corr
Strain (Intercept) 3.4405
Dose 0.2004 -1.00
Residual 0.9053
Number of obs: 50, groups: Strain, 5
Fixed Effects:
(Intercept) Dose
25.795 -2.052
optimizer (nloptwrap) convergence code: 0 (OK) ; 0 optimizer warnings; 1 lme4 warnings
Lower REML is higher likelihood - fully random effects in this case is a better model compared to the random intercept model which has higher REML
Example of where linear mixed models can be effective
What about data which contains both biological and technical replicates?
Here we measure enzymatic activity across 10 samples and want to test for association to a disease phenotype, where we perform 5 technical replicates of the experiments for each sample
md <- lm(enz$Value2 ~ enz$Value)
summary(md)
Call:
lm(formula = d$Value2 ~ d$Value)
Residuals:
Min 1Q Median 3Q Max
-4.3232 -1.5239 -0.0828 1.7889 4.0199
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -2.56206 2.30423 -1.112 0.272
d$Value 0.47312 0.06109 7.745 5.36e-10 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 2.33 on 48 degrees of freedom
Multiple R-squared: 0.5555, Adjusted R-squared: 0.5462
F-statistic: 59.98 on 1 and 48 DF, p-value: 5.364e-10
Enzymatic activity
Phenotype
Example of where linear mixed models can be effective
What about data which contains both biological and technical replicates?
We could ‘collapse’ the technical replicates (average values) and that would allow us to use linear regression
summary(md)
Call:
lm(formula = enz_mean$Enzyme ~ enz_mean$Pheno)
Residuals:
Min 1Q Median 3Q Max
-5.142 -2.780 -1.456 2.427 9.973
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 19.7986 2.3410 8.457 4.52e-11 ***
d3$Pheno 1.1611 0.1511 7.683 6.67e-10 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 3.687 on 48 degrees of freedom
Multiple R-squared: 0.5515, Adjusted R-squared: 0.5422
F-statistic: 59.02 on 1 and 48 DF, p-value: 6.666e-10
Enzymatic activity
Phenotype
Example of where linear mixed models can be effective
What about data which contains both biological and technical replicates?
Or we could model the replicate experiments using a random effect
md2 <- lmer(Enzyme ~ Pheno + (1|Sample),data=enz)
summary(md2)
Linear mixed model fit by REML ['lmerMod']
Formula: Enzyme ~ Pheno + (1 | Sample)
Data: d3
REML criterion at convergence: 271.8
Scaled residuals:
Min 1Q Median 3Q Max
-1.3948 -0.7540 -0.3951 0.6584 2.7051
Random effects:
Groups Name Variance Std.Dev.
Sample (Intercept) 0.00 0.000
Residual 13.59 3.687
Number of obs: 50, groups: Sample, 5
Fixed effects:
Estimate Std. Error t value
(Intercept) 19.7986 2.3410 8.457
Pheno 1.1611 0.1511 7.683
Correlation of Fixed Effects:
(Intr)
Pheno -0.975
Enzymatic activity
Phenotype
Example of where linear mixed models can be effective
Here we collect data on the effects of a new drug on weight gain in a cohort at a hospital clinic
m1 <- lm(Value ~ Dose,data=clinic_data)
summary(m1)
Call:
lm(formula = Value ~ Dose, data = v)
Residuals:
Min 1Q Median 3Q Max
-79.02 -23.39 8.50 18.39 71.03
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 126.64 14.97 8.460 2.62e-13 ***
Dose 149.23 38.53 3.873 0.000194 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 32.94 on 98 degrees of freedom
Multiple R-squared: 0.1328, Adjusted R-squared: 0.1239
F-statistic: 15 on 1 and 98 DF, p-value: 0.0001938
Example of where linear mixed models can be effective
Here we collect data on the effects of a new drug on weight gain in a cohort at a hospital clinic - what if we look at samples separated by gender? The effects are very different
m2 <- lmer(Value ~ Dose + (Dose|Gender),data=clinic_data)
summary(m2)
Linear mixed model fit by REML ['lmerMod']
Formula: Value ~ Dose + (Dose | Gender)
Data: v
REML criterion at convergence: 907.8
Scaled residuals:
Min 1Q Median 3Q Max
-2.2807 -0.4143 0.0302 0.3313 3.5559
Random effects:
Groups Name Variance Std.Dev. Corr
Gender (Intercept) 4532 67.32
Dose 58513 241.89 -1.00
Residual 564 23.75
Number of obs: 100, groups: Gender, 2
Fixed effects:
Estimate Std. Error t value
(Intercept) 126.64 48.81 2.594
Dose 149.23 173.29 0.861
coef(m2)
$Gender
(Intercept) Dose
F 173.98091 -20.87315
M 79.29753 319.34242
Other examples of linear mixed models
In this example we determine whether levels of a new biomarker of interest are associated with a quantitative measure of disease
mod <- lm(Disease ~ Biomarker,data=biomarker_data)
summary(mod)
Call:
lm(formula = Disease ~ Biomarker, data = biomarker_data)
Residuals:
Min 1Q Median 3Q Max
-2.40826 -0.90334 -0.04954 0.64218 2.40477
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 130.5009 9.0963 14.347 <2e-16 ***
Biomarker -0.2382 0.1150 -2.071 0.0429 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 1.254 on 58 degrees of freedom
Multiple R-squared: 0.06883, Adjusted R-squared: 0.05278
F-statistic: 4.287 on 1 and 58 DF, p-value: 0.04286
Other examples of linear mixed models
How about if we account for the cohort that the samples come from? The pattern is opposite, as the effects are positive within cohort - what is this phenomenon called?
mod2 <- lmer(Disease ~ Biomarker + (Biomarker|Cohort),data=biomarker_data)
summary(mod2)
Linear mixed model fit by REML ['lmerMod']
Formula: Disease ~ Biomarker + (Biomarker | Cohort)
Data: r
REML criterion at convergence: 137.4
Scaled residuals:
Min 1Q Median 3Q Max
-2.21860 -0.76260 -0.05759 0.83586 2.32448
Random effects:
Groups Name Variance Std.Dev. Corr
Cohort (Intercept) 0.6711521 0.81924
Biomarker 0.0001034 0.01017 0.98
Residual 0.4601512 0.67834
Number of obs: 60, groups: Cohort, 3
Fixed effects:
Estimate Std. Error t value
(Intercept) 83.90322 6.49943 12.909
Biomarker 0.35127 0.08219 4.274
Other examples of linear mixed models
In this example we determine whether levels of a new biomarker of interest are associated with a quantitative measure of disease
coef(mod2)
$Cohort
(Intercept) Biomarker
A 83.01421 0.3402402
B 84.08135 0.3534811
C 84.61410 0.3600939
Repeated measures and longitudinal data
Both are examples of multiple measurements performed within a sample over short or long periods of time
In this example we collected measurements of abundance of a protein biomarker on repeated visits every year - do the levels of the biomarker vary linearly with year?
mod2 <- lmer(Value ~ Year + (Year|Sample),data=rmdata)
summary(mod2)
Linear mixed model fit by REML ['lmerMod']
Formula: Value ~ Year + (Year | Sample)
Data: t
REML criterion at convergence: 95.2
Scaled residuals:
Min 1Q Median 3Q Max
-1.51418 -0.62538 -0.02284 0.42124 1.62411
Random effects:
Groups Name Variance Std.Dev. Corr
Sample (Intercept) 27.089 5.2048
Year 1.805 1.3436 -0.56
Residual 0.200 0.4472
Number of obs: 30, groups: Sample, 6
Fixed effects:
Estimate Std. Error t value
(Intercept) 48.4253 2.1334 22.698
Year 1.5750 0.5516 2.856
Effects of year on value are ~1.5 increase per year
Some samples have an effect some don’t, but the overall effect is positive
Diagnosing linear mixed effects models
Residuals (error) from model are normally distributed and centered around 0
No clear relationship in residuals across the distribution of fitted values
Hypothesis testing with linear mixed models
The LMM will estimate the fixed and random effects but not provide significance testing for whether these effects are different than null expectation (i.e. true effect=0)
One way to perform hypothesis testing of the fixed effect using LMMs is to perform model comparison of models with and without the fixed effect variable
From the previous example, is year a significant predictor of biomarker levels?
Biomarker levels increase by
~1.5 units/year
mod2 <- lmer(Value ~ Year + (Year|Sample),data=rmdata)
summary(mod2)
. . . .
Fixed effects:
Estimate Std. Error t value
(Intercept) 48.4253 2.1334 22.698
Year 1.5750 0.5516 2.856
Hypothesis testing with linear mixed models
The LMM will estimate the fixed and random effects but not provide significance testing for whether these effects are different than null expectation (effect=0)
One way to perform hypothesis testing of the fixed effect using LMMs is to perform model comparison of models with and without the fixed effect variable
From the previous example, is year a significant predictor of biomarker levels?
Biomarker levels increase by
~1.5 units/year
Can reject the null (p=.01) and conclude that yes year affects biomarker levels
library(lrtest)
mod2 <- lmer(Value ~ Year + (Year|Sample),data=rmdata)
mod3 <- lmer(Value ~ (Year|Sample),data=rmdata)
lrtest(mod3,mod2)
Likelihood ratio test
Model 1: Value ~ (Year | Sample)
Model 2: Value ~ Year + (Year | Sample)
#Df LogLik Df Chisq Pr(>Chisq)
1 5 -50.775
2 6 -47.596 1 6.3578 0.01169 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Hypothesis testing with linear mixed models
The LMM will estimate the fixed and random effects but not provide significance testing for whether these effects are different than null expectation (effect=0)
Another way to perform hypothesis testing of the fixed effect using LMMs is to use the lmertest package (same function lmer) which provides significance testing
From the previous example, is year a significant predictor of biomarker levels?
Biomarker levels increase by
~1.5 units/year
Can reject the null (p=.01) and conclude that yes year affects levels
mod5 <- lmer(Value ~ Year + (Year|Sample),data=rmdata)
summary(mod5)
Linear mixed model fit by REML. t-tests use Satterthwaite's method ['lmerModLmerTest']
Formula: Value ~ Year + (Year | Sample)
Data: t
REML criterion at convergence: 95.2
Scaled residuals:
Min 1Q Median 3Q Max
-1.51418 -0.62538 -0.02284 0.42124 1.62411
Random effects:
Groups Name Variance Std.Dev. Corr
Sample (Intercept) 27.089 5.2048
Year 1.805 1.3436 -0.56
Residual 0.200 0.4472
Number of obs: 30, groups: Sample, 6
Fixed effects:
Estimate Std. Error df t value Pr(>|t|)
(Intercept) 48.4253 2.1334 4.9999 22.698 3.09e-06 ***
Year 1.5750 0.5516 4.9996 2.856 0.0356 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Linear mixed models with different outcome distributions
Just as GLMs are a generalization of LMs, we can use GLMMs to model outcomes that follow different distributions
For example, gene count data from cells (e.g. scRNA-seq) - cells are ‘pseudo-replicates’ within each sample and likely not truly independent measurements
Counts
n=50 cells/sample
Gene count in cells for s4:
Gene count in cells for s5:
Linear mixed models with different outcome distributions
Just as GLMs are a generalization of LMs, we can use GLMMs to model outcomes that follow different distributions
For example, gene count data from cells (e.g. scRNA-seq)
n=50 cells/sample
out <- glmer(counts ~ disease + (1|sample), data=tab, family="poisson")
summary(out)
Generalized linear mixed model fit by maximum likelihood (Laplace Approximation) ['glmerMod']
Family: poisson ( log )
Formula: counts ~ disease + (1 | sample)
Data: tab
AIC BIC logLik deviance df.resid
1637.8 1650.4 -815.9 1631.8 497
Scaled residuals:
Min 1Q Median 3Q Max
-1.7608 -1.0119 -0.0238 0.5116 4.4865
Random effects:
Groups Name Variance Std.Dev.
sample (Intercept) 1.121e-05 0.003348
Number of obs: 500, groups: sample, 10
Fixed effects:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 1.13140 0.03600 31.43 <2e-16 ***
diseaseND -1.10769 0.07212 -15.36 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Effect of disease on gene counts:
Beta = 1.1
exp(1.1) -> 3
3-fold change in counts in disease
Assumptions/limitations of linear mixed models
Linear relationship between predictors and outcome variable
Residuals (error) are normally distributed and have constant variance
The random effects can only consider effects with categorical variables, and ideally you want the categories to contain >2-3 levels in order to estimate the variance accurately
Summary
Linear regression and GLMs are limited by the effects of each predictor being fixed across samples/units
Mixed-effects linear models can estimate both a fixed effect as well as random effects within a unit/group and enable considering both independent and non-independent data
Can capture nested patterns within groups including those that may not be apparent - or even opposite - in the overall data
Mixed-effects models can be generalized to model outcome data following different distributions in the same manner as GLMs