1
Applied Data Analysis (CS401)
Maria Brbic
Lecture 14
ADA in action
17 Dec 2025
Announcements
2
Course evaluation available on Moodle!
3
Course evaluation
Today we will
show how everything you’ve learned in lectures 1–13 comes together in one project
Paper available at https://doi.org/10.1073/pnas.2106152118
4
5
Philip Seymour Hoffman † 2014
Amy Winehouse † 2011
Questions
6
#6 Observational studies
Why should we care about postmortem collective memory?
Fact: Humans care a lot about being remembered after death
7
Ara Pacis, Rome
Damnatio memoriae�[Wikipedia]
An ADA approach
Let’s use lots of data and count stuff!
8
Stuffed Count von Count counting stuff
Part 1:
Getting the data
9
The raw data:
10
#12 Scaling to
massive data
Detecting news and social media in Spinn3r
11
#2 Handling data
Data volume: Number of docs per day
#3 Visualizing data
Detecting mentions of people names
Wikipedia-based solution:
13
The city of Gary is known as the birthplace of singer Michael Jackson.
The city of Gary is known as the birthplace of singer Michael Jackson.
The city of Gary is known as the birthplace of singer Michael Jackson.
The city of Gary is known as the birthplace of singer Michael Jackson.
Beer and whisky expert Michael Jackson was born in Leeds.
…
#2 Handling data
#13 Handling network data
#10+11 Handling text data
Recruiting the army of the dead:
14
#2 Handling data
#13 Handling network data
Tools used
15
#12 Scaling to
massive data
#2 Handling data
Our protagonist: Mention time series
16
#1–13 ADA loves logs!
Mention time series: examples
17
News
Smoothed via “Friedman’s
#3 Visualizing data
Part 2:
The shape of postmortem memory
18
Average mention time series
19
News
#4 Describing data
Curve characteristics
20
�
Median over people:�1.98�95% CI [1.90, 2.03]
Median over people:�0.00055�95% CI [-0.00091, 0.0017]
Pre-mortem mean: arithmetic mean of days 360 through 30 before death
Short-term boost: maximum of days 0 through 29 after death, minus the premortem mean
Long-term boost: arithmetic mean of days 30 through 360 after death, minus the premortem mean
Halving time: number of days required to accumulate half of the total area between the postmortem curve (including the day of death) and the minimum postmortem value
#4 Describing data
21
Commercial break
Are there prototypical curve shapes?
22
Q: How to find the best number k of clusters?
#8 Applied ML
#9 Unsupervised learning
Average silhouette width!
News
#9 Unsupervised learning
Cluster analysis
24
“Blip”
“Silence”
“Rise”
“Decline”
#4 Describing data
Part 3:
Biographic correlates of postmortem memory
25
A first stab
26
Problem: Biographic properties are correlated
E.g., leaders (politicians, CEOs, etc.) are
compared to artists
Regression analysis allows us to compare averages across subgroups of the data while accounting for correlations among averaged values!
27
#5 Regression analysis
Linear regression
yi = β0 � + β1 premortem_mention_freqi
+ β2 age_at_deathi� + β3 manner_of_deathi� + β4 notability_typei
+ β5 languagei
+ β6 genderi
28
Outcome for person i:
Rank-transformed, then linearly scaled/shifted to [–0.5, 0.5]; i.e., median has value 0
8 discrete levels (dummy-coded): 20–29, 30–39, …, 70–79, 80–89, 90–99
2 levels: natural, unnatural
6 levels: arts, sports, leadership, known for death, general fame, academia/engineering
3 levels: anglophone, non-anglophone, unknown
2 levels: male, female
Avg. outcome for “baseline persona”: male anglophone artist of median premortem popularity who died a natural death at age 70–79
Linear regression results
29
Age at death vs. postmortem memory
News plays two simultaneous roles (more so than Twitter):
30
Part 4:
Discussion
31
Summary: The shape of postmortem memory
32
Summary: Biographic correlates
33
34
Merry Christmas ADA happy New Year!