1 of 90

Human-data interaction

(things we don’t know)

@jtleek

2 of 90

jtleek.com/talks

@jtleek

3 of 90

4 of 90

The optimality paradox

5 of 90

...Rates of convergence of the proposed penalized likelihood estimators are established. Furthermore, with proper choice of regularization parameters, we show that the proposed�estimators perform as well as the oracle procedure in variable selection; namely, they work as well as if the correct submodel were known.

Source: http://orfe.princeton.edu/~jqfan/papers/01/penlike.pdf

6 of 90

...Rates of convergence of the proposed penalized likelihood estimators are established. Furthermore, with proper choice of regularization parameters, we show that the proposed�estimators perform as well as the oracle procedure in variable selection; namely, they work as well as if the correct submodel were known.

Source: http://orfe.princeton.edu/~jqfan/papers/01/penlike.pdf

7 of 90

https://www.theguardian.com/technology/2017/dec/07/alphazero-google-deepmind-ai-beats-champion-program-teaching-itself-to-play-four-hours

8 of 90

https://www.healthcaredive.com/news/google-rolls-out-ai-tool-for-understanding-genomes/

9 of 90

10 of 90

https://www.forbes.com/sites/stevensalzberg/2017/12/11/no-googles-new-ai-cant-build-your-genome-sequence/

11 of 90

12 of 90

13 of 90

14 of 90

A hypothesis

15 of 90

16 of 90

Let’s make an example of someone at random

17 of 90

18 of 90

“In the first stage, single lag and distributed lag overdispersed Poisson regression models were used for estimating county-specific RRs of hospital admissions associated with ambient levels of PM_2.5”

Source: Dominici, Peng et al. 2006 JAMA

19 of 90

log(E[Hospital Admissions | X]) =

PM2.5

+ log(# at risk) + factor(day of week) + ns(day of year, 8) + ns(temperature,6)

+ ns(dew point temperature, 3) + ns(lagged temperature, 6)

+ ns(lagged dew point temperature, 3)

20 of 90

21 of 90

Eight outcomes were considered based on the ICD-9 codes for 5 cardiovascular outcomes (heart failure [428], heart rhythm disturbances [426-427], cerebrovascular events [430-438], ischemic heart disease [410-414, 429], peripheral vascular disease [440-448]), 2 respiratory outcomes (chronic obstructive pulmonary disease [COPD; 490-492], respiratory tract infections [464-466, 480-487]), and hospitalizations caused by injuries and other external causes (800-849). The county-wide daily hospitalization rates for each outcome for 1999-2002 appear in Table 1.��The study population includes 11.5 million Medicare enrollees residing an average of 5.9 miles from a PM2.5 monitor. The analysis was restricted to the 204 US counties with populations larger than 200 000. Of these 204 counties, 90 had daily PM2.5 data across the study period and the remaining counties had PM2.5 data collected once every 3 days for at least 1 full year. The locations of the 204 counties appear in Figure 1. The counties were clustered into 7 geographic regions by applying the K-means clustering algorithm to longitude and latitude for the counties.10,11

22 of 90

Eight outcomes were considered based on the ICD-9 codes for 5 cardiovascular outcomes (heart failure [428], heart rhythm disturbances [426-427], cerebrovascular events [430-438], ischemic heart disease [410-414, 429], peripheral vascular disease [440-448]), 2 respiratory outcomes (chronic obstructive pulmonary disease [COPD; 490-492], respiratory tract infections [464-466, 480-487]), and hospitalizations caused by injuries and other external causes (800-849). The county-wide daily hospitalization rates for each outcome for 1999-2002 appear in Table 1.��The study population includes 11.5 million Medicare enrollees residing an average of 5.9 miles from a PM2.5 monitor. The analysis was restricted to the 204 US counties with populations larger than 200 000. Of these 204 counties, 90 had daily PM2.5 data across the study period and the remaining counties had PM2.5 data collected once every 3 days for at least 1 full year. The locations of the 204 counties appear in Figure 1. The counties were clustered into 7 geographic regions by applying the K-means clustering algorithm to longitude and latitude for the counties.10,11

23 of 90

Import

Tidy

Transform

Visualize

Model

Publish

Adapted from Hadley Wickham

Communicate

Design and collect

24 of 90

Import

Tidy

Transform

Visualize

Model

Publish

Adapted from Hadley Wickham

Communicate

Design and collect

googlesheets

dplyr

Base R

gams

google slides

twitter

amazon s3

25 of 90

Import

Tidy

Transform

Visualize

Model

Publish

Adapted from Hadley Wickham

Communicate

Design and collect

spreadsheets :(

david robinson told me

bad life choices

my advisor likes them

gifs

chris volinsky told me

chromebook

26 of 90

Source: Tversky and Kahneman 1984

Suppose we have a sample with 30 lawyers and 70 engineers.

What is the probability Dick is a lawyer?

27 of 90

Source: Tversky and Kahneman 1984

Dick is a 30 year old man. He is married with no children. A man of high ability and high motivation, he promises to be quite successful in his field. He is well liked by his colleagues.

What is the probability Dick is a lawyer?

28 of 90

Source: Cleveland and McGill JASA 1984

29 of 90

Source: Cleveland and McGill JASA 1984

30 of 90

Human data interaction

(observational epidemiology)

31 of 90

Import

Tidy

Transform

Visualize

Model

Publish

Adapted from Hadley Wickham

Communicate

Design and collect

32 of 90

Import

Tidy

Transform

Visualize

Model

Publish

Adapted from Hadley Wickham

Communicate

Design and collect

33 of 90

Source: https://fivethirtyeight.com/features/science-isnt-broken/

34 of 90

Human data interaction

(our lab)

35 of 90

Data Science

10 Courses

4.4M+ Enrollments

Executive Data Science

5 Courses

150K+ Enrollments

Genomic Data Science

9 Courses

230k+ Enrollments

MSD in R

6 Courses

35K+ Enrollments

36 of 90

Adapted from Aboozar Hadavand

37 of 90

38 of 90

39 of 90

Import

Tidy

Transform

Visualize

Model

Publish

Adapted from Hadley Wickham

Communicate

Design and collect

40 of 90

41 of 90

42 of 90

Adapted from Sara Wang

43 of 90

Adapted from Sara Wang

44 of 90

Adapted from Sara Wang

Thanks Hadley!

45 of 90

# read code as text

code_txt = readLines('codefile.r')

# look for calls

str_detect(code_txt, 'library(dplyr)')

Adapted from Sara Wang

46 of 90

Source: https://github.com/jhudsl/matahari

47 of 90

library(matahari)

# Start logging your commands�dance_start(value = TRUE)

dat = data.frame(x=rnorm(100),y=rnorm(100))

library(dplyr)

glimpse(dat)

plot(dat)

# Pause logging�dance_stop()�

48 of 90

# Get data

code_df = dance_tbl()

code_df %>% select(expr,value,dt) %>% View()

49 of 90

# Get data

code_df = dance_tbl()

code_df %>% select(expr,value,dt) %>% View()

50 of 90

# Get data

code_df = dance_tbl()

code_df %>% select(expr,value,dt) %>% View()

51 of 90

# Plain text

dance_report()

"WAoAAAACAAMFAAACAwAAAAMTAAAABgAAABMAAAAIAAAABgAAAAEABAAJAAAAC3Nlc3Npb25J\nbmZvAAAA/gAAAAYAAAABAAQACQAAAAtkYW5jZV9zdGFydAAABAIAAAABAAQACQAAAAV2YWx1\nZQAAAAoAAAABAAAAAQAAAP4AAAAGAAAAAQAEAAkAAAABPQAAAAIAAAABAAQACQAAAANkYXQA\nAAACAAAABgAAAAEABAAJAAAACmRhdGEuZnJhbWUAAAQCAAAAAQAEAAkAAAABeAAAAAYAAAAB\nAAQACQAAAAVybm9ybQAAAAIAAAAOAAAAAUBZAAAAAAAAAAAA/gAABAIAAAABAAQACQAAAAF5\nAAAABgAACP8AAAACAAAADgAAAAFAWQAAAAAAAAAAAP4AAAD+AAAA/gAAAAYAAAABAAQACQAA\nAAdsaWJyYXJ5AAAAAgAAAAEABAAJAAAABWRwbHlyAAAA/gAAAAYAAAABAAQACQAAAAdnbGlt\ncHNlAAAAAgAABf8AAAD+AAAABgAAAAEABAAJAAAABHBsb3QAAAACAAAF/wAAAP4AAAAGAAAB\n/wAAAP4AAAAGAAAB/wAAAP4AAAATAAAACAAAAxMAAAAKAAACEwAAAA4AAAAQAAAAAQAEAAkA\nAAATeDg2XzY0LXBjLWxpbnV4LWdudQAAABAAAAABAAQACQAAAAZ4ODZfNjQAAAAQAAAAAQAE\nAAkAAAAJbGludXgtZ251AAAAEAAAAAEABAAJAAAAEXg4Nl82NCwgbGludXgtZ251AAAAEAAA\nAAEABAAJAAAAAAAAABAAAAABAAQACQAAAAEzAAAAEAAAAAEABAAJAAAAAzUuMAAAABAAAAAB\nAAQACQAAAAQyMDE4AAAAEAAAAAEABAAJAAAAAjA0AAAAEAAAAAEABAAJAAAAAjIzAAAAEAAA\nAAEABAAJAAAABTc0NjI2AAAAEAAAAAEABAAJAAAAAVIAAAAQ....”

52 of 90

# Decode

rprt %>% base64_to_df()

"WAoAAAACAAMFAAACAwAAAAMTAAAABgAAABMAAAAIAAAABgAAAAEABAAJAAAAC3Nlc3Npb25J\nbmZvAAAA/gAAAAYAAAABAAQACQAAAAtkYW5jZV9zdGFydAAABAIAAAABAAQACQAAAAV2YWx1\nZQAAAAoAAAABAAAAAQAAAP4AAAAGAAAAAQAEAAkAAAABPQAAAAIAAAABAAQACQAAAANkYXQA\nAAACAAAABgAAAAEABAAJAAAACmRhdGEuZnJhbWUAAAQCAAAAAQAEAAkAAAABeAAAAAYAAAAB\nAAQACQAAAAVybm9ybQAAAAIAAAAOAAAAAUBZAAAAAAAAAAAA/gAABAIAAAABAAQACQAAAAF5\nAAAABgAACP8AAAACAAAADgAAAAFAWQAAAAAAAAAAAP4AAAD+AAAA/gAAAAYAAAABAAQACQAA\nAAdsaWJyYXJ5AAAAAgAAAAEABAAJAAAABWRwbHlyAAAA/gAAAAYAAAABAAQACQAAAAdnbGlt\ncHNlAAAAAgAABf8AAAD+AAAABgAAAAEABAAJAAAABHBsb3QAAAACAAAF/wAAAP4AAAAGAAAB\n/wAAAP4AAAAGAAAB/wAAAP4AAAATAAAACAAAAxMAAAAKAAACEwAAAA4AAAAQAAAAAQAEAAkA\nAAATeDg2XzY0LXBjLWxpbnV4LWdudQAAABAAAAABAAQACQAAAAZ4ODZfNjQAAAAQAAAAAQAE\nAAkAAAAJbGludXgtZ251AAAAEAAAAAEABAAJAAAAEXg4Nl82NCwgbGludXgtZ251AAAAEAAA\nAAEABAAJAAAAAAAAABAAAAABAAQACQAAAAEzAAAAEAAAAAEABAAJAAAAAzUuMAAAABAAAAAB\nAAQACQAAAAQyMDE4AAAAEAAAAAEABAAJAAAAAjA0AAAAEAAAAAEABAAJAAAAAjIzAAAAEAAA\nAAEABAAJAAAABTc0NjI2AAAAEAAAAAEABAAJAAAAAVIAAAAQ....”

53 of 90

Human data interaction

(randomized trials)

54 of 90

55 of 90

56 of 90

Import

Tidy

Transform

Visualize

Model

Publish

Adapted from Hadley Wickham

Communicate

Design and collect

57 of 90

“...Coming up with nothing is unacceptable…”

58 of 90

Source: http://bmjopen.bmj.com/content/7/11/e018491

59 of 90

[It has been widely noted that there is a relationship between x1 and Y.] We are interested in studying the magnitude of this relationship and reporting results to the president of the company. Using the data available at [data1/data2], use a linear regression model to investigate this relationship.

Would you be confident telling the company president that there is a meaningful relationship between x1 and y?

A. Yes

B. No

data1: x2 confounds the relationship between Y and x1

data2: x2 is not a confounder but is predictive of Y

Source: Myint et al (in prep)

60 of 90

Confounded

Primed

68% found signal

(n = 263)

Not Primed

57% found signal

(n =252 )

Source: Myint et al (in prep)

61 of 90

Not confounded

Primed

78% found signal

(n = 263)

Not Primed

87% found signal

(n = 258)

Source: Myint et al (in prep)

62 of 90

Source: Myint et al (in prep)

Confounded

Not Primed

Confounded

Primed

Not Confounded

Not Primed

Not Confounded

Primed

63 of 90

Frequency of fitting y ~ x1 + x2 + x3

	Confounded	Not Confounded
Primed	73%	70%
Not Primed	60%	62%

64 of 90

Human data interaction

(interventional randomized trials)

65 of 90

Import

Tidy

Transform

Visualize

Model

Publish

Adapted from Hadley Wickham

Communicate

Design and collect

66 of 90

Source: http://journals.sagepub.com/doi/pdf/10.1177/0956797611417632

67 of 90

Your assignment is to study how income varies across different categories of college majors. You will be using data from a study of recent college graduates. Make sure to use good practices that you have learned so far in this course and previous courses in the specialization. [In particular, it is good practice to specify an analysis plan early in the process to avoid the “p-hacking” behavior of trying many analyses to find one that has desired results. If you want to learn more about “p-hacking”, you can visit https://projects.fivethirtyeight.com/p-hacking/.]

If you will proceed with the analysis, click “Yes”. Otherwise, click “No”.

Source: Myint et al (in prep)

68 of 90

	Significant	Not significant
Warned	66	90
Not Warned	59	88

Source: Myint et al (in prep)

69 of 90

head(college)�fit <- lm(rank ~ major_category - 1, college)�summary(fit)�

70 of 90

str(college)� library(ggplot2)� library(dplyr)� g1 <- ggplot(college, aes(x = major_category, y = median, fill = major_category))� g1 <- g1 + geom_boxplot()� g1 <- g1 + theme(axis.text.x = element_text(angle = 45, hjust = 1), legend.position = "none")� g1� tapply(college$sample_size, college$major_category, sum)� majormean <- tapply(college$sample_size, college$major_category, mean)� max(majormean)� college$major_category <- as.factor(college$major_category)� college2 <- college� college2$major_category <- relevel(college$major_category, ref = "Business")� fit1 <- lm(median ~ major_category, data = college2)� fit2 <- update(fit1, median ~ major_category + perc_men)� fit3 <- update(fit1, median ~ major_category + perc_men + perc_employed)� fit4 <- update(fit1, median ~ major_category + perc_men + perc_employed + perc_employed_fulltime_yearround)� fit5 <- update(fit1, median ~ major_category + perc_men + perc_employed + perc_employed_fulltime_yearround + perc_college_jobs)� anova(fit1, fit2, fit3, fit4)� summary(fit5)� tail(hatvalues(fit5)[order(hatvalues(fit5, decreasing = T))], 5)� tail(cooks.distance(fit5)[order(cooks.distance(fit5, decreasing = T))], 5)

71 of 90

str(college)� library(ggplot2)� library(dplyr)� g1 <- ggplot(college, aes(x = major_category, y = median, fill = major_category))� g1 <- g1 + geom_boxplot()� g1 <- g1 + theme(axis.text.x = element_text(angle = 45, hjust = 1), legend.position = "none")� g1� tapply(college$sample_size, college$major_category, sum)� majormean <- tapply(college$sample_size, college$major_category, mean)� max(majormean)� college$major_category <- as.factor(college$major_category)� college2 <- college� college2$major_category <- relevel(college$major_category, ref = "Business")� fit1 <- lm(median ~ major_category, data = college2)� fit2 <- update(fit1, median ~ major_category + perc_men)� fit3 <- update(fit1, median ~ major_category + perc_men + perc_employed)� fit4 <- update(fit1, median ~ major_category + perc_men + perc_employed + perc_employed_fulltime_yearround)� fit5 <- update(fit1, median ~ major_category + perc_men + perc_employed + perc_employed_fulltime_yearround + perc_college_jobs)� anova(fit1, fit2, fit3, fit4)� summary(fit5)� tail(hatvalues(fit5)[order(hatvalues(fit5, decreasing = T))], 5)� tail(cooks.distance(fit5)[order(cooks.distance(fit5, decreasing = T))], 5)

Setup

72 of 90

str(college)� library(ggplot2)� library(dplyr)� g1 <- ggplot(college, aes(x = major_category, y = median, fill = major_category))� g1 <- g1 + geom_boxplot()� g1 <- g1 + theme(axis.text.x = element_text(angle = 45, hjust = 1), legend.position = "none")� g1� tapply(college$sample_size, college$major_category, sum)� majormean <- tapply(college$sample_size, college$major_category, mean)� max(majormean)� college$major_category <- as.factor(college$major_category)� college2 <- college� college2$major_category <- relevel(college$major_category, ref = "Business")� fit1 <- lm(median ~ major_category, data = college2)� fit2 <- update(fit1, median ~ major_category + perc_men)� fit3 <- update(fit1, median ~ major_category + perc_men + perc_employed)� fit4 <- update(fit1, median ~ major_category + perc_men + perc_employed + perc_employed_fulltime_yearround)� fit5 <- update(fit1, median ~ major_category + perc_men + perc_employed + perc_employed_fulltime_yearround + perc_college_jobs)� anova(fit1, fit2, fit3, fit4)� summary(fit5)� tail(hatvalues(fit5)[order(hatvalues(fit5, decreasing = T))], 5)� tail(cooks.distance(fit5)[order(cooks.distance(fit5, decreasing = T))], 5)

EDA

73 of 90

str(college)� library(ggplot2)� library(dplyr)� g1 <- ggplot(college, aes(x = major_category, y = median, fill = major_category))� g1 <- g1 + geom_boxplot()� g1 <- g1 + theme(axis.text.x = element_text(angle = 45, hjust = 1), legend.position = "none")� g1� tapply(college$sample_size, college$major_category, sum)� majormean <- tapply(college$sample_size, college$major_category, mean)� max(majormean)� college$major_category <- as.factor(college$major_category)� college2 <- college� college2$major_category <- relevel(college$major_category, ref = "Business")� fit1 <- lm(median ~ major_category, data = college2)� fit2 <- update(fit1, median ~ major_category + perc_men)� fit3 <- update(fit1, median ~ major_category + perc_men + perc_employed)� fit4 <- update(fit1, median ~ major_category + perc_men + perc_employed + perc_employed_fulltime_yearround)� fit5 <- update(fit1, median ~ major_category + perc_men + perc_employed + perc_employed_fulltime_yearround + perc_college_jobs)� anova(fit1, fit2, fit3, fit4)� summary(fit5)� tail(hatvalues(fit5)[order(hatvalues(fit5, decreasing = T))], 5)� tail(cooks.distance(fit5)[order(cooks.distance(fit5, decreasing = T))], 5)

Munging

74 of 90

str(college)� library(ggplot2)� library(dplyr)� g1 <- ggplot(college, aes(x = major_category, y = median, fill = major_category))� g1 <- g1 + geom_boxplot()� g1 <- g1 + theme(axis.text.x = element_text(angle = 45, hjust = 1), legend.position = "none")� g1� tapply(college$sample_size, college$major_category, sum)� majormean <- tapply(college$sample_size, college$major_category, mean)� max(majormean)� college$major_category <- as.factor(college$major_category)� college2 <- college� college2$major_category <- relevel(college$major_category, ref = "Business")� fit1 <- lm(median ~ major_category, data = college2)� fit2 <- update(fit1, median ~ major_category + perc_men)� fit3 <- update(fit1, median ~ major_category + perc_men + perc_employed)� fit4 <- update(fit1, median ~ major_category + perc_men + perc_employed + perc_employed_fulltime_yearround)� fit5 <- update(fit1, median ~ major_category + perc_men + perc_employed + perc_employed_fulltime_yearround + perc_college_jobs)� anova(fit1, fit2, fit3, fit4)� summary(fit5)� tail(hatvalues(fit5)[order(hatvalues(fit5, decreasing = T))], 5)� tail(cooks.distance(fit5)[order(cooks.distance(fit5, decreasing = T))], 5)

Modeling

75 of 90

str(college)� library(ggplot2)� library(dplyr)� g1 <- ggplot(college, aes(x = major_category, y = median, fill = major_category))� g1 <- g1 + geom_boxplot()� g1 <- g1 + theme(axis.text.x = element_text(angle = 45, hjust = 1), legend.position = "none")� g1� tapply(college$sample_size, college$major_category, sum)� majormean <- tapply(college$sample_size, college$major_category, mean)� max(majormean)� college$major_category <- as.factor(college$major_category)� college2 <- college� college2$major_category <- relevel(college$major_category, ref = "Business")� fit1 <- lm(median ~ major_category, data = college2)� fit2 <- update(fit1, median ~ major_category + perc_men)� fit3 <- update(fit1, median ~ major_category + perc_men + perc_employed)� fit4 <- update(fit1, median ~ major_category + perc_men + perc_employed + perc_employed_fulltime_yearround)� fit5 <- update(fit1, median ~ major_category + perc_men + perc_employed + perc_employed_fulltime_yearround + perc_college_jobs)� anova(fit1, fit2, fit3, fit4)� summary(fit5)� tail(hatvalues(fit5)[order(hatvalues(fit5, decreasing = T))], 5)� tail(cooks.distance(fit5)[order(cooks.distance(fit5, decreasing = T))], 5)

Evaluation

76 of 90

str(college)� library(ggplot2)� library(dplyr)� g1 <- ggplot(college, aes(x = major_category, y = median, fill = major_category))� g1 <- g1 + geom_boxplot()� g1 <- g1 + theme(axis.text.x = element_text(angle = 45, hjust = 1), legend.position = "none")� g1� tapply(college$sample_size, college$major_category, sum)� majormean <- tapply(college$sample_size, college$major_category, mean)� max(majormean)� college$major_category <- as.factor(college$major_category)� college2 <- college� college2$major_category <- relevel(college$major_category, ref = "Business")� fit1 <- lm(median ~ major_category, data = college2)� fit2 <- update(fit1, median ~ major_category + perc_men)� fit3 <- update(fit1, median ~ major_category + perc_men + perc_employed)� fit4 <- update(fit1, median ~ major_category + perc_men + perc_employed + perc_employed_fulltime_yearround)� fit5 <- update(fit1, median ~ major_category + perc_men + perc_employed + perc_employed_fulltime_yearround + perc_college_jobs)� anova(fit1, fit2, fit3, fit4)� summary(fit5)� tail(hatvalues(fit5)[order(hatvalues(fit5, decreasing = T))], 5)� tail(cooks.distance(fit5)[order(cooks.distance(fit5, decreasing = T))], 5)

Setup

Exploratory

Munging

Modeling

Evaluation

77 of 90

Source: Myint et al (in prep)

78 of 90

Source: Myint et al (in prep)

79 of 90

Source: Myint et al (in prep)

80 of 90

Source: Myint et al (in prep)

81 of 90

Y = X + e

82 of 90

Source: https://simplystatistics.org/2013/07/31/the-researcher-degrees-of-freedom-recipe-tradeoff-in-data-analysis/

83 of 90

Y = X + e_analyst + e_sampling

84 of 90

Data Science

10 Courses

4.4M+ Enrollments

Executive Data Science

5 Courses

150K+ Enrollments

Genomic Data Science

9 Courses

230k+ Enrollments

MSD in R

6 Courses

35K+ Enrollments

85 of 90

Data Science

10 Courses

4.4M+ Enrollments

Executive Data Science

5 Courses

150K+ Enrollments

Genomic Data Science

9 Courses

230k+ Enrollments

MSD in R

6 Courses

35K+ Enrollments

86 of 90

Source: https://simplystatistics.org/2015/04/29/data-analysis-subcultures/

87 of 90

Y = X + e_population + e_analyst + e_sampling

88 of 90

89 of 90

2

The optimality paradox is due to unknowns about human data interaction

We can now study human data interaction as its own science

90 of 90

Leek group

Collaborators

Alumni

Shannon Ellis
Aboozar Hadavand
Lucy D’Agostino McGowan
Leslie Myint
Jack Fu
Sara Wang
Kayode Sosina
You?

Kasper Hansen
Margaret Taub
Leah Jager
Ben Langmead
Roger Peng
Aaron Fisher
Brooke Anderson
Ira Gooding

Simina Boca
Hilary Parker
Andrew Jaffe
Alyssa Frazee
Prasad Patil
Leo Collado Torres
Abhi Nellore
Kai Kammers
Nick Carchedi
Sean Kross
Divya Narayanan