1 of 14

Empirical Investigation of Causal Inference in the Development of Smoking Behavior among Adolescents Population in the PATH Dataset

Advisor: Dr. Shu Xu, Dr. Raymond Niaura, Dr. Jennifer Cantrell

PRESENTED BY Zhihao Chen

Department of Biostatistics, NYU School of Global Public Health

05/10/2023

2 of 14

Outline

  • Introduction
    • Motivation and gaps in research
  • Methods
    • Study data, deign, and BN algorithms
  • Results
    • Univariate analysis
    • Causal pathway leads to e-cig initiation
    • Average Causal Effect
  • Discussion
    • Future research

2

3 of 14

Gaps in Previous Research

  • Traditional General/Generalized Linear Models suffer from limitations including multicollinearity, non-linear relationship, increased computational cost, and low model interpretability.
  • Limited guidance on selecting confounder variables

3

4 of 14

Causal Inference with Bayesian Network

  • Causal inference allows researchers to infer “the response of an effect when a cause of the effect variable is changed.”
  • There are several variable selection methods, of which we are interested in selecting variables along the causal pathway, with intuitive visualization of this pathway in the format of a probabilistic graphical model, naming the Bayesian Network (BN).

4

5 of 14

Bayesian Network

  • A directed acyclic graph (DAG)
    • Each node represents a variable
    • The present of an edge between two nodes represents they are conditionally dependent
  • The DAG of BN determines which nodes are conditionally (in)dependent
  • BN is constructed by computing structural aspect of the graph, by adding, removing, reversing direction of the edges to optimize the Minimum Description Length (MDL) Score

5

6 of 14

Research Question: What is the average causal effect of Wave 2 e-cigarette initiation on Wave 3 past-30 days combustible cigarette use.

6

7 of 14

Data: Population Assessment of Tobacco and Health (PATH)

  • This longitudinal cohort survey collected smoking behavior across multiple follow-up waves, with a baseline wave population of 45,971 adults and youth.
  • Study sample: n = 7,340
  • Wave 1 tobacco naïve Wave 2 cigarette naive population and Wave 3 new combustible cigarette users
  • Imputed dataset from the Xu et al., (2022)

7

8 of 14

Measures

  • Exposure (Wave 2): E-cig initiation
  • Outcome (Wave 3): Past-30-day cigarette use
  • Covariates (Wave 1):
    • Age, Ethnicity, Smoking Susceptibility, Free tobacco product access, Exposure to second-hand smoking, Last time since self-related/environmental related mental health crisis

8

9 of 14

Bayesian Network (cont)

  • Sum of Products Linear Equation (SopEQL) algorithm was used for structure learning and parameter learning
    • Construct a causal network among variables by iteratively testing and selecting the most relevant variables for the network based on a scoring function.

[Violet: Put tech details in bullet points. E.g., software pacakge, algorithm, name of the scoring function et al. Remove the network on the right from this page because this is a part of your result (do not belong to the method section]

9

10 of 14

Results

  • Univariate analysis [Violet: Need a description of univariate analysis results; e.g., xx.x% male …]
  • [Add: causal pathway]
  • An increase of 4.6% of prevalence risk was found
    • E-cigarette initiation in Wave 2 youth population was associated with a higher risk of using combustible cigarettes in the past 30 days period in Wave 3, represented by an increase of 4.6% in the risk, should the sampled population did not frequently use e-product in Wave 2 as the counterfactual outcome.

10

11 of 14

11

12 of 14

Discussions and Conclusions

  • E-cigarette and subsequent combustible cigarette
  • Advantages of a BN approach
    • Flexiable nonparamentric approach
    • Able to handle complex interaction
    • Visualization of data structure
    • Infer causal effect directly from data

12

13 of 14

Conclusions

  • Limitations
    • The Bayesian Network is a method that is constantly being updated, thus there might be limitations from omitting algorithms that is at state-of-art
    • Bias from data collection and the original PATH study
  • Future research
    • Policy application in tobacco reduction
    • Adolescent intervention program on e-cigarette

13

14 of 14

References

  • Soneji S. Errors in data input in meta-analysis on association between initial use of e-cigarettes and subsequent cigarette smoking among adolescents and young adults. JAMA Pediatrics. 2018;172(1):92. doi:10.1001/jamapediatrics.2017.4200
  • BayesiaLab. BayesiaLab Overview. Bayesia. Accessed April 4, 2023. https://www.bayesia.com/articles/#!bayesialab-knowledge-hub/bayesialab-overview
  • Xu S, Coffman DL, Liu B, Kollath-Cattano C, Pesko MF. Relationships Between E-cigarette Use and Subsequent Cigarette Initiation Among Adolescents in the PATH Study: an Entropy Balancing Propensity Score Analysis. Prev Sci. 2022. doi:10.1007/s11121-021-01326-4
  • Conrady S, Jouffe L. Chapter 10: Causal Effect Identification and Estimation. In: Bayesian Networks and BayesiaLab: A Practical Introduction for Researchers. Franklin, TN: Bayesia USA; 2015. https://www.bayesia.com/articles/#!bayesialab-knowledge-hub/book. Accessed April 4, 2023.

14