The ‘Decision Tool for Causal Inference and Observational Data Analysis Methods in Comparative Effectiveness Research’ (DECODE CER)
Comparative effectiveness research (CER) seeks to compare interventions in real-world settings, often through use of observational data and/or causal inference1 methods. Although such methods are increasingly popular for CER, debate exists on the utility of different approaches and the underlying framework (e.g. potential outcomes2) which seek to better estimate true causal effects. We therefore sought to develop a decision tool (funded by the Patient-Centered Outcomes Research Institute) to guide researchers in formulating questions, recognizing assumptions, conducting analyses, and interpreting results. The goal of this tool is to facilitate the process, point researchers toward other resources, and highlight the need for appropriate expert guidance, not provide a “right answer”.
DECODE CER
Version 2.2
User’s Guide to DECODE CER
The next slide is the overall framework with links to all other slides, each of which has embedded links to relevant resources. Topics are categorized as follows:
DECODE CER is a series of Google Slides connected through an overall framework, beginning with a question, and then outlining assumptions, methods, and other considerations for observational CER. Links to relevant resources are embedded throughout the decision tool.
The boxes with black borders denote topics which highlight assumptions of the methods, including assumptions for data adequacy, causal inference, and PS-based and IV methods; at each of these steps, the user should assess if their research approach can meet such assumptions.
Most of the slides are divided into two columns specific to Background and Resources.
Navigation: In addition to embedded links, the Back to Tool link in subsequent slides returns the user to the overall framework slide. You can use the back arrow in your browser to return to previous slides after clicking on any links.
General Concepts
Propensity Scores
Instrumental Variables
Overall Framework
A4. Causal Graphs
B2. Assignment Mechanism
Decision Tool for
Causal Inference and Observational Data Analysis Methods in Comparative Effectiveness
Research
DECODE CER
A1. Research Question
CER can be described as research that informs a clinical decision through the direct comparison of two or more efficacious interventions, in terms of harms and benefits at the patient or population level, in real-world settings, using appropriate methods and data. CER, as defined by the Federal Coordinating Council and the National Academies of Medicine, is supported by organizations such as the Agency for Healthcare Research and Quality (AHRQ), the Patient-Centered Outcomes Research Institute (PCORI), and the National Libraries of Medicine (NLM). While some CER questions may require randomized trials, the pure volume of questions, and the inability to randomize certain treatments or interventions, also requires use of observational methods. Causal inference therefore has substantial utility for questions in CER.
Our group developed an online course on Writing a Successful Concept Proposal; Module 1 and Module 2 of this course provide specific guidance on formulating the question. This course references the PCORI videos on formulating research questions through Category 1 of their Methodology Standards; see their Academic Curriculum for the total series of videos and associated materials (created by Johns Hopkins University) on the PCORI Methodology Standards. The six standards in Category 1 describe how to develop a protocol, address gaps in the literature, and specify the population and the intervention/comparator(s), and outcomes that matter to patients. Chapter 1 of the AHRQ document Developing a Protocol for Observational CER: A User’s Guide also serves as a valuable reference on the topic.
The first step of any research project is to specify the question. DECODE CER specifically addresses comparative effectiveness of two treatments assigned at baseline. This section provides guidance on formulating the CER question.
Background:
Resources:
Click here for main take-home points.
Design and Data
Once the CER question is specified, one must assess the adequacy of the design and data. As stated in Chapter 2 of the AHRQ document on Observational CER, “The choice of study design often has profound consequences for the causal interpretation of study results”, or as stated by Rubin1, “For objective causal inference, design trumps analysis”. Although the design is specific to the CER question, there are several overriding considerations; Rubin1 emphasizes 1) the nature of data collection, where potential confounders and outcome predictors are measured before treatment assignment, 2) completeness of data for key covariates, and 3) overlap in the propensities for treatment status across treatment groups (addressed further in Slide 10).
CER practitioners are increasingly moving toward “big data”2-10, registries, and multi-institutional data networks, such as those developed through PCORnet. These topics are addressed across numerous categories by PCORI Methodology Standards and Academic Curriculum on Data Registries, and Data Networks. Regardless of whether we use “big data” for CER, there are other critical concerns in the design of observational studies, as discussed under Methodology Standards for Data Integrity and Rigorous Analysis, Preventing and Handling Missing Data, and Modules 1-6a of Category 8 on Causal Inference. The AHRQ document, Chapter 2, Rosenbaum’s text on Observational Studies11, and the ENACT course Module 6 serve as another valuable reference on the topic.
Although DECODE CER focuses on the analysis (i.e. after data collection is completed) the study design must be recognized as a critical factor for identifying the optimal analysis methods and deciding whether causal inference is appropriate.
Background:
Resources:
A3. Scope of Methods
DECODE CER addresses a range of approaches in observational CER, from asking the question, to assessing design adequacy and key aspects of potential approaches. However, the total scope of our tool is still somewhat limited given the following restrictions:
Although most of the tool focuses on the analysis, Slide #5 does emphasize the critical role of design. Other slides focus on concepts that cut across both design and analysis (e.g. causal graphs). In terms of the potential outcomes framework, we acknowledge that many other approaches exist (e.g. Structural Causal Models1) and substantial debate remains on how to best estimate causal effects. Approaches such as marginal structural models2, g-methods3, and targeted learning approaches, address time-varying treatments and associated complexities and biases4. Generalized propensity scores5-6 may also be applied to continuous treatment levels, and assessing dose response effects. The same issues described in DECODE CER may also arise with non-adherence and/or pragmatic trials.
Causal inference includes a wide range of approaches with a rapidly evolving body of literature, thus making a complete synthesis of methods is impractical; DECODE CER is limited to propensity score-based methods and instrumental variables.
The goal of this tool is limited to providing sufficient guidance for a well-formulated observational CER analysis using the commonly applied techniques and alerting the users to potential utility of other methods.
A4. Causal Graphs
A number of authors, including Pearl1, Greenland, et al.2, and Glymour and Greenland3, and others4-6, advocate the use of causal graphs and, more specifically, directed acyclic graphs, for both formulating the models and conducting the analyses to draw causal inferences. In very basic terms, the process described extensively by Pearl7 seeks to 1) develop a directed acyclic graph with nodes and edges (and associated functions), 2) use rules associated with different types of graphs (e.g. chains and colliders) to specify joint probabilities and rules for independence and d-separation, and 3) evaluate causal effects of interventions by conditioning on appropriate variables with do-expressions and other graph modifications. As noted by Hernan and others, expert knowledge is critical in specifying the relationships.
Although the details of these methods go beyond the scope of this tool, it is important to emphasize that developing modeling strategies for observational CER should include causal graphs as a mechanism to better understand the underlying conceptual models, general approaches, and variables needed for those models. Causal graphs should be an early step in both planning and analyzing the data.
Formulating causal models requires a clear understanding of relationships between treatments, confounders, mediators, and other covariates. Specifying relationships through causal graphs facilitates development of the causal model.
Background:
Resources:
Further guidance is provided in the PCORI Academic Curriculum across the Models in Category 3 (Data Integrity and Rigorous Analysis) and, in particular, in Module 4 (Thinking about Causality). A number of examples of causal graphs4-6 are published in the literature.
A5. Causal Inference Assumptions
Although a number of different frameworks exist for causal inference, and the assumptions depend on the framework being utilized, DECODE CER focuses on use of the potential outcomes framework, as described by Rubin1-2 and others3-4. Briefly, in the simple case of a dichotomous treatment with two levels, the individual causal effect (which cannot be directly estimated) is defined as the difference between the observed outcome and the other potential outcome, i.e. the counterfactual (or some function of that difference depending on the outcome distribution). Average treatment effects, and other such contrasts, are defined using expectations of those differences. In general, the causal effects are not estimated by the simple difference in expectations, or by regression adjustment3.
One assumption, as described by Imbens and Rubin1, is that potential outcomes must be “a priori observable”, i.e. there exists some non-zero probability for each potential outcome for each treatment level. Other assumptions include adequacy of data, and ability to formulate the causal contrast for the given research question. Another related set of considerations lead to the Stable Unit Treatment Value (SUTVA) assumption1,4, which states that, across all units, 1) the treatments and assignment mechanisms do not interfere with other units, and 2) the potential treatments are the same. Assessing these assumptions depends on the underlying science. Inability to satisfy assumptions may require restricting analyses to a subset where assumptions do hold (e.g. in the range of data with overlapping propensity scores).
As with any statistical approach, causal inference methods require strong assumptions to achieve advertised properties. While different methods have different assumptions, there are several overall assumptions that should always be considered.
The Potential Outcomes Framework:
Assumptions:
A6. Causal Question
Not all comparisons are causal questions. Peterson1, for instance describes a 7-step roadmap, whereas Hernan asks whether the question can be framed as a randomized experiment with a 4 step process (slide 35):
For the most part, questions that are well formulated as a CER question, i.e. directly comparing two or more treatments in real-world settings, are also sufficiently well formulated as causal questions, especially if one follows the PCORI guidance on clearly defined interventions (although this might not be the case for an ill-defined “usual care” comparator).
Another prerequisite to appropriately applying causal inference methods is formulating the research question as a testable causal effect. More specifically, one must be able to specify the research question in the form of a randomized experiment.
Background:
Relationship to CER:
Inference also requires well-defined causal states2 and causal estimands3, and correctly interpreting estimates, e.g. as average treatment effect (ATE), conditional effects within the treated or untreated4, local ATE, or complier-average causal effects5.
The PCORI Academic Curriculum only briefly addresses the causal question and causal effects in Category 8 on Causal Inference, in particular in Modules 3a. In general, however, we found little in the way of tutorials or introductory articles on this topic outside of some discussion in the referenced textbooks and published articles5-11 on the individual methods.
Resources
A7. Additional Resources & Methods
Throughout this tool, we highlight videos from the PCORI Academic Curriculum that were developed by Johns Hopkins University to provide training materials on the PCORI Methodology Standards (and the Methodology Report), which are a set of minimum standards for conducting patient-centered CER. Our online course, developed through the Expanding National Capacity in PCOR through Training (ENACT) Program, leverages these materials for instruction on writing a concept proposal (with a new course on writing a full contract proposal coming later in 2017). Four other programs funded under the same RFA as ENACT also offer training for their specific research communities. Online modules in CER methods are also available through websites from UC-Davis and the Ohio State University.
These resources address both the methods for observational data described in DECODE CER, and other aspects of patient-centered CER, such as heterogeneity of treatment effects and the conduct of pragmatic trials. The University of Pittsburgh CER Center provides a more complete description of these modules. Other resources are more specific to causal inference methods, such as websites by Pearl, Stuart (including publications and software), the HSPM Program on Causal Inference (at Harvard University), and others. As described on some of these websites, software routines for causal inference are being developed for numerous programs, including R, Stata, and SAS. PCORI has also funded other projects that target related training efforts, such as a causal inference toolkit, guidelines for causal inference in pragmatic trials, and others.
As research in observational CER, and causal inference for observational data continues to expand, a number of resources have been developed for practitioners. DECODE CER seeks to highlight, rather than reproduce these resources.
B1. PS Methods & Assumptions
PS-based methods were motivated as an approach for causal inference by Rosenbaum and Rubin1. Although propensity-score (PS) based methods are often referred to as a single technique, they actually describe a series of approaches necessary for estimating effects, including the following components of the approach:
The main assumptions1,3 of PS-based methods are
Propensity score-based methods represent an increasingly popular set of approaches that require assessing assumptions, estimating the propensity score, specifying the psuedo population, and estimating effects through the outcomes model.
Methods:
Assumptions:
In addition, the PS, or the logit(PS), may be used as a regression adjustment factor, but this approach generally has worse properties2 and is not discussed further.
Unconfoundedness is usually assessed via sensitivity measures, while positivity is assessed by overlap in propensity scores, and restricting analyses to the area of common support. Subsequent balance4-5 of covariates should also be checked.
The PCORI Academic Curriculum describes PS-based methods in Category 3 (Data Integrity and Rigorous Analysis) Module 8c, and Category 8 (Causal Inference) Modules 6a and 6b. See websites by Stuart and Love and an overview by Austin6 for further background.
Resources
B2. Assignment Mechanism
Implementing PS-based methods begins by fitting a model to estimate each individual’s propensity for a given treatment level. Several of the usual modeling concerns are not however relevant for PS models, including estimating variable effects, presence of colinearity, and generalizability; instead the goal is strictly prediction for a given data set. While some debate exists on variable inclusion strategies, the literature suggests including both outcome predictors and confounders1-3. Although others argue for a ‘kitchen sink’ approach, that includes all available data, inclusion of an instrumental variable may negatively affect subsequent statistical properties4. Variable specification should also depend on prior knowledge, and not be limited to variables showing an imbalance between treatment groups.
Although logistic regression is a common choice for the propensity score model, strong motivation exists for using more complex models that implicitly fit interactions and non-linear terms. Specifically, the logistic model likely underestimates the complexity of the PS, and the gains in interpretation have no benefit in this context. Machine learning methods5-8, such as trees or random forests, neural networks, or support vector machines, therefore have significant appeal for PS estimation.
Substantial literature exists on developing an optimal prognostic model; calculating an individual’s propensity for treatment (or a given treatment level) is essentially the same problem as any prognostic model with some exceptions.
Background:
Resources: (In addition to those in Slide #10)
Although calculation of the PS, and all other steps in the PS-based approaches, do not necessarily require special routines, such routines are commonly available in packages such as R, SAS, and Stata; the website by Stuart provides a number of examples.
B3. Pseudo Population
Although the PS is often viewed strictly as an analysis method, it can play a critical tool in designing an observational study1. More specifically, propensity scores can be used to specify the psuedo population, which may use matching2, reweighted data3-5, or separate populations stratified6 by the propensity score. Another approach is using standardization through g-methods. The distributions of the PS by treatment group7-8 provide critical information about whether causal inferences are possible and how the pseudo populations should be formulated. In the extreme case of confounding by indication, for instance, propensity scores would have little to no overlap, and thus prohibit us from drawing any causal inferences. For less extreme cases, one must consider how different methods best utilize data within the area of PS overlap.
Matching subjects across treatment groups on the PS (within some caliber) represents a conceptually intuitive approach, but with many computational variations and the potential to (appropriately or not) use only a fraction of the available sample. Weighting by the PS essentially reweights the study data to form a population representative of a randomized trial in the same way that probability samples are reweighted to achieve a nationally representative sample. While PS weighting uses the entire sample, weights may be unstable when PS distributions are only partially overlapping. Stratifying, which also use the entire data set may also yield effect estimates outside of the PS overlap, but have the advantage of quantifying heterogeneity of effects over different ranges of the PS.
Once propensity scores are estimated, the psuedo population is formed based on matching, weighting, or stratifying. The specified method must consider the causal question and distribution of propensity scores across treatment groups.
B4. Outcomes Model
Once the PS is calculated, and used to assess assumptions and form the pseudo population, the outcomes model is then formulated to estimate the treatment effect. The outcomes model may correspond to any type of statistical approach for the given outcome distribution and incorporating the pseudo population formed in the previous step. For instance, a simple t-test, or regression of the paired difference, or conditional logistic regression for a binary outcome would be appropriate for 1:1 PS matching. Other examples include using a regression model with standard routines for survey weights for PS weighting1-2 or use of the Mantel-Haenzel estimator3 for the pooled estimate with stratification by the PS.
More recently, propensity score models have been applied to survival data. Austin4, for instance, describes methods that depend on either matching or weighting for estimating the marginal survival curves and hazard ratios. Another variation on the outcomes model is use of doubly robust estimators5 that (for unbiasedness) depends on either a correctly specified outcomes model or PS model, but not both. If both do hold, however, they are the most efficient. As with all aspects of PS-based methods, approaches used for the outcome model can still vary substantially within a given scenario, and substantial literature exists on the range of possible approaches; see the systematic review for literature on the statistical properties.
The causal question, the estimated propensity score, the approach for forming the psuedo population, and the outcome distribution, all inform the selection of the outcomes model, which is then used to estimate the treatment effect.
B5. Sensitivity Analysis
A key limitation of PS methods is the inability to account for unobserved confounders. Although using big data and machine learning may better approximate the true PS, some unmeasured confounding is expected. Sensitivity measures1,2 estimate the critical magnitude of effect where results of the sensitivity analysis change from significant to nonsignificant, or vice-versa. Morgan3 describes 5 steps to carry such a sensitivity analysis:
Module 8 of Category 8 (Causal Inference) of the PCORI Academic Curriculum specifically discusses sensitivity analyses, beginning with a general discussion, noting that there are many assumptions made, all of which could be wrong and subject to sensitivity analyses. The majority of the video, however, specifically addresses propensity score analysis, including Methodology Standard CI-5, sensitivity to unmeasured confounding, and assumptions of IV methods. For propensity scores, a number of different sensitivity methods have been developed specific to the corresponding propensity score-based model. Stuart mentions several of these for R, Stata and SAS on her website. In applying these methods, the sensitivity measures must be specific to the utilized PS-based method.
A key, but often unrealistic assumption of propensity score methods is measurement of all confounders. One approach to dealing with this limitation is assessing the sensitivity of results to different levels of unmeasured confounding.
Background:
Resources:
C1. IV Methods & Assumptions
The basic concept behind instrumental variable (IV) methods1 is to identify a variable that is strongly predictive of treatment status but otherwise independent (conditional on treatment) of the outcome measures, with randomization being the ultimate IV. More specifically, for an IV Z, treatment A, and outcome Y, there are 3 IV conditions2:
Modules 7a and 7b of Category 8 (Causal Inference) of the PCORI Academic Curriculum describe the concept of IV methods and present other examples, including IVs with randomization (e.g. randomized encouragement) or strictly observational data (e.g. insurance coverage). They also discuss the IV assumptions and Standard CI-6. Lesson 15 of the UC-Davis online CER modules and Module 5 of the Ohio State University online course also describe IV methods. The OSU course in particular describes useful texts and the estimation of local average treatment effects. Module 8 of of Category 8 (Causal Inference) of the PCORI Academic Curriculum provide further description of sensitivity analyses for instrumental variables.
When potential exists for significant unmeasured confounding, IV methods represent a common approach. The IV, which must be strongly predictive of treatment but conditionally independent of the outcome, acts in the role of randomization.
Background:
Resources:
Potential choices for an IV include treatment assignment in the case of non-compliance, hospital formulary, geographic location, and preference-based variables3-9. Various methods exist for combining instruments and conducting sensitivity analyses.
C2. IV Estimation of Effects
While other approaches exist for IV estimation of treatment effects, such as two-stage predictor substitution, the two-stage residual inclusion method1 (2SRI) and two-stage least squares method are the most common method; The basic process for 2SRI is to
The Ohio State University online CER video from Module 5 video provides a clear introduction to the basic method. Software packages, including R, Stata, and SAS, also provide instructional files and youtube videos (also for R, Stata, and SAS) on running IV regression.
A number of textbooks on causal inference also describe IV methods in detail, including Chapter 9 of Morgan and Winship2, Chapters 23-25 of Imbens and Rubin3, Chapter 16 of Hernan and Robins4, and Bowden and Turkington5.
Once the IV is specified, and assumptions are checked, the IV is then used to estimate the treatment effects (as done with the outcomes model for PS methods) using methods such as 2-stage residual inclusion or 2-stage least squares methods.
Background:
Resources:
The systematic review resulted in 168 papers; the main elements of each study are labeled in the data extraction spreadsheet. A further description of each study is also provided in the article summary. Initially, the systematic review was intended to create a set of decision points for when to use which method. However, most of the articles are either deriving a new method or assessing properties for very specific cases that are difficult to generalize. We therefore suggest that users of this tool with PhD-level training in statistics use these articles as a check on what the literature says about the specific methods in their specific scenario to decide if other methods are preferable for their scenario and/or if specific limitations are apparent for the intended approach.
This review sought to summarize literature on statistical properties of different approaches to observational CER; the methods excluded standard approaches, such as matching and linear or generalized linear modeling, and included, but was not limited to, PS-based and IV methods, g-methods, and associated variations. The review did not include tutorials, and was limited to studies of two treatment groups. The review focused on studies that could inform DECODE CER using theoretical or simulation results to assess properties of one or more methods. See the protocol for details. Although other reviews have been conducted for such methods, none specifically addressed statistical properties to this extent. Glymour, however, does provide a review of key properties for these and other methods.
Substantial literature exists on the statistical properties associated with causal inference methods. We conducted a systematic review of publications on the associated statistical properties including simulations and/or theoretical results.
Developing the analytical plan essentially follows the overall framework of DECODE CER, although the specific methods do not necessarily need to include and/or be limited to PS-based and/or IV methods. Even when using other methods, however, the general outline of topics, assumptions, and formulating questions, represents a fairly representative set of analytical considerations. Specifically, the analytical plan should always start with the research question and assessment of data adequacy. Creating a causal graph then facilitates appropriate model formulation and assessing assumptions is also critical (e.g. Slides 8, 10, and 15). This tool is limited to use of PS-based and IV methods as examples of two popular approaches relevant to CER (for observed and unobserved confounders, respectively), but many other methods may apply.
Resources: The online course developed by our group (for writing a concept proposal for patient-centered CER) also provides another resource. The course features 12 modules beginning with the research question and identifying gaps, through designing an observational study or pragmatic trial, addressing other analytical issues.
These components of the course again leverage the PCORI Academic Curriculum with guidelines for describing the analysis plan, including an overview of design considerations, selection of primary or secondary data, specific study designs, and thinking about causality, as well as application of propensity score and IV methods in their modules for Category 8 on Causal Inference Methods.
The analytical plan for causal inference involves all of the standard considerations for statistical analysis plans, plus the further complexities specific to the causal question and the specific method, as described throughout DECODE CER.
Analytical Plan