Designs and analytic techniques for studies that contribute to causal determinations are in a continuous state of development and evolution. There have been significant shifts in the way study goals are framed (e.g., as in the causal inference framework described in Appendix C), data are collected or assembled (e.g., increasing volume and dimensionality in space and time, data harmonization, and integration), data are analyzed (e.g., use of machine learning techniques) and scientific findings synthesized within and across scientific disciplines. Statistical techniques for causal inference are increasingly applied in research related to public health or welfare, accompanied with methodological innovations to address nuances that are of particular importance to air pollution–related research. It is important that the U.S. Environmental Protection Agency (EPA) consider the implications of these newer developments for causal determinations made during the Integrated Science Assessment (ISA) process, especially in terms of how the methods are used, their strengths and limitations, and what expertise and balance of expertise is needed during the ISA process to adequately incorporate studies employing these rapidly evolving techniques.
This chapter outlines features of emerging research methods that are salient to the process of causal determinations, especially with respect to the design of individual studies that may inform an ISA process, but the chapter is not meant to be an exhaustive review of different methods. Appendix C introduces causal inference principles that can increase rigor in randomized and nonrandomized individual studies that investigate causality. More specifically, this chapter discusses emerging approaches for exposure assessment, methods for confounder selection and control, recent approaches for estimation of causal effects, how to deal with post-treatment variables and unmeasured confounders, and how to handle multiple exposures. While advances in these areas may seem more relevant to the design and analysis of individual studies, the understanding the advances could inform evaluation, synthesis, and integration of evidence, and subsequent causal determinations made within the ISAs. To this end, each of the following sections provides an assessment of how the ISA causal determination framework guides consideration of studies using the methods discussed, and what EPA could do to incorporate attention to those methods into the causal determination framework. Whether or how guidance regarding those methods is incorporated into the framework will often be dependent on the relevance, readiness, and level of maturity of the methods.
Advances in techniques to measure, store, combine, harmonize, process, and analyze exposure data with high temporal and spatial resolutions are revolutionizing exposure assessment and resultant air pollution-related studies for both health and welfare effects. The exposure assessment methods and data used are important aspects of overall study quality and merit in relation to causal assessments, including considerations of “biological gradient” (i.e., exposure-response relationship) and strength of the observed association. Given the rapidly evolving field in exposure assessment it is crucial that the ISA causal determination framework laid out in the Preamble be nimble enough to incorporate and appropriately evaluate studies that use new exposure estimation methods.
It is typical and good practice in scientific investigations to assess emerging exposure assessment methods in terms of improved accuracy, and hence validity, by addressing major challenges such as (1) typical variability or intermittency of exposure, (2) diversity and complexity of pollutants to which people and ecological (or environmental) systems are exposed, (3) the role of complex and varying behavioral components of exposure, and (4) differential responses to exposure (leading to vulnerable subpopulations) (Brauer, 2010; Nieuwenhuijsen, 2003). Regardless of study design, improved precision and accuracy in the exposure variables used in epidemiologic models (whether for the exposure of interest or potential co-pollutants treated as confounders) increase the validity of findings drawn from health- and welfare-related studies and therefore the influence those studies should have on the body of evidence used for causal determination.
Exposure estimation errors could invalidate conclusions by misrepresenting the magnitude or significance of a given pollutant, or attributing effects to the wrong pollutant (Zidek et al., 1996)—a topic of active methodologic research and implementation supported by additional data availability from satellites and low-cost sensors (Yi et al., 2021). For example, advances in exposure assessment include short- and long-term exposures that focus on temporal variabilities (e.g., for time-series studies) or spatio-temporal settings (e.g., for cohort studies). Methods such as machine learning that allow estimation of pollutant concentrations at unmeasured locations based on limited fixed-site measurements can lead to greater accuracy and precision (Breiman et al., 2017; Brokamp et al., 2017). There are new approaches to address statistical challenges due to multicollinearity and complex interaction problems relative to linear regression (Hastie et al., 2005). These approaches build on more established approaches (e.g., spatial interpolation/averaging, nearest neighbor, inverse distance weighting [IDW], kriging, land-use regression [LUR] modeling [Henderson et al., 2007], and dispersion modeling [Xie et al., 2017]). Estimation efforts at multiple scales are enabled through combined use of several methods such as those listed above (Xie et al., 2017). Hybrid models for exposure (Sorek-Hamer et al., 2016) are emerging approaches that use both ground level and remote sensing data to allow for spatiotemporally continuous estimation of air pollutant levels generally incorporate LUR (Henderson et al., 2007). LUR can account for local effects of spatial predictors, temporal variables such as pollution data from satellite-based remote sensing, meteorology, green space, and pollutant level estimates from chemical transport models (CTMs) (Yarza et al., 2020). Low-cost sensor technology may be used to fill in data gaps (Brokamp et al., 2019; Morawska et al., 2018). Data-driven spatial prediction approaches that are designed to capture dependencies among heterogeneous data (e.g., data on pollutants and meteorology) via machine learning to model spatial correlations among different locations (Candes et al., 2006; Donoho, 2006; Zheng et al., 2015) were utilized in studies at differing scales, ranging from local and geographically limited to national and global levels (Bind et al., 2016). The resulting exposure estimates used in health and ecological effects models range from traditional regression models to emerging causal inference approaches at scale (Wu et al., 2020).
Developing guidance for assessing quality of individual studies that use various emerging technologies for measuring exposure, as described above, and modeling techniques combining data to
assign exposure levels from multiple sources may improve causal determinations. As discussed in Chapter 4, both the 2020 Ozone ISA (EPA, 2020a) and the 2019 particulate matter (PM) ISA (EPA, 2019c) include discussions of such emerging hybrid approaches. The 2019 PM ISA concluded that developments in hybrid spatiotemporal modeling have improved spatial resolution of PM2.5 exposure estimates and consequently reduced bias and uncertainty in health effect estimates (EPA, 2019c, pp. 3–121). However, in subsequent steps of the NAAQS review process, findings from studies that assigned exposures based on fixed site monitoring data were selected over findings from recent epidemiology studies that used hybrid exposure models (e.g., Di et al., 2017a,b; Shi et al., 2016). Incorporating potentially valid and informative evidence into causal determinations would be aided with clearly articulated guidelines in the framework for evaluating emerging exposure assessment methods.
Most of the exposure assessment methods discussed in this section could be included among the studies used in the ISA causality assessment (assuming study quality criteria are met). For example, the hybrid models discussed above that use both ground level and remote sensing data (i.e., Sorek-Hamer et al., 2016) have been discussed in two of the most recent ISAs (i.e., the 2020 Ozone ISA [EPA, 2020a] and the 2019 PM ISA [EPA, 2019c]), but with mixed reception as discussed in Chapter 4. The recommendations outlined in Chapter 10 might help clarify how best to incorporate emerging methods such as these into the ISA causal determination framework. Emerging causal inference approaches that incorporate exposure estimates from hybrid approaches into health effects analysis (Wu et al., 2020) are nearing the level of maturity that would warrant guidance in the causal determination framework in the near future, while quantification of the associated estimation uncertainties and their subsequent account in health effects analysis is an active area of research that may need more time before incorporating guidance in the framework.
Appendix C introduces causal inference principles. As described in that appendix, there has been a sharp growth in the development of statistical methods to address causal questions based on data from randomized and nonrandomized (observational) studies in recent decades. Causal inference methods provide formal tools that can increase rigor in design, analyses, and interpretation of studies that aim to establish causality and clarify the conditions under which a causal determination can be made. While to date these methods have seen more application to human health effects studies, many are also applicable to studies of pollutant exposure or deposition effects on ecological endpoints or other welfare effects such as visibility and climate. As described in Appendix C, well-designed studies for assessing causality include articulating the scientific question in terms of potential and counterfactual outcomes, specifying available data and a causal model, articulating assumptions on the causal model that allow the causal parameter of interest to be identified as an observable statistical quantity, data analysis that estimates the identified statistical quantity, and interpretation of statistical results as causal relations according to validity of the assumptions in the model with statistical uncertainties quantified. Assessing a study for good study design will provide important information regarding how that study should influence the causal determinations using a weight of evidence approach.
A focus on careful design of nonexperimental studies is of growing importance in the field of causal inference, with attention to designing studies that do not rely on the outcome data. A study design framework—recently referred to as trial emulation, and building on ideas that go back to work in the 1970s and earlier—aims to design nonexperimental studies in a manner parallel to randomized experiments design. Exposures, sample selection, and outcome assessment are clearly specified, and there are similarities across treatment groups in the observed covariates, as seen in an
experiment (although, of course, an experiment will create similarity on observed and unobserved factors) (Bind and Rubin, 2019; Hernán and Robins, 2016; Rubin, 2008). Bind and Rubin refer to this idea of careful design of non-experimental studies, including the idea of not using the outcome data itself in the design stage, as the “causal pipeline.” A recent paper bridges some of those design ideas to Bayesian methods within causal inference (Li et al., 2022). Use of such careful design approaches in individual studies might be encouraged if the benefits of the separation of design and (outcome) analysis was described in the ISA causal determination framework. Not using outcome data in the design of a study can help prevent researchers from choosing a particular design strategy because it leads to a desired estimated effect.
An additional aspect of study design particularly relevant to air pollution studies is the potential “interference” across individuals or communities (e.g., where air pollution sources in one geographic area spill over and have impacts in other geographic areas). Many standard study designs assume what is known as the Stable Unit Treatment Value Assumption (SUTVA; Imbens and Rubin, 2015), which assumes (1) that the potential outcomes for any unit do not vary with the treatments assigned to other units and (2) that for each unit, there are not different forms or versions of each treatment level which lead to different potential outcomes. There are situations where this assumption is violated (Papadogeorgou et al., 2019).
Confounding is another common challenge in non-experimental studies. Discussion of methods for assessing whether an appropriate set of confounders has been included in the analysis is provided below, and a later section of the chapter considers methods for handling measured and unmeasured confounders.
Directed acyclic graphs (DAGs) are a valuable approach for clarifying assumptions in causal models. As discussed in Appendix C, DAGs are used to postulate, visualize, and convey assumptions (implicit or explicit) in causal models. A DAG is a graphical model in which each relevant variable is represented as a node, and nodes are connected by directed edges representing causal relations (see Appendix C, Figure C.1). DAGs are increasingly used to guide design and interpretation of observational studies. They have also been proposed as a tool to improve weight of evidence causal assessments (Brewer et al., 2017), and could be applicable in the Preamble’s causal determination framework. DAGs can aid evaluation of most aspects of causal models described throughout this report. For example, DAGs may:
An example of the utility of DAGs to postulate and validate assumptions made in causal models is their use when deciding on a set of variables that are sufficient for confounder adjustment. Specifically, graphical tools such as d-separation and the backdoor criteria can be used to decide which variables should and should not be included as confounders (Cinelli et al., 2022). Attention to such tools in the ISA causality framework can help assess whether individual studies controlled for an appropriate set of confounders.
The literature on DAGs is vast and a complete discussion of the tools available is out of the scope of this report; extensive discussions are available in recent causal inference textbooks (e.g., Pearl, 2009; Spirtes et al., 1993). DAGs may be used in an ISA to organize and clarify the underlying structure of the data used in an individual study, providing a visual way to represent key concepts such as confounding, causation, and experimentation (Shrier and Platt, 2008). Specifically, reviewers might build DAGs and apply expert judgment to state and assess the validity of the assumptions required to interpret the results of a study causally, even if DAGs were not used in the original study design and analysis.
Nonexperimental study designs are widely used to study air pollution topics and thus often inform ISAs. Assessing the quality of those studies requires knowledge of the approaches—including emerging approaches—used to control for confounding. Many studies aiming to estimate causal effects—especially those based on observational comparison group designs (e.g., propensity score and other matching methods and standard regression adjustment)—are predicated on knowledge of the baseline characteristics (covariates) necessary to satisfy the assumption of no unmeasured confounding (i.e., that there are no unobserved differences between exposed and unexposed groups on variables that also relate to outcomes after adjusting for observed confounders) (Rosenbaum and Rubin, 1983a; Rubin, 1974; VanderWeele and Shpitser, 2013). However, the exact set of covariates needed to control confounding is rarely known. Assessment of the appropriateness of the underlying causal model (an important step in study quality assessment; see Chapter 3 and Appendix C) thus can depend on whether an appropriate set of confounders is included in the analysis (see Brookhart et al., 2006; Zhu et al., 2015; Zigler and Dominici, 2014b, for examples when using propensity score methods), and there has been an emphasis on approaches that involve pre-specification of confounders (Rubin, 2008). In addition, assessment of confounders, and methods to deal with high-dimensional confounders, is a rapidly evolving methodological area (Antonelli et al., 2019; Athey et al., 2017, 2018; Laubach et al., 2021; Schneeweiss et al., 2009; VanderWeele, 2019; Zigler and Dominici, 2014b). It is important for a causal determination framework to acknowledge that different methods incorporate different underlying assumptions, and to provide strategies for those trying to understand the validity of the method applied in any particular study.
When assessing the underlying assumption, it is important to recognize that adjustment for more covariates is not always better. Although increasing the number of adjustment covariates often improves the chances that the assumption of no-unmeasured-confounders is true, it can be problematic as many covariates can complicate estimation of causal effects. Questions, not necessarily statistical, might arise as to which confounders should be included into statistical analysis (Austin et al., 2020; Schneeweiss et al., 2009).
DAGs constructed with all the covariates (observed or unobserved) may also aid in assessing whether the causal effects are estimable from data when the underlying causal structure is correctly specified or its structure can be agreed upon by experts. Specifically, graphical tools such as the backdoor criteria and d-separation (Greenland et al., 1999) can be used to identify the sets of covariates (known as “adjustment sets”) that allow unbiased estimation of a causal effect (Pearl, 2009; Perkovic et al., 2015; van der Zander et al., 2014). However, knowledge of the true graph is often unavailable and alternative strategies are necessary. There is an extensive literature on statistical methods for causal inference that acknowledges uncertainty in confounder selection (see, e.g., Brookhart et al., 2006; De Luna et al., 2011; Ferraro et al., 2019; Joffe et al., 2004; Schneeweiss et al., 2009; Shahar, 2013; VanderWeele and Shpitser, 2011; Vansteelandt et al., 2012; Wang et al., 2012, 2015; Wilson and Reich, 2014; Zigler and Dominici, 2014b). Future iterations of the causal determination framework might benefit from explicit consideration of these strategies when
assessing the relevance and quality of individual studies, and in particular whether they adjusted for an appropriate set of variables.
Several study designs and analysis approaches used often in air pollution epidemiology aim to establish conditions under which adjustment for a set of observed confounders can reliably estimate potentially causal effects. Such approaches are rapidly evolving. The Preamble does not provide specific guidance for handling measured confounders in evaluating individual study quality nor document the exact statistical methodology used to control for potential confounders, and it may not be feasible or appropriate for the Preamble to be fully prescriptive in that regard: There are many possible data analysis methods that may yield correct or incorrect conclusions for a given study design. However, a future framework might include a mechanism to document these foundational designs and assumptions, offer guidance on how to assess the quality of studies with respect to addressing confounders, and provide structures for evaluating emerging methodologies that may be informative in the process.
The next sections discuss some of the most used methods for handling measured confounders. The definitions and principles outlined may inform the assessment of an individual study for inclusion in an ISA as well as the potential influence of that study in the overall synthesis and integration of evidence for determining causality.
A class of study design methods emerging in scientific literature uses strategies to make the observed characteristics of the exposed and unexposed groups as similar as possible, thus reducing or removing confounding bias due to those characteristics. This covariate similarity is known as balance. The specific methods described here are known as “matching” methods and include weighting and subclassification (Stuart, 2010). Such strategies have been applied in studies on the effects of air pollution (Baccini et al., 2017) and the effects of parents’ smoking on children’s lung functioning (Bind and Rubin, 2019). The matching is sometimes a simple “1:1” matching, where for each exposed individual, one unexposed individual with similar background characteristics is selected as a match. Other “matching” methods are more complex. For example, studies that match multiple exposed individuals to multiple unexposed individuals include full matching and variable ratio matching approaches (Stuart, 2010). Some approaches rely on tools such as the propensity score (Rosenbaum and Rubin, 1983) to summarize the covariates into a single summary in order to facilitate matching; other approaches aim to obtain balance by matching directly on the covariates themselves (e.g., Iacus et al., 2012); these can be complex to implement in practice, especially with high-dimensional covariates, however new fine balance approaches provide computational tools for creating well-matched groups on large sets of covariates (e.g., Visconti and Zubizarreta, 2018; Zubizarreta, 2012).
Other analytic approaches incorporating aspects of design and analysis include inverse probability of treatment weighting (IPTW) (Lunceford and Davidian, 2004) and related weighting approaches (such as overlap weights) (Li et al., 2019), which use a function of the propensity score as a weight to equate the groups on the observed covariates. IPTW is a particularly common approach, assigning weights to the exposed and unexposed groups each representing the combined population, thus equating the groups. However, if not implemented carefully, IPTW does not necessarily have the separation of “design” from “analysis” of other sample equating approaches—the outcome variable is used more directly in the design itself (Austin and Stuart, 2015). Propensity
scores sometimes also are included as covariates into a regression model, such as with propensity score-based splines (Zhou et al., 2019); however, this approach does not have the elements of covariate balancing and sample equating of the other propensity score approaches, making it harder to assess for diagnostics.
An important diagnostic for any of these confounder adjustment approaches, and of particular relevance for assessing studies for a weight of evidence approach, is the covariate balance—how well the matching or weighting approach worked in terms of creating exposed and unexposed groups that are similar on the observed covariates. As part of a study quality determination, any study using such a weighting or matching approach needs to be assessed in terms of its success at creating that balance; this can be done through both numeric and graphical approaches (Austin and Stuart, 2015; Greifer, 20221; Stuart, 2010). Methods that quantify covariate balances are fully developed, and the causal determination framework could be augmented by providing guidance regarding assessing individual studies for use of the diagnostic methods.
As discussed in Chapter 3 and Appendix C, the result of an identification exercise in a causal inference study is to express the causal effect of interest in terms of a formula that depends only on the distribution of the observed data. In contrast to matching and propensity score methods, which use estimates of the exposure mechanism (e.g., propensity scores), g-computation methods rely on estimating the outcome process to correct for observed confounding. G-computation and other substitution estimators generally proceed by directly estimating the identifying formula. For example, this can be done by (a) fitting regression functions of the outcome on confounders separately for the exposed and unexposed groups, and then (b) using these regressions to “predict” the counterfactual underexposure or no exposure for each subject in the study, and then (c) averaging individual-level effects to obtain an overall estimate of the effect of exposure in the population. The simplest form of g-computation is regression standardization, a technique utilized in the fields of economics, epidemiology, statistics, and social sciences (Vansteelandt and Keiding, 2011) for its utility to adjust for covariates in studies that require it. Development of g-computation methods has expanded to address problems such as confounding in longitudinal studies (Bang and Robins, 2005) and methods for mediation analysis (Valeri and VanderWeele, 2013). Outcome regression estimators are also available for quasi-experimental settings such as instrumental variables (Angrist et al., 1996) and difference-in-differences (Wing et al., 2018), among others. G-computation estimators might be Bayesian or frequentist, depending on whether the models fitted to the data are Bayesian or frequentist (see Keil et al., 2021) for an example of Bayesian g-computation in air pollution research).
A common approach to account for measured confounders in epidemiology and clinical research is to fit regression functions on the outcome as a function of exposure and covariates, often omitting interactions. Examples of this approach include logistic regression, linear regression, and Cox proportional hazards regression. Typically, studies that use this approach proceed to interpret the coefficient on the exposure variable (or a transformation thereof) as a causal effect. This is another example of g-computation, albeit one in which the effects estimated are conditional effects (within strata of confounders) rather than marginal (averaged for the whole population) (Stanghellini and Doretti, 2019). Studies using this mode of analysis—where no interactions are present in the regression models—implicitly assume that the causal effect is constant across strata of the covariates (i.e., these analyses assume no treatment effect heterogeneity). Furthermore, in some instances, such as the Cox regression model with time-to-event outcomes, the coefficients cannot
___________________
1 See https://cran.r-project.org/web/packages/cobalt/vignettes/cobalt.html (accessed July 18, 2022).
be interpreted as causal effects (Hernán, 2010). Assessing the quality of studies that use Cox regression models also needs to address the lack of interpretability of the model-derived hazard ratios as causal effects. As part of study quality determination, the framework might include guidelines for assessing whether the assumption of treatment effect heterogeneity is appropriately supported based on subject-matter knowledge or auxiliary data analyses.
In general, the correctness of the conclusions stemming from g-computation estimation strategies rests on the degree to which the outcome regression models posited appropriately reflect the true mathematical relations between the variables at hand (outcomes, treatment, and confounders). It is therefore critical that regression models used in a particular study are selected based on subject-matter knowledge, if available, or on data-adaptive regression procedures for model selection. In evaluating study quality, any study that uses outcome regression methods need to be assessed in terms of whether appropriate model selection has been performed. Studies that use machine learning with flexible regression methods are more likely to yield correct regression functions, especially when there is a large number of confounders considered (Van der Laan et al., 2007).
The ISA causal determination framework could be improved with guidance on up-rating studies that incorporate evaluation of model diagnostics for assessing whether scientifically reasonable models were used. However the methodologies on such model diagnostics may need to mature before specific guidance can be developed. This is an area where EPA could stay abreast of advances in the literature and encourage or sponsor research.
The correctness of the matching/weighting and g-computation/substitution estimators relies on correct specification of a mathematical model for the relation between the exposure and confounders, or for the outcome and confounders, respectively. Doubly robust methods are a family of estimation techniques that combine ideas from inverse probability weighting and substitution estimators to obtain estimators whose consistency relies on the correct specification of at least one of those models, thus providing an additional measure against the potential for model misspecification. Importantly, doubly robust estimators also allow researchers to use novel developments in machine learning to fit the regressions that are required for estimation (e.g., outcome regressions, propensity scores). Machine learning methods enable flexible regressions that can process interactions and nonlinearities automatically. This may reduce the threat of bias due to misspecification of mathematical models relating the outcome and treatment with the confounders (Díaz, 2019). Increased use of these methods can be found in scientific literature, specifically doubly robust estimating equations (Scharfstein et al., 1999), targeted minimum loss-based estimation (van der Laan and Rose, 2011) and double machine learning (Chernozhukov et al., 2018).
While studies that use doubly robust estimators provide opportunities to address model misspecification bias through machine learning, correctly quantifying uncertainty (e.g., producing confidence intervals) in such studies can be complicated. One approach that typically yields correct uncertainty quantification is “cross-fitting” (i.e., training the regression functions out-of-sample) (Rose and Rizopoulos, 2020). Furthermore, the stability of doubly robust estimators hinges on a technical assumption known as the positivity or overlap assumption with “enough experimentation” in the data such that all study participants have a non-zero probability of exposure at each level under consideration. Assessment of study quality for studies that use doubly robust estimation needs to consider whether the study appropriately considered violations of the positivity assumption. Guidance regarding core aspects of these methods, such as whether model diagnostics were conducted, whether positivity was assessed, and whether a scientifically justifiable set of confounders were adjusted for could be incorporated into the ISA causality determination framework currently. As the research literature evolves, a future causal determination framework may guide study quality-
assessments in the weight of evidence to consider whether studies that use doubly robust estimators with machine learning contain provisions for correct uncertainty quantification.
Longitudinal studies measure the exposure or outcomes of interest (health or welfare) at several points in time. Common longitudinal designs include cohort studies, time-series analyses, and difference-in-differences approaches, with many recent advances and continuing methods development. These designs are introduced in Appendix C and discussed further below. Assessing the validity of the underlying causal assumptions can be particularly challenging in longitudinal studies given that there are two types of confounders, each having different considerations: time-varying and time-invariant confounders. Emerging methods for the correct design and analysis of each of these cases are discussed below, with particular attention to the implications for assessing study quality within an ISA process and therefore improving the accuracy of causal determinations. The causal determination framework might be revised to provide specific guidance for assessing the quality and relevance of many of these types of studies.
In cohort studies, a sample of participants from a population of interest is followed over time, and measurements are obtained at points in time during the observation period. These measurements often include health outcomes, pollutant exposures, baseline variables (i.e., time-invariant variables such as biological sex and place of birth), as well as time-varying variables such as social determinants of health (SDH). An SDH (e.g., economic stability) can, for example, influence the location of a family residence; disadvantaged families might live in areas that are more likely to be affected by pollution. Some SDH may therefore be important confounders of the relation between pollutants and health outcomes (Schulz and Northridge, 2004). The analysis of cohort studies would typically proceed by postulating a time-varying statistical model for the outcome of interest (e.g., wheezing) at time t as a function of all the data measured up until time t (baseline variables, exposures prior to t and SDH prior to t), and then interpreting the adjusted association between exposure and wheezing (e.g., as measured by a hazard ratio in a Cox model) as the causal effect. Although this practice is pervasive in epidemiologic research, recent advances in causal inference methodology have uncovered important limitations that prohibit the interpretation of such adjusted regressions as causal effects in the presence of time-varying confounders (Joffe et al., 2004; VanderWeele et al., 2016; Vansteelandt and Joffe, 2014). Time-varying confounders can also be problematic when the exposure is not time-varying, but the study is subject to informative loss-to-follow-up (i.e., when individuals drop out of the study in ways that relate to the outcome[s] of interest). Solutions to this problem include the use of marginal structural models (Robins et al., 2000), regression adjustment using the longitudinal g-computation formula (Bang and Robins, 2005; Hernán and Robins, 2006), longitudinal inverse probability weighting (Ertefaie and Stephens, 2010), and doubly robust sequential regression estimators (Díaz et al., 2021; Stitelman et al., 2012). Studies that do not correctly handle time-varying confounders are at high risk of yielding incorrect causal conclusions. As a result, causal determination frameworks might include guidance related to the careful handling of time-varying confounders in longitudinal studies, including strategies for weighing the evidence appropriately with respect to whether time-varying confounding was handled properly.
Time-series studies are similar to cohort studies, with the difference being that often only one study unit is measured longitudinally. For example, researchers might be interested in examining the relationship between changes in pollution levels through time and changes in the low birth-weight rate in a given city, where temperature is a time-varying confounder. Changes in pollution levels over time affect environmental attributes, including nutrient levels, biodiversity, food webs, and ecosystem processes such as primary and heterotrophic productivity, trace gas fluxes, and net
ecosystem production (Seabloom et al., 2021; Simkin et al., 2016). The considerations of the previous paragraphs regarding time-varying confounding also apply to time-series studies.
Difference-in-differences approaches compare changes over time between groups that do or do not experience some exposure of interest. Such approaches are used increasingly to study air pollution, and the literature regarding best practices for their use is growing rapidly. Difference-in-differences designs are most useful in individual studies when the exposure of interest is an event that occurs at a fixed point in time and otherwise is not time-varying. As an example, consider an accountability study aimed at assessing the effect of the introduction of E-ZPass in the U.S. Northeast on infant health. It is inappropriate to simply compare infant health across regions of the country because of potential confounding by observed and unobserved variables. In this case difference-in-differences is an alternative identification strategy, which assumes that the trends in infant health within the exposed (Northeast) and unexposed (other geographic regions) groups are parallel over time. Then, the counterfactual outcome that would have been observed had the intervention not taken place in the exposed group can be “imputed” by extrapolating what occurred in the unexposed group. Difference-in-differences approaches use specific causal models, causal assumptions (such as the assumption of parallel counterfactual trends), and often specific statistical models, which all must be checked before the conclusions can interpreted causally (Callaway and Sant’Anna, 2021; Zeldow and Hatfield, 2021). In particular, assessment of quality of studies using difference-in-differences should consider whether the comparison locations form a useful proxy for what would have happened in the “treated” locations in the absence of the treatment or policy change of interest, and should assess the validity of the underlying statistical models used to model trends over time. Haber et al. (2021) provide examples of the sorts of analysis diagnostics and checks that can be useful for these designs. Causal determinations in future ISAs could benefit from a framework that includes specific guidance for evaluating studies that employ difference-in-differences approaches. There are best practices that could be incorporated into the guidance in the causality determination framework, such as identifying whether model diagnostics were conducted, whether comparison group data were available, and whether attempts were made to compare treatment and comparison locations with similar trends in the pre-period. However, given the pace with which these methods are evolving in the literature, it will be particularly important to include individuals with expertise and a balance of perspectives in these methods in the ISA process.
Beyond consideration and appropriate treatment of expected confounders, methods are being developed to assess the robustness of individual study results to the potential for unobserved confounders that could cause spurious associations or to enable interpretation of an effect as causal without relying on the assumption that there are no unobserved confounders. New strategies are also being developed to handle post-treatment or postexposure variables that may themselves be affected by the exposure of interest.
An aspect of study design and analysis that the causal determination framework could address is the assessment of the robustness of study results to an unobserved confounder. It is also noted that a particular variable might be observable—or even observed in a particular dataset—but if it is not used in the analysis then it is essentially unobserved for purposes of the causal methods used. Approaches exist for assessing how results would change given the existence of an unobserved confounder. These include, for example, bounding approaches (Richardson et al., 2014), quantita-
tive bias analysis approaches (Lash et al., 2009, 2014; Weuve et al., 2018), and sensitivity analyses, which posit a potential unobserved confounder and obtain quantitative estimates of how different the study results would be given different relationships between the unobserved confounder and the exposure and outcome(s) of interest. The relevance of such approaches, especially quantitative bias analysis in the FDA regulatory context, is described in several studies (Everitt and Howell, 2005; Lash et al., 2016; Liu et al., 2013; Rosenbaum, 2005; Rosenbaum and Rubin, 1983b; VanderWeele and Ding, 2017; Zubizarreta et al., 2013). The causal determination framework should include discussion of how to incorporate the results of such robustness checks—and whether they were done at all—when weighing and integrating individual study results.
When the threat of unobserved confounding is particularly high—for example, if the dataset does not include many potential confounders—alternative study designs that do not rely on the assumption of no unmeasured confounding of the exposure/outcome relationship may be more appropriate. The ISA causal determination framework might thus also acknowledge and discuss these designs and provide guidance for how to assess their quality and relevance when weighing and integrating evidence. An instrumental variable approach (Greenland, 2000; Hernán and Robins, 2006) is commonly applied in such cases. This approach relies on finding a variable (the “instrument”) that relates to the exposure of interest, can be thought of as being randomly assigned (i.e., the instrument/outcome relationship is not subject to unobserved confounding), and does not directly influence outcomes except through the exposure of interest. These designs, however, are rarely used in air pollution epidemiology with a few exceptions (e.g., Deryugina et al., 2019; Schwartz et al., 2015, 2017), and do have their own underlying assumptions, most notably that the instrument/outcome relationship is unconfounded, and that there is no direct link between the instrument and the outcome (any effect of the instrument on the outcome “flows through” the exposure of interest). As with all studies, the plausibility of the underlying assumptions needs to be assessed within the context of the research question and data available; the ISA causal determination framework could provide guidance on how this can be done when weighing and integrating results from studies using these approaches. For example, guidance could be provided about assessing whether an individual study has properly discussed the assumptions made, whether the assumptions were likely to be violated, and whether sensitivity analyses were conducted.
One particularly challenging type of variable to address in observational studies is what is known as a “post-treatment” variable. Post-treatment variables may, themselves, be affected by the exposure of interest. Such variables are also referred to as “intermediate variables,” “mediators,” or “colliders.” The instrumental variables approach mentioned in the previous section is a specific example. Examples in air pollution epidemiology include studying mortality related to air pollution levels, but only among individuals hospitalized for a heart attack; adjusting for birth weight when studying the link between prenatal air pollution exposure and cognitive development during childhood; adjusting for the level of reduction in indoor PM concentrations in the context of a randomized air cleaner intervention trial; or adjusting for nutrient levels in a welfare context when those nutrient levels may have already been affected by previous levels of air pollution. Theoretical advances have shown that inappropriately adjusting for post-treatment variables can lead to post-treatment confounding (also called collider bias or selection bias) (Cole et al., 2010). Inappropriate adjustments that can lead to bias include selecting a sample on the basis of a collider (Grace and Irvine, 2020; Griffith et al., 2020; Weuve et al., 2018) or simple adjustment for the collider in a regression model (Groenwold et al., 2021). As articulated by the methodology of principal stratification (Frangakis and Rubin, 2002), the particular challenge is that for any post-treatment variable
there is actually a set of potential outcomes (the values of the post-treatment variable under exposure and without exposure) that need to be accounted for in design and analysis for accurate results.
There are multiple strategies to address post-treatment confounding in situations where appropriate adjustment of such variables could be important. As a starting point, DAGs can be useful in describing the underlying structure of the variables and their relationships to help examine the potential implications of different analysis choices (Bind, 2019; Laubach et al., 2021; Weisskopf et al., 2015). Analysis approaches for dealing with post-treatment variables fall into three main types: (1) principal stratification–based approaches (Frangakis and Rubin, 2002; Hackstadt et al., 2014; Zigler et al., 2012), (2) mediation approaches (e.g., Bind et al., 2016, 2017), and (3) marginal structural model/target trial emulation approaches that can be used in complex longitudinal data and ensure that adjustment is made only for preexposure factors, not post-exposure confounders (Hernán and Robins, 2016; Schwartz et al., 2018). Mediation analysis can provide a way to examine possible “mechanisms of action” by disentangling what are commonly referred to as “indirect” and “direct” effects—those that flow through some mediator versus those that are a direct result of the exposure of interest (e.g., whether the effects of air pollution on pregnancy outcomes are mediated by maternal metabolomics; Inoue et al., 2020). However, care needs to be taken when conducting and—particularly relevant for the ISA process—interpreting results of studies using mediation analysis as the interpretation of the effects of interest can be challenging and the assumptions required to interpret the results as causal are substantial and rarely discussed in manuscripts using mediation analysis (Naimi et al., 2014; Stuart et al., 2020; VanderWeele, 2016).
The process of assessing quality of any individual study needs to consider whether post-exposure variables are handled properly and appropriately adjusted for, or whether more appropriate methods such as those listed in the previous paragraph are used. The causal determination framework could be revised to provide explicit guidance regarding such variables and any potential bias resulting from their treatment as well as to incorporate guidance for examining whether individual studies followed emerging best practices for mediation analysis. As with difference-in-differences designs, there are best practices regarding mediation, such as whether confounders are adjusted for and whether exposure measurement precedes the mediator and whether the mediator precedes the outcome. However, given the pace with which the methods in this area are evolving, it will be important to include individuals with expertise in these methods and a balance of perspectives in the ISA process.
Given EPA’s focus on regulating single criteria pollutants, incorporating studies of exposure to a mixture of air pollutants in causal assessment is challenging. The effects of the criteria pollutant in question need to be disentangled from results associated with the mixture. A traditional approach to modeling health effects from multiple pollutants is to build a regression model including all the pollutants of interest as covariates, as well as other variables that may act as confounders or effect modifiers. Traditional approaches such as those based on variable selection, may not efficiently identify health effects that arise from combinations of pollutants. Emerging techniques for handling large numbers of individual exposure variables or exposure mixtures include Bayesian kernel machine regression (BKMR) (Bobb et al., 2015), weighted quantile sum (WQS) (Curtin et al., 2021; Czarnota et al., 2015), and structural equation modeling. However, structural equation modeling is often considered to be more appropriate for exploratory analyses rather than causal analyses (Grace et al., 2010; VanderWeele and Shpitser, 2013).
The BKMR approach uses kernels to represent multivariable effects, including variable selection in a hierarchical Bayesian context. Bobb et al. (2015) described the methodology and appli-
cation based on multipollutant mixtures in a toxicology study of air pollution and hemodynamics. Recent extensions of the BKMR approach include lagged kernel machine regression (LKMR) (Liu et al., 2018) to address situations where there is doubt about the timescale of the air pollution effects. The model effectively allows for a combination of effects at different time lags.
Several other emerging methods are designed to handle cluster effects of pollutants or focus on sources of pollution rather than the pollutants themselves. With respect to the former, Coker et al. (2018) reviewed approaches using Bayesian profile regression (Molitor et al., 2010). This is a Bayesian methodology for identifying “profiles” of covariates that are clustered into groups and associated with relevant outcomes for the response of interest. The method could also be applied spatially to identify locations with the most health-relevant exposure-mixture profiles. This approach is potentially useful in identifying subpopulations with increased susceptibility to complex nonlinear interactions among pollutants.
A different approach to multipollutant modeling is to focus on the sources of pollution rather than the pollutants. Source apportionment refers to a very general class of methods for taking monitor data on multiple pollutants at multiple sites and representing them as mixtures of contributions from different sources. The most challenging problems occur when the number and locations of the sources are unknown. Park et al. (2014) presented a Bayesian approach for attributing health effects to an unknown number of sources that considers model uncertainty as well as parameter uncertainty where response of interest may be assumed to be normally distributed. Park and Oh (2018) extended this approach to cover Poisson regression, which is the most common type of model when the response of interest is mortality. There are several other studies utilizing multivariate source receptor modeling from a Bayesian viewpoint (Park et al., 2018, 2021).
Ultimately, assessing emerging techniques modeling exposure mixtures may be based on the ability to disentangle effects of the criteria pollutant in question, given exposure to a mix of other pollutants. From a statistical point of view, focusing on a single pollutant at a time, the “other” pollutants are typically treated as confounders (Dominici et al., 2010). Emerging work on methods to study a discrete set of multiple treatments (e.g., two pollutants) building on the designs of non-experimental work may be relevant (Lopez and Gutman, 2017; Oulhote et al., 2019). The methods include work using a factorial design to study the relationship of four pesticides on body mass index (Pashley and Bind, 2022).
Many methods that aim to assess effects of a single criteria pollutant in the presence of other multi-collinear pollutants are mature enough to be used in individual studies, and those studies may be included in the ISA process. However, given that these methods are rapidly evolving, it is important that the ISA causality determination framework provide a structure for assessing the quality of studies that use such approaches. That quality assessment should include consideration of the causal question being asked by each study, and of how well the study answers that question. More specifically, when evaluating the causal question formulated in a particular study, it will be important to identify whether the study differentiates between: (1) whether the causal estimand is the causal effect of a given component of the mixture (or a given criteria pollutant), “adjusted” by the potential confounding effects of the other components of the mixture (or other criteria pollutants); and (2) whether the causal estimand is the causal effect of the whole mixture, which in turn includes hypothetical increases in each element of the mixture plus potential complex interactions between the elements of the mixture (see, e.g., Wilson et al., 2018). The ISA causal determination framework will need to guide how to evaluate which causal estimand is being considered in a study (i.e., a single element of the mixture or the whole mixture) and assess whether adequate methods to estimate that estimand, such as confounding adjustment, has been implemented. This will necessitate experts with knowledge of these emerging methods to be involved in the ISA process.
Environmental exposures may have particularly large effects for subgroups of the study populations, ecosystems, geographical locations, and other factors. Statistically this is known as treatment effect heterogeneity or effect “moderation” (Baron and Kenny, 1986; Kraemer et al., 2002); there is a growing literature on statistical methods to examine whether causal effects vary based on characteristics measured before the exposure or treatment of interest (often called the “moderators” or “effect modifiers”). Traditional moderation analysis involves examining whether treatment effects vary across levels of individual moderators (such as age, preexisting health status, or the type of ecosystem), such as by including interactions between the exposure variable and potential moderators in a regression model of the outcome of interest, or by conducting analyses separately for groups defined by the moderators. This traditional approach is limited, however, in its ability to identify complex moderation functions. Concerns have been raised about spurious findings if many moderators are examined separately, and only those with statistically significant results reported; in response authors have proposed analysis and reporting standards (Kent et al., 2010, 2020).
Recently, a newer set of methods has been developed to examine whether effects vary across a potentially complex function of covariates. Many of the new methods use nonparametric machine learning tools that fit flexible functions of the outcome as a function of the exposure of interest and the potential moderators (e.g., Athey et al., 2019; Hahn et al., 2020; Kunzel et al., 2019; Nie and Wager, 2019; Powers et al., 2018) to capture variation in treatment effects. This general approach does not require pre-selecting the effect moderators or specifying their functional forms in models, and thus avoids influence by the analyst’s subjectivity. Although some work exists examining effect heterogeneity in air pollution (e.g., Lee et al., 2021), an area for potential future development is to extend these methods to contexts relevant for air pollution, such as to account for clustering of individuals within communities and that the exposures of interest are often at an ecological rather than individual level.
Given potential interest in vulnerable populations and ecosystems and explicit wording in the Clean Air Act (1990) to provide additional protections for susceptible subpopulations (Marchant, 2008), it is important that EPA include in the ISA causal determination framework guidance for how to assess studies that examine variation in treatment effects, and ensure that the necessary expertise and balance of perspectives is included to understand the scientific validity of the methods used in individual studies and thus how those studies might contribute to the weight of evidence approach. EPA might investigate how to assess diagnostics of method performance and how to account for the danger of spurious findings when many subgroups are examined, including whether studies conducted multiple comparisons adjustments or addressed concerns about the number of subgroups examined (e.g., Burke et al., 2015).