Estimating the natural indirect effect and the mediation proportion via the product method

Background The natural indirect effect (NIE) and mediation proportion (MP) are two measures of primary interest in mediation analysis. The standard approach for mediation analysis is through the product method, which involves a model for the outcome conditional on the mediator and exposure and another model describing the exposure–mediator relationship. The purpose of this article is to comprehensively develop and investigate the finite-sample performance of NIE and MP estimators via the product method. Methods With four common data types with a continuous/binary outcome and a continuous/binary mediator, we propose closed-form interval estimators for NIE and MP via the theory of multivariate delta method, and evaluate its empirical performance relative to the bootstrap approach. In addition, we have observed that the rare outcome assumption is frequently invoked to approximate the NIE and MP with a binary outcome, although this approximation may lead to non-negligible bias when the outcome is common. We therefore introduce the exact expressions for NIE and MP with a binary outcome without the rare outcome assumption and compare its performance with the approximate estimators. Results Simulation studies suggest that the proposed interval estimator provides satisfactory coverage when the sample size ≥500 for the scenarios with a continuous outcome and sample size ≥20,000 and number of cases ≥500 for the scenarios with a binary outcome. In the binary outcome scenarios, the approximate estimators based on the rare outcome assumption worked well when outcome prevalence less than 5% but could lead to substantial bias when the outcome is common; in contrast, the exact estimators always perform well under all outcome prevalences considered. Conclusions Under samples sizes commonly encountered in epidemiology and public health research, the proposed interval estimator is valid for constructing confidence interval. For a binary outcome, the exact estimator without the rare outcome assumption is more robust and stable to estimate NIE and MP. An R package mediateP is developed to implement the methods for point and variance estimation discussed in this paper. Supplementary Information The online version contains supplementary material available at (10.1186/s12874-021-01425-4).

repeated their work and developed a simplified version of mediation measure expressions, with notations in our manuscript. At the end of this section, we compared the exact expressions and approximate expressions under a rare outcome assumption and showed that the exact expressions degenerated to the approximate expressions if the outcome was rare.
If model (1) has a logistic link function and the error term in (4) follows N (0, σ 2 ), P (Y xMx = 1|W = w) is shown as Therefore, , then the above quantity simplifies to Similarly, we can show logit P (Y xM x * = 1|W = w) = β 0 + β 1 x + β T 3 w + log can use some integrate methods to numerically calculate the integrals, such as the Gauss-Hermite Quadrature (abbreviated by GHQ, Liu and Pierce, 1994) and the QUADPACK approach (Piessens et al., 2012). When using 30 or more knots in the GHQ approach and setting the relative accuracy in the QUADPACK to 10 −4 or less, we observed that both integrate approaches performed very close for calculating NIE and MP estimates, where the differences of their calculations were generally less than 10 −7 under the parameter settings of γ, β and σ 2 in our simulation studies. Therefore, we only used the GHQ method in the simulation studies and our R package.
Instead of calculating the integrals numerically, Gaynor et al. (2019) used a probit function to approximate the logit function in the integrals and obtained closed-form expressions for the mediation measures (see Table   2 in the manuscript for the expressions). However, this probit approximation method could be crude as the outcome prevalence further deviates from 50%, as discussed in Gaynor et al. (2019). We conducted a brief numerical study to compare the percent bias of the probit approximation method for computing the NIE and MP, where the numerical integral calculation based on the GHQ were treated as the gold standard. We set TE and MP, defined for X in change from 0 to 1, as log(1.5) and 0.5, respectively, while changing the baseline outcome prevalence from 1% to 95%, where other settings and parameter values are same to the simulation study section. Web Figure 1 visualized the percent bias of the probit approximation approach. Although the precent bias for MP were minimal, the probit approximation method exhibited high bias for calculating NIE if the baseline prevalence is less than 10% or greater than 80%, where the percent bias exceeds 40% when outcome prevalences below 5%.
Generally, the above expressions for NIE, NDE and TE can not be further simplified, as the integrals in the expressions do not have closed forms. However, if the outcome is sufficient rare, the denominator in τ (x , x, m, w), i.e., 1 + exp(β 0 + β 1 x + β 2 m + β T 3 w), approximates to 1. If we apply this approximation to the NIE expression, we have Using a similar strategy, we can show NDE ≈ β 1 (x − x * ) and TE ≈ (β 2 γ 1 + β 1 )(x − x * ). As a result MP ≈ β2γ1 β2γ1+β1 .
Web Appendix B: Causal mediation measures under binary outcome and binary mediator As given in Gaynor et al. (2019), P (YxM x = 1|W = w), P (YxM x * = 1|W = w), and P (Yx * M x * = 1|W = w) can be shown as Next, substituting above probabilities into the expressions of NIE, NDE and TE leads to , and TE = log .

Web Appendix D: Simulations in the presence of a binary confounder
We conducted additional simulation studies to examine the robustness of the product method in the presence of a binary confounder, W . This could be a representative case accounting for many confounders simultaneously. We investigate the percent bias, variance ratio, and 95% confidence interval coverage rate for Cases #1 to #4. With a focus on investigating the impact from confounding effect, in this simulation we will fix the sample size, TE, MP, and outcome prevalence while changing the confounder-exposure, confoundermediator, and confounder-outcome associations. Specifically, we will fix the sample size at 5,000, MP to 0.5, the TE to 0.5 for the scenarios with a continuous outcome, and fix the sample size at 5,000, MP to 0.5, TE to log(1.5), and the baseline outcome prevalence at 3% for the scenarios with a binary outcome. The data generations are described as below.
First, generate W from a Bernoulli distribution with P (W = 1) = 0.5. Then, given W , generate a binary exposure X based on the following logistic model where ψ 1 is chose to log(1.2) and log (2) such that the odds ratio of X = 1 for W = 1 versus W = 0 is 1.2 and 2.0 to represent a small and large confounder-exposure association, respectively. We choose ψ 0 such that the marginal probability of X = 1 equals 0.5. Next, we generate M from N (γ 0 +γ 1 X+γ 2 W, 1) for the scenarios with a continuous mediator (Cases #1 and #3) and from a logistic model logit(P (M = 1|X, W )) = γ 0 + γ 1 X + γ 2 W for the scenarios with a binary mediator (Cases #2 and #4). The values of γ 0 and γ 1 are selected to agree with what they were in the original simulation studies without confounder adjustment, and γ 2 is selected to 0.1 and 1 to represent a small and large confounder-mediator association, respectively. Finally, we generate the outcome Y by a normal distribution N (β 0 + β 1 X + β 2 M + β 3 W, 1) for the scenarios with a continuous outcome (Cases #1 and #2) and from a logistic model logit( for the scenarios with a binary outcome (Cases #3 and #4). The values of β 0 , β 1 , and β 2 is selected to agree with what they were in the original simulation studies without confounder adjustment, and β 3 are selected to 0.1 and 1 to represent a small and large confounder-outcome association, respectively.
The simulation results for the continuous outcome scenarios (Cases #1 and #2) are shown in Web Table   6 in additional file 1. We observe that both the NIE and MP estimators have minimal bias associated with accurate 95% confidence interval coverage rates among all combinations of ψ 1 , γ 2 , and β 3 considered. The simulation results for the binary outcome scenarios (Cases #3 and #4) are shown in Web Table 7 in additional file 1. Similar results were observed, where the percent bias of the NIE and MP estimators are controlled within 2% and the coverage rates are also close to the nominal value. All these indicate that the product method with the proposed estimation methods are robust with regard to confounding adjustment.

Web Appendix E: Causal mediation measures defined on the odds ratio scale
Here we use OR N IE , OR N DE , OR T E , and OR M P to denote the mediation measures on the odds ratio scale. The relationship between the mediation measures on the log odds ratio scale and those on the odds ratio scale is given by For NIE, NDE and TE, we can simply take exponential of point estimates and confidence interval boundaries shown in the Methodology section to construct the corresponding point and interval estimates on an odds ratio scale. In the aspect of MP, there is no one-to-one function that can map MP to OR M P . However, if TE expansion on exp(TE) around TE = 0, i.e., Similarly, TE ≈ 0 indicates NDE ≈ 0, therefore we also have exp(NDE) − 1 ≈ NDE. It follows that , were highlighted in bold, where q denotes the nominal confidence interval threshold (95%) and B denotes number of replication (5,000). The median percent bias was calculated as the median of the ratio of bias to the true value over 5,000 replications, i.e., Bias(%) = median(p −p p ) × 100%, where p denotes the true value of the causal mediation measure, andp is the point estimate of the simulated causal mediation measure. The median variance ratio is defined by the ratio of median delta-method variance estimators across 5,000 replications to the empirical variance of causal mediation measure estimates from the 5,000 replications.  , were highlighted in bold, where q denotes the nominal confidence interval threshold (95%) and B denotes number of replication (5,000). The median percent bias was calculated as the median of the ratio of bias to the true value over 5,000 replications, i.e., Bias(%) = median(p −p p ) × 100%, where p denotes the true value of the causal mediation measure, andp is the point estimate of the simulated causal mediation measure.
The median variance ratio is defined by the ratio of median delta-method variance estimators across 5,000 replications to the empirical variance of causal mediation measure estimates from the 5,000 replications. , were highlighted in bold, where q denotes the nominal confidence interval threshold (95%) and B denotes number of replication (5,000). The median percent bias was calculated as the median of the ratio of bias to the true value over 5,000 replications, i.e., Bias(%) = median(p −p p ) × 100%, where p denotes the true value of the causal mediation measure, andp is the point estimate of the simulated causal mediation measure.
The median variance ratio is defined by the ratio of median delta-method variance estimators across 5,000 replications to the empirical variance of causal mediation measure estimates from the 5,000 replications. Note: Prev, Ncase, Bias(%), CR (d) , and VR denote the baseline outcome prevalence, the average number of cases under each setting across 5,000 replications, the median percent bias, 95% confidence interval coverage rate of multivariate delta method, and mediation variance ratio, respectively. The coverage rates outside the 95% confidence boundary, i.e., q ± 1.96 × q(1−q) B , were highlighted in bold, where q denotes the nominal confidence interval threshold (95%) and B denotes number of replication (5,000). The median percent bias was calculated as the median of the ratio of bias to the true value over 5,000 replications, i.e., Bias(%) = median(p −p p ) × 100%, where p is the true value of the causal mediation measure evaluated based on equations (5) and (6) in manuscript, andp is the point estimate of the simulated causal mediation measure. The median variance ratio is defined by the ratio of median delta-method variance estimators across 5,000 replications to the empirical variance of causal mediation measure estimates from the 5,000 replications. , were highlighted in bold, where q denotes the nominal confidence interval threshold (95%) and B denotes number of replication (5,000). The median percent bias was calculated as the median of the ratio of bias to the true value over 5,000 replications, i.e., Bias(%) = median(p −p p ) × 100%, where p is the true value of the causal mediation measure evaluated based on equations (5) and (6) in manuscript, andp is the point estimate of the simulated causal mediation measure. The median variance ratio is defined by the ratio of median delta-method variance estimators across 5,000 replications to the empirical variance of causal mediation measure estimates from the 5,000 replications. , were highlighted in bold, where q denotes the nominal confidence interval threshold (95%) and B denotes number of replication (5,000). The median percent bias was calculated as the median of the ratio of bias to the true value over 5,000 replications, i.e., Bias(%) = median(p −p p ) × 100%, where p denotes the true value of the causal mediation measure, andp is the point estimate of the simulated causal mediation measure. The median variance ratio is defined by the ratio of median delta-method variance estimators across 5,000 replications to the empirical variance of causal mediation measure estimates from the 5,000 replications. , were highlighted in bold, where q denotes the nominal confidence interval threshold (95%) and B denotes number of replication (5,000). The median percent bias was calculated as the median of the ratio of bias to the true value over 5,000 replications, i.e., Bias(%) = median(p −p p ) × 100%, where p denotes the true value of the causal mediation measure, andp is the point estimate of the simulated causal mediation measure. The median variance ratio is defined by the ratio of median delta-method variance estimators across 5,000 replications to the empirical variance of causal mediation measure estimates from the 5,000 replications.