Research article  Open  Open Peer Review  Published:
Adjustment for unmeasured confounding through informative priors for the confounderoutcome relation
BMC Medical Research Methodologyvolume 18, Article number: 174 (2018)
Abstract
Background
Observational studies of medical interventions or risk factors are potentially biased by unmeasured confounding. In this paper we propose a Bayesian approach by defining an informative prior for the confounderoutcome relation, to reduce bias due to unmeasured confounding. This approach was motivated by the phenomenon that the presence of unmeasured confounding may be reflected in observed confounderoutcome relations being unexpected in terms of direction or magnitude.
Methods
The approach was tested using simulation studies and was illustrated in an empirical example of the relation between LDL cholesterol levels and systolic blood pressure. In simulated data, a comparison of the estimated exposureoutcome relation was made between two frequentist multivariable linear regression models and three Bayesian multivariable linear regression models, which varied in the precision of the prior distributions. Simulated data contained information on a continuous exposure, a continuous outcome, and two continuous confounders (one considered measured one unmeasured), under various scenarios.
Results
In various scenarios the proposed Bayesian analysis with an correctly specified informative prior for the confounderoutcome relation substantially reduced bias due to unmeasured confounding and was less biased than the frequentist model with covariate adjustment for one of the two confounding variables. Also, in general the MSE was smaller for the Bayesian model with informative prior, compared to the other models.
Conclusions
As incorporating (informative) prior information for the confounderoutcome relation may reduce the bias due to unmeasured confounding, we consider this approach one of many possible sensitivity analyses of unmeasured confounding.
Background
Inferences from observational epidemiological studies are often hampered by confounding [1, 2]. To estimate the causal effect of exposure on the outcome, adjustment for a minimal set of confounding variables (or confounders) is required [3,4,5,6]. However, there may be unmeasured variables that result in unmeasured (or residual) confounding. Several design and analytical methods to account for unmeasured confounding have been proposed [7], including crossover designs e.g., [8, 9], instrumental variable analysis e.g., [10, 11], the use of negative controls [12], and approaches to collect information on unmeasured confounding variables in a subsample e.g., [13, 14]. In addition, sensitivity analysis of unmeasured confounding is used to quantify the potential impact of unmeasured confounding [15,16,17].
Sensitivity analyses can be performed within a frequentist framework as well as within a Bayesian framework. The latter requires for example assumptions on prior distributions for the unknown parameters of the unmeasured confounder and its relations with exposure and outcome [18,19,20,21]. However, eliciting prior distributions for these unknown parameters can be very challenging as unmeasured confounders may actually be unknown. So far, Bayesian sensitivity analyses focused on allocating informative priors to the effect of the unmeasured confounders on the exposure or on the outcome [18, 19, 22]. Instead, it may be more straightforward to elicit prior distributions for the parameters of the effects of the observed confounders on the outcome.
Unmeasured confounding of the exposureoutcome relation may not only affect that relation, but may also bias the observed relations between confounders and outcome [23]. Constraining the estimation of the confounderoutcome relation, or incorporating (informative) prior information for the confounderoutcome relation, may (indirectly) reduce the bias due to unmeasured confounding of the exposureoutcome relation.
The aim of this research was to assess to what extent using prior information on parameters for an observed relation between a measured confounder and the outcome in a Bayesian analysis can reduce bias due to unmeasured confounding in an estimator of the exposure outcome relation. The remainder of this article is structured as follows. The bias due to omitting one or more confounders from a regression model is quantified in section 2. In section 3, the use of informative priors for the observed confounderoutcome relation was tested using simulation studies. Section 4 illustrates the approach using an empirical example of the relation between LDL cholesterol levels and systolic blood pressure. Section 5 provides a general discussion to the paper.
Methods
Notation
We consider studies of a continuous exposure (denoted by X), a continuous outcome (Y), and two continuous confounders (Z and U). All relations are assumed to be linear. All variables are considered related to the outcome, according to the model: y_{i} = β_{yx}x_{i} + β_{yz}z_{i} + β_{yu}u_{i} + ε_{i}, where lower case letters represent the realisations of the random variables Y, X, Z, and U, i is a subject indicator (i = 1, …, n), and ε ~ N(0,σ^{2}). The confounders are considered related to the exposure: x_{i} = β_{xz}z_{i} + β_{xu}u_{i} + ζ_{i}, and the confounders are also related to each other: z_{i} = β_{zu}u_{i} + ξ_{i}, with ζ ~ N(0,σ_{x}^{2}) and ξ ~ N(0,σ_{z}^{2}). For all models, the intercepts are assumed independent of all other terms in the models and are omitted here and in the following equations. The coefficients of these models represent an increase in the dependent variable by β_{..} for each unit increase in the independent variable. The structural relations between the variables are presented in Fig. 1.
Bias due to unmeasured confounding
For the fairly simple model outlined in Fig. 1, there are three possible scenarios of confounding adjustment: scenario 1.) both confounders Z and U are measured and adjusted for (e.g., by a multivariable regression analysis of Y on X, including Z and U as covariates); scenario 2.) none of the confounders are measured and hence none is adjusted for; and scenario 3.) one confounder (Z) is measured and adjusted for, while the other (U) is not. Because our interest is in situations in which unmeasured confounding is present, we only consider scenarios 2 and 3.
In both scenarios, the effect of X on Y can be estimated by means of a linear regression model. In the following, we assume all assumptions of the linear regression model are met, except that unmeasured confounding may be present. As a result, the estimator for the effect of X on Y is expected to be biased due to unmeasured confounding. Details about the bias due to unmeasured confounding are provided in Additional file 1: Appendix 1.
In scenario 2, the bias due to omitting Z and U from the data analytical model can be expressed as:
where Var(Z), Var(X), and Var(U) denote the marginal variances of Z, X, and U, respectively. Equation (1) indicates that the bias resulting from omitting two confounders is independent of the true exposureoutcome relation β_{yx}. Furthermore, the bias increases with increasing strength of the relation between each of the confounders and the outcome or the exposure (β_{yz}, β_{yu}, β_{xz}, and β_{xu}). The bias is the result of different backdoor paths [24] from X to Y: X ← Z → Y, X ← U → Y, X ← Z ← U → Y, and X ← U → Z → Y, which can be identified in the equation.
In scenario 3 the bias due to omitting U from the data analytical model, while adjusting for Z, can be expressed as:
where \( {\rho}_{uz}^2 \) is the squared (Pearson’s) correlation between U and Z, \( {\rho}_{xz}^2 \) is the squared correlation between X and Z, and \( Var(U)\left(1{\rho}_{uz}^2\right) \) and \( Var(X)\left(1{\rho}_{xz}^2\right) \), represent the conditional variances of U given Z and of X given Z, respectively. Equation (2) shows that the bias resulting from omitting one confounder from the adjustment model is independent of the true exposureoutcome relation β_{yx}. Furthermore, the bias increases as the relation between the unmeasured confounder and the outcome (β_{yu}) or the exposure (β_{xu}) increases.
As the correlation between the confounders (ρ_{uz}) increases, the bias of the estimator of the exposureoutcome relation decreases. Intuitively, when two confounders are correlated, adjusting for one accounts for some of the variability (and thus confounding effect) in the other. Therefore, adjustment for one confounder may reduce the bias that is caused by the other [25, 26]. In addition, in a linear model, Var(XZ) ≤ Var(X) and the larger the absolute value of ρ_{xz} the smaller Var(XZ). Because of this decreased Var(XZ), the residual bias carried by U, i.e.\( {\beta}_{xu}{\beta}_{yu} Var(U)\left(1{\rho}_{uz}^2\right) \), is amplified. This bias amplification particularly happens when the confounder (Z) that is adjusted for acts like an instrumental variable (IV) or nearIV, meaning that it has a stronger relation with the exposure (X) than with the outcome (Y) [27, 28].
In scenario 3, the linear regression analysis of Y on X and Z, yielding an estimate of β_{yx ∣ z}, is a biased estimator of the relation between X and Y. However, this linear regression analysis is also a biased estimator of the relation between Z and Y (β_{yz ∣ x}). When we assume all variables follow a multivariate standard normal distribution, the bias in the β_{yz ∣ x} relation can be expressed as:
where \( {\beta}_{yu}^{\prime } \) represents the conditional (or direct) effect of U on Y if both are standardized. Equation (3) shows that the unmeasured confounder (U) of the exposureoutcome relation may also confound the observed relation between the measured confounder (Z) and the outcome. If Z and X are independent (i.e., ρ_{xz} = 0), the bias is simply the result of the backdoor path from Z to Y via U (i.e., \( {\beta}_{yu}^{\prime }{\rho}_{zu} \)). Note that even if Z and U are independent, the observed relation between Z and Y is biased, due to conditioning on X, which is a collider of Z and U and hence conditioning on X opens a path from Z to Y via U [24].
Reducing unmeasured confounding using a Bayesian model
As indicated above, unmeasured confounding of the exposureoutcome relation can also bias the relation between an observed confounder and the outcome. Hence, an unexpected relation between a confounder and the outcome may suggest the presence of unmeasured confounding. Allocating informative priors to the observed confounderoutcome relation may not only reduce the bias in that parameter, but also may reduce the bias due to unmeasured confounding of the exposureoutcome relation.
In the absence of information about the confounder U, the relation between X and Y only can be controlled for confounding by Z. In a Bayesian framework, we can specify a linear model of Y as a function of X and Z. The parameters of interest, β_{yx}, β_{yz} and σ^{2}, can then be estimated using their joint posterior distribution given the data for Y, X, and Z. The joint posterior distribution is proportional to the product of the density of the data times the joint prior distribution of the parameters:
where g(β_{xy}, β_{yz}, σ^{2}) is the joint prior distribution and f(Y X, Z, β_{yx}, β_{yz}, σ^{2}) is the probability density of Y conditional on the parameters:
Assuming independent priors for the different parameters, the joint prior is simply a product of all marginal priors.
Incorporating (informative) prior information for the confounderoutcome relation, may (indirectly) reduce the bias due to unmeasured confounding (by the unmeasured variable U) of the exposureoutcome relation. This was tested through simulation studies, which are described in the next section.
Simulation study of Bayesian analysis to control for unmeasured confounding
Objective
A simulation study was performed to test the possible decrease in bias in the estimator of the exposureoutcome relation by using informative priors for the confounderoutcome relation. In simulated data, a comparison of the estimated relation between the exposure (X) and the outcome (Y) was made between two frequentist (OLS) multivariable linear regression models and three Bayesian multivariable linear regression models.
Data analysis
Every simulated data set was analysed in five different ways: two frequentist analyses and three Bayesian analyses. The two frequentist regression models included none or one of the two confounding variables: linear regression analysis without and with adjustment for the measured confounder Z. The three Bayesian regression analyses all incorporated the information about one confounder, but used different informative priors for the confounderoutcome relation. The performance of these methods was compared in terms of bias and precision of the estimator of the exposureoutcome relation. The simulation study was performed in R, version 3.1.1 [29].
The Bayesian model described in section 2.3 was used. All Bayesian regression analyses were adjusted for Z, but not for U. We used uninformative priors for σ^{2} and β_{yx}: σ ∼ U(0,100) and β_{yx} ∼ N(μ = 0, τ = 0.001), where τ indicates the precision of the distribution. We used informative priors for the parameter β_{yz}, but with different levels of precision. A normal informative prior was assumed for β_{yz}, with the true value for β_{yz} as the mean and different values for the precision, which were proportionate to the sample size n of the simulated data sets: β_{yz} ∼ N(μ = β_{yz}, τ = n, n/10, n/100). The precision could take three different values representing different degrees of certainty in the prior information. The Bayesian models were specified using the rjags package in R [30], which provides an interface from R to JAGS (http://mcmcjags.sourceforge.net).
Since the priors for σ_{y} and β_{yx} were noninformative, the posterior distributions could be approximated by the product of the density of the data and the prior of β_{yz}. The Gibbs sampler was used with four parallel chains for 2000 iterations. The first 1000 iterations were discarded as burnin runs. Since the marginal posterior was normal, we chose to present the mean of the posterior distribution as an estimate of β_{yxz}.
Data generation
Data were generated according to the structure depicted in Fig. 1 and consisted of a continuous exposure (X), a continuous outcome (Y), and two continuous confounders (Z and U). First, U was sampled from a normal distribution: U ~ N(0, σ_{u}^{2}). Second, Z was generated based on U: z_{i} = β_{zu}u_{i} + ξ_{i}, with ξ ~ N(0, σ_{z}^{2}). Then, X was generated based on U and Z: x_{i} = β_{xz}z_{i} + β_{xu}u_{i} + ζ_{i}, with ζ ~ N(0, σ_{x}^{2}). Finally, Y was generated based on U, Z, and X: y_{i} = β_{yx}x_{i} + β_{yz}z_{i} + β_{yu}u_{i} + ε_{i}, with ε ~ N(0, σ^{2}).
In all simulations, the variances σ_{u}^{2}, σ_{z}^{2}, σ_{x}^{2}, and σ^{2} were set to 1. Furthermore, the exposureoutcome relation was fixed at β_{yx} = 0 (i.e. zero relation). The parameter β_{zu} was set at 0, or 1. The parameters β_{yz}, and β_{xz} were set at 1 or 2, indicating that the observed confounder Z was related to X and to Y in all scenarios. The parameters β_{yu} and β_{xu} were set at 0, 1, or 2. All combinations of the parameters settings were evaluated through simulations, leading to 72 different scenarios.
Comparison of methods
For each scenario 100 datasets of 1000 subjects each were generated. In each dataset the methods described above were applied. For each scenario separately, the performance of these methods was compared in terms of bias of the estimator of the relation between X and Y, the empirical standard deviation (SD) of the estimated relations between X and Y, and the mean squared error (MSE). For the frequentist models, we computed the average of the estimated regression coefficients (bias), their standard deviation (SD), and the mean of the squared difference between the estimated regression coefficient and the true exposureoutcome relation (MSE). For the Bayesian models, we computed the average of the posterior means (bias), their standard deviation (SD), and the mean of the squared difference between the posterior mean and the true exposureoutcome relation (MSE).
Example study of the relation between cholesterol levels and blood pressure
To illustrate the application of the use of informative priors for the observed confounderoutcome relation we used data on the relation between lowdensity lipoprotein (LDL cholesterol) levels and systolic blood pressure (SBP). This example was based on the Second Manifestations of Arterial disease (SMART) study, which is an ongoing prospective cohort study of patients with manifest vascular disease of vascular risk factors [31]. For this example, we assumed that there are two possible confounders of the LDLSBP relation, namely body mass index (BMI) and blood glucose levels (BGL). A data set of 1000 observations was simulated based on the variancecovariance matrix and the vector of means of these four variables in the cohort study. In all analyses, BMI was considered to be a measured confounder, while BGL was considered to be unmeasured.
Comparison of methods
The different methods described in section 3.2.1 were applied to the example data. As a reference, we fitted a linear regression model of SBP on LDL, including BMI and BGL as covariates (referred to as the ‘full model’). BMI was considered to be a measured confounder, while BGL was considered to be unmeasured. The performance of the different methods was assessed by the difference between the estimated LDLSBP relations from the different models and the LDLSBP relation obtained from the full model.
The Bayesian approach was implemented in two ways. We first used the estimated regression coefficient of the effect of BMI on systolic blood pressure from the full model (i.e., 0.32), as the mean for the prior distribution of the measured confounder on the outcome, and precision equal to the sample size (i.e., τ = 1000). We then used an relation from the literature as the prior mean. A previous study on the relation between BMI and SBP in adults found a linear regression coefficient of 0.77 [32]. This relation was used as the mean of the prior distribution of the measured confounder and outcome. Since we were less certain about this prior information, we used a smaller precision (τ = 100). For all the other relations we used uninformative priors as described in Section 3.2.1.
Results
Simulation study
Table 1 shows the results of the simulation study for the scenarios where β_{xz} = β_{yz} = 2. Similar patterns were observed for other values of β_{xz} and β_{yz}; these are omitted from the Table for brevity. Results for all simulated scenarios can be found in Additional file 2: Appendix 2. The Bayesian model with precision 100 (i.e., n/10) showed results that were in between those of the Bayesian models with precision 1000 (i.e., n) and precision 10 (i.e., n/100). Results for the Bayesian model with precision 100 are omitted for clarity (see Additional file 2: Appendix 2).
In most scenarios, the Bayesian model with precision 1000 showed less bias than the frequentist model with covariate adjustment. Noticeable exceptions in Table 1 are scenarios 8 and 14, in which the Bayesian model with precision 1000 was more biased than the frequentist model with covariate adjustment (which was actually unbiased). The reason for this is that in these scenarios U is not a confounder of the XY relation (because β_{xu} = 0), yet it is a confounder of the ZY relation (e.g., in scenario 8 \( \widehat{\beta_{yz\mid x}} \)= 1.50, while β_{yz} = 1). As the Bayesian model corrects the bias in the ZY relation, it induces a bias in the XY relation. In scenarios 10 and 16 in Table 1, the Bayesian models and the frequentist model with covariate adjustment yielded similar, yet biased, results. In these scenarios, the estimated relation between Z and Y from the frequentist model with covariate adjustment corresponded with the mean of the prior distribution of this relation (i.e., \( \widehat{\beta_{yz\mid x}} \)= 1.00 and β_{yz} = 1). Hence, the Bayesian model did not reduce bias, compared to the frequentist model. In scenarios 1–7, all methods that adjusted for the measured confounder Z yielded unbiased results, because the variable U was not a confounder in these scenarios (β_{yu} = 0). The extent to which the Bayesian model reduced bias was substantially smaller when the precision was 10 instead of 1000.
The standard deviation (SD) of the empirical distribution of the parameter estimates was smaller for the Bayesian model with precision 1000, compared to the frequentist model with covariate adjustment and the Bayesian model with precision 10 (the latter two showing approximately the same SD). Also, in general MSE was smaller for the Bayesian model with precision 1000, compared to the other models.
Empirical example
In the empirical example of the relation between lowdensity lipoprotein (LDL cholesterol) levels and systolic blood pressure (SBP)., LDL increased BP, after adjustment for BMI and BGL, but omitting BGL from the data analytical model reduced the estimated effect substantially from 1.24 to 1.03 (Table 2). The amount of bias of the LDLSBP relation slightly decreased when using an informative prior for the confounder outcome relation (i.e., for the BMISBP relation). However, even when the ‘correct’ prior, based on the full model, was used, the estimated effect of LDL on SBP remained substantially different from the reference value.
Discussion
This simulation study on the value of Bayesian analysis with informative priors for the relation between the measured confounder and the outcome in the presence of unmeasured confounding shows that such an analysis can reduce the bias due to unmeasured confounding substantially. The magnitude of the remaining bias decreases as the precision of the (correct) informative prior increases.
An obvious prerequisite when using the proposed Bayesian approach to correct for unmeasured confounding is prior knowledge about the relation between the measured confounder and the outcome. We argue that in many clinical research situations, such prior knowledge exists for many observed confounders, at least in terms of direction and order of magnitude of the relation. That information may be obtained from rigorously designed and conducted large epidemiological studies or from metaanalysis of individual patient data of randomised trials. Obviously, the impact of the Bayesian approach depends on the precision of the prior distribution. Informative priors with relatively small precision have little impact in term of confounding correction, yet allow Bayesian algorithms to be used. In practice it might be difficult – or researchers may be reluctant – to specify relatively highly informative priors.
If only the direction (but not the magnitude) of the confounderoutcome relation is included in the prior, the precision of the prior will be relatively small and the impact of the Bayesian analysis may be relatively small too. We did not include this particular form of prior distribution in our simulation study, but instead focused on distributions with the same mean, yet different precision.
As with any simulation study, an obvious limitation to our work is the finite number of simulated scenarios that we evaluated. For example, we only considered situations with two confounders, one being measured, one unmeasured. Although the two confounders Z and U could be considered as representing two sets of measured and unmeasured confounders, respectively, future research could address scenarios of multiple confounders with, e.g., different distributions of the confounders. Another scenario that we did not evaluate and could be the topic of future research is specification of the priors, such that these do not correspond to the ‘true’ confounderoutcome relation. The robustness to various levels of misspecifications of the prior distribution still needs to be studied.
Where to position this Bayesian approach in the toolbox of the researcher doing observational epidemiologic research? Given that many observational studies potentially suffer from unmeasured confounding, sensitivity analysis of unmeasured confounding is often important. Eliciting priors for unobserved (and possibly unknown) confounding variables is likely to be difficult. On the other hand, focusing on the approximate size of the relations between measured confounders and the outcome provides the opportunity to perform a Bayesian sensitivity analysis as outlined in this paper.
Informative priors for the measured confounderoutcome relations can reduce unmeasured confounding bias of the exposureoutcome relation. In case of observing unexpected confounderoutcome relations a sensitivity analysis of unmeasured confounding could be considered, in which prior information about the observed confounderoutcome relations is incorporated through Bayesian analysis.
Conclusions
In this paper we proposed a Bayesian approach to reduce bias due to unmeasured confounding by expressing an informative prior for a measured confounderoutcome relation. A simulation study on the value of this Bayesian analysis with informative priors for the relation between the measured confounder and the outcome in the presence of unmeasured confounding shows that such an analysis can indeed reduce the bias due to unmeasured confounding substantially. The magnitude of the remaining bias decreases as the precision of the (correct) informative prior increases. We consider this approach one of many possible sensitivity analyses of unmeasured confounding.
Abbreviations
 BGL:

Blood glucose levels
 BMI:

Body mass index
 LDL:

Lowdensity lipoprotein
 MSE:

Mean squared error
 SBP:

Systolic blood pressure
 SD:

Standard deviation
References
 1.
Hernan MA, Robins JM. Causal inference. Boca Raton: Chapman & Hall / CRC, forthcoming; 2016.
 2.
Robins JM. Data, design, and background knowledge in etiologic inference. Epidemiology. 2001;12(3):313–20.
 3.
VanderWeele TJ, Shpitser I. On the definition of a confounder. Ann Stat. 2013;41(1):196–220.
 4.
VanderWeele TJ, Shpitser I. A new criterion for confounder selection. Biometrics. 2011;67(4):1406–13.
 5.
Rosenbaum PR, Rubin DB. The central role of the propensity score in observational studies for causal effects. Biometrika. 1983;70(1):41–55.
 6.
Rosenbaum PR, Rubin DB. Reducing bias in observational studies using subclassification on the propensity score. J Am Stat Ass. 1984;79(387):516–24.
 7.
Uddin MJ, Groenwold RH, Ali MS, de Boer A, Roes KC, Chowdhury MA, Klungel OH. Methods to control for unmeasured confounding in pharmacoepidemiology: an overview. Int J Clin Pharm. 2016;38(3):714–23.
 8.
Hallas J, Pottegård A. Use of selfcontrolled designs in pharmacoepidemiology. J Intern Med. 2014;275(6):581–9.
 9.
Whitaker HJ, Hocine MN, Farrington CP. The methodology of selfcontrolled case series studies. Stat Methods Med Res. 2009;18(1):7–26.
 10.
Chen Y, Briesacher BA. Use of instrumental variable in prescription drug research with observational data: a systematic review. J Clin Epidemiol. 2011;64(6):687–700.
 11.
Martens EP, Pestman WR, de Boer A, Belitser SV, Klungel OH. Instrumental variables: application and limitations. Epidemiology. 2006;17(3):260–7.
 12.
Lipsitch M, Tchetgen Tchetgen E, Cohen T. Negative controls: a tool for detecting confounding and bias in observational studies. Epidemiology. 2010;21(3):383–8.
 13.
Stürmer T, Schneeweiss S, Avorn J, Glynn RJ. Adjusting effect estimates for unmeasured confounding with validation data using propensity score calibration. Am J Epidemiol. 2005;162(3):279–89.
 14.
White JE. A two stage design for the study of the relationship between a rare exposure and a rare disease. Am J Epidemiol. 1982;115:119–28.
 15.
Lin DY, Psaty BM, Kronmal RA. Assessing the sensitivity of regression results to unmeasured confounders in observational studies. Biometrics. 1998;54(3):948–63.
 16.
Diaz I, van der Laan MJ. Sensitivity analysis for causal inference under unmeasured confounding and measurement error problems. Int J Biostat. 2013;9(2):149–60.
 17.
Groenwold RH, Nelson DB, Nichol KL, Hoes AW, Hak E. Sensitivity analyses to estimate the potential impact of unmeasured confounding in causal research. Int J Epidemiol. 2010;39(1):107–17.
 18.
McCandless LC, Gustafson P, Levy AR, Richardson S. Hierarchical priors for bias parameters in Bayesian sensitivity analysis for unmeasured confounding. Stat in Med. 2012;31(4):383–96.
 19.
McCandless LC, Gustafson P, Levy A. Bayesian sensitivity analysis for unmeasured confounding in observational studies. Stat in Med. 2007;26(11):2331–47.
 20.
Greenland S. The impact of prior distributions for uncontrolled confounding and response bias: a case study of the relation of wire codes and magnetic fields to childhood leukemia. J Am Stat Ass. 2003;98(461):47–54.
 21.
Dorie V, Harada M, Bohme Carnegie N, Hill J. A flexible, interpretable framework for assessing sensitivity to unmeasured confounding. Stat in Med. 2016;35:3453–70.
 22.
Gustafson P, McCandless L, Levy A, Richardson S. Simplified Bayesian sensitivity analysis for mismeasured and unobserved confounders. Biometrics. 2010;66(4):1129–37.
 23.
Schuit E, Groenwold RH, Harrell FE, de Kort WL, Kwee A, Mol BWJ, et al. Unexpected predictor–outcome associations in clinical prediction research: causes and solutions. CMAJ. 2013;185(10):E499–505.
 24.
Pearl J. Causality: models, reasoning, and inference. 2nd ed. 2009. Cambridge University press, N Y.
 25.
Fewell Z, Smith GD, Sterne JA. The impact of residual and unmeasured confounding in epidemiologic studies: a simulation study. Am J Epidemiol. 2007;166(6):646–55.
 26.
Groenwold RH, Sterne JA, Lawlor DA, Moons KG, Hoes AW, Tilling K. Sensitivity analysis for the effects of multiple unmeasured confounders. Ann Epidemiol. 2016 Sep;26(9):605–11.
 27.
Bhattacharya J, Vogt WB. Do instrumental variables belong in propensity scores? Int J Stat Econ. 2012;9(A12):107–27.
 28.
Pearl J. Invited commentary: understanding bias amplification. Am J Epidemiol. 2011;174(11):1223–7.
 29.
R Development Core Team. R: A Language and Environment for Statistical Computing Vienna, Austria; 2008. ISBN 3900051070. Available from: http://www.Rproject.org.
 30.
Plummer M. rjags: Bayesian Graph Model using MCMC; 2016. R package version 4–5. Available from: http://CRAN.Rproject.org/package=rjags.
 31.
Simons PCG, Algra A, Van de Laak M, Grobbee D, Van der Graaf Y. Second manifestations of ARTerial disease (SMART) study: rationale and design. Eur J Epidemiol. 1999;15(9):773–81.
 32.
Stamler J. Epidemiologic findings on body mass and blood pressure in adults. Ann Epidemiol. 1991;1(4):347–62.
Acknowledgements
We thank prof Y. van der Graaf for allowing us to use a subset of the dataset of the SMART cohort as an illustration.
Funding
We gratefully acknowledge financial contribution from the Netherlands Organisation for Scientific Research (NWO, projects 917.16.430 and 452–12010).
Availability of data and materials
Simulation scripts are available upon request.
Author information
Affiliations
Contributions
RG, IS, and IK drafted the concept for the current paper. RG and IS wrote the initial version of the paper, performed statistical programming for the simulations and conducted analyses. MM and MvS contributed to the design of the simulation study and the interpretation of the simulation results. All authors commented on drafts of the article and approved the manuscript.
Corresponding author
Correspondence to Rolf H. H. Groenwold.
Ethics declarations
Ethics approval and consent to participate
Not applicable.
Consent for publication
Not applicable.
Competing interests
The authors declare that they have no competing interests.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Additional files
Additional file 1:
Appendix 1. Expressions of bias. (PDF 105 kb)
Additional file 2:
Appendix 2 Table A1. Results of the simulation study of different methods to control for confounding (PDF 395 kb)
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
About this article
Received
Accepted
Published
DOI
Keywords
 Bias
 Confounding
 Bayesian statistics
 Sensitivity analysis