Model
Let U denote a set of unobserved confounders which could possibly distort the causal relationship between the exposure X and the outcome Y. Let Z be an instrumental SNP (multiple instruments are considered later) which satisfies the following three assumptions ([10]):
-
A1: Z is associated with X;
-
A2: Z is independent of U;
-
A3: Z is independent of Y, conditioning on (X,U).
These assumptions are graphically expressed in Fig. 1. The Z→X arrow represents a non-zero association between the IV and the exposure, in accord with A1. Assumption A2 follows from the graph (it would not if there were an arrow directly connects Z and U). A3 is also satisfied (it would not if an arrow pointed directly from Z to Y). The X→Y arrow represents the putative causal effect of X on Y, which is our primary interest.
Consider a study with multiple exposures and each instrumental SNP associated with at least one exposure. Without loss of generality, we shall hereafter restrict attention to the case where there are three exposures X=(X1,X2,X3) and three IVs Z=(Z1,Z2,Z3). Figure 2 depicts our model and parameter notations, with β1,β2 and β3 representing the putative causal effects of the exposures X1,X2 and X3 on the outcome Y, respectively; δ1,δ2,δ3 and δ4 representing the effects of the confounder U on the three exposures and Y, respectively. For simplicity, we assume that there are no direct associations between the exposures and that the IVs are mutually independent. The Z1→X1 association is quantified by parameter α1 and the Z2→X2 association by α2. Both Z2 and Z3 are associated with X3, and the strengths of these associations are quantified by parameters α3 and α4 respectively. Z1 and Z3 are “valid instruments” as they satisfy all the above stated MR assumptions. However, Z2 violates assumption A3 because the effect of Z2 on Y is mediated by both X2 and X3 - a problem of horizontal pleiotropy [15].
By assuming linearity and additivity of the conditional dependencies, and in accord with the graph of Fig. 1, we specify our model as follows.
$$\begin{array}{*{20}l} \text{\textsl{U}} &\sim N(0, 0.1), \end{array} $$
(1)
$$\begin{array}{*{20}l} X_{1} | Z_{1},\text{\textsl{U}} &\sim N\left(\omega_{1}+\alpha_{1}Z_{1}+\delta_{1}\text{\textsl{U}}, \sigma_{1}^{2}\right), \end{array} $$
(2)
$$\begin{array}{*{20}l} X_{2} | Z_{2},\text{\textsl{U}} &\sim N\left(\omega_{2}+\alpha_{2}Z_{2}+\delta_{2}\text{\textsl{U}}, \sigma_{2}^{2}\right), \end{array} $$
(3)
$$\begin{array}{*{20}l} X_{3} | Z_{2},Z_{3},\text{\textsl{U}} &\sim N\left(\omega_{3}+\alpha_{3}Z_{2} +\alpha_{4}Z_{3}+\delta_{3}\text{\textsl{U}}, \sigma_{3}^{2}\right), \end{array} $$
(4)
$$\begin{array}{*{20}l} Y | X_{1},X_{2},X_{3},\text{\textsl{U}} &\sim N\left(\omega_{Y}+\beta_{1}X_{1}+ \beta_{2}X_{2}\right.\\ &\left.\qquad+\beta_{3}X_{3}+\delta_{4}\text{\textsl{U}}, \sigma_{Y}^{2}\right), \end{array} $$
(5)
where N(a,b) stands for a normal distribution with mean a and variance b. The ω parameters are unknown intercepts and the σs are standard deviations of independent random noise terms. The parameters of primary inferential interest are the βs, each representing the causal effect of a particular exposure on the outcome. The α parameters quantify the strengths of pairwise associations between the instruments and the exposures. They are often referred to as “instrument strengths”, and should be significantly different from zero to attenuate weak instrument bias [16]. Finally, U denotes a sufficient scalar summary of the unobserved confounders, which is set to be drawn from a N(0,0.1) distribution with its parameters not identifiable from the likelihood. This model is built on the basis of the Bayesian approach developed by Berzuini et al. [5], with an extension to multiple exposures and allowing for an IV to be associated with more than one exposure.
Our method
We propose a Bayesian MR method that works in all the sample settings (whether one-sample, two-sample or overlapping-sample), and therefore is more flexible than existing non-Bayesian methods in this respect. In essence, our method treats two- or overlapping- samples as one sample where the data of some (or all) of the individuals are incomplete, in the sense that these individuals have Z and X (but not Y) measured or Z and Y (but not X) measured. Clearly our method allows for quite general patterns of missingness, although it is not our purpose here to explicitly explore their full repertoire. In addition, for simplicity, we illustrate our method by restricting attention to the special case of a continuous outcome variable Y. We assume that all individuals are drawn from the same population and unmeasured data are missing at random. Then whatever are the unobserved variables X or Y, they are treated au pair with the model parameters as unknown quantities, and thus, can be imputed iteratively from their distributions conditioning on the observed variables and estimated parameters using Markov chain Monte Carlo (MCMC) [17] - the best Bayesian tradition.
We start with introducing three disjoint datasets as follows (see Fig. 3a):
-
Dataset A: all individuals with observed values of Z,X and Y;
-
Dataset B: all individuals with observed values of Z and X only;
-
Dataset C: all individuals with observed values of Z and Y only.
No individuals are common in A, B and C but they are drawn independently from the same population. As discussed earlier, A may be collected from a previous study; B and C may represent data from two independent IV-exposure and IV-outcome association studies. If we combine A with B
and A with C
$$D_{2} = A \cup C,\vspace*{-15pt} $$
the two resulting datasets, D1 and D2, will not be disjoint as they have dataset A in common- a typical overlapping-sample problem. If A is empty, B and C will form a two-sample problem.
Our method treats missing data as all other unknown quantities in the model. That is, we impute the missing values of Y in B by randomly drawing a value from the conditional distribution of Y given observed exposures and the estimated parameters at each iteration. Likewise, the missing values of X in C will be imputed by a randomly draw from the conditional distribution of X given the observed instruments Z and the estimated parameters at each iteration. Let Y∗ be imputed values of Y and \(\mathbf {X}^{*} = (X_{1}^{*}, X_{2}^{*}, X_{3}^{*})\) be imputed values of X. Given all the datasets A,B and C, we proceed with the following Markov chain scheme.
-
1.
Set initial values for all the unknown parameters in the model. We also fix the desired number of Markov iterations, with iteration index t.
-
2.
At the tth iteration, in accord with Fig. 3b, we replace the missing values of Y in B with Y∗ which is drawn from a Normal distribution with mean \(\omega _{Y}^{(t)}+\beta _{1}^{(t)}X_{1}+\beta _{2}^{(t)}X_{2}+\beta _{3}^{(t)}X_{3}\) and standard deviation \(\sigma _{Y}^{(t)}\), where (X1,X2,X3) are observed values in dataset B. Similarly, we replace the missing values of X in C with (\(X_{1}^{*}, X_{2}^{*}, X_{3}^{*}\)) which are drawn from a Normal distribution with mean \(\omega _{1}^{(t)}+\alpha _{1}^{(t)}Z_{1}, \omega _{2}^{(t)}+\alpha _{2}^{(t)}Z_{2}\) and \(\omega _{3}^{(t)}+\alpha _{3}^{(t)}Z_{2}+\alpha _{4}^{(t)}Z_{3}\) respectively, where (Z1,Z2,Z3) are observed values in dataset C. We use superscript (t) for the random parameters at the tth iteration, with t=0 representing their initial values. Because both U and the random errors have zero mean, they have vanished in this step.
-
3.
Merge all the data, imputed and observed, into a new complete dataset (Fig. 3c).
-
4.
Estimate model parameters based on the complete dataset obtained in Step 3 using MCMC and set t←t+1.
-
5.
Repeat Steps 2-4 until t equals the number of iterations specified in Step 1.
The priors
Here we discuss our choice of prior distributions of the unknown parameters involved in Models (1)-(5). Let the priors for β=(β1,β2,β3) be independently normally distributed with mean 0 and standard deviation 10
$$\left(\begin{array}{c} \beta_{1} \\ \beta_{2} \\ \beta_{3} \\ \end{array} \right) \sim N\left[\left(\begin{array}{c} 0 \\ 0 \\ 0 \\ \end{array} \right),\left(\begin{array}{ccc} 10^{2} & 0 & 0 \\ 0 & 10^{2} & 0 \\ 0 & 0 & 10^{2} \\ \end{array} \right)\right], $$
and the IV strength parameters α=(α1,α2,α3,α4) be independently normally distributed with mean 1 and standard deviation 0.3
$$\left(\begin{array}{c} \alpha_{1} \\ \alpha_{2} \\ \alpha_{3} \\ \alpha_{4} \\ \end{array} \right) \sim N\left[\left(\begin{array}{c} 1 \\ 1 \\ 1 \\ 1 \\ \end{array} \right),\left(\begin{array}{cccc} {0.3}^{2} & 0 & 0 & 0 \\ 0 & {0.3}^{2} & 0 & 0 \\ 0 & 0 & {0.3}^{2} & 0 \\ 0 & 0 & 0 & {0.3}^{2} \\ \end{array} \right)\right]. $$
U has been assumed to follow a normal distribution U∼N(0,0.1) in Model (1). The priors of the standard deviation parameters σ1,σ2,σ3 and σY are assumed to follow a same inverse-gamma distribution σ(·)∼Inv-Gamma(3, 2).
Simulations
In our simulations, according to Models (1)-(5), we consider a total of 72 configurations including
-
6 sample overlapping rates (100%, 80%, 60%, 40%, 20%, 0%)
-
2 IV strengths: α=(α1,α2,α3,α4)=0.5 and 0.1
-
3 levels of the confounding effects of U on X: (δ1,δ2,δ3)=1,0.5 and 0.1
-
2 levels of causal effects of X on Y: β=(β1,β2,β3)=0.3 and 0.
The effect of U on Y is set to 1 (δ4=1). In each configuration, we simulated 200 datasets, step by step, as follows.
-
1.
Generate a dataset H which contains observed IVs (Z), exposures (X) and outcome (Y) from 1000 independent individuals.
-
2.
Randomly sample nA individuals without replacement from H and take their observations of (Z,X,Y) as dataset A;
-
3.
Randomly Sample nB individuals without replacement from H−A={x∈H,x∉A} and take their observations of (Z,X) as dataset B;
-
4.
Randomly sample nC individuals without replacement from H−A−B={x∈H,x∉A∪B} and take their observations of (Z,Y) as dataset C.
The sample size of B was set to be the same as that of C (i.e., nB=nC) and the sample size of both D1=A∪B and D2=A∪C to 400. The overlapping rate was defined as the percentage of the number of individuals from A (nA) in D1 (or equivalently, in D2). For example, if overlapping rate was 80%, then nA=320 and nB=nC=80. Similarly, for an overlapping rate of 60%, nA=240 and nB=nC=160. When overlapping rate was 100%, we only used dataset A of sample size 400 (or equivalently, D1=D2=A where nA=400). For an overlapping rate of 0%, we only used datasets B and C (i.e., D1=B with nB=400 and D2=C with nC=400). Imputations of missing data and causal effect estimations were then performed simultaneously using MCMC in Stan [18,19].
Assessments of the performance of our method
The performance of our method was evaluated using 4 metrics:
-
mean (posterior mean)
-
standard deviation
-
coverage (the proportion of the times that the 95% credible interval contained the true value of the causal effect)
-
power (the proportion of the times that the 95% credible interval did not contain value zero when the true causal effect was non-zero).
The causal effects of X1,X2 and X3 on the outcome Y (denoted as \(\hat {\boldsymbol {\beta }} = (\hat {\beta _{1}}, \hat {\beta _{2}}, \hat {\beta _{3}})\)) were estimated from data simulated under the alternative hypothesis that β=0.3, and from data simulated under the null hypothesis that β=0. Note that the metric power was only applicable under the alternative hypothesis when β≠0, by its definition. A high value of power indicates high sensitivity of the model to deviations of the data from the null or, equivalently, a low expected probability of false negatives.
We compared our method with the two-stage least squares (2SLS) regression [10] for one-sample MR (100% overlap) and with the inverse-variance weighted (IVW) estimation [8] for two-sample MR (0% overlap). For overlapping samples, our method was compared with IVW, and in the latter, we only included data of non-overlapping samples to make it a two-sample MR.