Model
Let U denote a set of unobserved confounders which could possibly distort the causal relationship between the exposure X and the outcome Y. Let Z be an instrumental SNP (multiple instruments are considered later) which satisfies the following three assumptions ([10]):

A_{1}: Z is associated with X;

A_{2}: Z is independent of U;

A_{3}: Z is independent of Y, conditioning on (X,U).
These assumptions are graphically expressed in Fig. 1. The Z→X arrow represents a nonzero association between the IV and the exposure, in accord with A_{1}. Assumption A_{2} follows from the graph (it would not if there were an arrow directly connects Z and U). A_{3} is also satisfied (it would not if an arrow pointed directly from Z to Y). The X→Y arrow represents the putative causal effect of X on Y, which is our primary interest.
Consider a study with multiple exposures and each instrumental SNP associated with at least one exposure. Without loss of generality, we shall hereafter restrict attention to the case where there are three exposures X=(X_{1},X_{2},X_{3}) and three IVs Z=(Z_{1},Z_{2},Z_{3}). Figure 2 depicts our model and parameter notations, with β_{1},β_{2} and β_{3} representing the putative causal effects of the exposures X_{1},X_{2} and X_{3} on the outcome Y, respectively; δ_{1},δ_{2},δ_{3} and δ_{4} representing the effects of the confounder U on the three exposures and Y, respectively. For simplicity, we assume that there are no direct associations between the exposures and that the IVs are mutually independent. The Z_{1}→X_{1} association is quantified by parameter α_{1} and the Z_{2}→X_{2} association by α_{2}. Both Z_{2} and Z_{3} are associated with X_{3}, and the strengths of these associations are quantified by parameters α_{3} and α_{4} respectively. Z_{1} and Z_{3} are “valid instruments” as they satisfy all the above stated MR assumptions. However, Z_{2} violates assumption A_{3} because the effect of Z_{2} on Y is mediated by both X_{2} and X_{3}  a problem of horizontal pleiotropy [15].
By assuming linearity and additivity of the conditional dependencies, and in accord with the graph of Fig. 1, we specify our model as follows.
$$\begin{array}{*{20}l} \text{\textsl{U}} &\sim N(0, 0.1), \end{array} $$
(1)
$$\begin{array}{*{20}l} X_{1}  Z_{1},\text{\textsl{U}} &\sim N\left(\omega_{1}+\alpha_{1}Z_{1}+\delta_{1}\text{\textsl{U}}, \sigma_{1}^{2}\right), \end{array} $$
(2)
$$\begin{array}{*{20}l} X_{2}  Z_{2},\text{\textsl{U}} &\sim N\left(\omega_{2}+\alpha_{2}Z_{2}+\delta_{2}\text{\textsl{U}}, \sigma_{2}^{2}\right), \end{array} $$
(3)
$$\begin{array}{*{20}l} X_{3}  Z_{2},Z_{3},\text{\textsl{U}} &\sim N\left(\omega_{3}+\alpha_{3}Z_{2} +\alpha_{4}Z_{3}+\delta_{3}\text{\textsl{U}}, \sigma_{3}^{2}\right), \end{array} $$
(4)
$$\begin{array}{*{20}l} Y  X_{1},X_{2},X_{3},\text{\textsl{U}} &\sim N\left(\omega_{Y}+\beta_{1}X_{1}+ \beta_{2}X_{2}\right.\\ &\left.\qquad+\beta_{3}X_{3}+\delta_{4}\text{\textsl{U}}, \sigma_{Y}^{2}\right), \end{array} $$
(5)
where N(a,b) stands for a normal distribution with mean a and variance b. The ω parameters are unknown intercepts and the σs are standard deviations of independent random noise terms. The parameters of primary inferential interest are the βs, each representing the causal effect of a particular exposure on the outcome. The α parameters quantify the strengths of pairwise associations between the instruments and the exposures. They are often referred to as “instrument strengths”, and should be significantly different from zero to attenuate weak instrument bias [16]. Finally, U denotes a sufficient scalar summary of the unobserved confounders, which is set to be drawn from a N(0,0.1) distribution with its parameters not identifiable from the likelihood. This model is built on the basis of the Bayesian approach developed by Berzuini et al. [5], with an extension to multiple exposures and allowing for an IV to be associated with more than one exposure.
Our method
We propose a Bayesian MR method that works in all the sample settings (whether onesample, twosample or overlappingsample), and therefore is more flexible than existing nonBayesian methods in this respect. In essence, our method treats two or overlapping samples as one sample where the data of some (or all) of the individuals are incomplete, in the sense that these individuals have Z and X (but not Y) measured or Z and Y (but not X) measured. Clearly our method allows for quite general patterns of missingness, although it is not our purpose here to explicitly explore their full repertoire. In addition, for simplicity, we illustrate our method by restricting attention to the special case of a continuous outcome variable Y. We assume that all individuals are drawn from the same population and unmeasured data are missing at random. Then whatever are the unobserved variables X or Y, they are treated au pair with the model parameters as unknown quantities, and thus, can be imputed iteratively from their distributions conditioning on the observed variables and estimated parameters using Markov chain Monte Carlo (MCMC) [17]  the best Bayesian tradition.
We start with introducing three disjoint datasets as follows (see Fig. 3a):

Dataset A: all individuals with observed values of Z,X and Y;

Dataset B: all individuals with observed values of Z and X only;

Dataset C: all individuals with observed values of Z and Y only.
No individuals are common in A, B and C but they are drawn independently from the same population. As discussed earlier, A may be collected from a previous study; B and C may represent data from two independent IVexposure and IVoutcome association studies. If we combine A with B
and A with C
$$D_{2} = A \cup C,\vspace*{15pt} $$
the two resulting datasets, D_{1} and D_{2}, will not be disjoint as they have dataset A in common a typical overlappingsample problem. If A is empty, B and C will form a twosample problem.
Our method treats missing data as all other unknown quantities in the model. That is, we impute the missing values of Y in B by randomly drawing a value from the conditional distribution of Y given observed exposures and the estimated parameters at each iteration. Likewise, the missing values of X in C will be imputed by a randomly draw from the conditional distribution of X given the observed instruments Z and the estimated parameters at each iteration. Let Y^{∗} be imputed values of Y and \(\mathbf {X}^{*} = (X_{1}^{*}, X_{2}^{*}, X_{3}^{*})\) be imputed values of X. Given all the datasets A,B and C, we proceed with the following Markov chain scheme.

1.
Set initial values for all the unknown parameters in the model. We also fix the desired number of Markov iterations, with iteration index t.

2.
At the tth iteration, in accord with Fig. 3b, we replace the missing values of Y in B with Y^{∗} which is drawn from a Normal distribution with mean \(\omega _{Y}^{(t)}+\beta _{1}^{(t)}X_{1}+\beta _{2}^{(t)}X_{2}+\beta _{3}^{(t)}X_{3}\) and standard deviation \(\sigma _{Y}^{(t)}\), where (X_{1},X_{2},X_{3}) are observed values in dataset B. Similarly, we replace the missing values of X in C with (\(X_{1}^{*}, X_{2}^{*}, X_{3}^{*}\)) which are drawn from a Normal distribution with mean \(\omega _{1}^{(t)}+\alpha _{1}^{(t)}Z_{1}, \omega _{2}^{(t)}+\alpha _{2}^{(t)}Z_{2}\) and \(\omega _{3}^{(t)}+\alpha _{3}^{(t)}Z_{2}+\alpha _{4}^{(t)}Z_{3}\) respectively, where (Z_{1},Z_{2},Z_{3}) are observed values in dataset C. We use superscript (t) for the random parameters at the tth iteration, with t=0 representing their initial values. Because both U and the random errors have zero mean, they have vanished in this step.

3.
Merge all the data, imputed and observed, into a new complete dataset (Fig. 3c).

4.
Estimate model parameters based on the complete dataset obtained in Step 3 using MCMC and set t←t+1.

5.
Repeat Steps 24 until t equals the number of iterations specified in Step 1.
The priors
Here we discuss our choice of prior distributions of the unknown parameters involved in Models (1)(5). Let the priors for β=(β_{1},β_{2},β_{3}) be independently normally distributed with mean 0 and standard deviation 10
$$\left(\begin{array}{c} \beta_{1} \\ \beta_{2} \\ \beta_{3} \\ \end{array} \right) \sim N\left[\left(\begin{array}{c} 0 \\ 0 \\ 0 \\ \end{array} \right),\left(\begin{array}{ccc} 10^{2} & 0 & 0 \\ 0 & 10^{2} & 0 \\ 0 & 0 & 10^{2} \\ \end{array} \right)\right], $$
and the IV strength parameters α=(α_{1},α_{2},α_{3},α_{4}) be independently normally distributed with mean 1 and standard deviation 0.3
$$\left(\begin{array}{c} \alpha_{1} \\ \alpha_{2} \\ \alpha_{3} \\ \alpha_{4} \\ \end{array} \right) \sim N\left[\left(\begin{array}{c} 1 \\ 1 \\ 1 \\ 1 \\ \end{array} \right),\left(\begin{array}{cccc} {0.3}^{2} & 0 & 0 & 0 \\ 0 & {0.3}^{2} & 0 & 0 \\ 0 & 0 & {0.3}^{2} & 0 \\ 0 & 0 & 0 & {0.3}^{2} \\ \end{array} \right)\right]. $$
U has been assumed to follow a normal distribution U∼N(0,0.1) in Model (1). The priors of the standard deviation parameters σ_{1},σ_{2},σ_{3} and σ_{Y} are assumed to follow a same inversegamma distribution σ(·)∼InvGamma(3, 2).
Simulations
In our simulations, according to Models (1)(5), we consider a total of 72 configurations including

6 sample overlapping rates (100%, 80%, 60%, 40%, 20%, 0%)

2 IV strengths: α=(α_{1},α_{2},α_{3},α_{4})=0.5 and 0.1

3 levels of the confounding effects of U on X: (δ_{1},δ_{2},δ_{3})=1,0.5 and 0.1

2 levels of causal effects of X on Y: β=(β_{1},β_{2},β_{3})=0.3 and 0.
The effect of U on Y is set to 1 (δ_{4}=1). In each configuration, we simulated 200 datasets, step by step, as follows.

1.
Generate a dataset H which contains observed IVs (Z), exposures (X) and outcome (Y) from 1000 independent individuals.

2.
Randomly sample n_{A} individuals without replacement from H and take their observations of (Z,X,Y) as dataset A;

3.
Randomly Sample n_{B} individuals without replacement from H−A={x∈H,x∉A} and take their observations of (Z,X) as dataset B;

4.
Randomly sample n_{C} individuals without replacement from H−A−B={x∈H,x∉A∪B} and take their observations of (Z,Y) as dataset C.
The sample size of B was set to be the same as that of C (i.e., n_{B}=n_{C}) and the sample size of both D_{1}=A∪B and D_{2}=A∪C to 400. The overlapping rate was defined as the percentage of the number of individuals from A (n_{A}) in D_{1} (or equivalently, in D_{2}). For example, if overlapping rate was 80%, then n_{A}=320 and n_{B}=n_{C}=80. Similarly, for an overlapping rate of 60%, n_{A}=240 and n_{B}=n_{C}=160. When overlapping rate was 100%, we only used dataset A of sample size 400 (or equivalently, D_{1}=D_{2}=A where n_{A}=400). For an overlapping rate of 0%, we only used datasets B and C (i.e., D_{1}=B with n_{B}=400 and D_{2}=C with n_{C}=400). Imputations of missing data and causal effect estimations were then performed simultaneously using MCMC in Stan [18,19].
Assessments of the performance of our method
The performance of our method was evaluated using 4 metrics:

mean (posterior mean)

standard deviation

coverage (the proportion of the times that the 95% credible interval contained the true value of the causal effect)

power (the proportion of the times that the 95% credible interval did not contain value zero when the true causal effect was nonzero).
The causal effects of X_{1},X_{2} and X_{3} on the outcome Y (denoted as \(\hat {\boldsymbol {\beta }} = (\hat {\beta _{1}}, \hat {\beta _{2}}, \hat {\beta _{3}})\)) were estimated from data simulated under the alternative hypothesis that β=0.3, and from data simulated under the null hypothesis that β=0. Note that the metric power was only applicable under the alternative hypothesis when β≠0, by its definition. A high value of power indicates high sensitivity of the model to deviations of the data from the null or, equivalently, a low expected probability of false negatives.
We compared our method with the twostage least squares (2SLS) regression [10] for onesample MR (100% overlap) and with the inversevariance weighted (IVW) estimation [8] for twosample MR (0% overlap). For overlapping samples, our method was compared with IVW, and in the latter, we only included data of nonoverlapping samples to make it a twosample MR.