Overlapping-sample Mendelian randomisation with multiple exposures: a Bayesian approach

Background Mendelian randomization (MR) has been widely applied to causal inference in medical research. It uses genetic variants as instrumental variables (IVs) to investigate putative causal relationship between an exposure and an outcome. Traditional MR methods have mainly focussed on a two-sample setting in which IV-exposure association study and IV-outcome association study are independent. However, it is not uncommon that participants from the two studies fully overlap (one-sample) or partly overlap (overlapping-sample). Methods We proposed a Bayesian method that is applicable to all the three sample settings. In essence, we converted a two- or overlapping- sample MR to a one-sample MR where data were partly unmeasured. Assume that all study individuals were drawn from the same population and unmeasured data were missing at random. Then the missing data were treated au pair with the model parameters as unknown quantities, and thus, were imputed iteratively conditioning on the observed data and estimated parameters using Markov chain Monte Carlo. We generalised our model to allow for pleiotropy and multiple exposures and assessed its performance by a number of simulations using four metrics: mean, standard deviation, coverage and power. We also compared our method with classic MR methods. Results In our proposed method, higher sample overlapping rate and instrument strength led to more precise estimated causal effects with higher power. Pleiotropy had a notably negative impact on the estimates. Nevertheless, the coverages were high and our model performed well in all the sample settings overall. In comparison with classic MR, our method provided estimates with higher precision. When the true causal effects were non-zero, power of their estimates was consistently higher from our method. The performance of our method was similar to classic MR in terms of coverage. Conclusions Our model offers the flexibility of being applicable to any of the sample settings. It is an important addition to the MR literature which has restricted to one- or two- sample scenarios. Given the nature of Bayesian inference, it can be easily extended to more complex MR analysis in medical research.


Background
Many questions in medical research address putative causal nature of the relationship between a clinical outcome and a corresponding risk factor or exposure. Randomised controlled trials (RCTs) are ideal for this purpose, but are often infeasible due to cost or ethical considerations. With the aid of proper statistical methodology, causal inference can be made from observational studies. This is a context where Mendelian randomization (MR) [1][2][3] can play an important role. MR uses genetic variants as instrumental variables (IVs) to estimate the causal effect of an exposure on an outcome of interest [4,5]. Without loss of generality, we assume the IVs are single nucleotide polymorphisms (SNPs).
MR analysis requires data from two association studies: IV-exposure and IV-outcome. Most of MR methods [6][7][8][9] have focussed on a two-sample scenario where the IV-exposure and IV-outcome associations were estimated separately from independent studies. In other words, there were no shared individuals between these two studies. A limited number of one-sample MR methods, if a single group of individuals come with a complete set of measurements (IVs, exposure and outcome), are also available in the literature [5,[10][11][12]. Little attention has, however, been devoted to overlapping samples where the two studies have a subset of individuals in common. For example, an investigator has identified a number of SNPs associated with diabetes from a genome-wide association study. Her interest is in whether expression levels of certain genes are causal risk factors of diabetes. But gene expression can be measured only from a subset of participants of the SNP-diabetes association study due to cost. Another example is that the investigator has collected data from two small independent IV-BMI and IV-diabetes association studies, with the aim to estimate the causal effect of BMI on diabetes using MR. To enhance statistical power, a natural way is to incorporate previous studies that have complete sets of observed data on IV, BMI and diabetes into her current studies.
To the best of our knowledge, few studies have investigated overlapping-sample MR. Burgess et al. [13] have shown that sample overlap increases type I error and leads to bias in classic MR methods. LeBlanc et al. [14] have developed a correction method of overlapping samples by decorrelation. There is a pressing need for the development of a flexible MR approach that can be used for one-, two-as well as overlapping-samples. Here we propose a novel MR method based on the Bayesian framework introduced by Berzuini et al [5]. Our method preserves data information while avoiding bias. It exploits two simple, but powerful, ideas: (i) the overlapping-sample problem is a special case of the more general situation where some or all the individuals have missing values in exposure or in outcome; and (ii) a Bayesian approach can coherently deal with missing data by treating them as additional parameters which can be estimated from the data. In this paper we introduce our method and assess its performance by comparing it with classic MR in a simulation study. A further advantage of Bayesian MR is freedom to elaborate the model in various directions of interest. This feature we illustrate by moving away from the standard singleexposure MR model and considering multiple exposures instead.

Model
Let U denote a set of unobserved confounders which could possibly distort the causal relationship between the exposure X and the outcome Y. Let Z be an instrumental SNP (multiple instruments are considered later) which satisfies the following three assumptions ( [10]): These assumptions are graphically expressed in Fig. 1. The Z → X arrow represents a non-zero association between the IV and the exposure, in accord with A 1 . Assumption A 2 follows from the graph (it would not if there were an arrow directly connects Z and U ). A 3 is also satisfied (it would not if an arrow pointed directly from Z to Y ). The X → Y arrow represents the putative causal effect of X on Y, which is our primary interest.
Consider a study with multiple exposures and each instrumental SNP associated with at least one exposure. Without loss of generality, we shall hereafter restrict attention to the case where there are three exposures X = (X 1 , X 2 , X 3 ) and three IVs Z = (Z 1 , Z 2 , Z 3 ). Figure 2 depicts our model and parameter notations, with β 1 , β 2 and β 3 representing the putative causal effects of the exposures X 1 , X 2 and X 3 on the outcome Y, respectively; δ 1 , δ 2 , δ 3 and δ 4 representing the effects of the confounder U on the three exposures and Y, respectively. For simplicity, we assume that there are no direct associations between the exposures and that the IVs are mutually independent. The Z 1 → X 1 association is quantified by parameter α 1 and the Z 2 → X 2 association by α 2 . Both Z 2 and Z 3 are associated with X 3 , and the strengths of these associations are quantified by parameters α 3 and α 4 respectively. Z 1 and Z 3 are "valid instruments" as they satisfy all the above stated MR assumptions. However, Z 2 violates assumption A 3 because the effect of Z 2 on Y is mediated by both X 2 and X 3 -a problem of horizontal pleiotropy [15].
By assuming linearity and additivity of the conditional dependencies, and in accord with the graph of Fig. 1, we specify our model as follows.  1 Schematic representation of the three assumptions required in Mendelian randomization. 1) Instrumental variable Z is associated with the exposure X; 2) Z is independent of the confounder U ; 3) Z is independent of the outcome Y, conditioning on X and U where N(a, b) stands for a normal distribution with mean a and variance b. The ω parameters are unknown intercepts and the σ s are standard deviations of independent random noise terms. The parameters of primary inferential interest are the βs, each representing the causal effect of a particular exposure on the outcome. The α parameters quantify the strengths of pairwise associations between the instruments and the exposures. They are often referred to as "instrument strengths", and should There are three instrumental variables (Z 1 , Z 2 , Z 3 ) and three exposures (X 1 , X 2 , X 3 ). Z 1 is associated with X 1 only. Z 3 is associated with X 3 only. Z 2 is associated with both X 2 and X 3 . For simplicity, we assume there is no direct associations among the exposures and the instrumental variables are mutually independent be significantly different from zero to attenuate weak instrument bias [16]. Finally, U denotes a sufficient scalar summary of the unobserved confounders, which is set to be drawn from a N(0, 0.1) distribution with its parameters not identifiable from the likelihood. This model is built on the basis of the Bayesian approach developed by Berzuini et al. [5], with an extension to multiple exposures and allowing for an IV to be associated with more than one exposure.

Our method
We propose a Bayesian MR method that works in all the sample settings (whether one-sample, two-sample or overlapping-sample), and therefore is more flexible than existing non-Bayesian methods in this respect. In essence, our method treats two-or overlapping-samples as one sample where the data of some (or all) of the individuals are incomplete, in the sense that these individuals have Z and X (but not Y ) measured or Z and Y (but not X) measured. Clearly our method allows for quite general patterns of missingness, although it is not our purpose here to explicitly explore their full repertoire. In addition, for simplicity, we illustrate our method by restricting attention to the special case of a continuous outcome variable Y. We assume that all individuals are drawn from the same population and unmeasured data are missing at random. Then whatever are the unobserved variables X or Y, they are treated au pair with the model parameters as unknown quantities, and thus, can be imputed iteratively from their distributions conditioning on the observed variables and estimated parameters using Markov chain Monte Carlo (MCMC) [17] -the best Bayesian tradition. We start with introducing three disjoint datasets as follows (see Fig. 3a): No individuals are common in A, B and C but they are drawn independently from the same population. As discussed earlier, A may be collected from a previous study; B and C may represent data from two independent IV-exposure and IV-outcome association studies. If we combine A with B the two resulting datasets, D 1 and D 2 , will not be disjoint as they have dataset A in common-a typical overlappingsample problem. If A is empty, B and C will form a twosample problem.
Our method treats missing data as all other unknown quantities in the model. That is, we impute the missing values of Y in B by randomly drawing a value from the conditional distribution of Y given observed exposures and the estimated parameters at each iteration. Likewise, the missing values of X in C will be imputed by a randomly draw from the conditional distribution of X given the observed instruments Z and the estimated parameters at each iteration. Let Y * be imputed values of Y and Given all the datasets A, B and C, we proceed with the following Markov chain scheme.

Set initial values for all the unknown parameters in
the model. We also fix the desired number of Markov iterations, with iteration index t. 2. At the tth iteration, in accord with Fig. 3b, we replace the missing values of Y in B with Y * which is drawn from a Normal distribution with mean We use superscript (t) for the random parameters at the tth iteration, with t = 0 representing their initial values. Because both U and the random errors have zero mean, they have vanished in this step. 3. Merge all the data, imputed and observed, into a new complete dataset (Fig. 3c). 4. Estimate model parameters based on the complete dataset obtained in Step 3 using MCMC and set t ← t + 1.

Repeat Steps 2-4 until t equals the number of
iterations specified in Step 1.

The priors
Here we discuss our choice of prior distributions of the unknown parameters involved in Models (1)- (5). Let the priors for β = (β 1 , β 2 , β 3 ) be independently normally distributed with mean 0 and standard deviation 10 ⎛ U has been assumed to follow a normal distribution U ∼ N(0, 0.1) in Model (1). The priors of the standard deviation parameters σ 1 , σ 2 , σ 3 and σ Y are assumed to follow a same inverse-gamma distribution σ (·) ∼ Inv-Gamma (3,2).
1. Generate a dataset H which contains observed IVs (Z), exposures (X) and outcome (Y ) from 1000 independent individuals. 2. Randomly sample n A individuals without replacement from H and take their observations of (Z, X, Y ) as dataset A ; 3. Randomly Sample n B individuals without replacement from H − A = {x ∈ H, x / ∈ A} and take their observations of (Z, X) as dataset B ; 4. Randomly sample n C individuals without replacement from H − A − B = {x ∈ H, x / ∈ A ∪ B} and take their observations of (Z, Y ) as dataset C.
The sample size of B was set to be the same as that of C (i.e., n B = n C ) and the sample size of both D 1 = A ∪ B and D 2 = A∪C to 400. The overlapping rate was defined as the percentage of the number of individuals from A (n A ) in D 1 (or equivalently, in D 2 ). For example, if overlapping rate was 80%, then n A = 320 and n B = n C = 80. Similarly, for an overlapping rate of 60%, n A = 240 and n B = n C = 160. When overlapping rate was 100%, we only used dataset A of sample size 400 (or equivalently, D 1 = D 2 = A where n A = 400). For an overlapping rate of 0%, we only used datasets B and C (i.e., D 1 = B with n B = 400 and D 2 = C with n C = 400). Imputations of missing data and causal effect estimations were then performed simultaneously using MCMC in Stan [18,19].

Assessments of the performance of our method
The performance of our method was evaluated using 4 metrics: • mean (posterior mean) • standard deviation • coverage (the proportion of the times that the 95% credible interval contained the true value of the causal effect) • power (the proportion of the times that the 95% credible interval did not contain value zero when the true causal effect was non-zero).
The causal effects of X 1 , X 2 and X 3 on the outcome Y (denoted asβ = (β 1 ,β 2 ,β 3 )) were estimated from data simulated under the alternative hypothesis that β = 0.3, and from data simulated under the null hypothesis that β = 0. Note that the metric power was only applicable under the alternative hypothesis when β = 0, by its definition. A high value of power indicates high sensitivity of the model to deviations of the data from the null or, equivalently, a low expected probability of false negatives.
We compared our method with the two-stage least squares (2SLS) regression [10] for one-sample MR (100% overlap) and with the inverse-variance weighted (IVW) estimation [8] for two-sample MR (0% overlap). For overlapping samples, our method was compared with IVW, and in the latter, we only included data of non-overlapping samples to make it a two-sample MR. Table 1 displays a summary of results obtained from the simulated data under the alternative hypothesis (β = 0.3). Each row of the table corresponds to a configuration of specified sample overlapping rate, instrument strength α and degree of confounding δ. Columns are values of the four metrics of the estimated causal effectsβs obtained from our method and classic MR. Let us first focus on our method (grey columns "Bayesian"). If overlapping rate was high (80% or over), both the coverage and the power of theβs were consistently high (≥0.93). When the sample overlap was 60% or under, strong IVs (α = 0.5) resulted in high statistical power (=1) while the impact of reduced IV strength (α = 0.1) became detrimental, even more so when overlapping rate was reduced. When IVs were relatively weak (α = 0.1), power increased as confounding effects δ decreased. Overall, where there was pleiotropy (in estimating β 2 because there were two competing paths from Z 2 to Y via X 2 and X 3 ) the results were notably worse than those without pleiotropy (in estimating β 1 and β 3 ). As α increased, the averages ofβs (mean) became closer to the true value 0.3 and the variations (sd) decreased, concluding that higher IV strength reduced bias and resulted in more precise estimates. We note that despite the greater posterior uncertainty for β 2 across all the configurations (indicated in the "sd" column), the posterior mean for this parameter remained relatively close to the true value. This is evidence that our model is reliable and "honest about uncertainty". Compared with the classic MR, our method showed higher precision in estimated causal effect in all configurations ("sd" columns). When IV strength became weaker (α = 0.1), our method led to higher power throughout. With highly overlapped samples (80%), even for strong IVs, power of the estimates in classic MR (IVW) was relatively low, partly due to smaller sample size after removing data of overlapping samples. Table 2 shows the results of the simulated data under the null (β = 0). Again, we first report results from our method (grey columns "Bayesian"). The coverages of the three estimated causal effectsβs were high (>0.9) in all the configurations, indicating that the method does not produce a large number of false positives -an important property of MR analysis. Therefore, none of the sample overlapping rate, IV strength and confounding effect has had much negative influence on the results, although there seemed to be an increasing trend in bias and varia- Table 1 Causal effects estimated from simulated data using our Bayesian method and classic methods (2SLS when sample overlap = 100% and IVW otherwise) when  Table 2 Causal effects estimated from simulated data using our Bayesian method and classic methods (2SLS when sample overlap = 100% and IVW otherwise) when is set to 1 tion when IV strength decreased. In comparison with classic methods, our method provided estimates with higher precision (smaller values of "sd"), but similar coverage. Figures 4 and 5 depict, respectively, distributions ofβs for each of the IV-strength and overlapping rate configurations when the confounding effects δ = 1, for β = 0.3 and β = 0. Each box represents distribution of the estimated causal effect β based on 200 simulated datasets. Higher overlapping rate led to a gain in precision from our Bayesian method, which can be observed in all panels across the figures as the blue boxes became narrower in height as the sample overlap increased and reached minimum in the one-sample scenario (overlapping rate: 1). It is also shown that estimates were more precise when instruments were stronger (top panels vs bottom panels). Blue boxes were narrower than red boxes in height in all the panels, indicating that our method consistently outperformed classic MR in precision. Interestingly, in classic methods, variations of the estimates in one sample were smaller than those in two-and overlapping-samples, which is probably because no data were excluded in the one-sample 2SLS MR. For two-and overlapping-samples, we adopted two-sample IVW which used summary statistics and required data removal of overlapping samples. Consequently, the sample size in IVW was reduced to some extent for overlapping-samples, and the higher the overlap rate, the lower the precision.

Results
Power ofβ for each IV-strength and overlapping rate configurations when the confounding effects δ = 1 under the alternative β = 0.3 is presented in Fig. 6. In Bayesian method, power was always equal to 1 for strong IVs (top panels) and gradually became lower as overlapping rate decreased for weak IVs (bottom panels). Compared with our method, estimates in classic MR had lower power, especially when IVs were weak. In classic MR, similar to the trend of precision in Fig. 4, power was the highest for one sample and lowest for overlapping samples with highest overlap 80%. Figure 7 displays the coverage ofβ for the same configurations and confounding effects as in Fig. 6 under the alternative β = 0.3. Overall, coverage was high (>0.85) in both of the MR methods. Bayesian method performed better for high overlapping rate while classic MR was better for low overlapping rate. Under the alternative hypothesis, however, it would be sensible to focus on power rather than coverage. When β = 0, it is seen from Fig. 8 that the coverage ofβ was high (>0.91) in both Bayesian and classic MR. Our method consistently outperformed classic MR in power but not in coverage. This is partly because our method provides very precise estimates (narrow 95% credible intervals). Thus, even for Fig. 4 Box plots of causal effects estimated from simulated data using our Bayesian method and classic methods (2SLS when sample overlapping rate = 1 and IVW otherwise) when β = (β 1 , β 2 , β 3 ) = 0.3 and δ = (δ 1 , δ 2 , δ 3 , δ 4 ) = 1. Results with two instrument strengths (top panels: α = (α 1 , α 2 , α 3 ) = 0.5 and bottom panels: α = 0.1) are displayed, from left to right, for estimated causal effect of X 1 , X 2 and X 3 on Y, denoted asβ 1 , β 2 andβ 3 respectively. Each panel consists results of six different sample overlapping rates ranging from 0 (leftmost) to 1 (rightmost) and each box plot represents causal effect estimated from 200 simulated datasets Fig. 5 Box plots of causal effects estimated from simulated data using our Bayesian method and classic methods (2SLS when sample overlapping rate = 1 and IVW otherwise) when β = (β 1 , β 2 , β 3 ) = 0 and δ = (δ 1 , δ 2 , δ 3 , δ 4 ) = 1. Results with two instrument strengths (top panels: α = (α 1 , α 2 , α 3 ) = 0.5 and bottom panels: α = 0.1) are displayed, from left to right, for estimated causal effect of X 1 , X 2 and X 3 on Y, denoted asβ 1 , β 2 andβ 3 respectively. Each panel consists results of six different sample overlapping rates ranging from 0 (leftmost) to 1 (rightmost) and each box plot represents causal effect estimated from 200 simulated datasets Fig. 6 Power from simulated data using our Bayesian method and classic methods (2SLS when sample overlapping rate = 1 and IVW otherwise) when β = (β 1 , β 2 , β 3 ) = 0.3 and δ = (δ 1 , δ 2 , δ 3 , δ 4 ) = 1. Results with two instrument strengths (top panels: α = (α 1 , α 2 , α 3 ) = 0.5 and bottom panels: α = 0.1) are displayed, from left to right, for estimated causal effect of X 1 , X 2 and X 3 on Y, denoted asβ 1 ,β 2 andβ 3 respectively. Each panel consists results of six different sample overlapping rates ranging from 0 (leftmost) to 1 (rightmost) and each box plot represents causal effect estimated from 200 simulated datasets Fig. 7 Coverage from simulated data using our Bayesian method and classic methods (2SLS when sample overlapping rate = 1 and IVW otherwise) when β = (β 1 , β 2 , β 3 ) = 0.3 and δ = (δ 1 , δ 2 , δ 3 , δ 4 ) = 1. Results with two instrument strengths (top panels: α = (α 1 , α 2 , α 3 ) = 0.5 and bottom panels: α = 0.1) are displayed, from left to right, for estimated causal effect of X 1 , X 2 and X 3 on Y, denoted asβ 1 ,β 2 andβ 3 respectively. Each panel consists results of six different sample overlapping rates ranging from 0 (leftmost) to 1 (rightmost) and each box plot represents causal effect estimated from 200 simulated datasets Fig. 8 Coverage from simulated data using our Bayesian method and classic methods (2SLS when sample overlapping rate = 1 and IVW otherwise) when β = (β 1 , β 2 , β 3 ) = 0 and δ = (δ 1 , δ 2 , δ 3 , δ 4 ) = 1. Results with two instrument strengths (top panels: α = (α 1 , α 2 , α 3 ) = 0.5 and bottom panels: α = 0.1) are displayed, from left to right, for estimated causal effect of X 1 , X 2 and X 3 on Y, denoted asβ 1 ,β 2 andβ 3 respectively. Each panel consists results of six different sample overlapping rates ranging from 0 (leftmost) to 1 (rightmost) and each box plot represents causal effect estimated from 200 simulated datasets a slightly biased estimate, it is likely the credible interval does not include the true value, which will affect its performance in coverage.

Discussion
In this paper we have proposed a Bayesian method that effectively enables a one-sample MR whether there are two overlapping samples or disjoint samples. It is noteworthy that our method has the best performance in one-sample setting, which is unsurprising because 1) our method has been developed on the basis of a Bayesian approach tailored for one-sample MR [5]; 2) the percentage of imputed data becomes lower as overlapping rate increases, and consequently, the uncertainty in data decreases. As discussed, our method provides the flexibility of combining data from previous studies with current ones to enhance statistical power, which is particularly advantageous because MR studies are often underpowered partly due to small sample sizes. It is also shown from our simulation results that pleiotropy (effect of Z 2 on Y mediated by both X 2 and X 3 ) resulted in the worst estimated causal effect, because it involves two completing paths which led to higher uncertainty in estimation. This is, however, an issue commonly seen in MR studies [20,21].
The precision of the estimates from our proposed method was higher than that from the classic MR. Our method also resulted in higher power of the estimates under the alternative hypothesis where the true causal effects were different from zero. Coverage of the estimates from both methods was similar under the null as well as the alternative hypotheses.
Our method is limited by the fact that we have focussed on a simple model with only three IVs and three exposures and a moderate number of configurations, although the configurations can never be exhaustive. In the real world, we often encounter much more complex data in which there are possibly many (weak and/or correlated) IVs and (correlated) exposures and outcomes, which triggers a research topic of variable selections in future MR methodology [22]. Bayesian approach, however, is well known for its flexibility of building various models to address different scientific questions.

Conclusions
We have developed a Bayesian MR that can be applied to any of the sample settings by means of treating missing data as unknown parameters which can then be imputed using MCMC. This is an important addition to the existing MR literature where some methods can only be used for one-sample and others for two-sample settings. Because of the nature of Bayesian inference, our model can be easily extended to tackle more complex MR analysis in medical research.