 Research
 Open Access
 Published:
Bayesian mendelian randomization with study heterogeneity and data partitioning for large studies
BMC Medical Research Methodology volume 22, Article number: 162 (2022)
Abstract
Background
Mendelian randomization (MR) is a useful approach to causal inference from observational studies when randomised controlled trials are not feasible. However, study heterogeneity of two association studies required in MR is often overlooked. When dealing with large studies, recently developed Bayesian MR can be computationally challenging, and sometimes even prohibitive.
Methods
We addressed study heterogeneity by proposing a random effect Bayesian MR model with multiple exposures and outcomes. For large studies, we adopted a subset posterior aggregation method to overcome the problem of computational expensiveness of Markov chain Monte Carlo. In particular, we divided data into subsets and combined estimated causal effects obtained from the subsets. The performance of our method was evaluated by a number of simulations, in which exposure data was partly missing.
Results
Random effect Bayesian MR outperformed conventional inversevariance weighted estimation, whether the true causal effects were zero or nonzero. Data partitioning of large studies had little impact on variations of the estimated causal effects, whereas it notably affected unbiasedness of the estimates with weak instruments and high missing rate of data. For the cases being simulated in our study, the results have indicated that the “divide (data) and combine (estimated subset causal effects)” can help improve computational efficiency, for an acceptable cost in terms of bias in the causal effect estimates, as long as the size of the subsets is reasonably large.
Conclusions
We further elaborated our Bayesian MR method to explicitly account for study heterogeneity. We also adopted a subset posterior aggregation method to ease computational burden, which is important especially when dealing with large studies. Despite the simplicity of the model we have used in the simulations, we hope the present work would effectively point to MR studies that allow modelling flexibility, especially in relation to the integration of heterogeneous studies and computational practicality.
Background
Mendelian randomization (MR) [1–3] is a useful approach to causal inference from observational studies when randomised controlled trials are not feasible. It uses genetic variants as instrumental variables (IVs) to explore putative causal relationship between an exposure and an outcome. Conventional MR methods [4–11] have mainly used summary statistics of IVexposure association and IVoutcome association analyses, from a single study (onesample) or two independent studies (twosample). Among recent developments of MR methods, a Bayesian approach [8, 12] has been proposed to tackle overlapping samples in which a subset of participants are common in the two association studies. This comes from the idea that overlapping and two sample settings can be treated as cases of missing data which can be imputed through Markov chain Monte Carlo (MCMC) while estimating causal effects of interest. This way, we take full advantage of all the observed and imputed data. Bayesian MR also offers great flexibility of modelling complex data structure and explicitly quantifies uncertainties of model parameters.
It is not uncommon that studies from different research groups are designed to address similar (but not exactly the same) scientific questions. For example, in a genomewide association study (Study 1), data of genetic variants and hypertension status (outcome) are collected to identify outcomeassociated genetic variants. In another independent study (Study 2), besides this aim, the investigator is also interested in causal effect of blood pressure medication (exposure) on hypertension. Therefore, exposure information is also recorded. To investigate the exposureoutcome causal relationship, a conventional option would be onesample MR using data from Study 2 only. Another option would be a twosample MR which will use genetic variants and the outcome data from Study 1, and genetic variants and the exposure data from Study 2. In other words, the outcome data of Study 2 will be discarded. Both of the options involve removal of data which, in our view, is not necessary. In fact, we can combine observed data from the two studies, and impute exposure data for Study 1 in a Bayesian MR model. However, it is well possible that the two studies are not homogenous, which should be taken into consideration in our modelling.
Another important aspect of Bayesian MR analysis (in fact, all kinds of data analysis) is tractability of computation, as we are in the era of big data. MCMC requires a large number of iterations and a complete scan of data for each iteration [13]. Thus, it can be computationally challenging, and sometimes even prohibitive. An intuitive solution would be dividing data into a number of subsets and enabling data analysis in parallel.
This paper aims to address study heterogeneity and data partitioning for large studies in Bayesian MR. First, we build a Bayesian MR model including multiple IVs, exposures and outcomes based on two independent studies, of which one has exposure data completely missing. To account for study heterogeneity, we propose a random effect model. Second, a data partitioning and subset posterior aggregation method [13] is adopted for analysis of large studies. Third, simulation experiments are carried out for different configurations of IV strength and missing rate of exposure data, followed by evaluation of our proposed method.
Methods
Bayesian MR with study heterogeneity
Let X denote the exposure, Y the outcome, and U a scalar variable summarising the set of unobserved confounders of the relationship between X and Y. Traditional MR [9] requires that an IV (denoted by Z) is : i) associated with the exposure X, ii) not associated with the confounders U, and iii) associated with the outcome Y only through the exposure X. These three assumptions can be graphically expressed as Fig. 1 in which our interest is whether X causes Y (the X→Y arrow). For the purpose of illustration, we consider the data generating process shown in Fig. 2. Z_{1},Z_{2} and Z_{3} are vectors consisting of L,K and M independent IVs respectively. Random scalar variables X_{1} and X_{2} represent two exposures. Random scalar variables Y_{1} and Y_{2} represent two outcomes.
In a twosample (or equivalently, twostudy) MR setting with or without overlapping individuals, it has been shown that, compare to conventional MR analysis, a Bayesian approach may lead to more precise estimates of the causal effect by treating it as a case of incomplete data which may be dealt with through iterative imputations using MCMC [12]. Here, we further generalize the approach by allowing for some degree of heterogeneity between different studies.
Suppose we have data collected from two independent studies:

Study A  observed data for IVs, exposures and outcomes {Z_{1},Z_{2},Z_{3},X_{1},X_{2},Y_{1},Y_{2}}.

Study B  observed data for IVs and outcomes {Z_{1},Z_{2},Z_{3},Y_{1},Y_{2}} only.
Study A includes fully observed data for MR, whereas Study B has exposure data completely missing. We shall include random effect terms in our MR model to capture study heterogeneity. By assuming standardised observed variables and linear additivity, according to Fig. 2, our models are constructed as follows.
For Study A,
For Study B,
In the above prespecified models, αs are instrument strength parameters, and δs are effects of U on Xs or Ys. Causal effects of Xs on Ys are denoted by βs. The study heterogeneity is accounted for by Vs. Note that X_{1} and X_{2} do not have observed data in Study B, but they are part of data generating process, and thus, should be included in the model. U is a sufficient scalar summary of the unobserved confounders. We assume that U∼N(0,1).
The combined dataset of StudiesA and B (\(\mathcal {D}\), say) will contain fully observed data for the instruments and the outcomes. However, all participants in Study B have missing data of X_{1} and X_{2} which will be treated as unknown quantities and imputed from their conditional distributions given the observed data and current estimated parameters using MCMC. Let X^{∗} be imputed values of X. Our approach involves the following sequence of five steps.

1.
Specify initial values for unknown parameters and the number of Markov iterations T.

2.
At the tth iteration, where 0≤t<T, let missing values of X_{1} and X_{2} in Study B be filled with \(X_{1}^{*}\) drawn from \(N\left (V_{X_{1}}^{(t)} + \boldsymbol {\alpha }_{1}^{(t)}\mathbf {Z}_{1} + \boldsymbol {\alpha }_{31}^{(t)}\mathbf {Z}_{3} + \delta _{X_{1}}^{(t)}U, {\sigma _{X_{1B}}^{2}}^{(t)}\right)\) and \(X_{2}^{*}\) drawn from \(N\left (V_{X_{2}}^{(t)} + \boldsymbol {\alpha }_{2}^{(t)}\mathbf {Z}_{2} + \boldsymbol {\alpha }_{32}^{(t)}\mathbf {Z}_{3} + \delta _{X_{2}}^{(t)}U, {\sigma _{X_{2B}}^{2}}^{(t)}\right)\), respectively. Z_{1},Z_{2} and Z_{3} are observed values of IVs in Study B.

3.
Create a single complete dataset including both the observed and the imputed data.

4.
Estimate model parameters using MCMC based on the complete dataset and set t←t+1.

5.
Repeat Steps 24 until t=T.
Now we specify priors in the Bayesian model (2)(10). Previous GWAS studies show that individual SNPs explain a tiny proportion of exposure variance [14–16], corresponding to small magnitudes of the α parameters in our model. In accord with this finding and previous MR simulation studies [17], we set IV strength parameters αs to be independent and identically distributed with mean zero and a small variance: α_{1}∼N_{L}(0,0.3^{2}I),α_{2}∼N_{K}(0,0.3^{2}I),α_{31}∼N_{M}(0,0.3^{2}I), and α_{32}∼N_{M}(0,0.3^{2}I). The priors of both β_{1} and β_{2} are set to a same distribution N(0,10^{2}). Finally, we assign the priors of the standard deviations σs to a same inversegamma distribution InvGamma(3,2), and random effects Vs to N(0,1) in the Model (7)(10) for Study B.
Bayesian MR for large studies
Advantages of an MCMCpowered Bayesian approach to MR are counterpoised by a relatively higher computational burden and a possibly large memory requirement. A natural way of dealing with this problem would be to divide data \(\mathcal {D}\) into a number (J, say) of subsets D_{1},D_{2},...,D_{J} that we assume to contain an equal number (q, say) of individuals for simplicity. By running separate Bayesian MR analyses in parallel on the subsets, we will obtain J subsetspecific posteriors which can then be aggregated in various ways. In this study, we adopt a “divideandcombine” approach proposed by Xue and Liang [13].
Let θ denote the entire set of unknown quantities in the model. For subset D_{j}, where j=1,2,...,J, let π(θD_{j}) denote the joint posterior distribution of θ and \(\widehat {\boldsymbol {\mu }}_{j} = \widehat {E}(\boldsymbol {\theta }D_{j})\) the corresponding estimated mean vector. Let \(\widehat {\boldsymbol {\mu }} = \frac {1}{J}\sum _{j=1}^{J}\widehat {\boldsymbol {\mu }}_{j}\) be the average of the \(\widehat {\boldsymbol {\mu }}_{j}\)s. According to [13], the posterior based on full data, \(\pi (\boldsymbol {\theta }\mathcal {D})\), can be estimated as the average of the recentred subset posteriors.
And it has been proved that ([13])
and
where q is the sample size of the subsets and n the sample size of the full dataset. \(E_{\widetilde {\pi }}(\boldsymbol {\theta })\) and E_{π}(θ) are expectations of the posteriors of θ aggregated from subsets and obtained from full data respectively. \(Var_{\widetilde {\pi }}(\boldsymbol {\theta })\) and Var_{π}(θ) are their variances. It is easily seen that the difference in expectation depends on the sample size of the subsets and the difference in variation depends on the sample size of the full dataset.
Simulations  Bayesian MR with study heterogeneity
We used simulated data to evaluate our Bayesian MR model with study heterogeneity in comparison with a conventional MR method. In particular, we considered 12 configurations including

3 missing rates of the exposures: 20%, 50%, 80%

2 degrees of the IV strength (α_{1},α_{2},α_{31},α_{32}): 0.1 and 0.3

Zero and nonzero causal effects of the exposures on the outcomes (β_{1},β_{2}): 0 and 0.3.
The number of IVs was set to 15, 15 and 5 for Z_{1},Z_{2} and Z_{3} respectively. Data of each IV were randomly drawn from a binomial distribution B(2,0.3) independently. The specified values of the effects of U on the exposures (\(\delta _{X_{1}}, \delta _{X_{2}}\)) and on the outcomes (\(\delta _{Y_{1}}, \delta _{Y_{2}}\)) were set to 1. Standard deviations σs were set to 0.1. We simulated 200 datasets for each configuration.
For each dataset, we

simulated a dataset of sample size n_{A} which contains observations of the IVs, exposures and outcomes (dataset A, denoted by \(\mathcal {D}_{A}\));

simulated a dataset of sample size n_{B} which contains observations of the IVs, exposures and outcomes, then included data of the IVs and outcomes only as if the exposure data were missing (dataset B, denoted by \(\mathcal {D}_{B}\)).
Sample size of \(\mathcal {D}\), the combined data of \(\mathcal {D}_{A}\) and \(\mathcal {D}_{B}\), was set to 400 in all configurations, i.e., n=n_{A}+n_{B}=400. The missing rate of the exposures was defined as \(\frac {n_{B}}{n}\times 100\%\). For example, if missing rate was 50%, we simulated \(\mathcal {D}_{A}\) of sample size 200 and \(\mathcal {D}_{B}\) of sample size 200. To allow for different degrees of study heterogeneity in different datasets, random effects Vs in study B were randomly drawn from a uniform distribution U(−0.5,0.5) independently. Imputations of missing data and estimations of model parameters were then performed simultaneously using MCMC in Stan [18, 19]. \(\hat {R}\) was used to check convergence of the Markov chains [20].
Estimated causal effects obtained from our Bayesian MR and twosample inversevariance weighted (IVW) estimation [6] were compared using 4 metrics: mean, standard deviation (sd), coverage (proportion of the times that the 95% credible/confidence intervals contained the true value of the causal effect) and power (proportion of the times that the 95% credible/confidence intervals did not contain zero when the true causal effect was nonzero, only applicable when β_{1}=β_{2}=0.3 by defination). Higher power indicates lower chance of getting false negative results. In IVW estimation, we used observed IV and exposure data from \(\mathcal {D}_{A}\) and observed IV and outcome data from \(\mathcal {D}_{B}\).
Simulations  Bayesian MR with study heterogeneity for large studies
We also assessed the performance of dividing a big dataset into subsets in our Bayesian MR with study heterogeneity in simulation experiments. The simulation scheme was the same as above. However, the sample size of \(\mathcal {D}\) was set to a much larger value 50,000. For each configuration, a single dataset was simulated by combining \(\mathcal {D}_{A}\) and \(\mathcal {D}_{B}\). We randomly divided data into 5 subsets of equal sample size, separately, for \(\mathcal {D}_{A}\) (\(\mathcal {D}_{A_{1}},..., \mathcal {D}_{A_{5}}\)) and for \(\mathcal {D}_{B}\) (\(\mathcal {D}_{B_{1}},..., \mathcal {D}_{B_{5}}\)). Subset \(\mathcal {D}_{i}\) was then constructed by combining \(\mathcal {D}_{A_{i}}\) and \(\mathcal {D}_{B_{i}}\), where i=1,...,5. This is to ensure that subset \(\mathcal {D}_{i}\) had the same missing rate as that of the full data \(\mathcal {D}\). Causal effects were estimated using \(\mathcal {D}\), and using the 5 subsets in Bayesian MR. To explore the impact of different data partitioning strategies on estimated causal effects, we carried out the same analysis by also dividing data into 50 subsets of sample size 1,000.
Results
\(\hat {R}\) values of all the parameters in the models (2)(5) and (7)(10) were greater than 1 and less than 1.1 across the simulations.
Simulation results  Bayesian MR with study heterogeneity
Table 1 displays simulation results when the true causal effects were nonzero (β_{1}=β_{2}=0.3). Each row of the table corresponds to a configuration of a specified missing rate and a degree of IV strength α. Columns correspond to the estimated causal effects of X_{1} on Y_{1} (\(\hat {\beta }_{1}\)) and of X_{2} on Y_{2} (\(\hat {\beta }_{2}\)) from our Bayesian method and from the IVW method evaluated using the four metrics. Unsurprisingly, the estimated causal effect of X_{1} on Y_{1} was very similar to that of X_{2} on Y_{2} in each configuration from Bayesian MR, because their true values were set to be the same and the model had a symmetric structure as shown in Fig. 2. This was also observed in the results from the IVW method. However, Bayesian MR outperformed IVW uniformly across all the configurations, with less bias, higher precision, coverage and power. The impact of low missing rate was positive on coverage but negative on power in IVW. However, such impact was negligible in Bayesian MR. This was mainly due to much higher variations of the estimates, and consequently, much wider confidence intervals in IVW estimation. Weaker IVs had little influence on unbiasedness of the estimates and power, but resulted in slightly lower precision and coverage in Bayesian MR. However, there was a remarkable decrease in unbiasedness, precision and power in IVW as IV strength decreased.
Table 2 presents simulation results when the true causal effects were zero (β_{1}=β_{2}=0). Again, the results of \(\hat {\beta }_{1}\) was very similar to those of \(\hat {\beta }_{2}\) in each configuration, separately, from Bayesian MR and from IVW. Overall, both methods performed well. However, Bayesian MR still outperformed IVW across all the configurations, with higher coverage and precision and less biased estimates. In both MR methods, missing rate did not have a notable effect on the estimates, whereas weaker IVs led to lower precision.
Simulation results  Bayesian MR with study heterogeneity for large studies
Figure 3 depicts the joint posterior distributions of \(\hat {\beta }_{1}\) (horizontal axis) and \(\hat {\beta }_{2}\) (vertical axis) based on simulated data when the true causal effects were nonzero. Columns corresponds to three missing rates and rows two levels of IV strength. In each panel, the black dot denotes the values of true causal effects (β_{1}=β_{2}=0.3). The red, orange and blue contours are 2dimensional Gaussian kernel density estimation of the joint posterior (GKDEJP) from the full dataset, aggregated GKDEJP from five subsets and aggregated GKDEJP from fifty subsets respectively. When IVs were strong in Bayesian MR analysis (top panels), estimated causal effects were close to their true values, with or without data partitioning. When IVs became weaker (bottom panels), the results from the full data were concordant with those from 5 subsets, but notably different from those based on 50 subsets. The impact of data partitioning was substantial with weak IVs and high missing rate. This could be explained by Equation (12), in which difference in mean of the GKDEJPs depends on the subset sample size q. Difference in variance of the GKDEJPs was, however, not evident in the three sets of contours in each configuration, because it only depends on the sample size of the full data (Equation (13)) which was a fixed value 50,000. Our simulation results suggest that, in Bayesian MR with a large sample size, there is a tradeoff between data partitioning for more efficient computations, and large enough sample size of each subset for preventing estimates from a decrease in unbiasedness.
The same plots were presented in Fig. 4 when the true causal effects were zero. The performances of the three data partition strategies were very similar to those when the true causal effects were nonzero.
Discussion and conclusions
Numerous MR methods have been developed in recent years. To the best of our knowledge, little attention has been focused on study heterogeneity. In this study, we further elaborated our Bayesian MR method [8, 12] by including random effects to explicitly account for study heterogeneity. We also adopted a subset posterior aggregation method [13] to address the computational challenge of MCMC, which is important especially when dealing with large studies. For the cases being simulated in our study, the results have indicated that the “divide (data) and combine (estimated subset causal effects)” can help improve computational efficiency, for an acceptable cost in terms of bias in the causal effect estimates, as long as the size of the subsets is reasonably large. However, when instruments are weak and data missing rate is high, the results obtained using data partitioning are noticeably different from those obtained using full data. Hence, there is room for further development of robust and computationally efficient methods for Bayesian MR.
Despite the simplicity of the model we have used in the simulations, we hope the present work would effectively point to MR studies that allow modelling flexibility, especially in relation to the integration of heterogeneous studies and computational practicality.
Availability of data and materials
The code of data simulations is available from the corresponding author upon request.
Abbreviations
 MR:

Mendelian randomization
 IV:

Instrumental variable
 MCMC:

Markov chain Monte Carlo
 IVW:

InverseVariance Weighted
 GKDEJP:

Gaussian kernel density estimation of the joint posterior
References
Katan MB. Apolipoprotein e isoforms, serum cholesterol, and cancer. Lancet. 1986; 327:507–8.
Smith GD, Ebrahim S. Mendelian randomization: can genetic epidemiology contribute to understanding environmental determinants of disease?Int J Epidemiol. 2003; 32:1–22.
Lawlor DA, Harbord RM, Sterne JAC, Timpson N, Smith GD. Mendelian randomization: Using genes as instruments for making causal inferences in epidemiology. Int J Epidemiol. 2008; 27:1133–63.
Johnson T. Efficient calculation for multisnp genetic risk scores. Technical report. 2013. http://cran.rproject.org/web/packages/gtx/vignettes/ashg2012.pdf.
Bowden J, Smith GD, Burgess S. Mendelian randomization with invalid instruments: effect estimation and bias detection through egger regression. Int J Epidemiol. 2015; 44(2):512–25.
Bowden J, Smith GD, Haycock PC, Burgess S. Consistent estimation in mendelian randomization with some invalid instruments using a weighted median estimator. Genet Epidemiol. 2016; 40:304–14.
Zhao Q, Wang J, Hemani G, Bowden J, Small DS. Statistical Inference in Twosample Summarydata Mendelian Randomization Using Robust Adjusted Profile Score. Ann Statist; 48(3):1742–69.
Berzuini C, Guo H, Burgess S, Bernardinelli L. A bayesian approach to mendelian randomization with multiple pleiotropic variants. Biostatistics. 2018; 21(1):86–101.
Burgess S, Thompson SG. MENDELIAN RANDOMIZATION Methods for Using Genetic Variants in Causal Estimation. London: Chapman & Hall/CRC Press; 2015.
Kleibergen F, Zivot E. Bayesian and classical approaches to instrumental variable regression. J Econ. 2003; 114(1):29–72.
Jones EM, Thompson JR, Didelez V, Sheehan NA. On the choice of parameterisation and priors for the bayesian analyses of mendelian randomisation studies. Stat Med. 2012; 31(14):1483–501.
Zou L, Guo H, Berzuini C. Overlappingsample mendelian randomisation with multiple exposures: a bayesian approach. BMC Med Res Methodol. 2020; 20:295.
Xue J, Liang F. Doubleparallel monte carlo for bayesian analysis of big data. Stat Comput. 2019; 29(1):23–32.
Sandhu MS, Waterworth DM, Debenham SL, Wheeler E, Papadakis K, Zhao JH, Song K, Yuan X, Johnson T, Ashford S, Inouye M, Luben R, Sims M, Hadley D, McArdle W, Barter P, Kesäniemi YA, Mahley RW, McPherson R, Grundy SM, Consortium WTCC, Bingham SA, Khaw KT, Loos RJF, Waeber G, Barroso I, Strachan DP, Deloukas P, Vollenweider P, Wareham NJ, Mooser V. Ldlcholesterol concentrations: a genomewide association study. Lancet (London, England). 2008; 371(9611):483–91. https://doi.org/10.1016/S01406736(08)602081.
Willer CJ, Schmidt EM, Sengupta S, Peloso GM, Gustafsson S, Kanoni S, Ganna A, Chen J, Buchkovich ML, Mora S, Beckmann JS, BraggGresham JL, Chang HY, Demirkan A, Den Hertog HM, Do R, et al.Discovery and refinement of loci associated with lipid levels. Nat Genet. 2013; 45(11):1274–83. https://doi.org/10.1038/ng.2797.
Adams B, Jacocks L, Guo H. Higher bmi is linked to an increased risk of heart attacks in european adults: a mendelian randomisation study. BMC Cardiovasc Disord. 2020; 20(1):258. https://doi.org/10.1186/s1287202001542w.
Burgess S. Sample size and power calculations in mendelian randomization with a single instrumental variable and a binary outcome. Int J Epidemiol. 2014; 43(3):922–9. https://doi.org/10.1093/ije/dyu005.
Stan Development Team. STAN: A C++ Library for Probability and Sampling, Version 2.2. 2014. http://mcstan.org/.
Wainwright MJ, Jordan MI. Graphical models, exponential families, and variational inference. Found Trends Mach Learn. 2008; 1:1–305.
Gelman A, Rubin DB. Inference from iterative simulation using multiple sequences. Stat Sci. 1992; 7(4):457–72. https://doi.org/10.1214/ss/1177011136.
Acknowledgements
We thank Professor Jeanine HouwingDuistermaat for insightful discussion on study heterogeneity.
Funding
This work was funded by ManchesterCSC. The funder had no role in study design, data generation and statistical analysis, interpretation of data or preparation of the manuscript.
Author information
Authors and Affiliations
Contributions
HG conceived the study. HG and CB supervised the study. LZ performed simulations and statistical analysis. LZ, HG and CB interpreted statistical results and wrote the manuscript. All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Ethics approval and consent to participate
Not applicable
Consent for publication
Not applicable
Competing interests
The authors declare that they have no competing interests.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
About this article
Cite this article
Zou, L., Guo, H. & Berzuini, C. Bayesian mendelian randomization with study heterogeneity and data partitioning for large studies. BMC Med Res Methodol 22, 162 (2022). https://doi.org/10.1186/s12874022016194
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s12874022016194
Keywords
 Mendelian randomization
 Bayesian inference
 Study heterogeneity
 Data partitioning