Meta-analysis using Python: a hands-on tutorial
BMC Medical Research Methodology volume 22, Article number: 193 (2022)
Meta-analysis is a central method for quality evidence generation. In particular, meta-analysis is gaining speedy momentum in the growing world of quantitative information. There are several software applications to process and output expected results. Open-source software applications generating such results are receiving more attention. This paper uses Python’s capabilities to provide applicable instruction to perform a meta-analysis.
We used the PythonMeta package with several modifications to perform the meta-analysis on an open-access dataset from Cochrane. The analyses were complemented by employing Python’s zEpid package capable of creating forest plots. Also, we developed Python scripts for contour-enhanced funnel plots to assess funnel plots asymmetry. Finally, we ran the analyses in R and STATA to check the cross-validity of the results.
A stepwise instruction on installing the software and packages and performing meta-analysis was provided. We shared the Python codes for meta-analysts to follow and generate the standard outputs. Our results were similar to those yielded by R and STATA.
We successfully produced standard meta-analytic outputs using Python. This programming language has several flexibilities to improve the meta-analysis results even further.
The use of quantitative evidence synthesis methods, i.e., meta-analysis, is rising. The compelling need for applying evidence-based medicine to clinical practice and the generation of an enormous amount of evidence is presumed motivations behind the upward trend in conducting meta-analysis [1, 2]. Cochrane Training, a known public institution aiming to standardize the systematic review and meta-analysis methods in medicine, has developed RevMan to fulfill the growing need for meta-analysis . Several other specialized software applications for meta-analysis exist, e.g., Comprehensive MetaAnalysis . These applications typically offer more or less an inclusive and standard output used by meta-analysts. Generic statistical programs like STATA also provide a full range of typical meta-analysis results .
In parallel with commercial programs, the use of open-source applications such as R is also ratcheting up. R provides a host of standard results and graphical displays for meta-analysis . Python is new to the world of meta-analysis. However, given its ease of use and popularity among data scientists, it is not surprising to witness Python’s incremental use for meta-analysis soon. The automatization of systematic reviews by employing natural language processing in Python is getting more recognition [7, 8]. Hence, integrating automated systematic review and meta-analysis in Python can be a promising future endeavor for evidence synthesis as a practical example.
Python program developers have introduced several meta-analysis applications that are at different stages of development-two of them with satisfying features are PythonMeta (PyMeta)  and PyMare . However, these applications have been infrequently applied to real-world data. To this date, few researchers have published the capabilities and accuracy of Python-based packages for meta-analysis in peer-reviewed journals. This paper applies Python’s meta-analysis features to a publicly available dataset prepared for this purpose. We aim to explain a stepwise approach to analyzing the data and compare them against R and STATA’s output.
We used the dataset provided by Higgins et al. , a subset of data that belongs to the Cochrane study titled “haloperidol versus placebo for schizophrenia” . The dataset comprises 17 different clinical trials to compare haloperidol’s efficacy with placebo . The Cochrane study data is publicly available [11, 13].
The following variables and labels (in parenthesis) have been specified for each of these trials: author (author), year of publication (year), haloperidol responders (resp.h), placebo responders (resp. p), haloperidol non-responders (fail.h), and placebo non-responders (fail. p). The dataset also conveys two additional variables, labeled as drop.h, and drop.p, to designate the haloperidol dropouts and placebo arms. PythonMeta to perform meta-analysis needs four input variables haloperidol responders (resp.h), placebo responders (resp. p), and total number in haloperidol (T.h) and total number in Placebo group (T.p). Accordingly, we modified the dataset to facilitate its future use with Python. The modified dataset is available for readers (Additional file 1).
The outcome of interest is the clinical improvement measured as risk ratio (RR), which serves as the selected effect size for the evidence synthesis in this study. RR greater than unity suggests haloperidol’s efficacy against placebo .
Fixed-effect models assume a fixed effect size across studies. On the other hand, random-effects models allow the effect size to vary from study to study. While understanding the two models’ conceptual differences is crucial for model selection, the discussion is beyond this paper’s scope. For a quick review of the basics of meta-analysis, we highly recommend the paper by Bornstein et al. . Of important note, the analyst needs an adequate level of familiarity with the statistical methods used to estimate these models . In PythonMeta, the default method for the fixed-effect model is Mantel–Haenszel (MH), which can be changed into “Peto” and “IV” for the inverse variance. The package offers a random-effects estimation method to obtain the between-study variance (tau2) through the DerSimonian and Laird (DL) method.
Step 1: installing the program and reading the data
To perform Meta-analysis in Python, PythonMeta (V.1.23) needs to be installed via “pip install PythonMeta” (Reference: On https://pypi.org/project/PythonMeta/ ). After installing the package, the Help()function shows help information of PythonMeta. PythonMeta provides Evidence-based medicine (EBM) tasks, such as: Combining effect measures OR (Odds Ratio), RR (Risk Ratio), RD (risk difference) for count data and MD (mean difference), SMD (standardized mean difference)for continuous data; Heterogeneity test(Q/Chi-square test); Subgroup analysis, and plots drawing including forest plot, funnel plot . Pymeta is an online version of the PythonMeta tool (https://www.pymeta.com/) .
After preparing the dataset (see the section “variables” above), the dataset sitting in the same file directory as Python scripts can be uploaded directly via readfile (“Haloperidol.text”) . Of note, PythonMeta offers a web-based application, which facilitates direct data entry and provides a few additional analytics .
Step 2: generating the main results
First, we selected the binary (“CATE” in PythonMeta) outcome and Risk Ratio (“RR”) as the desired effect size. Other options are continuous (“CONT”) for the outcome of interest and Odds Ratio (“OR”) and risk difference (“RD”) for the desired effect size. Second, we preferred to run both fixed-effect and random-effects models. This choice was for demonstration purposes. However, our a priori assumption was compatible with the latter. In the third step, we selected MH (Mantel–Haenszel) to run the fixed-effect and DL (DerSimonian and Laird) to run the random-effects models. Forest plots and funnel plots are the main outputs of this analysis step. One can update the default Python scripts to generate cleaner and more informative visuals .
Step 3. Assessing the impact of missing data
To understand the impact of missing data, we cleaned the dataset via a simple code available in Additional file 2. After preparing the dataset, the studies with missing and non-missing patients were labeled with “<subgroup>name = Missing” and “<subgroup>name = non-Missing,” and we analyzed them as subgroups. The dataset is available in Additional file 3.
It is common to impute the dataset in several ways to evaluate the impact of completed data on the results. Unlike R, Python meta-analysis packages do not handle an inclusive list of standard missing data imputation methods. Hence, we added a selection of missing data imputation methods after meta-analysis in this paper. The methods are Available Case Study (ACS), Imputed Case Analysis (ICA), and best and worst-case scenarios. ICA-0 is the designation under the assumption that none of the missing participants experience the event. ICA-1 assumes that all of the missing participants experience the event. Also, we used ICA-b for the best-case scenario, assuming all missing participants in the experimental group and none in the control group experienced the event. ICA-w, used for the worst-case scenario, is the reverse of ICA-b . To create a dataset for each method as mentioned above, we used the original dataset of Cochrane with six variables. (resp.h, fail.h, drop.h, resp.p, fail.p,drop.p) (Additional file 4) and wrote code for each method. Next, we ran a separate random-effects model with IV method on each. Using zEpid package, we generated the relevant forest plots .
Step 4: evaluating the small study effect
Small-study effects occur when small studies, relative to larger ones, demonstrate different, often larger, treatment effects. Funnel plots are a standard way of showing such an effect by measuring their symmetry [15, 18]. In assessing the funnel plots’ asymmetry, several tests such as Egger’s test indicate whether the association between estimated effects and study size is greater than that expected to occur by chance [15, 18]. There are complementary methods to enhance the assessment of small-study effects and conduct sensitivity analysis on the results; however, Python packages do not offer these extended analyses. We perform Egger’s test by applying Statsmodels linear regression.
Comparison with R and STATA
We used STATA (Release 16. College Station, TX: StataCorp LLC) and R (R Core Team, 2021) for the comparison of the results. Balduzzi et al.  and Chaimani et al.  used the same dataset we employed in the current study to conduct a meta-analysis. We used the respected STATA and R scripts these authors provided to obtain the results for this comparison.
Additional file 2 contains the Python scripts to obtain the outputs. To generate the illustrations in this paper, we modified the original Python scripts where needed and added more commands to complete the analysis.
Fixed-effect and random-effects models
Figure 1 is the printout display of the PythonMeta function and conveys the essential information about the individual studies, fixed-effect, and random-effects results, heterogeneity, and methods. Figure 1 shows both fixed-effect and random-effects outputs for non-missing cases, with both models indicating a statistically significantly higher haloperidol efficacy than placebo. The overall treatment effect estimated by the fixed-effect model risk ratio was 2.09 (95% CI 1.69,2.59), and the corresponding estimate via the random-effects model was 2.28 (95% CI, 1.54, 3.37). The two diamonds in Fig. 2 represent the overall treatment effects; they do not cross the no-effect vertical bar (RR = 1) and are on the no-effect bar’s right side. The confidence interval for the overall treatment effect using the random-effects model was slightly wider than that of the fixed-effect model. The relatively wide prediction interval (0.73–7.17), taking account of the between-study heterogeneity, crosses the no-effect bar. This finding indicates that future studies may not approve haloperidol’s superior efficacy. Several individual publications showed non-overlapping confidence intervals. This finding and that of the Q test (35.18, p value = 0.004) showed heterogeneity in the results. The I2 of 54.51% was also an indication of moderate heterogeneity.
Impact of missing data
Figure 3 is a forest plot dividing studies with and without missing. The overall treatment effect for both subgroups favors statistically significantly higher haloperidol efficacy than placebo. However, the overall treatment effect for the studies with non-missing data is larger than those with missing data. Several confidence intervals for the subgroup estimates do not include the related overall treatment effect. The chi-square test result under the random-effects model showed a statistically significant difference between the two sub-groups (χ2 = 5.60, DF = 1, p = 0.02). Figure 4 illustrates the summary results of sensitivity analysis after imputing missing data with five different assumptions about the missing pattern. For example, the risk ratios range between 1.97 and 2.71 for worst and best-case scenarios. Despite different assumptions, the risk ratios and their confidence interval are all on the no-effect bar’s right side and do not cross the bar.
Assessing small study effect
The final funnel plot (Fig. 5) is asymmetric. This asymmetry raised the concern of small-study effects. We can see that smaller studies tend to show more efficacy of haloperidol. A contour-enhanced plot is a method to help discern the existence of asymmetry due to a publication bias by demarking the areas of statistical significance for the treatment effect . The contour-enhanced plot shows that small studies present either in with and contoured area. To evaluate the funnel plot asymmetry and small study effect, we performed Egger’s meta-regression test. Examining the result, we can see that the confidence interval of the intercept does not include zero, so we can say that small studies effects are not likely to cause a publication bias. Table 1.
Sensitivity analysis such as trim-fill, yet to be developed in Python, can further help examine the presence of publication bias .
The comparison with R and STATA
Table 2 summarizes the results of the critical meta-analysis parameters across the three different applications. The discrepancies across the three applications are in bold font. Risk ratios obtained using Python were compatible with STATA 100% of the time at the second decimal position. These results were discrepant with those of R in three of seventeen rows. However, the risk ratios were an exact match across rows and columns at the integer levels. The 95% confidence intervals of the risk ratios estimated using Python were equal to those calculated using STATA and R in most cases. The disagreements were notably at the second decimal positions. One can observe the same level of absolute agreement across different applications for the fixed effects, random effects, and summary statistics shown at the bottom of the table. The workflow and computation time between the three software is negligible; the generation of the results takes no more than a few seconds.
Meta-analysis and systematic reviews are improving tools for evidence generation and synthesis [1, 2]. An automated systematic review is also a growing method that uses NLP algorithms [7, 8]. Hence, one can anticipate the increasing use of omnibus data handling and data analysis packages like Python for evidence generation and meta-analytic analysis at the same time. To introduce Python’s capabilities and show the accuracy of the meta-analysis estimates, we used the PythonMeta package to run the meta-analysis. We selected PythonMeta over its competitor algorithms such as PyMAre to fit our purpose. The strength of PyhonMeta lies in its web-based algorithm that eases its application and diverse options to generate standard outputs for scientific publications [9, 10, 16].
Using a binary outcome from a publicly available dataset and employing zEpid package to create a forest plot for the missing data imputations, we could demonstrate the accuracy of the results. Python, STATA, and R generated comparable results for the standard parameters. Evaluation of funnel plot asymmetry combined with contour enhanced funnel plot revealed a small study effect that publication bias could not entirely explain. While the Python package lacked sensitivity analysis tests, we showed a non-significant treatment effect using R and STATA standard packages for meta-analysis. Clinical heterogeneity is another unchecked source of variability that might explain the treatment effect diversification .
By analyzing subgroups with and without missing data, we indicated a more significant haloperidol effect in the subgroup without missingness than those with missing data. Unfortunately, the Python package lacked the capability of quantifying the between-group heterogeneity. We could, however, assess this heterogeneity by visually attending to the overlapping confidence intervals in the summary estimates .
We identified several gaps concerning Python meta-analytic capabilities.
Algorithms for sensitivity analysis
Missing data imputations
Counter-enhanced funnel plots
Subtle but indispensable details such as between-group heterogeneity quantifications, the prediction interval
We tried to address some of these gaps by modifying the existing Python macros for meta-analysis. However, these items provide a roadmap for future meta-analytic improvements in Python.
In this paper, we introduce Python as a tool for meta-analysis. We took advantage of Python-based packages written for meta-analysis, modified them, and generated standard meta-analytic results. The comparison of these results with STATA and R’s outputs supports the accuracy of our algorithms.
Availability of data and materials
“The dataset(s) supporting the conclusions of this article is (are) included within the article (and its additional file(s)). Modified sample codes and new Python scripts are available in Additional files. The PythonMeta package can be installed via pip install PythonMeta. The source code is available at https://pypi.org/project/PythonMeta/#files. Archived versions are available from https://pypi.org/project/PythonMeta/#history.
The dataset used in this article is accessible via below links:
Cochrane dataset of haloperidol versus placebo for schizophrenia.
DerSimonian and Laird
Available Case Study
Imputed Case Analysis
Natural Language Processing
Shin I-S. Recent research trends in meta-analysis. Asian Nurs Res. 2017;11(2):79–83.
Vetter TR. Systematic review and meta-analysis: sometimes bigger is indeed better. Anesth Analg. 2019;128(3):575–83.
Bax L, Yu L-M, Ikeda N, Moons KG. A systematic comparison of software dedicated to meta-analysis of causal studies. BMC Med Res Methodol. 2007;7:40.
Bradburn S. 13 Best Free Meta-Analysis Software To Use. https://toptipbio.com/free-meta-analysis-software/. Accessed 30 Sept 2021.
StataCorp. Stata statistical software: release 16. College Station: StataCorp LLC; 2019.
R Core Team. R: A language and environment for statistical computing. Vienna: R Foundation for statistical computing; 2021. https://www.R-project.org/. Accessed 30 Aug 2021.
Marshall IJ, Wallace BC. Toward systematic review automation: a practical guide to using machine learning tools in research synthesis. Syst Rev. 2019;8:163.
Raschka S, Patterson J, Nolet C. Machine learning in python: Main developments and technology trends in data science, machine learning, and artificial intelligence. Information. 2020;11(4):193.
Deng H. PythonMeta, Python module of Meta-analysis. http://www.pymeta.com. Accessed 30 Sept 2021.
Yarkoni T, Salo T, Nichols T, Peraza J. PyMARE: Python Meta-Analysis & Regression Engine. https://pymare.readthedocs.io/en/latest/index.html. Accessed 30 Sept 2021.
Higgins JP, White IR, Wood AM. Imputation methods for missing outcome data in meta-analysis of clinical trials. Clin Trials. 2008;5(3):225–39.
Adams CE, Bergman H, Irving CB, Lawrie S. Haloperidol versus placebo for schizophrenia. Cochrane Database Syst Rev. 2013;11:CD003082.
Balduzzi S, Rücker G, Schwarzer G. How to perform a meta-analysis with R: a practical tutorial. Evid Based Mental Health. 2019;22(4):153–60.
Borenstein M, Hedges L. A basic introduction to fixed-effect and random-effects models for meta-analysis. Res Synth Methods. 2010;1:97–111.
Higgins JP, Thomas J, Chandler J, Cumpston M, li T, page MJ, et al. Cochrane handbook for systematic reviews of interventions version 6.2 (updated February 2021). Wiley; 2021. Available from: https://www.training.cochrane.org/handbook. Accessed 30 Sept 2021.
Deng H. PythonMeta. https://pypi.org/project/PythonMeta/. Accessed 30 Sept 2021.
Zivich P. zEpid. https://zEpid.readthedocs.io/en/latest/index.html. Accessed 30 Sept 2021.
Debray TP, Moons KG, Riley RD. Detecting small-study effects and funnel plot asymmetry in meta-analysis of survival data: a comparison of new and existing tests. Res Synth Methods. 2018;9(1):41–50.
Chaimani A, Mavridis D, Salanti G. A hands-on practical tutorial on performing meta-analysis with Stata. Evid Based Ment Health. 2014;17:111–6.
Palmer TM, Sutton AJ, Peters JL, Moreno SG. Contour-enhanced funnel plots for meta-analysis. Stata J. 2008;8(2):242–54.
Mavridis D, Salanti G. How to assess publication bias: funnel plot, trim-and-fill method and selection models. Evid Based Ment Health. 2014;17:30.
Rücker G, Carpenter JR, Schwarzer G. Detecting and adjusting for small-study effects in meta-analysis. Biom J. 2011;53(2):351–68.
Deeks JJ, Higgins HJ, Altman DG. Chapter 10: Analysing data and undertaking meta-analyses. In: Higgins JPT, Thomas J, Chandler J, Cumpston M, Li T, Page MJ, et al, editor. Cochrane Handbook for Systematic Reviews of Interventions (updated February 2021). www.training.cochrane.org/handbook. Accessed 30 Sept 2021.
Ethics approval and consent to participate
Consent for publication
The authors declare that there are no conflicts of interest.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
About this article
Cite this article
Masoumi, S., Shahraz, S. Meta-analysis using Python: a hands-on tutorial. BMC Med Res Methodol 22, 193 (2022). https://doi.org/10.1186/s12874-022-01673-y