A systematic comparison of software dedicated to meta-analysis of causal studies

Background Our objective was to systematically assess the differences in features, results, and usability of currently available meta-analysis programs. Methods Systematic review of software. We did an extensive search on the internet (Google, Yahoo, Altavista, and MSN) for specialized meta-analysis software. We included six programs in our review: Comprehensive Meta-analysis (CMA), MetAnalysis, MetaWin, MIX, RevMan, and WEasyMA. Two investigators compared the features of the software and their results. Thirty independent researchers evaluated the programs on their usability while analyzing one data set. Results The programs differed substantially in features, ease-of-use, and price. Although most results from the programs were identical, we did find some minor numerical inconsistencies. CMA and MIX scored highest on usability and these programs also have the most complete set of analytical features. Conclusion In consideration of differences in numerical results, we believe the user community would benefit from openly available and systematically updated information about the procedures and results of each program's validation. The most suitable program for a meta-analysis will depend on the user's needs and preferences and this report provides an overview that should be helpful in making a substantiated choice.


Background
Meta-analysis has been characterized in various ways, from "making order of scientific chaos" [1] to "mega-silliness" [2], and has been subject of many debates. However, time has taught -both opponents and proponents -that things are not black and white; meta-analysis, executed with care, has become an important and influential cornerstone of scientific medicine. As the quantitative part of a systematic review, the merit of meta-analysis over qualitative approaches lies in the formal and reproducible investigation of heterogeneity, small study effects, and other data trends. Although meta-analysis is applied in many types of research, the bulk of published meta-analyses are in the domain of therapeutic and -albeit to a lesser extent -observational etiologic studies. This paper focuses on this area of causal medical research and in particular the software that is being used in the corresponding meta-analyses.
Computer software has become indispensable in metaanalysis and in the last decennia many programs have been developed. To aid potential users in choosing the software that fits their needs, there are a number of reviews and comparisons available [3][4][5][6][7]. The most recent one, however, dates back to 5 years ago and in the meantime the spectrum of available software has changed substantially. Also, most of the existing reviews have focused on numerical features, such as which analytical models were available or what graphs could be produced. We believe that information on the validity or comparability of results and ease-of-use are equally important factors in the total applicability of the software. Therefore, the purpose of our study was to systematically compare features, results, and usability of the currently available meta-analysis software.

Software search and selection
We decided, a priori, to focus on software that was solely dedicated to meta-analysis of randomized therapeutic or observational causal studies. General statistics packages were excluded. Furthermore, the software had to be actively maintained and supported, which was judged by either the time of the last software update (less than 5 years), bug report (less than 5 years), or website update (less than 3 years). We also decided to select only software with a graphical interface and mouse-click compatibility, which essentially excluded the DOS programs.
Searches for software and publications related to their development and usage were done by two authors (LB, LMY) with combinations of the following keywords in Internet search engines of Google, Yahoo, AltaVista, and MSN: "meta-analysis", "meta-analyses", "systematic review", "software", "program", "package", "macro", "add-in", and "add-on". The first search was done mid 2005 and the last search in June 2006. The software was purchased or downloaded if it appeared to fulfill the inclusion criteria.

Assessment of numerical and graphical features
The assessment of the numerical and graphical features in the included meta-analysis programs was handled independently by two investigators (LB, LMY) and reviewed by all authors until there was consensus on all items. The programs were installed and tested on Windows XP and Windows 2000 systems in English and Japanese. Details of the documented features are provided in the tables of the results section.

Validity and comparability of meta-analysis results
We searched the internet and literature databases of medical and social sciences (PubMed, EmBase, Eric, and PsychInfo) for articles that reported validations of meta-analysis software. We also checked the website of each included program and made inquiries with its authors about their validation procedures.
In addition to the search for validation reports, two reviewers (LB, LMY) actively investigated the comparability of the numerical results with data sets from three previously published [8][9][10] meta-analyses (Table 1). These data sets have been used as examples in methodological meta-analysis publications [11][12][13] and are representative of those commonly encountered in therapeutic or etiologic meta-analyses. The first data set [8] contains pergroup data from 16 randomized controlled trial articles with a dichotomous outcome, i.e. group sizes and event rates. One of the 16 included studies has no events in one of the treatment arms and the data set itself is subject to substantial small study effects. The second data set [9] contains per-group data typically found in meta-analyses of controlled trials with a continuous outcome (group sizes, means, standard deviations). It contains data from 11 studies with heterogeneous results. The third data set [10] contains data as they could be found in meta-analyses of observational studies. The data are from 19 studies with a dichotomous outcome, like in the first data set. However, this time there are no per-group data available for each study but only the comparative association measures (odds ratios) and their standard errors.
For each data set, we compared the combined association measures, tests for heterogeneity, and tests for small study effects (publication bias) derived from each of the studied meta-analysis programs. We focused on the most common association measures such as the risk difference, risk ratio, odds ratio, mean difference, Hedges' g, and Cohen's d, including their 95% confidence intervals. We used the metan (version 1.81) [14], metabias (version 1.4.2) [15], and metatrim (version 1.5.1) [16] programs of the general statistics software STATA [17] as 'reference' in the software comparisons.

Assessment of usability
Finally, we performed a usability assessment amongst 30 researchers from various institutes and countries: Kitasato University (Japan), Tokai University (Japan), Utrecht University (The Netherlands), University of Amsterdam (The Netherlands), the Dutch Cochrane Center (The Netherlands), the University of Leuven (Belgium), and the Centre for Statistics in Medicine (UK). There were no specific inclusion criteria and the sample consisted of individuals from various departments and with various levels of experience with meta-analysis.
During the assessment sessions, participants were asked to install (evaluation versions of) each of the studied metaanalysis programs and to analyze one small data set of a meta-analysis with a dichotomous outcome (a shortened version of the previously described meta-analysis by Teo et al. [8]). As they completed this task, they scored the usability of each program in an electronic scoring list. This list [see Additional file 1] was developed via a consensus session with (meta-analysis) experts from the disciplines of epidemiology, biostatistics, and medical informatics, who were asked which elements they considered important in meta-analysis software and what items they would use to judge its usability. The order in which each program was installed and assessed was determined by a computer generated randomization list and different for each participant.

Software search and selection
We found 10 meta-analysis packages that were available for download or purchase via the internet (Table 2). Many were no longer updated or had remained in their DOS stage and were excluded from our study. We included six programs in our comparison: Comprehensive Meta-analysis (CMA) Version 2 [18], MetAnalysis [19], MetaWin 2.1 [20], MIX 1.5 [21], RevMan 4.2.8 [22], and WEasyMA 2.5 [23] (in alphabetical order). Using less stringent inclusion/exclusion criteria did not change this software selection. Using more stringent criteria would exclude WEasyMA as various signs indicate that it may no longer be developed and supported. Initially, our search did not pick up the still relatively unknown program called MetA-nalysis. This software comes with a book and cannot be purchased separately. Neither the software nor the book is supported by a website, which is why we did not find it at first. At the time of inclusion, we could no longer assess it in the usability part, but have included it post-hoc in the assessment of comparability and features.

Assessment of numerical and graphical features
Below is a short summary of the numerical and graphical features in each of the reviewed programs; details are available in Tables 3 and 4.
Comprehensive Meta-Analysis (commercial software) has the highest profile in the Internet search engines of all included programs. It distinguishes itself from other programs by the option to enter effect sizes of different formats and the comprehensiveness of the numerical options and output. Data can be entered manually or via copy-and-paste in the CMA spreadsheet; direct import of text or other data files is not possible. The program features all major graphical presentations. The tutorial and manual are to-the-point and extensive. The program is actively maintained and the website is modern and regularly updated.
MetAnalysis 1.0 (commercial software) is not sold separately, but comes as a bonus feature of a book [19]. It is limited to studies with descriptive data on dichotomous outcomes. Data cannot be pasted or imported and must be entered manually, cell by cell. Once the data are entered and the calculations performed, numerical data can be produced in a print preview screen and graphs in separate windows. A nice feature is the radial part of the Galbraith plot, which is lacking in most other software. The software also has the facilities to enter loss to followup/drop-out information and use the studies in the metaanalyses with per-protocol or intention-to-treat analysis. The software does not contain help files and does not have a website, but users can consult the book instead.

Export options
Copy output to clipboard Export to office application(s)

Report creation
Setting copy file type (e.g. bmp, jpg or wmf) The ' ' indicates the presence and no mark indicates the absence of a feature. The '( )' means that the feature is limited or partially in development, and the ' ' means it was not working correctly at the time of our assessments.
itself and it requires Microsoft Excel 2000 or later to run. Another limitation is the maximum number of data sets, which is currently 100. Data sets can be created by manual input as well as by importing text delimited data files or Excel workbooks. The numerical and graphical options are diverse and comprehensive.
RevMan 4.2.8 (free for private and academic use) was developed by and for the Cochrane Collaboration. It stands out due to its extensive features for collaborative management of systematic reviews. The analytical functions of the program cannot be accessed without first creating a review structure and because import and copy-  and-paste functionality are also limited, getting started requires more preparation than with most other software. Once data are in the analysis module, analysis is straightforward. Output is detailed, though without tests for publication bias and no other graphs than the forest and funnel plot. The help resources in RevMan are extremely thorough. A new version is to be released in the near future.
WEasyMA 2.5 (commercial software) stands out by the speed with which results become available after data set creation. Data cannot be imported or pasted and need to be entered manually, cell by cell. Another limitation of this program is that it can only handle data from clinical trials with dichotomous outcomes, e.g. two-by-two table data. Although limited to these types of data, the program produces a wide variety of numerical and graphical output. The original author has indicated that the software is currently unsupported by a development team and may soon no longer be available.

Validity and comparability of meta-analysis results
Our internet and database search did not yield any publications on the validity or validation of any of the programs, except for MIX [24,25]. Authors of all programs were contacted to determine whether (yet unpublished) evidence of validation procedures was available. Authors of RevMan indicated that validation data were made public via notes and abstracts at Cochrane Collaboration meetings and conferences. The authors of CMA, MetAnalysis, and MetaWin stated that all procedures had been checked extensively with external programs, spreadsheets, and occasionally by hand, though had not been made public. For CMA, Excel sheets with such data are available upon request. We received no information on validation procedures from the authors of WEasyMA.
We found no discrepancies in meta-analysis results between STATA, MIX and RevMan. In CMA, we found a small inconsistency in results of publication bias tests, but this was corrected via an update while we were writing this article.
MetaWin's results were different from STATA's results (and thus also from results in CMA, MIX, and RevMan) because MetaWin mostly uses a t-distribution where the aforementioned programs use a z-distribution (although a recent version of MIX also allowed us to use a t-distribution). We did find what seemed to be a terminological inconsistency, as the Mantel-Haenszel labeled method used in MetaWin for odds ratio analyses gave results that were identical to those from Peto's method in the other programs (albeit with confidence limits based on a t-distribution).
Since MetAnalysis and WEasyMA can only analyze data from two-by-two tables, the comparability assessments were limited to one data set [8]. Analyses in MetAnalysis were very similar though not always identical to those from STATA. We found that if we entered experimental group data first (as is the case in all other software), an incorrect event coding is applied that causes the software to calculate risk differences and odds ratios of survival even if mortality is entered as event. For risk differences this only changes the sign, but for odds and odds ratios it gives the reciprocal of the intended results [26]. Although the book mentions that control data are to be entered in the first data column, the software has currently no builtin guard against this and we therefore urge users to be careful.
In WEasyMA, we found results that could not be reproduced if a data set with zero events in one study arm was used. Even when using the same continuity correction as reported in the 'Calculation options' dialog in WEasyMA, the results remained different in STATA. The WEasyMA authors did not respond to our inquiry into reasons for the discrepancies.

Assessment of usability
Of the 30 participating researchers, 26 provided quantitative data that were suitable for analysis (Table 5). Trouble with the electronic user form or installation of software made the data from 4 researchers incomplete and they were excluded from the quantitative part. MIX scored highest on the overall usability (8.6), followed by CMA (6.9), MetaWin (6.2), RevMan (6.1), and WEasyMA (4.2).
RevMan was most familiar to the participating researchers. MIX had not been used by any of the participants but the name was familiar to some as they were affiliated to the same institutions as the makers of the MIX software. Stratifying the results in analogous subgroups did not reveal any specific trends in the ratings. Experienced users appeared to be more critical than less experienced users, but relative scores were identical. Installation of WEa-syMA and CMA was troublesome for some researchers. Qualitative statements mostly concerned problems with the installation (WEasyMA, CMA), error messages in French (WEasyMA), and difficulties with data set creation (WEasyMA, RevMan). Favorable comments included praise for the user interfaces (MIX, RevMan, CMA), help system (RevMan), speed of analysis (WEasyMA), and within-program tutoring (MIX, CMA).

Discussion
Meta-analysis is an indispensable tool in current-day synthesis of research data from multiple studies, and systematic reviews with meta-analyses occupy the top position in the hierarchy of evidence. Software for meta-analysis has evolved over the years and available reviews are relatively outdated. We therefore considered it timely to provide a systematic overview of the features, criterion validity, and usability of the currently available software that is dedicated to meta-analysis of causal (therapeutic and etiologic) studies. It has some overlaps with existing reviews [3][4][5][6][7], but includes other more recent programs, contains more detailed information on the merits and demerits of the available programs, and follows a more systematic approach.
We studied four commercial programs (CMA, WEasyMA, MetaWin, and MetAnalysis) and two free programs (Rev-Man and MIX). The features of the commercial programs were not necessarily more extensive than those of the free ones. In particular MIX stood out in terms of numerical options and graphical output. CMA was generally most versatile, in particular in options for analysis of various types of data. With regard to the comparability of results, MIX, RevMan, and CMA produced numerical results that were identical to results from STATA's metan, metabias, and metatrim. MetaWin's results are different and slightly more conservative, since the confidence intervals are based on a t-distribution or bootstraps. WEasyMA produces results that can be disparate from the other programs, especially in data sets with studies with zero events in one or both of the comparison groups. Although most differences were small in the data sets we used, we have reservations on how this will reflect on data sets with more extreme data. The MetAnalysis program should also be used with care as data have to be entered manually and in the correct columns. Exchanging the columns is cur-rently not prevented by warning or error messages and can lead to invalid results.
The usability study shows that preparing data for analysis is the hardest part in each program. MIX and CMA are identified as the most user-friendly programs. WEasyMA scored least favorable. Stratifying user evaluations based on experience with meta-analysis and previous experience or knowledge of the software did not reveal any trends in the ratings.
Our comparison has been limited to software dedicated to meta-analysis only and does not include general statistics packages. The primary reason to leave them out was because they are structurally very different, making direct comparisons inappropriate. Central to this issue is software syntax: most general packages require thorough knowledge of their syntax in order to produce and alter graphs that are common in meta-analysis; the dedicated packages, however, produce such graphs with a few or sometimes even a single click. In addition, the syntax knowledge required to do more advanced meta-analyses with the general packages means that in a usability survey all participants would have to be expert statisticians, capable of writing and adapting syntax for meta-analysis in all major general software packages. This is not only not feasible in the current setting, it would also make the participating individuals no longer representative of the (sometimes relatively inexperienced) users of the software in the scientific and academic community. Although a different approach would be necessary, we believe the user community of meta-analysis software would benefit from an additional review of meta-analysis options in general statistics software.
Due to the lack of a 'gold' standard, we resorted to between-program comparisons and a criterion validation with STATA's user-written commands metan, metabias and metatrim as reference. Our choice for STATA was based on its versatility and use in two major books on meta-analysis [11,12]. We realize that STATA itself is also user-written and potentially subject to similar validity issues than the other programs. The fact that CMA, MIX, and RevMan produced results that were identical to results from STATA, at least with the three data sets we selected, justifies to some extent our use of STATA as a reference standard.
The results of our usability survey should be regarded as exploratory and serve as a rough indication. First, the number of participants was relatively small. Second, it is not unlikely that there may be some bias in favor of Rev-Man and MIX because some users were already familiar with these programs. Subgroup analyses, however, did not reveal such trends. MetAnalysis could unfortunately not be included as it was included after the start of the usability assessment. A further point regarding MIX is that it was created following a development focus list [25] that was created in a similar fashion to our usability scoring list. Assessment of both lists reveals that a number of items are very similar. Although this may indicate that the lists are indeed reflecting the demands of statistical software users, it also means that the MIX program was likely to do well in our assessment. We believe, however, that any program that is systematically developed to satisfy its users' demands should perhaps deservedly score high.
Another point to which we would like to draw attention is the lack of accessible public information about the manner in which meta-analysis programs have been validated.
Only the website of the MIX program includes specific references to this and MIX is the only program with a peerreviewed and published validation report [25]. Without such reports, authors, reviewers, editors, and consumers of evidence have no reference for judgments about the suitability of the software for scientific purposes. This is of course equally applicable to the user-written meta-analysis macros for general statistics software. We argue for more rigor and transparency in this area.
Finally, we are fully aware that the world of information technology changes constantly and by the time this manuscript is published, it is possible that some updates have become available or that new products have been launched. We apologize beforehand for our lack of timing. Like a traditional review, we intend to update this investigation in due time.

Conclusion
In conclusion, the most suitable meta-analysis software for a user depends on his or her demands; no single program may be best for everybody. The information provided in this article, in particular the data in Tables 3 and  4, should give users the opportunity to make a substantiated decision.