Development and validation of MIX: comprehensive free software for meta-analysis of causal research data

Background Meta-analysis has become a well-known method for synthesis of quantitative data from previously conducted research in applied health sciences. So far, meta-analysis has been particularly useful in evaluating and comparing therapies and in assessing causes of disease. Consequently, the number of software packages that can perform meta-analysis has increased over the years. Unfortunately, it can take a substantial amount of time to get acquainted with some of these programs and most contain little or no interactive educational material. We set out to create and validate an easy-to-use and comprehensive meta-analysis package that would be simple enough programming-wise to remain available as a free download. We specifically aimed at students and researchers who are new to meta-analysis, with important parts of the development oriented towards creating internal interactive tutoring tools and designing features that would facilitate usage of the software as a companion to existing books on meta-analysis. Results We took an unconventional approach and created a program that uses Excel as a calculation and programming platform. The main programming language was Visual Basic, as implemented in Visual Basic 6 and Visual Basic for Applications in Excel 2000 and higher. The development took approximately two years and resulted in the 'MIX' program, which can be downloaded from the program's website free of charge. Next, we set out to validate the MIX output with two major software packages as reference standards, namely STATA (metan, metabias, and metatrim) and Comprehensive Meta-Analysis Version 2. Eight meta-analyses that had been published in major journals were used as data sources. All numerical and graphical results from analyses with MIX were identical to their counterparts in STATA and CMA. The MIX program distinguishes itself from most other programs by the extensive graphical output, the click-and-go (Excel) interface, and the educational features. Conclusion The MIX program is a valid tool for performing meta-analysis and may be particularly useful in educational environments. It can be downloaded free of charge via or .


Background
The amount of data produced by researchers in health sciences has been growing explosively and advances in genetics, genomics, and information technology are likely to further contribute to this growth. In the past two decades, meta-analysis has evolved into the statistical method par excellence to make sense out of the growing number of research reports. As the quantitative analytical part of a systematic review, it has been used for evaluating data from both experimental and observational studies in therapeutic, diagnostic, prognostic, and etiologic settings. In the commonly used definition of the hierarchy of scientific data for medical decision making, meta-analyses are considered as providing the highest level of evidence [1,2]. As such, they can have a major impact on medical practice and health care policies, especially if aggregating data and investigating sources of heterogeneity provide new insights. Two well-known examples are the metaanalyses by Yusuf et al [3] and Lau et al [4], both showing that meta-analysis can be a powerful tool to show intervention effects that would remain beneath the surface of single study data without proper synthesis and re-analysis.
Although meta-analyses can be applied to all types of medical research, its primary application so far has been in the therapeutic realm. One of the main forces behind the rise of therapeutic meta-analysis is the Cochrane Collaboration [5], whose effort to systematically assess and synthesize evidence from randomized controlled trials has so far produced more than 4400 Cochrane systematic reviews, many with quantitative meta-analyses. The increasing interest for meta-analysis in health sciences over the past twenty years has been reported by several authors [6][7][8][9][10][11] and a small search we did in preparation of this project reveals that between 1990 and 2005 approximately 12,000 publications have been classified as a meta-analysis by PubMed. A bar graph of the annual numbers suggests that the interest for meta-analysis is still increasing (figure 1).
Many general statistical software packages have included options for meta-analysis in their basic program configuration, and user-communities have written numerous meta-analysis add-ons. Specialized software packages, meant exclusively for meta-analysis, are also available in various types and price ranges. Although the number of The annual number of meta-analyses registered by PubMed Figure 1 The annual number of meta-analyses registered by PubMed. An overview of studies of the publication type "metaanalysis" from 1990 to 2005 in PubMed.
software packages for performing meta-analysis is substantial, in our opinion, most share one common limitation: low applicability in educational settings or environments with beginning researchers. Even though numerous researchers in health care are nowadays confronted with data from published meta-analyses or are even requested to do a meta-analysis themselves, there is still little or no electronic educational material and none of the existing software has explicit educational features. Cost is another issue that may have an impact on the use of software by students and lecturers: only a few of the modern meta-analysis packages are free and if academic pricing is available, prices can still be rather high for many.
After reading previously published software reviews [12][13][14][15] and using existing meta-analysis software, we made an inventory of what we thought was lacking or could be improved. Next, we set out to implement our ideas and create an innovative and comprehensive statistical metaanalysis package that would be freely accessible and userfriendly enough for students and beginning researchers. The program, called MIX (Meta-analysis with Interactive eXplanations), has been developed over the past two years and has been presented at several stages of the development at a number of conferences [16][17][18][19]. In October 2005, the first public version (1.0) was released during the Cochrane Colloquium in Melbourne [19] and has become available for download via the MIX website [20]. It has been receiving a lot of interest (100-150 unique visitors to the MIX website each week) and has been downloaded over 1800 times within 6 months of its first release. This has prompted us to validate the results of all tests in the program formally and this article provides the offcial introduction of the MIX program together with the results of the validation.

Objectives
Our primary objective was to develop a free program for meta-analysis of causal research (therapeutic trials as well as etiologic cohorts and case-control studies) that could be applied in both analytical and educational settings. Our secondary aim was to validate the analytical tests in the program with output from established reference standards.

Program development
Before the actual development, we started with making an inventory of the most important meta-analytical tests and approaches, and brainstormed on ideas for an interface. Since causal meta-analysis methods are relatively wellestablished (in contrast to diagnostic or prognostic approaches to meta-analysis), we focused on meta-analysis of controlled trials and cohort or case-control studies.
In these studies, outcome differences between exposed or treated and non-exposed or untreated groups are compared to assess a causal relationship between the determinant (treatment or exposure) and an outcome (mortality or morbidity). As far as the program structure was concerned, our a priori idea was to create an add-in for Excel. Although a rather unorthodox approach in this area (all existing meta-analysis programs are stand-alone programs and work independently of Microsoft Office), Excel provides a sophisticated calculation and graphics platform that is well-suited to many meta-analytical methods and at the programmer's disposal before any programming is done. Consequently, development and maintenance is relatively easy and costs can be kept to a minimum (one of the main aims in our program development). Furthermore, the spreadsheet environment of Microsoft Excel is familiar to almost all researchers in medical, social, and economical sciences, which was very much in line with our attempt to develop a package that is fit for beginning researchers. Although we realized that even recent versions of Excel can be inaccurate with regard to some statistical calculations [21][22][23], we were confident that we could program around these difficulties if necessary.
Since we wanted to move beyond the occasional spreadsheet that can perform meta-analytical calculations, we started by designing a programming structure in which the already existing Excel functionality could be exploited to its maximum. Sophisticated procedures were customprogrammed with Visual Basic in the Visual Basic for Applications (VBA) editor of Excel 2003 (and tested in Excel 2000 and onward). The so-called front-loader (a start-up program initiated with an icon) and some small assistant programs, all being non-Excel entities, were developed with Visual Basic 6.0 (VB6).

Program architecture and operation
The current version of the program (version 1.5) is still only compatible with Windows operating systems running Excel 2000 or later, but versions for use with Excel on Macintosh and Linux are in preparation. The descriptions below apply to the Windows version, though most of it can be extended to future versions for other operating systems.
Installation is made easy with a set-up program that installs the necessary files in a folder that can be specified by the user (default is C:\Program Files\MIX). It will also create a MIX item in the Windows Start Menu (installing additional start-up icons on the Desktop or in the Quick-Launch bar is optional) and provides the option to start a Flash ® -based program introduction. The MIX menu item contains an icon for starting up the MIX program, a folder with a shortcut to the uninstall program, a folder with shortcuts to programs for loading and unloading the Excel add-in, and a folder with educational programs and information. Loading the small MIX add-in that is supplied with the main program (typically automatically loaded during installation) results in a MIX menu-item under the Tools menu in Excel. This MIX menu contains several functions that can be accessed when the MIX program itself is not running. The files that form the core of the program are recognizable by their Mix file extension (*.mix) and currently contain approximately 16,000 lines of command code in 26 code modules and 17 custom user forms. These core files take up approximately 22 Mb of space on a hard-disk and their primary functions are (A) running interface procedures, (B) showing and manipulating output, (C) performing analyses, and finally (D) exporting and communicating with external files and programs. One of the core files is a large Excel workbook with 23 worksheets that forms the calculation engine of the program. It contains 6 sheets with primarily worksheet formulas and 10 sheets with various kinds of pre-calculated graphical and numerical results from metaanalytical tests. The remaining sheets contain information for help functions or programming purposes. This Excel workbook remains hidden from the users at all times.   Figure 3 shows the MIX program's user-interface with a forest plot and a format box to change the graph's format.
The MIX program provides several options for importing or creating data sets for meta-analysis. The most convenient option is to create an Excel or CSV file with data (standard output option in Excel) and import this file into the MIX program. The variable ranges are then selected in Excel-manner to create a data set (see figure 4), which is subsequently loaded for analysis and optionally saved as a MIX data set file (*.mxd). The program accepts descriptive data from studies with continuous outcomes, e.g. sample size, mean, standard deviation, and dichotomous outcomes, e.g. group sizes and event numbers (two-bytwo table data). Comparative data can also be loaded by means of association measures with their standard error. Initially, however, it is not necessary to make a data set since 19 data sets from the most authoritative books on the subject ("Meta-analysis in Medical Research" by Sutton et al [10], "Systematic Reviews in Health Care, Meta-Analysis in Context" by Egger et al [6], and Systematic Reviews in Health Care, A Practial Guide by Glasziou et al [7]) have been included in the program. Most analyses and graphs presented in these books can be reproduced with a few clicks and the program can be used as a learning or teaching companion to these books. We hope to support more more books in this way in the future. In The MIX program's graphical interface with a forest plot Figure 3 The MIX program's graphical interface with a forest plot. The standard Excel menu and toolbars have been replaced by the MIX interface through which graphical and numerical output can be created and manipulated. Custom shortcut menus are available via right-clicks and double clicking graphical items shows the formatting options that Excel users are familiar with.
addition, the MIX website also contains a data set repository where users can contribute and download MIX data sets.
A large variety of numerical and graphical output can be produced by the program. Besides the association measure values from the meta-analysis, several formal tests for heterogeneity, small study effects (publication bias), single study influence, and cumulative trends are also available in MIX. The graphical output is particularly comprehensive, with no less than eighteen informative plots that can be formatted in detail.
Possible association measures from continuous outcome data input are mean difference (MD), Hedges' g (HG), and Cohen's d (CD), analyzed by inverse variance fixed or random effects models. Data from studies with dichotomous outcomes can be analyzed with a risk difference (RD), risk ratio (RR), or odds ratio (OR), weighted by inverse variance, Mantel-Haenszel, Peto (only odds ratio), or Dersimonian-Laird approaches. Analyses based on correlation coefficients or Fisher's Z are also possible, though only if the data are provided as comparative input, e.g. the association measures itself with their standard error. If correlation or effect size data are not in this format, they can be transformed via the MIX Statistics Converter that comes with the program. Table 1 gives an overview of the general features and the numerical and graphical methods in version 1.5 of the MIX program.
The most important educational features are the program's Output Tutor and Concept Tutor. Both are interac-Creation of a data set with the MIX program Figure 4 Creation of a data set with the MIX program. Data sets can be created from Excel files, Comma Separated Value (CSV) files, or via manual input. Once the data prepared on a spreadsheet within the program, the user can select the cell ranges that correspond to the relevant variables and load the data for analysis with a simple click.
tive dialog boxes that provide information about epidemiological and statistical concepts and tests. The Output Tutor changes with each analysis and always explains tests and results that are displayed or changed at the very moment. Additional teaching material includes a Flash ® -based Theory Tour that explains the fundamentals of systematic reviews and meta-analyses and a Program Tour that shows the basics of how to use the program. The educational materials take up approximately 25 Mb and can also be downloaded separately.
To increase program stability and prevent users from accidentally altering the Visual Basic procedures, the source code cannot be accessed while the program is running. Codes to unlock the VBA modules are provided by the first author upon request.

Validation
Version 9.2 of STATA [24], and more specifically version 1.81 of the metan program [25], version 1.2.4 of the metabias program [26], and version 1.0.5 of the metatrim program [27] were used as the general reference standards for most tests. Details on the development of these user-written programs themselves can be found in the STATA Technical Bulletins [25][26][27]. The meta-analysis software Comprehensive Meta-Analysis (CMA) version 2 [28] was The overview gives a general summary of the features in version 1.5 of the MIX program. More details are provided on the website. Abbreviations: "n" = group size, "m"' = mean, "sd" = standard deviation, "am" = association measure, se" = standard error, "N" = sample size.
used for validation of the Fail-safe N output and to double check the results of the other tests. Two investigators (LB, LMY) performed the validation independently with the MIX program (version 1.5 running in Excel 2003) and the reference standard(s) by analyzing eight data sets from meta-analyses that have been published in major journals [4,[29][30][31][32][33][34][35].
The data sets represent three of the most often used types of data for meta-analysis in health care research: 1) descriptive data for dichotomous outcomes, 2) descriptive data for continuous outcomes, and 3) comparative (association measure) data. For all three data types we chose a relatively small (less than 10 studies) and large data set (more than 20 studies) and we used two extra data sets in the 'descriptive dichotomous' category (one representing a meta-analysis of substantially heterogeneous studies and one with a rare event). The data sets are summarized in table 2. The tests that were subject to the validation procedures are shown in table 3. The items include individual study association measures, combined association measures, and several heterogeneity and small study effect assessments. Whenever applicable, p-values and/or confidence intervals were also compared.
Results from the analyses of the eight data sets with MIX and the reference software were entered independently in identical custom-made spreadsheets. These spreadsheets were later compared in separate analysis sheets that used a cell-based formula to check for discrepancies of results up to 4 decimals.

Results and Discussion
In summary, we have been able to achieve our objective of developing a comprehensive and yet free program for meta-analysis. The Excel platform, although not without problems, has proved to be flexible enough to create an easy-to-use, and graphically and numerically comprehensive program.
In its current state (version 1.5) all results from the MIX program are identical (up to 4 decimals minimally) to results from the most recent versions of the metan, metabias, and metatrim commands in STATA. The small study effect regression test by Macaskill [36] that was tested via STATA's regress command also turned out to be accurate. Table 4 and 5 are examples of the odds ratio validation results for data set 1 [4].
With regard to the trim-and-fill analysis [37], the MIX program allows for calculations using the weighting method applied in the original meta-analysis, whereas both CMA and STATA use only fixed or random effects inverse variance methods when trimming and filling. While the calculations in MIX for trim-and-fill analyses with other weighting methods were verified manually and we have no reason to believe anything is wrong, we recommend using the inverse variance methods until more is known about approaches with alternative weighting.
Although we are in the process of completing a formal software comparison project, we are confident that the MIX program can compete in many respects (usability, analytical options, comprehensiveness, and export options) with most of the existing meta-analysis programs like Comprehensive Meta-Analysis [28], MetaWin [38], RevMan [39], or WEasyMA [40]. However, there are also still some limitations. One is the maximum number of studies that can be analyzed in the meta-analysis, which is now 100. Though systematic reviews finding 100 studies for analysis are still very rare, this is something that may change in the future. Furthermore, while sub-group analyses are easy to perform within MIX, they are currently not  [29] 2001 5 DD -small 3 Teo et al. [30] 1991 16 DD -publication bias 4 Crowley [31] 2000 17 DD -rare events 5 Lightowler et al. [32] 2003 5 DC -small 6 Wahlbeck et al. [33] 2000 11 DC -medium large 7 Pagliaro et al. [34] 1992 19 C -odds ratio 8 Law et al. [35] 1994 10 C -risk difference The validation was done with eight data sets from meta-analysis that have been published in major peer-reviewed journals. The data sets were selected to represent a wide spectrum of potential input for meta-analysis. Abbreviations: "DD" = descriptive data for dichotomous outcomes (two-by-two table data), "DC" = descriptive data for continuous outcomes (means with their standard deviations and sample sizes), and "C" = comparative data (association measures with standards error or confidence intervals). Meta-analysis (per association measure/weighting) -Association measure with 95% CI and P value -Heterogeneity Q with 95% CI and/or P value -Inconsistency I 2 with 95% CI and/or P value -Fail-safe N with tolerance level -Begg's rank correlation test with z-score and P value -Egger's regression intercept with 95% CI and/or P value -Macaskill's regression slope with 95% CI and/or P value -Trim-and-fill studies with new association measure and 95% CI Essentially all major numerical output that is produced by a comprehensive meta-analysis was assessed during the validation. The tests were repeated with all available (fixed effect and random effects) weighting models. Abbreviations: "CI" = confidence interval.
automated and during a sub-group analysis not all subgroups can be shown simultaneously in a single forest plot. The subgroup forest plot can however be created manually because the Excel graphs of individual forest plots are relatively easily formatted and stacked. We intend to improve the program with regard to these limitations in the near future.
Another important issue that we will focus on in upcoming updates is meta-regression. Although some univariable regression methods are integrated in the tests for small study effects, the MIX program can currently not perform meta-regression. We realize that meta-regression, especially with multiple independent variables, is a valuable tool for assessing heterogeneity and adapting a meta-analysis accordingly, but it requires matrix calculations that are far more difficult to program in Excel or VBA than the standard tests. Currently, univariable meta-regression is possible with Comprehensive Meta-Analysis [28] and MetaWin [38]. However, like all dedicated meta-analysis packages they lack the option for multivariable metaregression. We have started working on facilities for metaregression within the MIX program and we hope it will be integrated sometime in 2007. In a meta-analysis, each study is given a weight that determines its influence on the overall result and this weight depends on the weighting method. Proper weighting is crucial to get correct results, so we validated all individual study weights for each data set and weighting method. The table shows the odds ratio weighting validation for data set 1. Abbreviations: "IV" = inverse variance weighting, "MH" = Mantel-Haenszel weighting, "PETO" = Peto weighting, and "IV+t" = inverse variance plus tau, which refers to random effects weighting according to the DerSimonian-Laird method.
Finally, because we are still frequently updating the program and including new features, we have postponed the making of a hard-copy manual or methods guide until this process has stabilized.

Conclusion
The MIX program provides researchers, students, and lecturers with a free tool to perform state-of-the-art metaanalyses and learn or teach about what it is they are doing. It uses an innovative approach with Excel as a computing platform and even provides some numerical and graphical output that is not provided by other software. Results from version 1.5 of the MIX program are identical to those from STATA, and MIX can be regarded as a comprehensive and valid tool for performing causal meta-analyses.