- Open Access
- Open Peer Review
Meta-DiSc: a software for meta-analysis of test accuracy data
© Zamora et al; licensee BioMed Central Ltd. 2006
- Received: 31 March 2006
- Accepted: 12 July 2006
- Published: 12 July 2006
Systematic reviews and meta-analyses of test accuracy studies are increasingly being recognised as central in guiding clinical practice. However, there is currently no dedicated and comprehensive software for meta-analysis of diagnostic data. In this article, we present Meta-DiSc, a Windows-based, user-friendly, freely available (for academic use) software that we have developed, piloted, and validated to perform diagnostic meta-analysis.
Meta-DiSc a) allows exploration of heterogeneity, with a variety of statistics including chi-square, I-squared and Spearman correlation tests, b) implements meta-regression techniques to explore the relationships between study characteristics and accuracy estimates, c) performs statistical pooling of sensitivities, specificities, likelihood ratios and diagnostic odds ratios using fixed and random effects models, both overall and in subgroups and d) produces high quality figures, including forest plots and summary receiver operating characteristic curves that can be exported for use in manuscripts for publication. All computational algorithms have been validated through comparison with different statistical tools and published meta-analyses. Meta-DiSc has a Graphical User Interface with roll-down menus, dialog boxes, and online help facilities.
Meta-DiSc is a comprehensive and dedicated test accuracy meta-analysis software. It has already been used and cited in several meta-analyses published in high-ranking journals. The software is publicly available at http://www.hrc.es/investigacion/metadisc_en.htm.
- Accuracy Estimate
- Forest Plot
- Threshold Effect
- Diagnostic Odds Ratio
- Respective Confidence Interval
Accurate diagnosis forms the basis of good clinical care, as without it one can neither prognosticate correctly nor choose the right treatment. Indeed, a wrong diagnosis can harm patients by exposing them to inappropriate or sub-optimal therapy . Thus studies of diagnostic accuracy, and particularly their systematic reviews and meta-analyses, are being recognised as instrumental in underpinning evidence-based clinical practice. Initiatives such as STARD  and developments within the Cochrane Collaboration  to accept protocols and reviews of test accuracy studies highlight the emphasis being given to evidence-based diagnosis.
Currently, there is only one test accuracy meta-analysis package, Meta-Test , which addresses some of the unique statistical issues related to test accuracy, such as pooling of sensitivities and specificities and summary receiver operating characteristics (sROC) analysis. However, it is a DOS-based application with an interface that many find difficult to use, and integrate into Windows-based applications. Moreover, it lacks crucial analytical tools such as pooling of likelihood ratios (LRs), tests for heterogeneity and meta-regression facilities.
We, therefore, developed, piloted and validated a comprehensive, Windows-based test accuracy meta-analysis software, Meta-DiSc, which is presented in this article, with a worked example.
Meta-DiSc software was created in Microsoft Visual Basic 6, and some mathematical routines have been linked from the NAG C mathematical library . The software is distributed as a single file, downloadable freely from URL: http://www.hrc.es/investigacion/metadisc_en.htm. Its installation is simple, guided by onscreen instructions. The programme has a user-friendly interface with roll-down menus, dialog boxes and online HTML compiled help files. These help files include a user manual and a description of the implemented statistical methods.
Meta-DiSc allows data entry into its datasheet in three different ways: a) directly by typing data into the datasheet using the keyboard, b) copying from another spreadsheet (e.g. Microsoft Excel) and pasting into Meta-DiSc datasheet, or c) importing text files from other sources (for example, in the comma delimited format). Several variables can be defined in the datasheet, including study identifiers, accuracy data from each study (true positives, false positives, true negatives and false negatives) and study level co-variates, such as those defining population spectrum or methodological quality of the studies.
Describing the results of individual studies
When describing accuracy results from several studies, it is important to get an indication of the magnitude and precision of the accuracy estimates derived from each study, as well as to assess the presence or absence of inconsistencies in accuracy estimates across studies (heterogeneity). As accuracy estimates are paired and often inter-related (sensitivity and specificity, or LR positive and LR negative), it is necessary to report these simultaneously . One accuracy measure that combines these paired measures is diagnostic odd ratio (dOR) , which has limited clinical use, although useful in procedures like meta-regression (see below).
Meta-DiSc computes accuracy estimates and confidence intervals from individual studies and shows results either as numerical tabulations or graphical plots in two formats: a) forest plots, for sensitivities, specificities, LRs or dOR, with respective confidence intervals; and b) plots of individual study results in ROC space, with or without an sROC curve.
Exploring heterogeneity (threshold effect)
Exploring heterogeneity is a critical issue to a) understand the possible factors that influence accuracy estimates, and b) to evaluate the appropriateness of statistical pooling of accuracy estimates from various studies. One of the primary causes of heterogeneity in test accuracy studies is threshold effect, which arises when differences in sensitivities and specificities or LRs occur due to different cut-offs or thresholds used in different studies to define a positive (or negative) test result. When threshold effect exists, there is a negative correlation between sensitivities and specificities (or a positive correlation between sensitivities and 1-specificities), which results in a typical pattern of "shoulder arm" plot in a sROC space . It is worth noting that correlation between sensitivity and specificity could arise due to a number of reasons other than threshold (e.g. partial verification bias, different spectrum of patients or different settings).
Meta-DiSc allows assessment for threshold effect in three different ways: a) visual inspection of relationship between pairs of accuracy estimates in forest plots. If threshold effect is present, the forest plots will show increasing sensitivities with decreasing specificities, or vice versa. The same inverse relationship will be apparent with LR positive and LR negative; b) representation of accuracy estimates from each study in a sROC space – a typical "shoulder arm" pattern would suggest presence of threshold effect; and c) computation of Spearman correlation coefficient between the logit of sensitivity and logit of 1-specificity. A strong positive correlation would suggest threshold effect.
Exploring for heterogeneity (other than threshold effect)
Apart from variations due to threshold effect, there are several other factors that can result in variations in accuracy estimates amongst different test accuracy studies in a review. These reasons include chance as well as variations in study population (e.g. severity of disease and co-morbidities), index test (differences in technology, assays, operator etc.), reference standard, and the way a study was designed and conducted . Since such heterogeneity is almost always present in accuracy systematic reviews, testing for the presence and the extent of heterogeneity of results between primary studies, prior to undertaking any meta-analysis, is a critical part of any diagnostic review, as is exploration of the possible causes of heterogeneity .
Meta-DiSc allows users to test for heterogeneity amongst various studies in two different ways: a) Visual inspection of forest plots of accuracy estimates. If the studies are reasonably homogeneous, the accuracy estimates from individual studies will lie along a line corresponding to the pooled accuracy estimate. Large deviations from this line will indicate possible heterogeneity; b) statistical tests, including Chi-square and Cochran-Q, which are automatically implemented during analysis to evaluate if the differences across the studies are greater than expected by chance alone. A low p-value will suggest presence of heterogeneity beyond what could be expected by chance alone. In addition to these heterogeneity statistics, Meta-DiSc computes the inconsistency index (I-squared) which has been proposed as a measure to quantify the amount of heterogeneity .
If substantial heterogeneity is found to be present from the analyses detailed above, then reasons for such heterogeneity can be explored by relating study level co-variates (e.g., population, test, reference standard or methodological features) to an accuracy measure, using meta-regression techniques. The accuracy measure that is normally used is dOR, as it is a unitary measure of diagnostic performance that encompasses both sensitivity and specificity or both LR positive and LR negative. Using dOR as a global measure of accuracy is a suitable method to compare the overall diagnostic accuracy of different tests . However, its use is limited because it cannot be used directly in clinical practice and, furthermore, possible opposing effects of a study characteristic on sensitivity or specificity may be masked by using dOR.
Meta-DiSc implements meta-regression using a generalization of Littenberg and Moses Linear model [8, 13] weighted by inverse of the variance or study size or unweighted. Random effects between studies can be estimated by different methods and added to the weighting scheme . Estimations of coefficients of the model are performed by least squares method as implemented in NAG mathematical routines. The outcome variable is ln(dOR) which is related via a linear model to any number of study level covariates, and optionally including the variable representing threshold effect . The outputs from meta-regression modelling in Meta-DiSc are the co-efficients of the model, as well as ratio of dOR (rdOR) with respective confidence intervals. If a particular study level co-variate is significantly associated with diagnostic accuracy, then its co-efficient will have a low p-value, and the rdOR will give a measure of magnitude of the association.
More advanced meta-regression techniques such as Hierarchical sROC model  and bivariate analysis of sensitivity and specificity  has been developed. These methods overcome some of the statistical shortcomings inherent to Littenberg and Moses model [8, 19].
Statistical pooling is not always appropriate or necessary in every systematic review of test accuracy studies. However, when used appropriately, pooling can provide useful summary information. The necessary precondition for simple pooling (weighted averaging) of each of sensitivities, specificities, LR positives and LR negatives, is that the studies and results are reasonably homogeneous (i.e. no substantial heterogeneity, including threshold effect, is present). If heterogeneity due to threshold effect were present, the accuracy data can be pooled by fitting a sROC curve and summarising that curve by means of the Area Under the Curve (AUC) or using other statistics such as the Q* index  (i.e. the point of the curve in which sensitivity equals specificity). If there is heterogeneity due to sources other than threshold effect, then pooling should only be attempted within homogeneous subsets, which would normally have been defined a priori.
Meta-DiSc has comprehensive functionality for statistical pooling: a) It allows pooling of sensitivities, specificities, LR positive and LR negative each separately, using either fixed or random effect [10, 20] models. The output from these analyses are presented numerically in tables, and graphically as forest plots. Pooled estimates are provided with their respective confidence intervals; b) It implements several ways to fit a sROC curve when threshold effect is present. Default option is to compute a symmetrical sROC curve after fitting the linear model proposed by Littenberg and Moses. However, users can choose different options to fit this curve, for example, combining individual dORs by the Mantel-Haenszel or the DerSimonian Laird methods [10, 20] to estimate an overall dOR, and then fitting an sROC curve. When the dOR changes with diagnostic threshold, the sROC curve is asymmetrical. Meta-DiSc allows the user to check for asymmetry of the sROC curve, and fit an asymmetrical sROC curve if appropriate. Finally, Meta-DiSc allows estimation of AUC and the Q* index, along with their standard errors, as a summary measure of global accuracy which also aids inter-test comparisons; c) Meta-DiSc allows pooling of various summary measures within subgroups defined by study level co-variates with the help of a filter utility.
Validation of statistical procedures. Validation of different statistical procedures using a simulated data-set. Results of Meta-DiSc (version 1.4) are compared with those obtained with metan (version 1.86) and metareg (version 1.06) STATA commands. Prior to the analyses, all four cells of all studies were added with 1/2 to avoid division by zero when computing some indices or standard errors. Meta-DiSc and STATA data-set are provided as additional files [see Additional file 1] and [see Additional file 2].
Meta-DiSc (version 1.4)
STATA (ver 8.2)
Random Effect Model
Pooled +ve LR
(2.085 – 2.871)
(2.085 – 2.871)
Pooled -ve LR
(0.095 – 0.257)
(0.095 – 0.257)
Fixed Effect Model
Pooled +ve LR
(2.208 – 2.459)
(2.208 – 2.459)
Pooled -ve LR
(0.073 – 0.149)
(0.073 – 0.148)
Constant coefficient (SE)
S coefficient (SE)
Covariable coefficient (SE)
We illustrate the various procedures that Meta-DiSc implements in a case-study of ultrasound test in the diagnosis of uterine pathology [21, 22]. Ultrasound measurement of the lining of the uterus (endometrium) can predict pathology such as endometrial hyperplasia (a precancerous condition) or cancer. The greater the thickness of endometrium, the more likely that the target condition is present. Various thresholds (such as 3, 4 or 5 mm etc) have been used to define a positive ultrasound result.
Tabulation of Likelihood ratio for positive test result (LR+) with respective 95% confidence intervals from all test accuracy studies included in systematic review of ultrasound for prediction of endometrial cancer.
[95% Conf. Iterval.]
Altuncu et al.
(REM) pooled LR+
Results of Spearman rank correlation of sensitivity against (1 – specificity) to assess the threshold effect in all test accuracy studies included in systematic review of ultrasound for prediction of endometrial cancer.
Results of meta-regression analysis for predicting the presence or absence of endometrial carcinoma with variables: use or non-use of hormone replacement therapy (HRT); technique of ultrasound measurement (single or double layer); and population enrolment (consecutive or other).
Meta-Regression(Inverse Variance weights) (1)
Meta-Regression(Inverse Variance weights) (2)
Meta-Regression(Inverse Variance weights) (3)
Meta-DiSc allows description of individual study results; exploration of heterogeneity with a variety of statistics including chi-square, I-squared and Spearman correlation tests; implements meta-regression techniques to explore the relationships between study characteristics and accuracy estimates; performs statistical pooling of sensitivities, specificities, likelihood ratios and diagnostic odds ratios, using fixed and random effects models, both overall and in subgroups; and produces high quality figures, including forest plots and summary receiver operating characteristic curves that can be exported for use in manuscripts for publication.
Meta-DiSc is an evolving software. As new diagnostic meta-analytic methods become established over time, they will be implemented into the program in the future. For example, bivariate method of pooling sensitivity and specificity  is currently being developed. We will carefully follow the progress in this field. Once accepted as an established meta-analytic method, it will be implemented in Meta-DiSc. On similar lines, methods of data extraction from individual studies that only provide accuracy measures are currently being developed within our department. Once these methods have been verified, we will implement this option to assist systematic reviewers in extracting 2-by-2 tables from such studies.
Meta-DiSc is a comprehensive and dedicated test accuracy meta-analysis software. All computational algorithms in it have been validated through comparison with different statistical tools and published meta-analyses. Its use and citation in several meta-analyses published in high-ranking journals is evidence of external validation of its high quality [23–28].
The software is publicly available at http://www.hrc.es/investigacion/metadisc_en.htm.
Operating system: The software runs on Windows based personal computers (Windows 95 or higher) with Pentium-class processor or equivalent, with minimum of 32 MB of RAM and minimum of 20 MB of hard disk space. SVGA color monitor; minimum 800 × 600 screen resolution and 256 colors.
Licence: Freeware for academic use.
This work has been partly funded by Spanish Health Ministry Grants no PI02/0954, G03/090 and PI04/1055.
- Thomson R, McElroy H, Sudlow M: Guidelines on anticoagulant treatment in atrial fibrillation in Great Britain: variation in content and implications for treatment. BMJ. 1998, 316: 509-513.View ArticlePubMedPubMed CentralGoogle Scholar
- Bossuyt PM, Reitsma JB, Bruns DE, Gatsonis CA, Glasziou PP, Irwig LM, Lijmer JG, Moher D, Rennie D, de Vet HC: Towards complete and accurate reporting of studies of diagnostic accuracy: The STARD Initiative. Radiology. 2003, 226: 24-28.View ArticlePubMedGoogle Scholar
- Collaboration C: Methods Groups Newsletter. http://www cochrane org/newslett/MGNews-2004 pdf. 2006, [http://www.cochrane.org/newslett/MGNews-2004.pdf]Google Scholar
- Lau J: Meta-Test. 1997, Boston: New England Medical CenterGoogle Scholar
- The NAG C Library, Mark 6. 2004, Oxford: Numerical Algorithms GroupGoogle Scholar
- Zamora J, Muriel A, Abraira V: Meta-DiSc Statistical Methods. 2006, [ftp://ftp.hrc.es/pub/programas/metadisc/MetaDisc_StatisticalMethods.pdf]Google Scholar
- Irwig L, Tosteson ANA, Gatsonis C, Lau J, Colditz G, Chalmers TC, Mosteller F: Guidelines for Metaanalyses Evaluating Diagnostic-Tests. Annals of Internal Medicine. 1994, 120: 667-676.View ArticlePubMedGoogle Scholar
- Moses LE, Shapiro D, Littenberg B: Combining independent studies of a diagnostic test into a summary ROC curve: data-analytic approaches and some additional considerations. Stat Med. 1993, 12: 1293-1316.View ArticlePubMedGoogle Scholar
- Deville WL, Buntinx F, Bouter LM, Montori VM, de Vet HC, van der Windt DA, Bezemer PD: Conducting systematic reviews of diagnostic studies: didactic guidelines. BMC Med Res Methodol. 2002, 2: 9-10.1186/1471-2288-2-9.View ArticlePubMedPubMed CentralGoogle Scholar
- Deeks JJ: Systematic reviews of evaluations of diagnostic and screening tests studies. Systematic reviews in health care: meta-analysis in context. Edited by: Egger M, Davey SG and Altman DG. 2001, BMJ Books, 2nd EditionGoogle Scholar
- Honest H, Khan KS: Reporting of measures of accuracy in systematic reviews of diagnostic literature. Bmc Health Services Research. 2002, 2:Google Scholar
- Glas AS, Lijmer JG, Prins MH, Bonsel GJ, Bossuyt PM: The diagnostic odds ratio: a single indicator of test performance. J Clin Epidemiol. 2003, 56: 1129-1135. 10.1016/S0895-4356(03)00177-X.View ArticlePubMedGoogle Scholar
- Lijmer JG, Bossuyt PM, Heisterkamp SH: Exploring sources of heterogeneity in systematic reviews of diagnostic tests. Stat Med. 2002, 21: 1525-1537. 10.1002/sim.1185.View ArticlePubMedGoogle Scholar
- Dinnes J, Deeks J, Kirby J, Roderick P: A methodological review of how heterogeneity has been examined in systematic reviews of diagnostic test accuracy. Health Technol Assess. 2005, 9: 1-113.View ArticleGoogle Scholar
- Higgins JP, Thompson SG: Quantifying heterogeneity in a meta-analysis. Stat Med. 2002, 21: 1539-1558. 10.1002/sim.1186.View ArticlePubMedGoogle Scholar
- Thompson SG, Sharp SJ: Explaining heterogeneity in meta-analysis: a comparison of methods. Stat Med. 1999, 18: 2693-2708. 10.1002/(SICI)1097-0258(19991030)18:20<2693::AID-SIM235>3.0.CO;2-V.View ArticlePubMedGoogle Scholar
- Rutter CM, Gatsonis CA: A hierarchical regression approach to meta-analysis of diagnostic test accuracy evaluations. Stat Med. 2001, 20: 2865-2884. 10.1002/sim.942.View ArticlePubMedGoogle Scholar
- Reitsma JB, Glas AS, Rutjes AWS, Scholten RJPM, Bossuyt PM, Zwinderman AH: Bivariate analysis of sensitivity and specificity produces informative summary measures in diagnostic reviews. Journal of Clinical Epidemiology. 2005, 58: 982-990. 10.1016/j.jclinepi.2005.02.022.View ArticlePubMedGoogle Scholar
- Walter SD: Properties of the summary receiver operating characteristic (SROC) curve for diagnostic test data. Stat Med. 2002, 21: 1237-1256. 10.1002/sim.1099.View ArticlePubMedGoogle Scholar
- DerSimonian R, Laird N: Meta-analysis in clinical trials. Control Clin Trials. 1986, 7: 177-188. 10.1016/0197-2456(86)90046-2.View ArticlePubMedGoogle Scholar
- Gupta JK, Chien PF, Voit D, Clark TJ, Khan KS: Ultrasonographic endometrial thickness for diagnosing endometrial pathology in women with postmenopausal bleeding: a meta-analysis. Acta Obstet Gynecol Scand. 2002, 81: 799-816. 10.1034/j.1600-0412.2001.810902.x.View ArticlePubMedGoogle Scholar
- Khan KS, Kunz R, Kleijnen J, Antes G: Case study 4: Reviewing evidence on test accuracy. Systematic Review to Support Evidence-based Medicine. 2003, London, The Royal Society of Medicine, 109-119. 2003Google Scholar
- Morgan M, Kalantri S, Flores L, Pai M: A commercial line probe assay for the rapid detection of rifampicin resistance in Mycobacterium tuberculosis: a systematic review and meta-analysis. BMC Infectious Diseases. 2005, 5: 62-10.1186/1471-2334-5-62.View ArticlePubMedPubMed CentralGoogle Scholar
- Flores L, Pai M, Colford JM, Riley LW: In-house nucleic acid amplification tests for the detection of Mycobacterium tuberculosis in sputum specimens: meta-analysis and meta-regression. BMC Microbiol. 2005, 5: 55-10.1186/1471-2180-5-55.View ArticlePubMedPubMed CentralGoogle Scholar
- Gisbert J, Abraira V: Accuracy of Helicobacter pylori Diagnostic Tests in Patients with Bleeding Peptic Ulcer: A Systematic Review and Meta-analysis. The American Journal of Gastroenterology. 2006, 101: 848-863. 10.1111/j.1572-0241.2006.00528.x.View ArticlePubMedGoogle Scholar
- Shiga T, Wajima Z, Inoue T, Sakamoto A: Predicting difficult intubation in apparently normal patients: a meta-analysis of bedside screening test performance. Anesthesiology. 2005, 103: 429-437. 10.1097/00000542-200508000-00027.View ArticlePubMedGoogle Scholar
- Zijlstra JM, van der Werf G, Hoekstra OS, Hooft L, Huijgens PC: F-fluoro-deoxyglucose positron emission tomography for post-treatment evaluation of malignant lymphoma: a systematic review. Haematologica. 2006, 91: 522-9.PubMedGoogle Scholar
- Goodacre S, Sutton AJ, Sampson FC: Meta-Analysis: The Value of Clinical Assessment in the Diagnosis of Deep Venous Thrombosis. Annals of Internal Medicine. 2005, 143: 129-139.View ArticlePubMedGoogle Scholar
- The pre-publication history for this paper can be accessed here:http://www.biomedcentral.com/1471-2288/6/31/prepub
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.